Compare commits

..

100 Commits

Author SHA1 Message Date
Erick Friis
18a4477b5d test ci 2024-02-29 11:49:53 -08:00
kkdamowang
6782dac420 docs: remove duplicate quote in AzureOpenAIEmbeddings doc (#18315)
- **Description:** Remove duplicate quote in AzureOpenAIEmbeddings doc,
remove trailing spaces.
- **Issue:** No
- **Dependencies:** No
2024-02-29 11:25:50 -08:00
Filip Schouwenaars
4c62362eab Add links to relevant DataCamp code alongs (#18332)
This PR adds links to some more free resources for people to get
acquainted with Langhchain without having to configure their system.

<!-- If no one reviews your PR within a few days, please @-mention one
of baskaryan, efriis, eyurtsev, hwchase17. -->

Co-authored-by: Filip Schouwenaars <filipsch@users.noreply.github.com>
2024-02-29 11:25:01 -08:00
Virat Singh
cd926ac3dd community: Add PolygonFinancials Tool (#18324)
**Description:**
In this PR, I am adding a `PolygonFinancials` tool, which can be used to
get financials data for a given ticker. The financials data is the
fundamental data that is found in income statements, balance sheets, and
cash flow statements of public US companies.

**Twitter**: 
[@virattt](https://twitter.com/virattt)
2024-02-29 10:56:05 -08:00
Leonid Ganeline
d43fa2eab1 docs providers update (#18336)
Formatted pages into a consistent form. Added descriptions and links
when needed.
2024-02-29 10:53:12 -08:00
Erick Friis
68be5a7658 infra: skip ibm api docs (#18335) 2024-02-29 10:16:57 -08:00
Erick Friis
43534a4c08 skip airbyte api docs (#18334) 2024-02-29 09:57:52 -08:00
Bagatur
6a5b084704 docs: update func calling doc (#18300) 2024-02-29 09:45:07 -08:00
Bagatur
68ad3414a2 experimental[patch]: Release 0.0.53 (#18330) 2024-02-29 09:13:21 -08:00
William FH
8af4425abd [Evaluation] Config Fix (#18231) 2024-02-29 00:06:46 -08:00
Averi Kitsch
1b63530274 docs: update Google documentation (#18297)
**Description:** update Google documentation
**Issue:** 
**Dependencies:**
2024-02-29 01:42:44 +00:00
Leonid Ganeline
1d865a7e86 docs: google provider page fixes (#18290)
Several URL-s were broken (in the yesterday PR). Like
[Integrations/platforms/google/Document
Loaders](https://python.langchain.com/docs/integrations/platforms/google#document-loaders)
page, Example link to "Document Loaders / Cloud SQL for PostgreSQL" and
most of the new example links in the Document Loaders, Vectorstores,
Memory sections.

- fixed URL-s (manually verified all example links)
- sorted sections in page to follow the "integrations/components" menu
item order.
- fixed several page titles to fix Navbar item order

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-02-29 00:45:03 +00:00
William De Vena
0486404a74 langchain_openai[patch]: Invoke callback prior to yielding token (#18269)
## PR title
langchain_openai[patch]: Invoke callback prior to yielding token

## PR message
Description: Invoke callback prior to yielding token in _stream and
_astream methods for langchain_openai.
Issue: https://github.com/langchain-ai/langchain/issues/16913
Dependencies: None
Twitter handle: None
2024-02-29 00:00:08 +00:00
William De Vena
5ee76fccd5 langchain_groq[patch]: Invoke callback prior to yielding token (#18272)
## PR title
langchain_groq[patch]: Invoke callback prior to yielding

## PR message
**Description:**Invoke callback prior to yielding token in _stream and
_astream methods for groq.
Issue: https://github.com/langchain-ai/langchain/issues/16913
Dependencies: None
Twitter handle: None
2024-02-28 23:43:16 +00:00
aditya thomas
eb0c178d75 docs: update to the list of partner packages in the list of providers (#18252)
**Description:** Update to the list of partner packages in the list of
providers
**Issue:** Google & Nvidia had two entries each, both pointing to the
same page
**Dependencies:** None
2024-02-28 15:40:14 -08:00
ccurme
9bf58ec7dd update extraction use-case docs (#17979)
Update extraction use-case docs to showcase and explain all modes of
`create_structured_output_runnable`.
2024-02-28 17:32:04 -05:00
Christophe Bornet
8a81fcd5d3 community: Fix deprecation version of AstraDB VectorStore (#17991) 2024-02-28 17:15:09 -05:00
Stefano Lottini
6d863bed51 partner[minor]: Astra DB clients identify themselves as coming through LangChain package (#18131)
**Description**

This PR sets the "caller identity" of the Astra DB clients used by the
integration plugins (`AstraDBChatMessageHistory`, `AstraDBStore`,
`AstraDBByteStore` and, pending #17767 , `AstraDBVectorStore`). In this
way, the requests to the Astra DB Data API coming from within LangChain
are identified as such (the purpose is anonymous usage stats to best
improve the Astra DB service).
2024-02-28 17:13:22 -05:00
kkdamowang
4899a72b56 docs: remove duplicate word in lcel/streaming (#18249)
- **Description:** Remove duplicate word in lcel/streaming.
- **Issue:** No.
- **Dependencies:**  No.
2024-02-28 21:50:26 +00:00
mackong
2c42f3a955 ollama[patch]: delete suffix slash to avoid redirect (#18260)
- **Description:** see
[ollama](https://github.com/ollama/ollama/blob/main/server/routes.go#L949)'s
route definitions
- **Issue:** N/A
- **Dependencies:** N/A
2024-02-28 16:44:48 -05:00
William De Vena
6b58943917 community[patch]: Invoke callback prior to yielding token (#18288)
## PR title
community[patch]: Invoke callback prior to yielding

PR message
Description: Invoke on_llm_new_token callback prior to yielding token in
_stream and _astream methods.
Issue: https://github.com/langchain-ai/langchain/issues/16913
Dependencies: None
Twitter handle: None
2024-02-28 21:40:53 +00:00
Brace Sproul
ca4f5e2408 ci: Update issue template required checks (#18283) 2024-02-28 13:27:39 -08:00
William De Vena
23722e3653 langchain[patch]: Invoke callback prior to yielding token (#18282)
## PR title
langchain[patch]: Invoke callback prior to yielding

## PR message
Description: Invoke on_llm_new_token callback prior to yielding token in
_stream and _astream methods in langchain/tests/fake_chat_model.
Issue: https://github.com/langchain-ai/langchain/issues/16913
Dependencies: None
Twitter handle: None
2024-02-28 16:15:02 -05:00
Eugene Yurtsev
cd52433ba0 community[minor]: Add SQLDatabaseLoader document loader (#18281)
- **Description:** A generic document loader adapter for SQLAlchemy on
top of LangChain's `SQLDatabaseLoader`.
  - **Needed by:** https://github.com/crate-workbench/langchain/pull/1
  - **Depends on:** GH-16655
  - **Addressed to:** @baskaryan, @cbornet, @eyurtsev

Hi from CrateDB again,

in the same spirit like GH-16243 and GH-16244, this patch breaks out
another commit from https://github.com/crate-workbench/langchain/pull/1,
in order to reduce the size of this patch before submitting it, and to
separate concerns.

To accompany the SQLAlchemy adapter implementation, the patch includes
integration tests for both SQLite and PostgreSQL. Let me know if
corresponding utility resources should be added at different spots.

With kind regards,
Andreas.


### Software Tests

```console
docker compose --file libs/community/tests/integration_tests/document_loaders/docker-compose/postgresql.yml up
```

```console
cd libs/community
pip install psycopg2-binary
pytest -vvv tests/integration_tests -k sqldatabase
```

```
14 passed
```



![image](https://github.com/langchain-ai/langchain/assets/453543/42be233c-eb37-4c76-a830-474276e01436)

---------

Co-authored-by: Andreas Motl <andreas.motl@crate.io>
2024-02-28 21:02:28 +00:00
William De Vena
a37dc83a9e langchain_anthropic[patch]: Invoke callback prior to yielding token (#18274)
## PR title
langchain_anthropic[patch]: Invoke callback prior to yielding

## PR message
- Description: Invoke callback prior to yielding token in _stream and
_astream methods for anthropic.
- Issue: https://github.com/langchain-ai/langchain/issues/16913
- Dependencies: None
- Twitter handle: None
2024-02-28 20:19:22 +00:00
David Ruan
af35e2525a community[minor]: add hugging_face_model document loader (#17323)
- **Description:** add hugging_face_model document loader,
  - **Issue:** NA,
  - **Dependencies:** NA,

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2024-02-28 20:05:35 +00:00
Sanjaypranav V M
b9a495e56e community[patch]: added latin-1 decoder to gmail search tool (#18116)
some mails from flipkart , amazon are encoded with other plain text
format so to handle UnicodeDecode error , added exception and latin
decoder

Thank you for contributing to LangChain!

@hwchase17
2024-02-28 19:28:29 +00:00
Nuno Campos
6da08d0f22 Add PNG drawer for Runnable.get_graph() (#18239)
Thank you for contributing to LangChain!

- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
  - Example: "community: add foobar LLM"


- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
    - **Description:** a description of the change
    - **Issue:** the issue # it fixes, if applicable
    - **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!


- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
2024-02-28 11:25:19 -08:00
Nuno Campos
d9fd1194f5 Remove check preventing passing non-declared config keys (#18276)
Thank you for contributing to LangChain!

- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
  - Example: "community: add foobar LLM"


- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
    - **Description:** a description of the change
    - **Issue:** the issue # it fixes, if applicable
    - **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!


- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
2024-02-28 18:28:53 +00:00
William De Vena
7ac74f291e langchain_nvidia_ai_endpoints[patch]: Invoke callback prior to yielding token (#18271)
## PR title
langchain_nvidia_ai_endpoints[patch]: Invoke callback prior to yielding

## PR message
**Description:** Invoke callback prior to yielding token in _stream and
_astream methods for nvidia_ai_endpoints.
**Issue:** https://github.com/langchain-ai/langchain/issues/16913
**Dependencies:** None
2024-02-28 18:10:57 +00:00
Erick Friis
b4f6066a57 docs: airbyte github cookbook (#18275) 2024-02-28 18:04:15 +00:00
Ashley Xu
e3211c2b3d community[patch]: BigQueryVectorSearch JSON type unsupported for metadatas (#18234) 2024-02-28 08:19:53 -08:00
Jack Wotherspoon
92c34d4803 docs: update documentation for Google Cloud database integrations (#18265)
**Description:** Fixing typos and rendering issues for Google Cloud
database integrations.
**Issue:** NA
**Dependencies:** NA
2024-02-28 15:32:43 +00:00
Erick Friis
2e31f1c2f8 infra: api docs folder move (#18223) 2024-02-28 07:10:27 -08:00
Mateusz Szewczyk
db643f6283 ibm[patch]: release 0.1.0 Add possibility to pass ModelInference or Model object to WatsonxLLM class (#18189)
- **Description:** Add possibility to pass ModelInference or Model
object to WatsonxLLM class
- **Dependencies:**
[ibm-watsonx-ai](https://pypi.org/project/ibm-watsonx-ai/),
  - **Tag maintainer:** : 

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally. 
2024-02-28 07:03:15 -08:00
Averi Kitsch
76eb553084 docs: add documentation for Google Cloud database integrations (#18225)
**Description:** add documentation for Google Cloud database
integrations
**Issue:** NA
**Dependencies:** NA
2024-02-27 21:17:30 -08:00
Erick Friis
d7a77054ed airbyte[patch]: core version 0.1.5 (#18244) 2024-02-27 19:54:43 -08:00
Erick Friis
be8d2ff5f7 airbyte[patch]: init pkg (#18236) 2024-02-27 19:37:53 -08:00
Ayo Ayibiowu
ac1d7d9de8 community[feat]: Adds LLMLingua as a document compressor (#17711)
**Description**: This PR adds support for using the [LLMLingua project
](https://github.com/microsoft/LLMLingua) especially the LongLLMLingua
(Enhancing Large Language Model Inference via Prompt Compression) as a
document compressor / transformer.

The LLMLingua project is an interesting project that can greatly improve
RAG system by compressing prompts and contexts while keeping their
semantic relevance.

**Issue**: https://github.com/microsoft/LLMLingua/issues/31
**Dependencies**: [llmlingua](https://pypi.org/project/llmlingua/)

@baskaryan

---------

Co-authored-by: Ayodeji Ayibiowu <ayodeji.ayibiowu@getinge.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2024-02-27 19:23:56 -08:00
Nuno Campos
a99eb3abf4 openai[patch]: Assign message id in ChatOpenAI (#17837) 2024-02-27 17:32:54 -08:00
Isaac Francisco
733367b795 docs: deprecation of OpenAI functions agent, astream_events docstring (#18164)
Co-authored-by: Hershenson, Isaac (Extern) <isaac.hershenson.extern@bayer04.de>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-02-27 09:14:53 -08:00
Harrison Chase
b0ccaf5917 Harrison/add structured output (#18165) 2024-02-27 08:25:09 -08:00
Bagatur
242af4b5a4 openai[patch], mistral[patch], fireworks[patch]: releases 0.0.8, 0.0.5, 0.0.2 (#18186) 2024-02-27 04:22:24 -08:00
Bagatur
7e66d964c6 core[patch]: Release 0.1.27 (#18159) 2024-02-26 17:27:38 -08:00
Harrison Chase
d7c607ca00 core[minor]: move document compressor base (#17910) 2024-02-26 17:20:50 -08:00
Bagatur
b3f4de38ae mistral[minor]: Function calling and with_structured_output (#18150)
![Screenshot 2024-02-26 at 2 07 06
PM](https://github.com/langchain-ai/langchain/assets/22008038/20cacb47-3b24-45b5-871b-dd169f1acd37)
2024-02-26 16:22:30 -08:00
Bagatur
c53aa5cd37 core[patch]: support JS message serial namespaces (#18151) 2024-02-26 16:19:46 -08:00
Harrison Chase
c673717c2b add optimization notebook (#18155) 2024-02-26 16:09:31 -08:00
Max Jakob
5ab69f907f partners: add Elasticsearch package (#17467)
### Description
This PR moves the Elasticsearch classes to a partners package.

Note that we will not move (and later remove) `ElasticKnnSearch`. It
were previously deprecated.
`ElasticVectorSearch` is going to stay in the community package since it
is used quite a lot still.

Also note that I left the `ElasticsearchTranslator` for self query
untouched because it resides in main `langchain` package.

### Dependencies
There will be another PR that updates the notebooks (potentially pulling
them into the partners package) and templates and removes the classes
from the community package, see
https://github.com/langchain-ai/langchain/pull/17468

#### Open question
How to make the transition smooth for users? Do we move the import
aliases and require people to install `langchain-elasticsearch`? Or do
we remove the import aliases from the `langchain` package all together?
What has worked well for other partner packages?

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-02-26 23:19:47 +00:00
matt haigh
a4896da2a0 Experimental: Add other threshold types to SemanticChunker (#16807)
**Description**
Adding different threshold types to the semantic chunker. I’ve had much
better and predictable performance when using standard deviations
instead of percentiles.


![image](https://github.com/langchain-ai/langchain/assets/44395485/066e84a8-460e-4da5-9fa1-4ff79a1941c5)

For all the documents I’ve tried, the distribution of distances look
similar to the above: positively skewed normal distribution. All skews
I’ve seen are less than 1 so that explains why standard deviations
perform well, but I’ve included IQR if anyone wants something more
robust.

Also, using the percentile method backwards, you can declare the number
of clusters and use semantic chunking to get an ‘optimal’ splitting.

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2024-02-26 13:50:48 -08:00
Jaskirat Singh
ce682f5a09 community: vectorstores.kdbai - Added support for when no docs are present (#18103)
- **Description:** By default it expects a list but that's not the case
in corner scenarios when there is no document ingested(use case:
Bootstrap application).
\
Hence added as check, if the instance is panda Dataframe instead of list
then it will procced with return immediately.

- **Issue:** NA
- **Dependencies:** NA
- **Twitter handle:**  jaskiratsingh1

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2024-02-26 12:47:06 -08:00
am-kinetica
9b8f6455b1 Langchain vectorstore integration with Kinetica (#18102)
- **Description:** New vectorstore integration with the Kinetica
database
  - **Issue:** 
- **Dependencies:** the Kinetica Python API `pip install
gpudb==7.2.0.1`,
  - **Tag maintainer:** @baskaryan, @hwchase17 
  - **Twitter handle:**

---------

Co-authored-by: Chad Juliano <cjuliano@kinetica.com>
2024-02-26 12:46:48 -08:00
Bagatur
1e8ab83d7b langchain[patch], core[patch], openai[patch], fireworks[minor]: ChatFireworks.with_structured_output (#18078)
<img width="1192" alt="Screenshot 2024-02-24 at 3 39 39 PM"
src="https://github.com/langchain-ai/langchain/assets/22008038/1cf74774-a23f-4b06-9b9b-85dfa2f75b63">
2024-02-26 12:46:39 -08:00
GoodBai
3589a135ef community: make SET allow_experimental_[engine]_index configurabe in vectorstores.clickhouse (#18107)
## Description & Issue
While following the official doc to use clickhouse as a vectorstore, I
found only the default `annoy` index is properly supported. But I want
to try another engine `usearch` for `annoy` is not properly supported on
ARM platforms.
Here is the settings I prefer:

``` python
settings = ClickhouseSettings(
    table="wiki_Ethereum",
    index_type="usearch",  # annoy by default
    index_param=[],
)
```
The above settings do not work for the command `set
allow_experimental_annoy_index=1` is hard-coded.
This PR will make sure the experimental feature follow the `index_type`
which is also consistent with Clickhouse's naming conventions.
2024-02-26 12:39:17 -08:00
Dan Stambler
69344a0661 community: Add Laser Embedding Integration (#18111)
- **Description:** Added Integration with Meta AI's LASER
Language-Agnostic SEntence Representations embedding library, which
supports multilingual embedding for any of the languages listed here:
https://github.com/facebookresearch/flores/blob/main/flores200/README.md#languages-in-flores-200,
including several low resource languages
- **Dependencies:** laser_encoders
2024-02-26 12:16:37 -08:00
Erick Friis
257879e98d infra: api docs setup action location (#18148) 2024-02-26 11:50:21 -08:00
Erick Friis
28cf3aab45 infra: api docs build commit dir (#18147) 2024-02-26 11:47:04 -08:00
Heidi Steen
166f3d8351 Docs: azuresearch.ipynb (in docs/docs/integrations/vectorstores) -- fixed headings and comments (#18135)
This PR updates azuresearch.ipynb with an edit to the introduction
sentence, consistent heading levels, and disambiguation in code
comments.
2024-02-26 11:46:55 -08:00
Luan Fernandes
e867557936 [docs] Update doc-string for buffer_as_messages method in ConversationBufferWindowMemory (#18136)
minor fix stated in #18080
2024-02-26 11:46:43 -08:00
Barun Amalkumar Halder
23fc7c8c90 docs [patch] : fix import to use community path for handler in fiddler notebook (#18140)
**Description:** Update the example fiddler notebook to use community
path, instead of langchain.callback
**Dependencies:** None
**Twitter handle:** @bhalder

Co-authored-by: Barun Halder <barun@fiddler.ai>
2024-02-26 11:41:07 -08:00
Bagatur
767523f364 core[patch], langchain[patch], templates: move openai functions parsers to core (#18060)
![Screenshot 2024-02-23 at 7 48 03
PM](https://github.com/langchain-ai/langchain/assets/22008038/e5540c4d-0020-4ece-869f-ae19db2a1f3f)
2024-02-26 11:12:53 -08:00
Bagatur
96bff0ed5d infra: create api rst for specific pkg (#18144)
Example: create rst for libs/core only
```bash
poetry run python docs/api_reference/create_api_rst.py core
```
2024-02-26 11:04:22 -08:00
Nuno Campos
cd3ab3703b Improve runnable generator error messages (#18142)
h/t @hinthornw 

Thank you for contributing to LangChain!

- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
  - Example: "community: add foobar LLM"


- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
    - **Description:** a description of the change
    - **Issue:** the issue # it fixes, if applicable
    - **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!


- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
2024-02-26 18:54:25 +00:00
Nuno Campos
62a30efb12 Fix bug with using configurable_fields after configurable_alternatives (#18139)
Closes #17915
2024-02-26 10:27:07 -08:00
Erick Friis
f5cf6975ba docs: anthropic partner package docs (#18109) 2024-02-26 17:51:44 +00:00
Nuno Campos
b1d9ce541d Add BaseMessage.id (#17835)
Thank you for contributing to LangChain!

- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
  - Example: "community: add foobar LLM"


- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
    - **Description:** a description of the change
    - **Issue:** the issue # it fixes, if applicable
    - **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!


- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
2024-02-26 09:27:47 -08:00
Harrison Chase
935aefa8db add run name for query constructor (#18101)
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2024-02-26 08:17:05 -08:00
Mohammad Mohtashim
719a1cde75 langchain[patch]: Update doc-string for a method in ConversationBufferWindowMemory (#18090)
A minor doc fix stated in #18080
2024-02-26 10:15:02 -05:00
Simon Schmidt
2716d58603 langchain: Import from langchain_core in langchain.smith to avoid deprecation warning (#18129)
Avoids deprecation warning that triggered at import time, e.g. with
`python -c 'import langchain.smith'`


/opt/venv/lib/python3.12/site-packages/langchain/callbacks/__init__.py:37:
LangChainDeprecationWarning: Importing this callback from langchain is
deprecated. Importing it from langchain will no longer be supported as
of langchain==0.2.0. Please import from langchain-community instead:

    `from langchain_community.callbacks import base`.

To install langchain-community run `pip install -U langchain-community`.
2024-02-26 10:14:10 -05:00
rongchenlin
9147a437f1 docs: Fix the bug in MongoDBChatMessageHistory notebook (#18128)
I tried to configure MongoDBChatMessageHistory using the code from the
original documentation to store messages based on the passed session_id
in MongoDB. However, this configuration did not take effect, and the
session id in the database remained as 'test_session'. To resolve this
issue, I found that when configuring MongoDBChatMessageHistory, it is
necessary to set session_id=session_id instead of
session_id=test_session.

Issue: DOC: Ineffective Configuration of MongoDBChatMessageHistory for
Custom session_id Storage

previous code:
```python
chain_with_history = RunnableWithMessageHistory(
    chain,
    lambda session_id: MongoDBChatMessageHistory(
        session_id="test_session",
        connection_string="mongodb://root:Y181491117cLj@123.56.224.232:27017",
        database_name="my_db",
        collection_name="chat_histories",
    ),
    input_messages_key="question",
    history_messages_key="history",
)
config = {"configurable": {"session_id": "mmm"}}
chain_with_history.invoke({"question": "Hi! I'm bob"}, config)
```

![image](https://github.com/langchain-ai/langchain/assets/83388493/c372f785-1ec1-43f5-8d01-b7cc07b806b7)


Modified code:
```python
chain_with_history = RunnableWithMessageHistory(
    chain,
    lambda session_id: MongoDBChatMessageHistory(
        session_id=session_id,   # here is my modify code
        connection_string="mongodb://root:Y181491117cLj@123.56.224.232:27017",
        database_name="my_db",
        collection_name="chat_histories",
    ),
    input_messages_key="question",
    history_messages_key="history",
)
config = {"configurable": {"session_id": "mmm"}}
chain_with_history.invoke({"question": "Hi! I'm bob"}, config)
```

Effect after modification (it works):


![image](https://github.com/langchain-ai/langchain/assets/83388493/5776268c-9098-4da3-bf41-52825be5fafb)
2024-02-26 15:02:56 +00:00
Erick Friis
e3b7779926 docs: api docs for external repos (#17904)
Stacked on google removal PR. Will make google continue to show up in
API docs even from external repo
2024-02-26 06:19:09 +00:00
Erick Friis
248c5b84ee google-genai, google-vertexai: move to langchain-google (#17899)
These packages have moved to
https://github.com/langchain-ai/langchain-google

Left tombstone readmes incase anyone ends up at the "Source Code" link
from old pypi releases. Can keep these around for a few months.
2024-02-25 21:58:05 -08:00
Erick Friis
3b5bdbfee8 anthropic[minor]: package move (#17974) 2024-02-25 21:57:26 -08:00
Christophe Bornet
a2d5fa7649 community[patch]: Fix GenericRequestsWrapper _aget_resp_content must be async (#18065)
There are existing tests in
`libs/community/tests/unit_tests/tools/requests/test_tool.py`
2024-02-25 19:07:07 -08:00
Neli Hateva
a01e8473f8 community[patch]: Fix GraphSparqlQAChain so that it works with Ontotext GraphDB (#15009)
- **Description:** Introduce a new parameter `graph_kwargs` to
`RdfGraph` - parameters used to initialize the `rdflib.Graph` if
`query_endpoint` is set. Also, do not set
`rdflib.graph.DATASET_DEFAULT_GRAPH_ID` as default value for the
`rdflib.Graph` `identifier` if `query_endpoint` is set.
  - **Issue:** N/A
  - **Dependencies:** N/A
  - **Twitter handle:** N/A
2024-02-25 19:05:21 -08:00
Christophe Bornet
4d6cd5b46a astradb[patch]: Use astrapy's upsert_one method in AstraDBStore (#18063)
As `upsert` is deprecated
2024-02-25 19:04:18 -08:00
Danny McAteer
e42110f720 docs: Additional examples for partners/exa README (#18081)
**Description:** Add additional examples for other modules to
partners/exa README
**Issue:** #17545
**Dependencies:** None
**Twitter handle:** @DannyMcAteer8

---------

Co-authored-by: Daniel McAteer <danielmcateer@Daniels-MBP.attlocal.net>
Co-authored-by: Daniel McAteer <danielmcateer@Daniels-MacBook-Pro.local>
2024-02-25 18:53:47 -08:00
dokato
5afb242161 langchain[patch]: Make BooleanOutputParser more robust to non-binary responses (#17810)
- **Description:** I encountered this error when I tried to use
LLMChainFilter. Even if the message slightly differs, like `Not relevant
(NO)` this results in an error. It has been reported already here:
https://github.com/langchain-ai/langchain/issues/. This change hopefully
makes it more robust.
- **Issue:**  #11408 
- **Dependencies:** No
- **Twitter handle:** dokatox
2024-02-25 18:48:33 -08:00
Matt
3b08617a89 docs: update azure search langchain notebook (#18053)
**Description:** Update the azure search notebook to have more
descriptive comments, and an option to choose between OpenAI and
AzureOpenAI Embeddings

---------

Co-authored-by: Matt Gotteiner <[email protected]>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-02-25 18:48:13 -08:00
kYLe
17ecf6e119 community[patch]: Remove model limitation on Anyscale LLM (#17662)
**Description:** Llama Guard is deprecated from Anyscale public
endpoint.
**Issue:** Change the default model. and remove the limitation of only
use Llama Guard with Anyscale LLMs
Anyscale LLM can also works with all other Chat model hosted on
Anyscale.
Also added `async_client` for Anyscale LLM
2024-02-25 18:21:19 -08:00
Barun Amalkumar Halder
cc69976860 community[minor] : adds callback handler for Fiddler AI (#17708)
**Description:**  Callback handler to integrate fiddler with langchain. 
This PR adds the following -

1. `FiddlerCallbackHandler` implementation into langchain/community
2. Example notebook `fiddler.ipynb` for usage documentation

[Internal Tracker : FDL-14305]

**Issue:** 
NA

**Dependencies:** 
- Installation of langchain-community is unaffected.
- Usage of FiddlerCallbackHandler requires installation of latest
fiddler-client (2.5+)

**Twitter handle:** @fiddlerlabs @behalder

Co-authored-by: Barun Halder <barun@fiddler.ai>
2024-02-25 18:17:03 -08:00
Christophe Bornet
b8b5ce0c8c astradb: Add AstraDBChatMessageHistory to langchain-astradb package (#17732)
Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-02-25 18:14:49 -08:00
Maxime Perrin
c06a8732aa community[patch]: fix llama index imports and fields access (#17870)
- **Description:** Fixing outdated imports after v0.10 llama index
update and updating metadata and source text access
  - **Issue:** #17860
  - **Twitter handle:** @maximeperrin_

---------

Co-authored-by: Maxime Perrin <mperrin@doing.fr>
2024-02-25 18:14:23 -08:00
BeatrixCohere
5d2d80a9a8 docs: Add Cohere examples in documentation (#17794)
- Description: Add cohere examples to documentation 
- Issue:N/A
- Dependencies: N/A

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-02-25 18:10:09 -08:00
Jacob Lee
c9eac3287e docs[patch]: Remove redundant Pinecone import (#18079)
CC @efriis
2024-02-24 19:27:54 -08:00
2jimoo
7fc903464a community: Add document manager and mongo document manager (#17320)
- **Description:** 
    - Add DocumentManager class, which is a nosql record manager. 
- In order to use index and aindex in
libs/langchain/langchain/indexes/_api.py, DocumentManager inherits
RecordManager.
    - Also I added the MongoDB implementation of Document Manager too.
  - **Dependencies:** pymongo, motor
  
<!-- Thank you for contributing to LangChain!

Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.

Replace this entire comment with:
- **Description:** Add DocumentManager class, which is a no sql record
manager. To use index method and aindex method in indexes._api.py,
Document Manager inherits RecordManager.Add the MongoDB implementation
of Document Manager.
  - **Dependencies:** pymongo, motor

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2024-02-23 21:32:52 -05:00
Leonid Ganeline
3f6bf852ea experimental: docstrings update (#18048)
Added missed docstrings. Formatted docsctrings to the consistent format.
2024-02-23 21:24:16 -05:00
kYLe
56b955fc31 community[minor]: Add async_client for Anyscale Chat model (#18050)
Add `async_client` for Anyscale Chat_model
2024-02-23 21:22:54 -05:00
Eugene Yurtsev
68527b809d core[patch]: Runnable with message history to use add_messages (#17958)
This PR updates RunnableWithMessageHistory to use add_messages which
will save on round-trips for any chat
history abstractions that implement the optimization. If the
optimization isn't
implemented, add_messages automatically invokes add_message serially.
2024-02-23 21:19:38 -05:00
Bagatur
1c1bb1152e openai[patch]: refactor with_structured_output (#18052)
- make schema Optional with default val None, since in json_mode you
don't need it if not parsing to pydantic
- change return_type -> include_raw
- expand docstring examples
2024-02-23 17:02:11 -08:00
Erick Friis
e85948d46b docs: fireworks tool calling docs (#18057) 2024-02-24 00:49:11 +00:00
Erick Friis
e566a3077e infra: simplify and fix CI for docs-only changes (#18058)
Current success check will fail on docs-only changes
2024-02-23 16:39:08 -08:00
Erick Friis
1a3383fba1 docs: fireworks fixes (#18056) 2024-02-23 15:58:53 -08:00
Erick Friis
a05fb19f42 openai[patch]: remove numpy dep (#18034) 2024-02-23 21:12:05 +00:00
Danny McAteer
e8be34f8c7 exa[patch]: update readme (#18047) 2024-02-23 21:05:42 +00:00
Yufei (Benny) Chen
ee6a773456 fireworks[patch]: Add Fireworks partner packages (#17694)
---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-02-23 20:45:47 +00:00
Erick Friis
11cf95e810 docs: recommend lambdas over runnablebranch (#18033) 2024-02-23 11:34:27 -08:00
Erick Friis
9ebbca3695 infra: CI success for partner packages 2 (#18043) 2024-02-23 11:10:39 -08:00
Erick Friis
b948f6da67 infra: CI success for partner packages (#18037)
Thank you for contributing to LangChain!

- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
  - Example: "community: add foobar LLM"


- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
    - **Description:** a description of the change
    - **Issue:** the issue # it fixes, if applicable
    - **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!


- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
2024-02-23 11:00:48 -08:00
Bagatur
22b964f802 community[patch]: Release 0.0.24 (#18038) 2024-02-23 10:49:29 -08:00
406 changed files with 40424 additions and 12279 deletions

View File

@@ -3,18 +3,18 @@ body:
- type: markdown
attributes:
value: |
Thanks for your interest in 🦜️🔗 LangChain!
Thanks for your interest in LangChain 🦜️🔗!
Please follow these instructions, fill every question, and do every step. 🙏
We're asking for this because answering questions and solving problems in GitHub takes a lot of time --
this is time that we cannot spend on adding new features, fixing bugs, write documentation or reviewing pull requests.
this is time that we cannot spend on adding new features, fixing bugs, writing documentation or reviewing pull requests.
By asking questions in a structured way (following this) it will be much easier to help you.
By asking questions in a structured way (following this) it will be much easier for us to help you.
And there's a high chance that you will find the solution along the way and you won't even have to submit it and wait for an answer. 😎
There's a high chance that by following this process, you'll find the solution on your own, eliminating the need to submit a question and wait for an answer. 😎
As there are too many questions, we will **DISCARD** and close the incomplete ones.
As there are many questions submitted every day, we will **DISCARD** and close the incomplete ones.
That will allow us (and others) to focus on helping people like you that follow the whole process. 🤓

View File

@@ -35,6 +35,8 @@ body:
required: true
- label: I am sure that this is a bug in LangChain rather than my code.
required: true
- label: The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).
required: true
- type: textarea
id: reproduction
validations:

View File

@@ -9,7 +9,7 @@ body:
If you are not a LangChain maintainer or were not asked directly by a maintainer to create an issue, then please start the conversation in a [Question in GitHub Discussions](https://github.com/langchain-ai/langchain/discussions/categories/q-a) instead.
You are a LangChain maintainer if you maintain any of the packages inside of the LangChain repository
or are a regular contributor to LangChain with previous merged merged pull requests.
or are a regular contributor to LangChain with previous merged pull requests.
- type: checkboxes
id: privileged
attributes:

View File

@@ -1,17 +1,23 @@
import json
import sys
import os
from typing import Dict
LANGCHAIN_DIRS = {
LANGCHAIN_DIRS = [
"libs/core",
"libs/langchain",
"libs/experimental",
"libs/community",
}
]
if __name__ == "__main__":
files = sys.argv[1:]
dirs_to_run = set()
dirs_to_run: Dict[str, set] = {
"lint": set(),
"test": set(),
"extended-test": set(),
}
if len(files) == 300:
# max diff length is 300 files - there are likely files missing
@@ -24,31 +30,42 @@ if __name__ == "__main__":
".github/workflows",
".github/tools",
".github/actions",
"libs/core",
".github/scripts/check_diff.py",
)
):
dirs_to_run.update(LANGCHAIN_DIRS)
elif "libs/community" in file:
dirs_to_run.update(
("libs/community", "libs/langchain", "libs/experimental")
)
elif "libs/partners" in file:
# add all LANGCHAIN_DIRS for infra changes
dirs_to_run["extended-test"].update(LANGCHAIN_DIRS)
dirs_to_run["lint"].add(".")
if any(file.startswith(dir_) for dir_ in LANGCHAIN_DIRS):
# add that dir and all dirs after in LANGCHAIN_DIRS
# for extended testing
found = False
for dir_ in LANGCHAIN_DIRS:
if file.startswith(dir_):
found = True
if found:
dirs_to_run["extended-test"].add(dir_)
elif file.startswith("libs/partners"):
partner_dir = file.split("/")[2]
if os.path.isdir(f"libs/partners/{partner_dir}"):
dirs_to_run.add(f"libs/partners/{partner_dir}")
dirs_to_run["test"].add(f"libs/partners/{partner_dir}")
# Skip if the directory was deleted
elif "libs/langchain" in file:
dirs_to_run.update(("libs/langchain", "libs/experimental"))
elif "libs/experimental" in file:
dirs_to_run.add("libs/experimental")
elif file.startswith("libs/"):
dirs_to_run.update(LANGCHAIN_DIRS)
else:
pass
json_output = json.dumps(list(dirs_to_run))
print(f"dirs-to-run={json_output}") # noqa: T201
raise ValueError(
f"Unknown lib: {file}. check_diff.py likely needs "
"an update for this new library!"
)
elif any(file.startswith(p) for p in ["docs/", "templates/", "cookbook/"]):
dirs_to_run["lint"].add(".")
extended_test_dirs = [d for d in dirs_to_run if not d.startswith("libs/partners")]
json_output_extended = json.dumps(extended_test_dirs)
print(f"dirs-to-run-extended={json_output_extended}") # noqa: T201
outputs = {
"dirs-to-lint": list(
dirs_to_run["lint"] | dirs_to_run["test"] | dirs_to_run["extended-test"]
),
"dirs-to-test": list(dirs_to_run["test"] | dirs_to_run["extended-test"]),
"dirs-to-extended-test": list(dirs_to_run["extended-test"]),
}
for key, value in outputs.items():
json_output = json.dumps(value)
print(f"{key}={json_output}") # noqa: T201

View File

@@ -63,6 +63,8 @@ jobs:
- name: Install the opposite major version of pydantic
# If normal tests use pydantic v1, here we'll use v2, and vice versa.
shell: bash
# airbyte currently doesn't support pydantic v2
if: ${{ !startsWith(inputs.working-directory, 'libs/partners/airbyte') }}
run: |
# Determine the major part of pydantic version
REGULAR_VERSION=$(poetry run python -c "import pydantic; print(pydantic.__version__)" | cut -d. -f1)
@@ -97,6 +99,8 @@ jobs:
fi
echo "Found pydantic version ${CURRENT_VERSION}, as expected"
- name: Run pydantic compatibility tests
# airbyte currently doesn't support pydantic v2
if: ${{ !startsWith(inputs.working-directory, 'libs/partners/airbyte') }}
shell: bash
run: make test

View File

@@ -70,6 +70,10 @@ jobs:
ASTRA_DB_API_ENDPOINT: ${{ secrets.ASTRA_DB_API_ENDPOINT }}
ASTRA_DB_APPLICATION_TOKEN: ${{ secrets.ASTRA_DB_APPLICATION_TOKEN }}
ASTRA_DB_KEYSPACE: ${{ secrets.ASTRA_DB_KEYSPACE }}
ES_URL: ${{ secrets.ES_URL }}
ES_CLOUD_ID: ${{ secrets.ES_CLOUD_ID }}
ES_API_KEY: ${{ secrets.ES_API_KEY }}
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} # for airbyte
run: |
make integration_tests

View File

@@ -191,6 +191,10 @@ jobs:
ASTRA_DB_API_ENDPOINT: ${{ secrets.ASTRA_DB_API_ENDPOINT }}
ASTRA_DB_APPLICATION_TOKEN: ${{ secrets.ASTRA_DB_APPLICATION_TOKEN }}
ASTRA_DB_KEYSPACE: ${{ secrets.ASTRA_DB_KEYSPACE }}
ES_URL: ${{ secrets.ES_URL }}
ES_CLOUD_ID: ${{ secrets.ES_CLOUD_ID }}
ES_API_KEY: ${{ secrets.ES_API_KEY }}
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} # for airbyte
run: make integration_tests
working-directory: ${{ inputs.working-directory }}

View File

@@ -15,32 +15,48 @@ jobs:
- uses: actions/checkout@v4
with:
ref: bagatur/api_docs_build
path: langchain
- uses: actions/checkout@v4
with:
repository: langchain-ai/langchain-google
path: langchain-google
- name: Move google libs
run: |
rm -rf langchain/libs/partners/google-genai langchain/libs/partners/google-vertexai
mv langchain-google/libs/genai langchain/libs/partners/google-genai
mv langchain-google/libs/vertexai langchain/libs/partners/google-vertexai
- name: Set Git config
working-directory: langchain
run: |
git config --local user.email "actions@github.com"
git config --local user.name "Github Actions"
- name: Merge master
working-directory: langchain
run: |
git fetch origin master
git merge origin/master -m "Merge master" --allow-unrelated-histories -X theirs
- name: Set up Python ${{ env.PYTHON_VERSION }} + Poetry ${{ env.POETRY_VERSION }}
uses: "./.github/actions/poetry_setup"
uses: "./langchain/.github/actions/poetry_setup"
with:
python-version: ${{ env.PYTHON_VERSION }}
poetry-version: ${{ env.POETRY_VERSION }}
cache-key: api-docs
working-directory: langchain
- name: Install dependencies
working-directory: langchain
run: |
poetry run python -m pip install --upgrade --no-cache-dir pip setuptools
poetry run python -m pip install --upgrade --no-cache-dir sphinx readthedocs-sphinx-ext
poetry run python -m pip install ./libs/partners/*
# skip airbyte and ibm due to pandas dependency issue
poetry run python -m pip install $(ls ./libs/partners | grep -vE "airbyte|ibm" | xargs -I {} echo "./libs/partners/{}")
poetry run python -m pip install --exists-action=w --no-cache-dir -r docs/api_reference/requirements.txt
- name: Build docs
working-directory: langchain
run: |
poetry run python -m pip install --upgrade --no-cache-dir pip setuptools
poetry run python docs/api_reference/create_api_rst.py
@@ -49,4 +65,5 @@ jobs:
# https://github.com/marketplace/actions/add-commit
- uses: EndBug/add-and-commit@v9
with:
cwd: langchain
message: 'Update API docs build'

View File

@@ -33,14 +33,16 @@ jobs:
run: |
python .github/scripts/check_diff.py ${{ steps.files.outputs.all }} >> $GITHUB_OUTPUT
outputs:
dirs-to-run: ${{ steps.set-matrix.outputs.dirs-to-run }}
dirs-to-run-extended: ${{ steps.set-matrix.outputs.dirs-to-run-extended }}
dirs-to-lint: ${{ steps.set-matrix.outputs.dirs-to-lint }}
dirs-to-test: ${{ steps.set-matrix.outputs.dirs-to-test }}
dirs-to-extended-test: ${{ steps.set-matrix.outputs.dirs-to-extended-test }}
lint:
name: cd ${{ matrix.working-directory }}
needs: [ build ]
if: ${{ needs.build.outputs.dirs-to-lint != '[]' }}
strategy:
matrix:
working-directory: ${{ fromJson(needs.build.outputs.dirs-to-run) }}
working-directory: ${{ fromJson(needs.build.outputs.dirs-to-lint) }}
uses: ./.github/workflows/_lint.yml
with:
working-directory: ${{ matrix.working-directory }}
@@ -49,9 +51,10 @@ jobs:
test:
name: cd ${{ matrix.working-directory }}
needs: [ build ]
if: ${{ needs.build.outputs.dirs-to-test != '[]' }}
strategy:
matrix:
working-directory: ${{ fromJson(needs.build.outputs.dirs-to-run) }}
working-directory: ${{ fromJson(needs.build.outputs.dirs-to-test) }}
uses: ./.github/workflows/_test.yml
with:
working-directory: ${{ matrix.working-directory }}
@@ -60,9 +63,10 @@ jobs:
compile-integration-tests:
name: cd ${{ matrix.working-directory }}
needs: [ build ]
if: ${{ needs.build.outputs.dirs-to-test != '[]' }}
strategy:
matrix:
working-directory: ${{ fromJson(needs.build.outputs.dirs-to-run) }}
working-directory: ${{ fromJson(needs.build.outputs.dirs-to-test) }}
uses: ./.github/workflows/_compile_integration_test.yml
with:
working-directory: ${{ matrix.working-directory }}
@@ -71,9 +75,10 @@ jobs:
dependencies:
name: cd ${{ matrix.working-directory }}
needs: [ build ]
if: ${{ needs.build.outputs.dirs-to-test != '[]' }}
strategy:
matrix:
working-directory: ${{ fromJson(needs.build.outputs.dirs-to-run) }}
working-directory: ${{ fromJson(needs.build.outputs.dirs-to-test) }}
uses: ./.github/workflows/_dependencies.yml
with:
working-directory: ${{ matrix.working-directory }}
@@ -82,10 +87,11 @@ jobs:
extended-tests:
name: "cd ${{ matrix.working-directory }} / make extended_tests #${{ matrix.python-version }}"
needs: [ build ]
if: ${{ needs.build.outputs.dirs-to-extended-test != '[]' }}
strategy:
matrix:
# note different variable for extended test dirs
working-directory: ${{ fromJson(needs.build.outputs.dirs-to-run-extended) }}
working-directory: ${{ fromJson(needs.build.outputs.dirs-to-extended-test) }}
python-version:
- "3.8"
- "3.9"

View File

@@ -32,6 +32,6 @@ jobs:
- name: Codespell
uses: codespell-project/actions-codespell@v2
with:
skip: guide_imports.json,*.ambr
skip: guide_imports.json,*.ambr,./cookbook/data/imdb_top_1000.csv
ignore_words_list: ${{ steps.extract_ignore_words.outputs.ignore_words_list }}
exclude_file: libs/community/langchain_community/llms/yuan2.py

View File

@@ -1,37 +0,0 @@
---
name: CI / cd .
on:
push:
branches: [ master ]
pull_request:
paths:
- 'docs/**'
- 'templates/**'
- 'cookbook/**'
- '.github/workflows/_lint.yml'
- '.github/workflows/doc_lint.yml'
workflow_dispatch:
jobs:
check:
name: Check for "from langchain import x" imports
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@v4
- name: Run import check
run: |
# We should not encourage imports directly from main init file
# Expect for hub
git grep 'from langchain import' {docs/docs,templates,cookbook} | grep -vE 'from langchain import (hub)' && exit 1 || exit 0
lint:
name: "-"
uses:
./.github/workflows/_lint.yml
with:
working-directory: "."
secrets: inherit

5
.gitignore vendored
View File

@@ -115,13 +115,10 @@ celerybeat.pid
# Environments
.env
.envrc
.venv
.venvs
.venv*
env/
venv/
ENV/
env.bak/
venv.bak/
# Spyder project settings
.spyderproject

View File

@@ -50,11 +50,13 @@ lint lint_package lint_tests:
poetry run ruff docs templates cookbook
poetry run ruff format docs templates cookbook --diff
poetry run ruff --select I docs templates cookbook
git grep 'from langchain import' {docs/docs,templates,cookbook} | grep -vE 'from langchain import (hub)' && exit 1 || exit 0
format format_diff:
poetry run ruff format docs templates cookbook
poetry run ruff --select I --fix docs templates cookbook
######################
# HELP
######################

View File

@@ -0,0 +1,200 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Note: you may need to restart the kernel to use updated packages.\n"
]
}
],
"source": [
"%pip install -qU langchain-airbyte"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"import getpass\n",
"\n",
"GITHUB_TOKEN = getpass.getpass()"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [],
"source": [
"from langchain_airbyte import AirbyteLoader\n",
"from langchain_core.prompts import PromptTemplate\n",
"\n",
"loader = AirbyteLoader(\n",
" source=\"source-github\",\n",
" stream=\"pull_requests\",\n",
" config={\n",
" \"credentials\": {\"personal_access_token\": GITHUB_TOKEN},\n",
" \"repositories\": [\"langchain-ai/langchain\"],\n",
" },\n",
" template=PromptTemplate.from_template(\n",
" \"\"\"# {title}\n",
"by {user[login]}\n",
"\n",
"{body}\"\"\"\n",
" ),\n",
" include_metadata=False,\n",
")\n",
"docs = loader.load()"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"# Updated partners/ibm README\n",
"by williamdevena\n",
"\n",
"## PR title\n",
"partners: changed the README file for the IBM Watson AI integration in the libs/partners/ibm folder.\n",
"\n",
"## PR message\n",
"Description: Changed the README file of partners/ibm following the docs on https://python.langchain.com/docs/integrations/llms/ibm_watsonx\n",
"\n",
"The README includes:\n",
"\n",
"- Brief description\n",
"- Installation\n",
"- Setting-up instructions (API key, project id, ...)\n",
"- Basic usage:\n",
" - Loading the model\n",
" - Direct inference\n",
" - Chain invoking\n",
" - Streaming the model output\n",
" \n",
"Issue: https://github.com/langchain-ai/langchain/issues/17545\n",
"\n",
"Dependencies: None\n",
"\n",
"Twitter handle: None\n"
]
}
],
"source": [
"print(docs[-2].page_content)"
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"10283"
]
},
"execution_count": 39,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"len(docs)"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [],
"source": [
"import tiktoken\n",
"from langchain_community.vectorstores import Chroma\n",
"from langchain_openai import OpenAIEmbeddings\n",
"\n",
"enc = tiktoken.get_encoding(\"cl100k_base\")\n",
"\n",
"vectorstore = Chroma.from_documents(\n",
" docs,\n",
" embedding=OpenAIEmbeddings(\n",
" disallowed_special=(enc.special_tokens_set - {\"<|endofprompt|>\"})\n",
" ),\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {},
"outputs": [],
"source": [
"retriever = vectorstore.as_retriever()"
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[Document(page_content='# Updated partners/ibm README\\nby williamdevena\\n\\n## PR title\\r\\npartners: changed the README file for the IBM Watson AI integration in the libs/partners/ibm folder.\\r\\n\\r\\n## PR message\\r\\nDescription: Changed the README file of partners/ibm following the docs on https://python.langchain.com/docs/integrations/llms/ibm_watsonx\\r\\n\\r\\nThe README includes:\\r\\n\\r\\n- Brief description\\r\\n- Installation\\r\\n- Setting-up instructions (API key, project id, ...)\\r\\n- Basic usage:\\r\\n - Loading the model\\r\\n - Direct inference\\r\\n - Chain invoking\\r\\n - Streaming the model output\\r\\n \\r\\nIssue: https://github.com/langchain-ai/langchain/issues/17545\\r\\n\\r\\nDependencies: None\\r\\n\\r\\nTwitter handle: None'),\n",
" Document(page_content='# Updated partners/ibm README\\nby williamdevena\\n\\n## PR title\\r\\npartners: changed the README file for the IBM Watson AI integration in the `libs/partners/ibm` folder. \\r\\n\\r\\n\\r\\n\\r\\n## PR message\\r\\n- **Description:** Changed the README file of partners/ibm following the docs on https://python.langchain.com/docs/integrations/llms/ibm_watsonx\\r\\n\\r\\n The README includes:\\r\\n - Brief description\\r\\n - Installation\\r\\n - Setting-up instructions (API key, project id, ...)\\r\\n - Basic usage:\\r\\n - Loading the model\\r\\n - Direct inference\\r\\n - Chain invoking\\r\\n - Streaming the model output\\r\\n\\r\\n\\r\\n- **Issue:** #17545\\r\\n- **Dependencies:** None\\r\\n- **Twitter handle:** None'),\n",
" Document(page_content='# IBM: added partners package `langchain_ibm`, added llm\\nby MateuszOssGit\\n\\n - **Description:** Added `langchain_ibm` as an langchain partners package of IBM [watsonx.ai](https://www.ibm.com/products/watsonx-ai) LLM provider (`WatsonxLLM`)\\r\\n - **Dependencies:** [ibm-watsonx-ai](https://pypi.org/project/ibm-watsonx-ai/),\\r\\n - **Tag maintainer:** : \\r\\n\\r\\nPlease make sure your PR is passing linting and testing before submitting. Run `make format`, `make lint` and `make test` to check this locally. ✅'),\n",
" Document(page_content='# Add WatsonX support\\nby baptistebignaud\\n\\nIt is a connector to use a LLM from WatsonX.\\r\\nIt requires python SDK \"ibm-generative-ai\"\\r\\n\\r\\n(It might not be perfect since it is my first PR on a public repository 😄)')]"
]
},
"execution_count": 42,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"retriever.invoke(\"pull requests related to IBM\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": ".venv",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.4"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,245 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "0fc0309d-4d49-4bb5-bec0-bd92c6fddb28",
"metadata": {},
"source": [
"## Fireworks.AI + LangChain + RAG\n",
" \n",
"[Fireworks AI](https://python.langchain.com/docs/integrations/llms/fireworks) wants to provide the best experience when working with LangChain, and here is an example of Fireworks + LangChain doing RAG\n",
"\n",
"See [our models page](https://fireworks.ai/models) for the full list of models. We use `accounts/fireworks/models/mixtral-8x7b-instruct` for RAG In this tutorial.\n",
"\n",
"For the RAG target, we will use the Gemma technical report https://storage.googleapis.com/deepmind-media/gemma/gemma-report.pdf "
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "d12fb75a-f707-48d5-82a5-efe2d041813c",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m23.2.1\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m24.0\u001b[0m\n",
"\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpip install --upgrade pip\u001b[0m\n",
"Note: you may need to restart the kernel to use updated packages.\n",
"Found existing installation: langchain-fireworks 0.0.1\n",
"Uninstalling langchain-fireworks-0.0.1:\n",
" Successfully uninstalled langchain-fireworks-0.0.1\n",
"Note: you may need to restart the kernel to use updated packages.\n",
"Obtaining file:///mnt/disks/data/langchain/libs/partners/fireworks\n",
" Installing build dependencies ... \u001b[?25ldone\n",
"\u001b[?25h Checking if build backend supports build_editable ... \u001b[?25ldone\n",
"\u001b[?25h Getting requirements to build editable ... \u001b[?25ldone\n",
"\u001b[?25h Preparing editable metadata (pyproject.toml) ... \u001b[?25ldone\n",
"\u001b[?25hRequirement already satisfied: aiohttp<4.0.0,>=3.9.1 in /mnt/disks/data/langchain/.venv/lib/python3.9/site-packages (from langchain-fireworks==0.0.1) (3.9.3)\n",
"Requirement already satisfied: fireworks-ai<0.13.0,>=0.12.0 in /mnt/disks/data/langchain/.venv/lib/python3.9/site-packages (from langchain-fireworks==0.0.1) (0.12.0)\n",
"Requirement already satisfied: langchain-core<0.2,>=0.1 in /mnt/disks/data/langchain/.venv/lib/python3.9/site-packages (from langchain-fireworks==0.0.1) (0.1.23)\n",
"Requirement already satisfied: requests<3,>=2 in /mnt/disks/data/langchain/.venv/lib/python3.9/site-packages (from langchain-fireworks==0.0.1) (2.31.0)\n",
"Requirement already satisfied: aiosignal>=1.1.2 in /mnt/disks/data/langchain/.venv/lib/python3.9/site-packages (from aiohttp<4.0.0,>=3.9.1->langchain-fireworks==0.0.1) (1.3.1)\n",
"Requirement already satisfied: attrs>=17.3.0 in /mnt/disks/data/langchain/.venv/lib/python3.9/site-packages (from aiohttp<4.0.0,>=3.9.1->langchain-fireworks==0.0.1) (23.1.0)\n",
"Requirement already satisfied: frozenlist>=1.1.1 in /mnt/disks/data/langchain/.venv/lib/python3.9/site-packages (from aiohttp<4.0.0,>=3.9.1->langchain-fireworks==0.0.1) (1.4.0)\n",
"Requirement already satisfied: multidict<7.0,>=4.5 in /mnt/disks/data/langchain/.venv/lib/python3.9/site-packages (from aiohttp<4.0.0,>=3.9.1->langchain-fireworks==0.0.1) (6.0.4)\n",
"Requirement already satisfied: yarl<2.0,>=1.0 in /mnt/disks/data/langchain/.venv/lib/python3.9/site-packages (from aiohttp<4.0.0,>=3.9.1->langchain-fireworks==0.0.1) (1.9.2)\n",
"Requirement already satisfied: async-timeout<5.0,>=4.0 in /mnt/disks/data/langchain/.venv/lib/python3.9/site-packages (from aiohttp<4.0.0,>=3.9.1->langchain-fireworks==0.0.1) (4.0.3)\n",
"Requirement already satisfied: httpx in /mnt/disks/data/langchain/.venv/lib/python3.9/site-packages (from fireworks-ai<0.13.0,>=0.12.0->langchain-fireworks==0.0.1) (0.26.0)\n",
"Requirement already satisfied: httpx-sse in /mnt/disks/data/langchain/.venv/lib/python3.9/site-packages (from fireworks-ai<0.13.0,>=0.12.0->langchain-fireworks==0.0.1) (0.4.0)\n",
"Requirement already satisfied: pydantic in /mnt/disks/data/langchain/.venv/lib/python3.9/site-packages (from fireworks-ai<0.13.0,>=0.12.0->langchain-fireworks==0.0.1) (2.4.2)\n",
"Requirement already satisfied: Pillow in /mnt/disks/data/langchain/.venv/lib/python3.9/site-packages (from fireworks-ai<0.13.0,>=0.12.0->langchain-fireworks==0.0.1) (10.2.0)\n",
"Requirement already satisfied: PyYAML>=5.3 in /mnt/disks/data/langchain/.venv/lib/python3.9/site-packages (from langchain-core<0.2,>=0.1->langchain-fireworks==0.0.1) (6.0.1)\n",
"Requirement already satisfied: anyio<5,>=3 in /mnt/disks/data/langchain/.venv/lib/python3.9/site-packages (from langchain-core<0.2,>=0.1->langchain-fireworks==0.0.1) (3.7.1)\n",
"Requirement already satisfied: jsonpatch<2.0,>=1.33 in /mnt/disks/data/langchain/.venv/lib/python3.9/site-packages (from langchain-core<0.2,>=0.1->langchain-fireworks==0.0.1) (1.33)\n",
"Requirement already satisfied: langsmith<0.2.0,>=0.1.0 in /mnt/disks/data/langchain/.venv/lib/python3.9/site-packages (from langchain-core<0.2,>=0.1->langchain-fireworks==0.0.1) (0.1.5)\n",
"Requirement already satisfied: packaging<24.0,>=23.2 in /mnt/disks/data/langchain/.venv/lib/python3.9/site-packages (from langchain-core<0.2,>=0.1->langchain-fireworks==0.0.1) (23.2)\n",
"Requirement already satisfied: tenacity<9.0.0,>=8.1.0 in /mnt/disks/data/langchain/.venv/lib/python3.9/site-packages (from langchain-core<0.2,>=0.1->langchain-fireworks==0.0.1) (8.2.3)\n",
"Requirement already satisfied: charset-normalizer<4,>=2 in /mnt/disks/data/langchain/.venv/lib/python3.9/site-packages (from requests<3,>=2->langchain-fireworks==0.0.1) (3.3.0)\n",
"Requirement already satisfied: idna<4,>=2.5 in /mnt/disks/data/langchain/.venv/lib/python3.9/site-packages (from requests<3,>=2->langchain-fireworks==0.0.1) (3.4)\n",
"Requirement already satisfied: urllib3<3,>=1.21.1 in /mnt/disks/data/langchain/.venv/lib/python3.9/site-packages (from requests<3,>=2->langchain-fireworks==0.0.1) (2.0.6)\n",
"Requirement already satisfied: certifi>=2017.4.17 in /mnt/disks/data/langchain/.venv/lib/python3.9/site-packages (from requests<3,>=2->langchain-fireworks==0.0.1) (2023.7.22)\n",
"Requirement already satisfied: sniffio>=1.1 in /mnt/disks/data/langchain/.venv/lib/python3.9/site-packages (from anyio<5,>=3->langchain-core<0.2,>=0.1->langchain-fireworks==0.0.1) (1.3.0)\n",
"Requirement already satisfied: exceptiongroup in /mnt/disks/data/langchain/.venv/lib/python3.9/site-packages (from anyio<5,>=3->langchain-core<0.2,>=0.1->langchain-fireworks==0.0.1) (1.1.3)\n",
"Requirement already satisfied: jsonpointer>=1.9 in /mnt/disks/data/langchain/.venv/lib/python3.9/site-packages (from jsonpatch<2.0,>=1.33->langchain-core<0.2,>=0.1->langchain-fireworks==0.0.1) (2.4)\n",
"Requirement already satisfied: annotated-types>=0.4.0 in /mnt/disks/data/langchain/.venv/lib/python3.9/site-packages (from pydantic->fireworks-ai<0.13.0,>=0.12.0->langchain-fireworks==0.0.1) (0.5.0)\n",
"Requirement already satisfied: pydantic-core==2.10.1 in /mnt/disks/data/langchain/.venv/lib/python3.9/site-packages (from pydantic->fireworks-ai<0.13.0,>=0.12.0->langchain-fireworks==0.0.1) (2.10.1)\n",
"Requirement already satisfied: typing-extensions>=4.6.1 in /mnt/disks/data/langchain/.venv/lib/python3.9/site-packages (from pydantic->fireworks-ai<0.13.0,>=0.12.0->langchain-fireworks==0.0.1) (4.8.0)\n",
"Requirement already satisfied: httpcore==1.* in /mnt/disks/data/langchain/.venv/lib/python3.9/site-packages (from httpx->fireworks-ai<0.13.0,>=0.12.0->langchain-fireworks==0.0.1) (1.0.2)\n",
"Requirement already satisfied: h11<0.15,>=0.13 in /mnt/disks/data/langchain/.venv/lib/python3.9/site-packages (from httpcore==1.*->httpx->fireworks-ai<0.13.0,>=0.12.0->langchain-fireworks==0.0.1) (0.14.0)\n",
"Building wheels for collected packages: langchain-fireworks\n",
" Building editable for langchain-fireworks (pyproject.toml) ... \u001b[?25ldone\n",
"\u001b[?25h Created wheel for langchain-fireworks: filename=langchain_fireworks-0.0.1-py3-none-any.whl size=2228 sha256=564071b120b09ec31f2dc737733448a33bbb26e40b49fcde0c129ad26045259d\n",
" Stored in directory: /tmp/pip-ephem-wheel-cache-oz368vdk/wheels/e0/ad/31/d7e76dd73d61905ff7f369f5b0d21a4b5e7af4d3cb7487aece\n",
"Successfully built langchain-fireworks\n",
"Installing collected packages: langchain-fireworks\n",
"Successfully installed langchain-fireworks-0.0.1\n",
"\n",
"\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m23.2.1\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m24.0\u001b[0m\n",
"\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpip install --upgrade pip\u001b[0m\n",
"Note: you may need to restart the kernel to use updated packages.\n"
]
}
],
"source": [
"%pip install --quiet pypdf chromadb tiktoken openai \n",
"%pip uninstall -y langchain-fireworks\n",
"%pip install --editable /mnt/disks/data/langchain/libs/partners/fireworks"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "cf719376",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<module 'fireworks' from '/mnt/disks/data/langchain/.venv/lib/python3.9/site-packages/fireworks/__init__.py'>\n"
]
}
],
"source": [
"import fireworks\n",
"\n",
"print(fireworks)\n",
"import fireworks.client"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9ab49327-0532-4480-804c-d066c302a322",
"metadata": {},
"outputs": [],
"source": [
"# Load\n",
"import requests\n",
"from langchain_community.document_loaders import PyPDFLoader\n",
"\n",
"# Download the PDF from a URL and save it to a temporary location\n",
"url = \"https://storage.googleapis.com/deepmind-media/gemma/gemma-report.pdf\"\n",
"response = requests.get(url, stream=True)\n",
"file_name = \"temp_file.pdf\"\n",
"with open(file_name, \"wb\") as pdf:\n",
" pdf.write(response.content)\n",
"\n",
"loader = PyPDFLoader(file_name)\n",
"data = loader.load()\n",
"\n",
"# Split\n",
"from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
"\n",
"text_splitter = RecursiveCharacterTextSplitter(chunk_size=2000, chunk_overlap=0)\n",
"all_splits = text_splitter.split_documents(data)\n",
"\n",
"# Add to vectorDB\n",
"from langchain_community.vectorstores import Chroma\n",
"from langchain_fireworks.embeddings import FireworksEmbeddings\n",
"\n",
"vectorstore = Chroma.from_documents(\n",
" documents=all_splits,\n",
" collection_name=\"rag-chroma\",\n",
" embedding=FireworksEmbeddings(),\n",
")\n",
"\n",
"retriever = vectorstore.as_retriever()"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "4efaddd9-3dbb-455c-ba54-0ad7f2d2ce0f",
"metadata": {},
"outputs": [],
"source": [
"from langchain_core.output_parsers import StrOutputParser\n",
"from langchain_core.prompts import ChatPromptTemplate\n",
"from langchain_core.pydantic_v1 import BaseModel\n",
"from langchain_core.runnables import RunnableParallel, RunnablePassthrough\n",
"\n",
"# RAG prompt\n",
"template = \"\"\"Answer the question based only on the following context:\n",
"{context}\n",
"\n",
"Question: {question}\n",
"\"\"\"\n",
"prompt = ChatPromptTemplate.from_template(template)\n",
"\n",
"# LLM\n",
"from langchain_together import Together\n",
"\n",
"llm = Together(\n",
" model=\"mistralai/Mixtral-8x7B-Instruct-v0.1\",\n",
" temperature=0.0,\n",
" max_tokens=2000,\n",
" top_k=1,\n",
")\n",
"\n",
"# RAG chain\n",
"chain = (\n",
" RunnableParallel({\"context\": retriever, \"question\": RunnablePassthrough()})\n",
" | prompt\n",
" | llm\n",
" | StrOutputParser()\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "88b1ee51-1b0f-4ebf-bb32-e50e843f0eeb",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'\\nAnswer: The architectural details of Mixtral are as follows:\\n- Dimension (dim): 4096\\n- Number of layers (n\\\\_layers): 32\\n- Dimension of each head (head\\\\_dim): 128\\n- Hidden dimension (hidden\\\\_dim): 14336\\n- Number of heads (n\\\\_heads): 32\\n- Number of kv heads (n\\\\_kv\\\\_heads): 8\\n- Context length (context\\\\_len): 32768\\n- Vocabulary size (vocab\\\\_size): 32000\\n- Number of experts (num\\\\_experts): 8\\n- Number of top k experts (top\\\\_k\\\\_experts): 2\\n\\nMixtral is based on a transformer architecture and uses the same modifications as described in [18], with the notable exceptions that Mixtral supports a fully dense context length of 32k tokens, and the feedforward block picks from a set of 8 distinct groups of parameters. At every layer, for every token, a router network chooses two of these groups (the “experts”) to process the token and combine their output additively. This technique increases the number of parameters of a model while controlling cost and latency, as the model only uses a fraction of the total set of parameters per token. Mixtral is pretrained with multilingual data using a context size of 32k tokens. It either matches or exceeds the performance of Llama 2 70B and GPT-3.5, over several benchmarks. In particular, Mixtral vastly outperforms Llama 2 70B on mathematics, code generation, and multilingual benchmarks.'"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chain.invoke(\"What are the Architectural details of Mixtral?\")"
]
},
{
"cell_type": "markdown",
"id": "755cf871-26b7-4e30-8b91-9ffd698470f4",
"metadata": {},
"source": [
"Trace: \n",
"\n",
"https://smith.langchain.com/public/935fd642-06a6-4b42-98e3-6074f93115cd/r"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.12"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

648
cookbook/optimization.ipynb Normal file
View File

@@ -0,0 +1,648 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "c7fe38bc",
"metadata": {},
"source": [
"# Optimization\n",
"\n",
"This notebook goes over how to optimize chains using LangChain and [LangSmith](https://smith.langchain.com)."
]
},
{
"cell_type": "markdown",
"id": "2f87ccd5",
"metadata": {},
"source": [
"## Set up\n",
"\n",
"We will set an environment variable for LangSmith, and load the relevant data"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "236bedc5",
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"\n",
"os.environ[\"LANGCHAIN_PROJECT\"] = \"movie-qa\""
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "a3fed0dd",
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd"
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "7cfff337",
"metadata": {},
"outputs": [],
"source": [
"df = pd.read_csv(\"data/imdb_top_1000.csv\")"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "2d20fb9c",
"metadata": {},
"outputs": [],
"source": [
"df[\"Released_Year\"] = df[\"Released_Year\"].astype(int, errors=\"ignore\")"
]
},
{
"cell_type": "markdown",
"id": "09fc8fe2",
"metadata": {},
"source": [
"## Create the initial retrieval chain\n",
"\n",
"We will use a self-query retriever"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "f71e24e2",
"metadata": {},
"outputs": [],
"source": [
"from langchain.schema import Document\n",
"from langchain_community.vectorstores import Chroma\n",
"from langchain_openai import OpenAIEmbeddings\n",
"\n",
"embeddings = OpenAIEmbeddings()"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "8881ea8e",
"metadata": {},
"outputs": [],
"source": [
"records = df.to_dict(\"records\")\n",
"documents = [Document(page_content=d[\"Overview\"], metadata=d) for d in records]"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "8f495423",
"metadata": {},
"outputs": [],
"source": [
"vectorstore = Chroma.from_documents(documents, embeddings)"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "31d33d62",
"metadata": {},
"outputs": [],
"source": [
"from langchain.chains.query_constructor.base import AttributeInfo\n",
"from langchain.retrievers.self_query.base import SelfQueryRetriever\n",
"from langchain_openai import ChatOpenAI\n",
"\n",
"metadata_field_info = [\n",
" AttributeInfo(\n",
" name=\"Released_Year\",\n",
" description=\"The year the movie was released\",\n",
" type=\"int\",\n",
" ),\n",
" AttributeInfo(\n",
" name=\"Series_Title\",\n",
" description=\"The title of the movie\",\n",
" type=\"str\",\n",
" ),\n",
" AttributeInfo(\n",
" name=\"Genre\",\n",
" description=\"The genre of the movie\",\n",
" type=\"string\",\n",
" ),\n",
" AttributeInfo(\n",
" name=\"IMDB_Rating\", description=\"A 1-10 rating for the movie\", type=\"float\"\n",
" ),\n",
"]\n",
"document_content_description = \"Brief summary of a movie\"\n",
"llm = ChatOpenAI(temperature=0)\n",
"retriever = SelfQueryRetriever.from_llm(\n",
" llm, vectorstore, document_content_description, metadata_field_info, verbose=True\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "a731533b",
"metadata": {},
"outputs": [],
"source": [
"from langchain_core.runnables import RunnablePassthrough"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "05181849",
"metadata": {},
"outputs": [],
"source": [
"from langchain_core.output_parsers import StrOutputParser\n",
"from langchain_core.prompts import ChatPromptTemplate"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "feed4be6",
"metadata": {},
"outputs": [],
"source": [
"prompt = ChatPromptTemplate.from_template(\n",
" \"\"\"Answer the user's question based on the below information:\n",
"\n",
"Information:\n",
"\n",
"{info}\n",
"\n",
"Question: {question}\"\"\"\n",
")\n",
"generator = (prompt | ChatOpenAI() | StrOutputParser()).with_config(\n",
" run_name=\"generator\"\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "eb16cc9a",
"metadata": {},
"outputs": [],
"source": [
"chain = (\n",
" RunnablePassthrough.assign(info=(lambda x: x[\"question\"]) | retriever) | generator\n",
")"
]
},
{
"cell_type": "markdown",
"id": "c70911cc",
"metadata": {},
"source": [
"## Run examples\n",
"\n",
"Run examples through the chain. This can either be manually, or using a list of examples, or production traffic"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "19a88d13",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'One of the horror movies released in the early 2000s is \"The Ring\" (2002), directed by Gore Verbinski.'"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chain.invoke({\"question\": \"what is a horror movie released in early 2000s\"})"
]
},
{
"cell_type": "markdown",
"id": "17f9cdae",
"metadata": {},
"source": [
"## Annotate\n",
"\n",
"Now, go to LangSmitha and annotate those examples as correct or incorrect"
]
},
{
"cell_type": "markdown",
"id": "5e211da6",
"metadata": {},
"source": [
"## Create Dataset\n",
"\n",
"We can now create a dataset from those runs.\n",
"\n",
"What we will do is find the runs marked as correct, then grab the sub-chains from them. Specifically, the query generator sub chain and the final generation step"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "e4024267",
"metadata": {},
"outputs": [],
"source": [
"from langsmith import Client\n",
"\n",
"client = Client()"
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "3814efc5",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"14"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"runs = list(\n",
" client.list_runs(\n",
" project_name=\"movie-qa\",\n",
" execution_order=1,\n",
" filter=\"and(eq(feedback_key, 'correctness'), eq(feedback_score, 1))\",\n",
" )\n",
")\n",
"\n",
"len(runs)"
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "3eb123e0",
"metadata": {},
"outputs": [],
"source": [
"gen_runs = []\n",
"query_runs = []\n",
"for r in runs:\n",
" gen_runs.extend(\n",
" list(\n",
" client.list_runs(\n",
" project_name=\"movie-qa\",\n",
" filter=\"eq(name, 'generator')\",\n",
" trace_id=r.trace_id,\n",
" )\n",
" )\n",
" )\n",
" query_runs.extend(\n",
" list(\n",
" client.list_runs(\n",
" project_name=\"movie-qa\",\n",
" filter=\"eq(name, 'query_constructor')\",\n",
" trace_id=r.trace_id,\n",
" )\n",
" )\n",
" )"
]
},
{
"cell_type": "code",
"execution_count": 21,
"id": "a4397026",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'question': 'what is a high school comedy released in early 2000s'}"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"runs[0].inputs"
]
},
{
"cell_type": "code",
"execution_count": 20,
"id": "3fa6ad2a",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'output': 'One high school comedy released in the early 2000s is \"Mean Girls\" starring Lindsay Lohan, Rachel McAdams, and Tina Fey.'}"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"runs[0].outputs"
]
},
{
"cell_type": "code",
"execution_count": 22,
"id": "1fda5b4b",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'query': 'what is a high school comedy released in early 2000s'}"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"query_runs[0].inputs"
]
},
{
"cell_type": "code",
"execution_count": 23,
"id": "1a1a51e6",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'output': {'query': 'high school comedy',\n",
" 'filter': {'operator': 'and',\n",
" 'arguments': [{'comparator': 'eq', 'attribute': 'Genre', 'value': 'comedy'},\n",
" {'operator': 'and',\n",
" 'arguments': [{'comparator': 'gte',\n",
" 'attribute': 'Released_Year',\n",
" 'value': 2000},\n",
" {'comparator': 'lt', 'attribute': 'Released_Year', 'value': 2010}]}]}}}"
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"query_runs[0].outputs"
]
},
{
"cell_type": "code",
"execution_count": 24,
"id": "e9d9966b",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'question': 'what is a high school comedy released in early 2000s',\n",
" 'info': []}"
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"gen_runs[0].inputs"
]
},
{
"cell_type": "code",
"execution_count": 25,
"id": "bc113f3d",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'output': 'One high school comedy released in the early 2000s is \"Mean Girls\" starring Lindsay Lohan, Rachel McAdams, and Tina Fey.'}"
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"gen_runs[0].outputs"
]
},
{
"cell_type": "markdown",
"id": "6cca74e5",
"metadata": {},
"source": [
"## Create datasets\n",
"\n",
"We can now create datasets for the query generation and final generation step.\n",
"We do this so that (1) we can inspect the datapoints, (2) we can edit them if needed, (3) we can add to them over time"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "69966f0e",
"metadata": {},
"outputs": [],
"source": [
"client.create_dataset(\"movie-query_constructor\")\n",
"\n",
"inputs = [r.inputs for r in query_runs]\n",
"outputs = [r.outputs for r in query_runs]\n",
"\n",
"client.create_examples(\n",
" inputs=inputs, outputs=outputs, dataset_name=\"movie-query_constructor\"\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "7e15770e",
"metadata": {},
"outputs": [],
"source": [
"client.create_dataset(\"movie-generator\")\n",
"\n",
"inputs = [r.inputs for r in gen_runs]\n",
"outputs = [r.outputs for r in gen_runs]\n",
"\n",
"client.create_examples(inputs=inputs, outputs=outputs, dataset_name=\"movie-generator\")"
]
},
{
"cell_type": "markdown",
"id": "61cf9bcd",
"metadata": {},
"source": [
"## Use as few shot examples\n",
"\n",
"We can now pull down a dataset and use them as few shot examples in a future chain"
]
},
{
"cell_type": "code",
"execution_count": 26,
"id": "d9c79173",
"metadata": {},
"outputs": [],
"source": [
"examples = list(client.list_examples(dataset_name=\"movie-query_constructor\"))"
]
},
{
"cell_type": "code",
"execution_count": 27,
"id": "a1771dd0",
"metadata": {},
"outputs": [],
"source": [
"import json\n",
"\n",
"\n",
"def filter_to_string(_filter):\n",
" if \"operator\" in _filter:\n",
" args = [filter_to_string(f) for f in _filter[\"arguments\"]]\n",
" return f\"{_filter['operator']}({','.join(args)})\"\n",
" else:\n",
" comparator = _filter[\"comparator\"]\n",
" attribute = json.dumps(_filter[\"attribute\"])\n",
" value = json.dumps(_filter[\"value\"])\n",
" return f\"{comparator}({attribute}, {value})\""
]
},
{
"cell_type": "code",
"execution_count": 28,
"id": "e67a3530",
"metadata": {},
"outputs": [],
"source": [
"model_examples = []\n",
"\n",
"for e in examples:\n",
" if \"filter\" in e.outputs[\"output\"]:\n",
" string_filter = filter_to_string(e.outputs[\"output\"][\"filter\"])\n",
" else:\n",
" string_filter = \"NO_FILTER\"\n",
" model_examples.append(\n",
" (\n",
" e.inputs[\"query\"],\n",
" {\"query\": e.outputs[\"output\"][\"query\"], \"filter\": string_filter},\n",
" )\n",
" )"
]
},
{
"cell_type": "code",
"execution_count": 29,
"id": "84593135",
"metadata": {},
"outputs": [],
"source": [
"retriever1 = SelfQueryRetriever.from_llm(\n",
" llm,\n",
" vectorstore,\n",
" document_content_description,\n",
" metadata_field_info,\n",
" verbose=True,\n",
" chain_kwargs={\"examples\": model_examples},\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 30,
"id": "4ec9bb92",
"metadata": {},
"outputs": [],
"source": [
"chain1 = (\n",
" RunnablePassthrough.assign(info=(lambda x: x[\"question\"]) | retriever1) | generator\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 31,
"id": "64eb88e2",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'1. \"Saving Private Ryan\" (1998) - Directed by Steven Spielberg, this war film follows a group of soldiers during World War II as they search for a missing paratrooper.\\n\\n2. \"The Matrix\" (1999) - Directed by the Wachowskis, this science fiction action film follows a computer hacker who discovers the truth about the reality he lives in.\\n\\n3. \"Lethal Weapon 4\" (1998) - Directed by Richard Donner, this action-comedy film follows two mismatched detectives as they investigate a Chinese immigrant smuggling ring.\\n\\n4. \"The Fifth Element\" (1997) - Directed by Luc Besson, this science fiction action film follows a cab driver who must protect a mysterious woman who holds the key to saving the world.\\n\\n5. \"The Rock\" (1996) - Directed by Michael Bay, this action thriller follows a group of rogue military men who take over Alcatraz and threaten to launch missiles at San Francisco.'"
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chain1.invoke(\n",
" {\"question\": \"what are good action movies made before 2000 but after 1997?\"}\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e1ee8b55",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.1"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -1083,7 +1083,7 @@
"metadata": {},
"outputs": [],
"source": [
"from langchain_community.vectorstores import ElasticsearchStore\n",
"from langchain_elasticsearch import ElasticsearchStore\n",
"from langchain_openai import OpenAIEmbeddings\n",
"\n",
"embeddings = OpenAIEmbeddings()"

View File

@@ -1,5 +1,10 @@
# docker-compose to make it easier to spin up integration tests.
# Services should use NON standard ports to avoid collision with
# any existing services that might be used for development.
# ATTENTION: When adding a service below use a non-standard port
# increment by one from the preceding port.
# For credentials always use `langchain` and `langchain` for the
# username and password.
version: "3"
name: langchain-tests
@@ -19,3 +24,34 @@ services:
image: graphdb
ports:
- "6021:7200"
mongo:
image: mongo:latest
container_name: mongo_container
ports:
- "6022:27017"
environment:
MONGO_INITDB_ROOT_USERNAME: langchain
MONGO_INITDB_ROOT_PASSWORD: langchain
postgres:
image: postgres:16
environment:
POSTGRES_DB: langchain
POSTGRES_USER: langchain
POSTGRES_PASSWORD: langchain
ports:
- "6023:5432"
command: |
postgres -c log_statement=all
healthcheck:
test:
[
"CMD-SHELL",
"psql postgresql://langchain:langchain@localhost/langchain --command 'SELECT 1;' || exit 1",
]
interval: 5s
retries: 60
volumes:
- postgres_data:/var/lib/postgresql/data
volumes:
postgres_data:

1
docs/.gitignore vendored Normal file
View File

@@ -0,0 +1 @@
/.quarto/

View File

@@ -3,6 +3,7 @@
import importlib
import inspect
import os
import sys
import typing
from enum import Enum
from pathlib import Path
@@ -344,28 +345,29 @@ def _doc_first_line(package_name: str) -> str:
return f".. {package_name.replace('-', '_')}_api_reference:\n\n"
def main() -> None:
def main(dirs: Optional[list] = None) -> None:
"""Generate the api_reference.rst file for each package."""
print("Starting to build API reference files.")
for dir in os.listdir(ROOT_DIR / "libs"):
if not dirs:
dirs = [
dir_
for dir_ in os.listdir(ROOT_DIR / "libs")
if dir_ not in ("cli", "partners")
]
dirs += os.listdir(ROOT_DIR / "libs" / "partners")
for dir_ in dirs:
# Skip any hidden directories
# Some of these could be present by mistake in the code base
# e.g., .pytest_cache from running tests from the wrong location.
if dir.startswith("."):
print("Skipping dir:", dir)
continue
if dir in ("cli", "partners"):
if dir_.startswith("."):
print("Skipping dir:", dir_)
continue
else:
print("Building package:", dir)
_build_rst_file(package_name=dir)
partner_packages = os.listdir(ROOT_DIR / "libs" / "partners")
print("Building partner packages:", partner_packages)
for dir in partner_packages:
_build_rst_file(package_name=dir)
print("Building package:", dir_)
_build_rst_file(package_name=dir_)
print("API reference files built.")
if __name__ == "__main__":
main()
dirs = sys.argv[1:] or None
main(dirs=dirs)

View File

@@ -25,6 +25,7 @@ Below are links to tutorials and courses on LangChain. For written guides on com
⛓ [LangChain Cheatsheet](https://pub.towardsai.net/langchain-cheatsheet-all-secrets-on-a-single-page-8be26b721cde) by **Ivan Reznikov**
### Short Tutorials
[LangChain Explained in 13 Minutes | QuickStart Tutorial for Beginners](https://youtu.be/aywZrzNaKjs) by [Rabbitmetrics](https://www.youtube.com/@rabbitmetrics)
[LangChain Crash Course: Build an AutoGPT app in 25 minutes](https://youtu.be/MlK6SIjcjE8) by [Nicholas Renotte](https://www.youtube.com/@NicholasRenotte)
@@ -33,6 +34,14 @@ Below are links to tutorials and courses on LangChain. For written guides on com
⛓ [LangChain 101 Course](https://medium.com/@ivanreznikov/langchain-101-course-updated-668f7b41d6cb) by **Ivan Reznikov**
### Code Alongs
DataCamp has developed a [Become a Generative AI Developer series](https://www.datacamp.com/ai-code-alongs) featuring 9 free code-alongs, including ones on building chatbots using LangChain and the OpenAI and Pinecone APIs. When you start a code along, you are launched into a fully configured notebook environment with an expert-led video to guide you through the project.
⛓ [Prompt Engineering with GPT & LangChain](https://www.datacamp.com/code-along/prompt-engineering-gpt-langchain)
⛓ [Retrieval Augmented Generation with the OpenAI API & Pinecone]https://www.datacamp.com/code-along/retrieval-augmented-generation-openai-api-pinecone
## Tutorials
### [LangChain for Gen AI and LLMs](https://www.youtube.com/playlist?list=PLIUOU7oqGTLieV9uTIFMm6_4PXg-hlN6F) by [James Briggs](https://www.youtube.com/@jamesbriggs)

View File

@@ -1,7 +1,7 @@
{
"cells": [
{
"cell_type": "markdown",
"cell_type": "raw",
"id": "9e45e81c-e16e-4c6c-b6a3-2362e5193827",
"metadata": {},
"source": [
@@ -25,53 +25,42 @@
"\n",
"There are two ways to perform routing:\n",
"\n",
"1. Using a `RunnableBranch`.\n",
"2. Writing custom factory function that takes the input of a previous step and returns a **runnable**. Importantly, this should return a **runnable** and NOT actually execute.\n",
"1. Conditionally return runnables from a [`RunnableLambda`](./functions) (recommended)\n",
"2. Using a `RunnableBranch`.\n",
"\n",
"We'll illustrate both methods using a two step sequence where the first step classifies an input question as being about `LangChain`, `Anthropic`, or `Other`, then routes to a corresponding prompt chain."
]
},
{
"cell_type": "markdown",
"id": "f885113d",
"metadata": {},
"source": [
"## Using a RunnableBranch\n",
"\n",
"A `RunnableBranch` is initialized with a list of (condition, runnable) pairs and a default runnable. It selects which branch by passing each condition the input it's invoked with. It selects the first condition to evaluate to True, and runs the corresponding runnable to that condition with the input. \n",
"\n",
"If no provided conditions match, it runs the default runnable.\n",
"\n",
"Here's an example of what it looks like in action:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "1aa13c1d",
"metadata": {},
"outputs": [],
"source": [
"from langchain.prompts import PromptTemplate\n",
"from langchain_community.chat_models import ChatAnthropic\n",
"from langchain_core.output_parsers import StrOutputParser"
]
},
{
"cell_type": "markdown",
"id": "ed84c59a",
"id": "c1c6edac",
"metadata": {},
"source": [
"## Example Setup\n",
"First, let's create a chain that will identify incoming questions as being about `LangChain`, `Anthropic`, or `Other`:"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "3ec03886",
"execution_count": null,
"id": "8a8a1967",
"metadata": {},
"outputs": [],
"outputs": [
{
"data": {
"text/plain": [
"' Anthropic'"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"from langchain_community.chat_models import ChatAnthropic\n",
"from langchain_core.output_parsers import StrOutputParser\n",
"from langchain_core.prompts import PromptTemplate\n",
"\n",
"chain = (\n",
" PromptTemplate.from_template(\n",
" \"\"\"Given the user question below, classify it as either being about `LangChain`, `Anthropic`, or `Other`.\n",
@@ -86,33 +75,14 @@
" )\n",
" | ChatAnthropic()\n",
" | StrOutputParser()\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "87ae7c1c",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"' Anthropic'"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
")\n",
"\n",
"chain.invoke({\"question\": \"how do I call Anthropic?\"})"
]
},
{
"cell_type": "markdown",
"id": "8aa0a365",
"id": "7655555f",
"metadata": {},
"source": [
"Now, let's create three sub chains:"
@@ -120,8 +90,8 @@
},
{
"cell_type": "code",
"execution_count": 4,
"id": "d479962a",
"execution_count": null,
"id": "89d7722d",
"metadata": {},
"outputs": [],
"source": [
@@ -158,101 +128,12 @@
")"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "593eab06",
"metadata": {},
"outputs": [],
"source": [
"from langchain_core.runnables import RunnableBranch\n",
"\n",
"branch = RunnableBranch(\n",
" (lambda x: \"anthropic\" in x[\"topic\"].lower(), anthropic_chain),\n",
" (lambda x: \"langchain\" in x[\"topic\"].lower(), langchain_chain),\n",
" general_chain,\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "752c732e",
"metadata": {},
"outputs": [],
"source": [
"full_chain = {\"topic\": chain, \"question\": lambda x: x[\"question\"]} | branch"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "29231bb8",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"AIMessage(content=\" As Dario Amodei told me, here are some ways to use Anthropic:\\n\\n- Sign up for an account on Anthropic's website to access tools like Claude, Constitutional AI, and Writer. \\n\\n- Use Claude for tasks like email generation, customer service chat, and QA. Claude can understand natural language prompts and provide helpful responses.\\n\\n- Use Constitutional AI if you need an AI assistant that is harmless, honest, and helpful. It is designed to be safe and aligned with human values.\\n\\n- Use Writer to generate natural language content for things like marketing copy, stories, reports, and more. Give it a topic and prompt and it will create high-quality written content.\\n\\n- Check out Anthropic's documentation and blog for tips, tutorials, examples, and announcements about new capabilities as they continue to develop their AI technology.\\n\\n- Follow Anthropic on social media or subscribe to their newsletter to stay up to date on new features and releases.\\n\\n- For most people, the easiest way to leverage Anthropic's technology is through their website - just create an account to get started!\", additional_kwargs={}, example=False)"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"full_chain.invoke({\"question\": \"how do I use Anthropic?\"})"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "c67d8733",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"AIMessage(content=' As Harrison Chase told me, here is how you use LangChain:\\n\\nLangChain is an AI assistant that can have conversations, answer questions, and generate text. To use LangChain, you simply type or speak your input and LangChain will respond. \\n\\nYou can ask LangChain questions, have discussions, get summaries or explanations about topics, and request it to generate text on a subject. Some examples of interactions:\\n\\n- Ask general knowledge questions and LangChain will try to answer factually. For example \"What is the capital of France?\"\\n\\n- Have conversations on topics by taking turns speaking. You can prompt the start of a conversation by saying something like \"Let\\'s discuss machine learning\"\\n\\n- Ask for summaries or high-level explanations on subjects. For example \"Can you summarize the main themes in Shakespeare\\'s Hamlet?\" \\n\\n- Give creative writing prompts or requests to have LangChain generate text in different styles. For example \"Write a short children\\'s story about a mouse\" or \"Generate a poem in the style of Robert Frost about nature\"\\n\\n- Correct LangChain if it makes an inaccurate statement and provide the right information. This helps train it.\\n\\nThe key is interacting naturally and giving it clear prompts and requests', additional_kwargs={}, example=False)"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"full_chain.invoke({\"question\": \"how do I use LangChain?\"})"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "935ad949",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"AIMessage(content=' 2 + 2 = 4', additional_kwargs={}, example=False)"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"full_chain.invoke({\"question\": \"whats 2 + 2\"})"
]
},
{
"cell_type": "markdown",
"id": "6d8d042c",
"metadata": {},
"source": [
"## Using a custom function\n",
"## Using a custom function (Recommended)\n",
"\n",
"You can also use a custom function to route between different outputs. Here's an example:"
]
@@ -350,13 +231,89 @@
"full_chain.invoke({\"question\": \"whats 2 + 2\"})"
]
},
{
"cell_type": "markdown",
"id": "5147b827",
"metadata": {},
"source": [
"## Using a RunnableBranch\n",
"\n",
"A `RunnableBranch` is a special type of runnable that allows you to define a set of conditions and runnables to execute based on the input. It does **not** offer anything that you can't achieve in a custom function as described above, so we recommend using a custom function instead.\n",
"\n",
"A `RunnableBranch` is initialized with a list of (condition, runnable) pairs and a default runnable. It selects which branch by passing each condition the input it's invoked with. It selects the first condition to evaluate to True, and runs the corresponding runnable to that condition with the input. \n",
"\n",
"If no provided conditions match, it runs the default runnable.\n",
"\n",
"Here's an example of what it looks like in action:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "46802d04",
"id": "2a101418",
"metadata": {},
"outputs": [],
"source": []
"outputs": [
{
"data": {
"text/plain": [
"AIMessage(content=\" As Dario Amodei told me, here are some ways to use Anthropic:\\n\\n- Sign up for an account on Anthropic's website to access tools like Claude, Constitutional AI, and Writer. \\n\\n- Use Claude for tasks like email generation, customer service chat, and QA. Claude can understand natural language prompts and provide helpful responses.\\n\\n- Use Constitutional AI if you need an AI assistant that is harmless, honest, and helpful. It is designed to be safe and aligned with human values.\\n\\n- Use Writer to generate natural language content for things like marketing copy, stories, reports, and more. Give it a topic and prompt and it will create high-quality written content.\\n\\n- Check out Anthropic's documentation and blog for tips, tutorials, examples, and announcements about new capabilities as they continue to develop their AI technology.\\n\\n- Follow Anthropic on social media or subscribe to their newsletter to stay up to date on new features and releases.\\n\\n- For most people, the easiest way to leverage Anthropic's technology is through their website - just create an account to get started!\", additional_kwargs={}, example=False)"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"from langchain_core.runnables import RunnableBranch\n",
"\n",
"branch = RunnableBranch(\n",
" (lambda x: \"anthropic\" in x[\"topic\"].lower(), anthropic_chain),\n",
" (lambda x: \"langchain\" in x[\"topic\"].lower(), langchain_chain),\n",
" general_chain,\n",
")\n",
"full_chain = {\"topic\": chain, \"question\": lambda x: x[\"question\"]} | branch\n",
"full_chain.invoke({\"question\": \"how do I use Anthropic?\"})"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8d8caf9b",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"AIMessage(content=' As Harrison Chase told me, here is how you use LangChain:\\n\\nLangChain is an AI assistant that can have conversations, answer questions, and generate text. To use LangChain, you simply type or speak your input and LangChain will respond. \\n\\nYou can ask LangChain questions, have discussions, get summaries or explanations about topics, and request it to generate text on a subject. Some examples of interactions:\\n\\n- Ask general knowledge questions and LangChain will try to answer factually. For example \"What is the capital of France?\"\\n\\n- Have conversations on topics by taking turns speaking. You can prompt the start of a conversation by saying something like \"Let\\'s discuss machine learning\"\\n\\n- Ask for summaries or high-level explanations on subjects. For example \"Can you summarize the main themes in Shakespeare\\'s Hamlet?\" \\n\\n- Give creative writing prompts or requests to have LangChain generate text in different styles. For example \"Write a short children\\'s story about a mouse\" or \"Generate a poem in the style of Robert Frost about nature\"\\n\\n- Correct LangChain if it makes an inaccurate statement and provide the right information. This helps train it.\\n\\nThe key is interacting naturally and giving it clear prompts and requests', additional_kwargs={}, example=False)"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"full_chain.invoke({\"question\": \"how do I use LangChain?\"})"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "26159af7",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"AIMessage(content=' 2 + 2 = 4', additional_kwargs={}, example=False)"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"full_chain.invoke({\"question\": \"whats 2 + 2\"})"
]
}
],
"metadata": {

View File

@@ -464,12 +464,12 @@
"id": "6fd3e71b-439e-418f-8a8a-5232fba3d9fd",
"metadata": {},
"source": [
"Stream just yielded the final result from that component. \n",
"Stream just yielded the final result from that component.\n",
"\n",
"This is OK 🥹! Not all components have to implement streaming -- in some cases streaming is either unnecessary, difficult or just doesn't make sense.\n",
"\n",
":::{.callout-tip}\n",
"An LCEL chain constructed using using non-streaming components, will still be able to stream in a lot of cases, with streaming of partial output starting after the last non-streaming step in the chain.\n",
"An LCEL chain constructed using non-streaming components, will still be able to stream in a lot of cases, with streaming of partial output starting after the last non-streaming step in the chain.\n",
":::"
]
},

View File

@@ -65,10 +65,10 @@ We will link to relevant docs.
## LLM Chain
For this getting started guide, we will provide two options: using OpenAI (a popular model available via API) or using a local open source model.
We'll show how to use models available via API, like OpenAI and Cohere, and local open source models, using integrations like Ollama.
<Tabs>
<TabItem value="openai" label="OpenAI" default>
<TabItem value="openai" label="OpenAI (API)" default>
First we'll need to import the LangChain x OpenAI integration package.
@@ -99,7 +99,7 @@ llm = ChatOpenAI(openai_api_key="...")
```
</TabItem>
<TabItem value="local" label="Local">
<TabItem value="local" label="Local (using Ollama)">
[Ollama](https://ollama.ai/) allows you to run open-source large language models, such as Llama 2, locally.
@@ -112,6 +112,37 @@ Then, make sure the Ollama server is running. After that, you can do:
```python
from langchain_community.llms import Ollama
llm = Ollama(model="llama2")
```
</TabItem>
<TabItem value="cohere" label="Cohere (API)" default>
First we'll need to import the Cohere SDK package.
```shell
pip install cohere
```
Accessing the API requires an API key, which you can get by creating an account and heading [here](https://dashboard.cohere.com/api-keys). Once we have a key we'll want to set it as an environment variable by running:
```shell
export COHERE_API_KEY="..."
```
We can then initialize the model:
```python
from langchain_community.chat_models import ChatCohere
llm = ChatCohere()
```
If you'd prefer not to set an environment variable you can pass the key in directly via the `cohere_api_key` named parameter when initiating the Cohere LLM class:
```python
from langchain_community.chat_models import ChatCohere
llm = ChatCohere(cohere_api_key="...")
```
</TabItem>
@@ -200,10 +231,10 @@ docs = loader.load()
Next, we need to index it into a vectorstore. This requires a few components, namely an [embedding model](/docs/modules/data_connection/text_embedding) and a [vectorstore](/docs/modules/data_connection/vectorstores).
For embedding models, we once again provide examples for accessing via OpenAI or via local models.
For embedding models, we once again provide examples for accessing via API or by running local models.
<Tabs>
<TabItem value="openai" label="OpenAI" default>
<TabItem value="openai" label="OpenAI (API)" default>
Make sure you have the `langchain_openai` package installed an the appropriate environment variables set (these are the same as needed for the LLM).
@@ -214,7 +245,7 @@ embeddings = OpenAIEmbeddings()
```
</TabItem>
<TabItem value="local" label="Local">
<TabItem value="local" label="Local (using Ollama)">
Make sure you have Ollama running (same set up as with the LLM).
@@ -224,6 +255,17 @@ from langchain_community.embeddings import OllamaEmbeddings
embeddings = OllamaEmbeddings()
```
</TabItem>
<TabItem value="cohere" label="Cohere (API)" default>
Make sure you have the `cohere` package installed an the appropriate environment variables set (these are the same as needed for the LLM).
```python
from langchain_community.embeddings import CohereEmbeddings
embeddings = CohereEmbeddings()
```
</TabItem>
</Tabs>
Now, we can use this embedding model to ingest documents into a vectorstore.

View File

@@ -0,0 +1,391 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "6e3f0f72",
"metadata": {},
"source": [
"# [beta] Structured Output\n",
"\n",
"It is often crucial to have LLMs return structured output. This is because often times the outputs of the LLMs are used in downstream applications, where specific arguments are required. Having the LLM return structured output reliably is necessary for that.\n",
"\n",
"There are a few different high level strategies that are used to do this:\n",
"\n",
"- Prompting: This is when you ask the LLM (very nicely) to return output in the desired format (JSON, XML). This is nice because works with all LLMs, this is not nice because it doesn't garuntee that the LLM returns in the right format.\n",
"- Function calling: This is when the LLM is finetuned to be able to not just generate a completion, but also generate a function call. The functions the LLM can call are generally passed as extra parameters to the model API. The function names and descriptions should be treated as part of the prompt (they usually count against token counts, and are used by the LLM to decide what to do).\n",
"- Tool calling: A technique similar to function calling, but it allows the LLM to call multiple functions at the same time.\n",
"- JSON mode: This is when the LLM is garunteed to return JSON.\n",
"\n",
"\n",
"\n",
"Different models may support different variants of these, with slightly different parameters. In order to make it easy to get LLMs to return structured output, we have added a common interface to LangChain models: `.with_structured_output`. \n",
"\n",
"By invoking this method (and passing in a JSON schema or a Pydantic model) the model will add whatever model parameters + output parsers are necessary to get back the structured output. There may be more than one way to do this (eg function calling vs JSON mode) - you can configure which method to use by passing into that method.\n",
"\n",
"Let's look at some examples of this in action!\n",
"\n",
"We will use Pydantic to easily structure the response schema."
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "08029f4e",
"metadata": {},
"outputs": [],
"source": [
"from langchain_core.pydantic_v1 import BaseModel, Field"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "070bf702",
"metadata": {},
"outputs": [],
"source": [
"class Joke(BaseModel):\n",
" setup: str = Field(description=\"The setup of the joke\")\n",
" punchline: str = Field(description=\"The punchline to the joke\")"
]
},
{
"cell_type": "markdown",
"id": "98f6edfa",
"metadata": {},
"source": [
"## OpenAI\n",
"\n",
"OpenAI exposes a few different ways to get structured outputs."
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "3fe7caf0",
"metadata": {},
"outputs": [],
"source": [
"from langchain_openai import ChatOpenAI"
]
},
{
"cell_type": "markdown",
"id": "deddb6d3",
"metadata": {},
"source": [
"### Function Calling\n",
"\n",
"By default, we will use `function_calling`"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "6700994a",
"metadata": {},
"outputs": [],
"source": [
"model = ChatOpenAI()\n",
"model_with_structure = model.with_structured_output(Joke)"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "c55a61b8",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Joke(setup='Why was the cat sitting on the computer?', punchline='It wanted to keep an eye on the mouse!')"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"model_with_structure.invoke(\"Tell me a joke about cats\")"
]
},
{
"cell_type": "markdown",
"id": "39d7a555",
"metadata": {},
"source": [
"### JSON Mode\n",
"\n",
"We also support JSON mode. Note that we need to specify in the prompt the format that it should respond in."
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "df0370e3",
"metadata": {},
"outputs": [],
"source": [
"model_with_structure = model.with_structured_output(Joke, method=\"json_mode\")"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "23844a26",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Joke(setup=\"Why don't cats play poker in the jungle?\", punchline='Too many cheetahs!')"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"model_with_structure.invoke(\n",
" \"Tell me a joke about cats, respond in JSON with `setup` and `punchline` keys\"\n",
")"
]
},
{
"cell_type": "markdown",
"id": "8f3cce9e",
"metadata": {},
"source": [
"## Fireworks\n",
"\n",
"[Fireworks](https://fireworks.ai/) similarly supports function calling and JSON mode for select models."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "ad45fdd8",
"metadata": {},
"outputs": [],
"source": [
"from langchain_fireworks import ChatFireworks"
]
},
{
"cell_type": "markdown",
"id": "36270ed5",
"metadata": {},
"source": [
"### Function Calling\n",
"\n",
"By default, we will use `function_calling`"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "49a20847",
"metadata": {},
"outputs": [],
"source": [
"model = ChatFireworks(model=\"accounts/fireworks/models/firefunction-v1\")\n",
"model_with_structure = model.with_structured_output(Joke)"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "e3093a6c",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Joke(setup=\"Why don't cats play poker in the jungle?\", punchline='Too many cheetahs!')"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"model_with_structure.invoke(\"Tell me a joke about cats\")"
]
},
{
"cell_type": "markdown",
"id": "ddb6b3ba",
"metadata": {},
"source": [
"### JSON Mode\n",
"\n",
"We also support JSON mode. Note that we need to specify in the prompt the format that it should respond in."
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "ea0c22c1",
"metadata": {},
"outputs": [],
"source": [
"model_with_structure = model.with_structured_output(Joke, method=\"json_mode\")"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "649f9632",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Joke(setup='Why did the dog sit in the shade?', punchline='To avoid getting burned.')"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"model_with_structure.invoke(\n",
" \"Tell me a joke about dogs, respond in JSON with `setup` and `punchline` keys\"\n",
")"
]
},
{
"cell_type": "markdown",
"id": "ff70609a",
"metadata": {},
"source": [
"## Mistral\n",
"\n",
"We also support structured output with Mistral models, although we only support function calling."
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "bffd3fad",
"metadata": {},
"outputs": [],
"source": [
"from langchain_mistralai import ChatMistralAI"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "c8bd7549",
"metadata": {},
"outputs": [],
"source": [
"model = ChatMistralAI(model=\"mistral-large-latest\")\n",
"model_with_structure = model.with_structured_output(Joke)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "17b15816",
"metadata": {},
"outputs": [],
"source": [
"model_with_structure.invoke(\"Tell me a joke about cats\")"
]
},
{
"cell_type": "markdown",
"id": "6bbbb698",
"metadata": {},
"source": [
"## Together\n",
"\n",
"Since [TogetherAI](https://www.together.ai/) is just a drop in replacement for OpenAI, we can just use the OpenAI integration"
]
},
{
"cell_type": "code",
"execution_count": 20,
"id": "9b9617e3",
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"\n",
"from langchain_openai import ChatOpenAI"
]
},
{
"cell_type": "code",
"execution_count": 24,
"id": "90549664",
"metadata": {},
"outputs": [],
"source": [
"model = ChatOpenAI(\n",
" base_url=\"https://api.together.xyz/v1\",\n",
" api_key=os.environ[\"TOGETHER_API_KEY\"],\n",
" model=\"mistralai/Mixtral-8x7B-Instruct-v0.1\",\n",
")\n",
"model_with_structure = model.with_structured_output(Joke)"
]
},
{
"cell_type": "code",
"execution_count": 25,
"id": "01da39be",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Joke(setup='Why did the cat sit on the computer?', punchline='To keep an eye on the mouse!')"
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"model_with_structure.invoke(\"Tell me a joke about cats\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3066b2af",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.1"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,215 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "0cebf93b",
"metadata": {},
"source": [
"## Fiddler Langchain integration Quick Start Guide\n",
"\n",
"Fiddler is the pioneer in enterprise Generative and Predictive system ops, offering a unified platform that enables Data Science, MLOps, Risk, Compliance, Analytics, and other LOB teams to monitor, explain, analyze, and improve ML deployments at enterprise scale. "
]
},
{
"cell_type": "markdown",
"id": "38d746c2",
"metadata": {},
"source": [
"## 1. Installation and Setup"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e0151955",
"metadata": {},
"outputs": [],
"source": [
"# langchain langchain-community langchain-openai fiddler-client"
]
},
{
"cell_type": "markdown",
"id": "5662f2e5-d510-4eef-b44b-fa929e5b4ad4",
"metadata": {},
"source": [
"## 2. Fiddler connection details "
]
},
{
"cell_type": "markdown",
"id": "64fac323",
"metadata": {},
"source": [
"*Before you can add information about your model with Fiddler*\n",
"\n",
"1. The URL you're using to connect to Fiddler\n",
"2. Your organization ID\n",
"3. Your authorization token\n",
"\n",
"These can be found by navigating to the *Settings* page of your Fiddler environment."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f6f8b73e-d350-40f0-b7a4-fb1e68a65a22",
"metadata": {},
"outputs": [],
"source": [
"URL = \"\" # Your Fiddler instance URL, Make sure to include the full URL (including https://). For example: https://demo.fiddler.ai\n",
"ORG_NAME = \"\"\n",
"AUTH_TOKEN = \"\" # Your Fiddler instance auth token\n",
"\n",
"# Fiddler project and model names, used for model registration\n",
"PROJECT_NAME = \"\"\n",
"MODEL_NAME = \"\" # Model name in Fiddler"
]
},
{
"cell_type": "markdown",
"id": "0645805a",
"metadata": {},
"source": [
"## 3. Create a fiddler callback handler instance"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "13de4f9a",
"metadata": {},
"outputs": [],
"source": [
"from langchain_community.callbacks.fiddler_callback import FiddlerCallbackHandler\n",
"\n",
"fiddler_handler = FiddlerCallbackHandler(\n",
" url=URL,\n",
" org=ORG_NAME,\n",
" project=PROJECT_NAME,\n",
" model=MODEL_NAME,\n",
" api_key=AUTH_TOKEN,\n",
")"
]
},
{
"cell_type": "markdown",
"id": "2276368e-f1dc-46be-afe3-18796e7a66f2",
"metadata": {},
"source": [
"## Example 1 : Basic Chain"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c9de0fd1",
"metadata": {},
"outputs": [],
"source": [
"from langchain_core.output_parsers import StrOutputParser\n",
"from langchain_openai import OpenAI\n",
"\n",
"# Note : Make sure openai API key is set in the environment variable OPENAI_API_KEY\n",
"llm = OpenAI(temperature=0, streaming=True, callbacks=[fiddler_handler])\n",
"output_parser = StrOutputParser()\n",
"\n",
"chain = llm | output_parser\n",
"\n",
"# Invoke the chain. Invocation will be logged to Fiddler, and metrics automatically generated\n",
"chain.invoke(\"How far is moon from earth?\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "309bde0b-e1ce-446c-98ac-3690c26a2676",
"metadata": {},
"outputs": [],
"source": [
"# Few more invocations\n",
"chain.invoke(\"What is the temperature on Mars?\")\n",
"chain.invoke(\"How much is 2 + 200000?\")\n",
"chain.invoke(\"Which movie won the oscars this year?\")\n",
"chain.invoke(\"Can you write me a poem about insomnia?\")\n",
"chain.invoke(\"How are you doing today?\")\n",
"chain.invoke(\"What is the meaning of life?\")"
]
},
{
"cell_type": "markdown",
"id": "48fa4782-c867-4510-9430-4ffa3de3b5eb",
"metadata": {},
"source": [
"## Example 2 : Chain with prompt templates"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2aa2c220-8946-4844-8d3c-8f69d744d13f",
"metadata": {},
"outputs": [],
"source": [
"from langchain.prompts import (\n",
" ChatPromptTemplate,\n",
" FewShotChatMessagePromptTemplate,\n",
")\n",
"\n",
"examples = [\n",
" {\"input\": \"2+2\", \"output\": \"4\"},\n",
" {\"input\": \"2+3\", \"output\": \"5\"},\n",
"]\n",
"\n",
"example_prompt = ChatPromptTemplate.from_messages(\n",
" [\n",
" (\"human\", \"{input}\"),\n",
" (\"ai\", \"{output}\"),\n",
" ]\n",
")\n",
"\n",
"few_shot_prompt = FewShotChatMessagePromptTemplate(\n",
" example_prompt=example_prompt,\n",
" examples=examples,\n",
")\n",
"\n",
"final_prompt = ChatPromptTemplate.from_messages(\n",
" [\n",
" (\"system\", \"You are a wondrous wizard of math.\"),\n",
" few_shot_prompt,\n",
" (\"human\", \"{input}\"),\n",
" ]\n",
")\n",
"\n",
"# Note : Make sure openai API key is set in the environment variable OPENAI_API_KEY\n",
"llm = OpenAI(temperature=0, streaming=True, callbacks=[fiddler_handler])\n",
"\n",
"chain = final_prompt | llm\n",
"\n",
"# Invoke the chain. Invocation will be logged to Fiddler, and metrics automatically generated\n",
"chain.invoke({\"input\": \"What's the square of a triangle?\"})"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.0"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -17,40 +17,44 @@
"source": [
"# ChatAnthropic\n",
"\n",
"This notebook covers how to get started with Anthropic chat models."
"This notebook covers how to get started with Anthropic chat models.\n",
"\n",
"## Setup\n",
"\n",
"For setup instructions, please see the Installation and Environment Setup sections of the [Anthropic Platform page](/docs/integrations/platforms/anthropic.mdx)."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "d4a7c55d-b235-4ca4-a579-c90cc9570da9",
"metadata": {
"ExecuteTime": {
"end_time": "2024-01-19T11:25:00.590587Z",
"start_time": "2024-01-19T11:25:00.127293Z"
},
"tags": []
},
"execution_count": null,
"id": "91be2e12",
"metadata": {},
"outputs": [],
"source": [
"from langchain_community.chat_models import ChatAnthropic\n",
"from langchain_core.prompts import ChatPromptTemplate"
"%pip install -qU langchain-anthropic"
]
},
{
"cell_type": "markdown",
"id": "584ed5ec",
"metadata": {},
"source": [
"## Environment Setup\n",
"\n",
"We'll need to get a [Anthropic](https://console.anthropic.com/settings/keys) and set the `ANTHROPIC_API_KEY` environment variable:"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "70cf04e8-423a-4ff6-8b09-f11fb711c817",
"metadata": {
"ExecuteTime": {
"end_time": "2024-01-19T11:25:04.349676Z",
"start_time": "2024-01-19T11:25:03.964930Z"
},
"tags": []
},
"execution_count": null,
"id": "01578ae3",
"metadata": {},
"outputs": [],
"source": [
"chat = ChatAnthropic(temperature=0, model_name=\"claude-2\")"
"import os\n",
"from getpass import getpass\n",
"\n",
"os.environ[\"ANTHROPIC_API_KEY\"] = getpass()"
]
},
{
@@ -82,7 +86,9 @@
"outputs": [
{
"data": {
"text/plain": "AIMessage(content=' 저는 파이썬을 좋아합니다.')"
"text/plain": [
"AIMessage(content=' 저는 파이썬을 좋아합니다.')"
]
},
"execution_count": 3,
"metadata": {},
@@ -90,6 +96,11 @@
}
],
"source": [
"from langchain_anthropic import ChatAnthropic\n",
"from langchain_core.prompts import ChatPromptTemplate\n",
"\n",
"chat = ChatAnthropic(temperature=0, model_name=\"claude-2\")\n",
"\n",
"system = (\n",
" \"You are a helpful assistant that translates {input_language} to {output_language}.\"\n",
")\n",
@@ -128,7 +139,9 @@
"outputs": [
{
"data": {
"text/plain": "AIMessage(content=\" Why don't bears like fast food? Because they can't catch it!\")"
"text/plain": [
"AIMessage(content=\" Why don't bears like fast food? Because they can't catch it!\")"
]
},
"execution_count": 4,
"metadata": {},
@@ -189,154 +202,6 @@
"for chunk in chain.stream({}):\n",
" print(chunk.content, end=\"\", flush=True)"
]
},
{
"cell_type": "markdown",
"id": "3737fc8d",
"metadata": {},
"source": [
"# ChatAnthropicMessages\n",
"\n",
"LangChain also offers the beta Anthropic Messages endpoint through the new `langchain-anthropic` package."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c253883f",
"metadata": {},
"outputs": [],
"source": [
"%pip install --upgrade --quiet langchain-anthropic"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "07c47c2a",
"metadata": {
"ExecuteTime": {
"end_time": "2024-01-19T11:25:25.288133Z",
"start_time": "2024-01-19T11:25:24.438968Z"
}
},
"outputs": [
{
"data": {
"text/plain": "AIMessage(content='파이썬을 사랑합니다.')"
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from langchain_anthropic import ChatAnthropicMessages\n",
"\n",
"chat = ChatAnthropicMessages(model_name=\"claude-instant-1.2\")\n",
"system = (\n",
" \"You are a helpful assistant that translates {input_language} to {output_language}.\"\n",
")\n",
"human = \"{text}\"\n",
"prompt = ChatPromptTemplate.from_messages([(\"system\", system), (\"human\", human)])\n",
"\n",
"chain = prompt | chat\n",
"chain.invoke(\n",
" {\n",
" \"input_language\": \"English\",\n",
" \"output_language\": \"Korean\",\n",
" \"text\": \"I love Python\",\n",
" }\n",
")"
]
},
{
"cell_type": "markdown",
"id": "19e53d75935143fd",
"metadata": {
"collapsed": false
},
"source": [
"ChatAnthropicMessages also requires the anthropic_api_key argument, or the ANTHROPIC_API_KEY environment variable must be set. \n",
"\n",
"ChatAnthropicMessages also supports async and streaming functionality:"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "e20a139d30e3d333",
"metadata": {
"ExecuteTime": {
"end_time": "2024-01-19T11:25:26.012325Z",
"start_time": "2024-01-19T11:25:25.288358Z"
},
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": "AIMessage(content='파이썬을 사랑합니다.')"
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"await chain.ainvoke(\n",
" {\n",
" \"input_language\": \"English\",\n",
" \"output_language\": \"Korean\",\n",
" \"text\": \"I love Python\",\n",
" }\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "6f34f1073d7e7120",
"metadata": {
"ExecuteTime": {
"end_time": "2024-01-19T11:25:28.323455Z",
"start_time": "2024-01-19T11:25:26.012040Z"
},
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Here are some of the most famous tourist attractions in Japan:\n",
"\n",
"- Tokyo Tower - A communication and observation tower in Tokyo modeled after the Eiffel Tower. It offers stunning views of the city.\n",
"\n",
"- Mount Fuji - Japan's highest and most famous mountain. It's a iconic symbol of Japan and a UNESCO World Heritage Site. \n",
"\n",
"- Itsukushima Shrine (Miyajima) - A shrine located on an island in Hiroshima prefecture, known for its \"floating\" torii gate that seems to float on water during high tide.\n",
"\n",
"- Himeji Castle - A UNESCO World Heritage Site famous for having withstood numerous battles without destruction to its intricate white walls and sloping, triangular roofs. \n",
"\n",
"- Kawaguchiko Station - Near Mount Fuji, this area is known for its scenic Fuji Five Lakes region. \n",
"\n",
"- Hiroshima Peace Memorial Park and Museum - Commemorates the world's first atomic bombing in Hiroshima on August 6, 1945. \n",
"\n",
"- Arashiyama Bamboo Grove - A renowned bamboo forest located in Kyoto that draws many visitors.\n",
"\n",
"- Kegon Falls - One of Japan's largest waterfalls"
]
}
],
"source": [
"prompt = ChatPromptTemplate.from_messages(\n",
" [(\"human\", \"Give me a list of famous tourist attractions in Japan\")]\n",
")\n",
"chain = prompt | chat\n",
"for chunk in chain.stream({}):\n",
" print(chunk.content, end=\"\", flush=True)"
]
}
],
"metadata": {

View File

@@ -23,6 +23,14 @@
"This example goes over how to use LangChain to interact with `ChatFireworks` models."
]
},
{
"cell_type": "raw",
"id": "4a7c795e",
"metadata": {},
"source": [
"%pip install langchain-fireworks"
]
},
{
"cell_type": "code",
"execution_count": 1,
@@ -35,10 +43,8 @@
},
"outputs": [],
"source": [
"import os\n",
"\n",
"from langchain_community.chat_models.fireworks import ChatFireworks\n",
"from langchain_core.messages import HumanMessage, SystemMessage"
"from langchain_core.messages import HumanMessage, SystemMessage\n",
"from langchain_fireworks import ChatFireworks"
]
},
{
@@ -48,7 +54,7 @@
"source": [
"# Setup\n",
"\n",
"1. Make sure the `fireworks-ai` package is installed in your environment.\n",
"1. Make sure the `langchain-fireworks` package is installed in your environment.\n",
"2. Sign in to [Fireworks AI](http://fireworks.ai) for the an API Key to access our models, and make sure it is set as the `FIREWORKS_API_KEY` environment variable.\n",
"3. Set up your model using a model id. If the model is not set, the default model is fireworks-llama-v2-7b-chat. See the full, most up-to-date model list on [app.fireworks.ai](https://app.fireworks.ai)."
]
@@ -67,7 +73,7 @@
" os.environ[\"FIREWORKS_API_KEY\"] = getpass.getpass(\"Fireworks API Key:\")\n",
"\n",
"# Initialize a Fireworks chat model\n",
"chat = ChatFireworks(model=\"accounts/fireworks/models/llama-v2-13b-chat\")"
"chat = ChatFireworks(model=\"accounts/fireworks/models/mixtral-8x7b-instruct\")"
]
},
{
@@ -82,17 +88,17 @@
},
{
"cell_type": "code",
"execution_count": 5,
"execution_count": 3,
"id": "72340871-ae2f-415f-b399-0777d32dc379",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"AIMessage(content=\"Hello! My name is LLaMA, I'm a large language model trained by a team of researcher at Meta AI. My primary function is to assist and converse with users like you, answering questions and engaging in discussion to the best of my ability. I'm here to help and provide information on a wide range of topics, so feel free to ask me anything!\", additional_kwargs={}, example=False)"
"AIMessage(content=\"Hello! I'm an AI language model, a helpful assistant designed to chat and assist you with any questions or information you might need. I'm here to make your experience as smooth and enjoyable as possible. How can I assist you today?\")"
]
},
"execution_count": 5,
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
@@ -102,22 +108,22 @@
"system_message = SystemMessage(content=\"You are to chat with the user.\")\n",
"human_message = HumanMessage(content=\"Who are you?\")\n",
"\n",
"chat([system_message, human_message])"
"chat.invoke([system_message, human_message])"
]
},
{
"cell_type": "code",
"execution_count": 6,
"execution_count": 5,
"id": "68c6b1fa-2ff7-4a63-8d88-3cec302180b8",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"AIMessage(content=\"Oh hello there! *giggle* It's such a beautiful day today, isn\", additional_kwargs={}, example=False)"
"AIMessage(content=\"I'm an AI and do not have the ability to experience the weather firsthand. However,\")"
]
},
"execution_count": 6,
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
@@ -125,200 +131,70 @@
"source": [
"# Setting additional parameters: temperature, max_tokens, top_p\n",
"chat = ChatFireworks(\n",
" model=\"accounts/fireworks/models/llama-v2-13b-chat\",\n",
" model_kwargs={\"temperature\": 1, \"max_tokens\": 20, \"top_p\": 1},\n",
" model=\"accounts/fireworks/models/mixtral-8x7b-instruct\",\n",
" temperature=1,\n",
" max_tokens=20,\n",
")\n",
"system_message = SystemMessage(content=\"You are to chat with the user.\")\n",
"human_message = HumanMessage(content=\"How's the weather today?\")\n",
"chat([system_message, human_message])"
"chat.invoke([system_message, human_message])"
]
},
{
"cell_type": "markdown",
"id": "d93aa186-39cf-4e1a-aa32-01ed31d43bc8",
"id": "8c44cb36",
"metadata": {},
"source": [
"# Simple Chat Chain"
]
},
{
"cell_type": "markdown",
"id": "28763fbc",
"metadata": {},
"source": [
"You can use chat models on fireworks, with system prompts and memory."
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "cbe29efc-37c3-4c83-8b84-b8bba1a1e589",
"metadata": {},
"outputs": [],
"source": [
"from langchain.memory import ConversationBufferMemory\n",
"from langchain_community.chat_models import ChatFireworks\n",
"from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder\n",
"from langchain_core.runnables import RunnablePassthrough\n",
"# Tool Calling\n",
"\n",
"llm = ChatFireworks(\n",
" model=\"accounts/fireworks/models/llama-v2-13b-chat\",\n",
" model_kwargs={\"temperature\": 0, \"max_tokens\": 64, \"top_p\": 1.0},\n",
")\n",
"prompt = ChatPromptTemplate.from_messages(\n",
" [\n",
" (\"system\", \"You are a helpful chatbot that speaks like a pirate.\"),\n",
" MessagesPlaceholder(variable_name=\"history\"),\n",
" (\"human\", \"{input}\"),\n",
" ]\n",
")"
]
},
{
"cell_type": "markdown",
"id": "02991e05-a38e-47d4-9ab3-7e630a8ead55",
"metadata": {},
"source": [
"Initially, there is no chat memory"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "e2fd186f",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'history': []}"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"memory = ConversationBufferMemory(return_messages=True)\n",
"memory.load_memory_variables({})"
]
},
{
"cell_type": "markdown",
"id": "bee461da",
"metadata": {},
"source": [
"Create a simple chain with memory"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "86972e54",
"metadata": {},
"outputs": [],
"source": [
"chain = (\n",
" RunnablePassthrough.assign(\n",
" history=memory.load_memory_variables | (lambda x: x[\"history\"])\n",
" )\n",
" | prompt\n",
" | llm.bind(stop=[\"\\n\\n\"])\n",
")"
]
},
{
"cell_type": "markdown",
"id": "f48cb142",
"metadata": {},
"source": [
"Run the chain with a simple question, expecting an answer aligned with the system message provided."
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "db3ad5b1",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"AIMessage(content=\"Ahoy there, me hearty! Yer a fine lookin' swashbuckler, I can see that! *adjusts eye patch* What be bringin' ye to these waters? Are ye here to plunder some booty or just to enjoy the sea breeze?\", additional_kwargs={}, example=False)"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"inputs = {\"input\": \"hi im bob\"}\n",
"response = chain.invoke(inputs)\n",
"response"
]
},
{
"cell_type": "markdown",
"id": "338f4bae",
"metadata": {},
"source": [
"Save the memory context, then read it back to inspect contents"
"Fireworks offers the [`FireFunction-v1` tool calling model](https://fireworks.ai/blog/firefunction-v1-gpt-4-level-function-calling). You can use it for structured output and function calling use cases:"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "257eec01",
"id": "ee2db682",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'history': [HumanMessage(content='hi im bob', additional_kwargs={}, example=False),\n",
" AIMessage(content=\"Ahoy there, me hearty! Yer a fine lookin' swashbuckler, I can see that! *adjusts eye patch* What be bringin' ye to these waters? Are ye here to plunder some booty or just to enjoy the sea breeze?\", additional_kwargs={}, example=False)]}"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
"name": "stdout",
"output_type": "stream",
"text": [
"{'function': {'arguments': '{\"name\": \"Erick\", \"age\": 27}',\n",
" 'name': 'ExtractFields'},\n",
" 'id': 'call_J0WYP2TLenaFw3UeVU0UnWqx',\n",
" 'index': 0,\n",
" 'type': 'function'}\n"
]
}
],
"source": [
"memory.save_context(inputs, {\"output\": response.content})\n",
"memory.load_memory_variables({})"
]
},
{
"cell_type": "markdown",
"id": "08441347",
"metadata": {},
"source": [
"Now as another question that requires use of the memory."
"from pprint import pprint\n",
"\n",
"from langchain_core.pydantic_v1 import BaseModel\n",
"\n",
"\n",
"class ExtractFields(BaseModel):\n",
" name: str\n",
" age: int\n",
"\n",
"\n",
"chat = ChatFireworks(\n",
" model=\"accounts/fireworks/models/firefunction-v1\",\n",
").bind_tools([ExtractFields])\n",
"\n",
"result = chat.invoke(\"I am a 27 year old named Erick\")\n",
"\n",
"pprint(result.additional_kwargs[\"tool_calls\"][0])"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "7f5f2820",
"execution_count": null,
"id": "2321a4e6",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"AIMessage(content=\"Arrrr, ye be askin' about yer name, eh? Well, me matey, I be knowin' ye as Bob, the scurvy dog! *winks* But if ye want me to call ye somethin' else, just let me know, and I\", additional_kwargs={}, example=False)"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"inputs = {\"input\": \"whats my name\"}\n",
"chain.invoke(inputs)"
]
"outputs": [],
"source": []
}
],
"metadata": {
@@ -337,7 +213,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.16"
"version": "3.11.4"
}
},
"nbformat": 4,

View File

@@ -98,9 +98,7 @@
"prompt = ChatPromptTemplate.from_messages([(\"system\", system), (\"human\", human)])\n",
"\n",
"chain = prompt | chat\n",
"chain.invoke({\n",
" \"text\": \"Explain the importance of low latency LLMs.\"\n",
"})"
"chain.invoke({\"text\": \"Explain the importance of low latency LLMs.\"})"
]
},
{

View File

@@ -0,0 +1,292 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "1f3a5ebf",
"metadata": {},
"source": [
"# AirbyteLoader"
]
},
{
"cell_type": "markdown",
"id": "35ac77b1-449b-44f7-b8f3-3494d55c286e",
"metadata": {},
"source": [
">[Airbyte](https://github.com/airbytehq/airbyte) is a data integration platform for ELT pipelines from APIs, databases & files to warehouses & lakes. It has the largest catalog of ELT connectors to data warehouses and databases.\n",
"\n",
"This covers how to load any source from Airbyte into LangChain documents\n",
"\n",
"## Installation\n",
"\n",
"In order to use `AirbyteLoader` you need to install the `langchain-airbyte` integration package."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "180c8b74",
"metadata": {},
"outputs": [],
"source": [
"% pip install -qU langchain-airbyte"
]
},
{
"cell_type": "markdown",
"id": "3dd92c62",
"metadata": {},
"source": [
"## Loading Documents\n",
"\n",
"By default, the `AirbyteLoader` will load any structured data from a stream and output yaml-formatted documents."
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "721d9316",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"```yaml\n",
"academic_degree: PhD\n",
"address:\n",
" city: Lauderdale Lakes\n",
" country_code: FI\n",
" postal_code: '75466'\n",
" province: New Jersey\n",
" state: Hawaii\n",
" street_name: Stoneyford\n",
" street_number: '1112'\n",
"age: 44\n",
"blood_type: \"O\\u2212\"\n",
"created_at: '2004-04-02T13:05:27+00:00'\n",
"email: bread2099+1@outlook.com\n",
"gender: Fluid\n",
"height: '1.62'\n",
"id: 1\n",
"language: Belarusian\n",
"name: Moses\n",
"nationality: Dutch\n",
"occupation: Track Worker\n",
"telephone: 1-467-194-2318\n",
"title: M.Sc.Tech.\n",
"updated_at: '2024-02-27T16:41:01+00:00'\n",
"weight: 6\n"
]
}
],
"source": [
"from langchain_airbyte import AirbyteLoader\n",
"\n",
"loader = AirbyteLoader(\n",
" source=\"source-faker\",\n",
" stream=\"users\",\n",
" config={\"count\": 10},\n",
")\n",
"docs = loader.load()\n",
"print(docs[0].page_content[:500])"
]
},
{
"cell_type": "markdown",
"id": "fca024cb",
"metadata": {
"scrolled": true
},
"source": [
"You can also specify a custom prompt template for formatting documents:"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "9fa002a5",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"My name is Verdie and I am 1.73 meters tall.\n"
]
}
],
"source": [
"from langchain_core.prompts import PromptTemplate\n",
"\n",
"loader_templated = AirbyteLoader(\n",
" source=\"source-faker\",\n",
" stream=\"users\",\n",
" config={\"count\": 10},\n",
" template=PromptTemplate.from_template(\n",
" \"My name is {name} and I am {height} meters tall.\"\n",
" ),\n",
")\n",
"docs_templated = loader_templated.load()\n",
"print(docs_templated[0].page_content)"
]
},
{
"cell_type": "markdown",
"id": "d3e6d887",
"metadata": {},
"source": [
"## Lazy Loading Documents\n",
"\n",
"One of the powerful features of `AirbyteLoader` is its ability to load large documents from upstream sources. When working with large datasets, the default `.load()` behavior can be slow and memory-intensive. To avoid this, you can use the `.lazy_load()` method to load documents in a more memory-efficient manner."
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "684b9187",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Just calling lazy load is quick! This took 0.0001 seconds\n"
]
}
],
"source": [
"import time\n",
"\n",
"loader = AirbyteLoader(\n",
" source=\"source-faker\",\n",
" stream=\"users\",\n",
" config={\"count\": 3},\n",
" template=PromptTemplate.from_template(\n",
" \"My name is {name} and I am {height} meters tall.\"\n",
" ),\n",
")\n",
"\n",
"start_time = time.time()\n",
"my_iterator = loader.lazy_load()\n",
"print(\n",
" f\"Just calling lazy load is quick! This took {time.time() - start_time:.4f} seconds\"\n",
")"
]
},
{
"cell_type": "markdown",
"id": "6b24a64b",
"metadata": {},
"source": [
"And you can iterate over documents as they're yielded:"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "3e8355d0",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"My name is Andera and I am 1.91 meters tall.\n",
"My name is Jody and I am 1.85 meters tall.\n",
"My name is Zonia and I am 1.53 meters tall.\n"
]
}
],
"source": [
"for doc in my_iterator:\n",
" print(doc.page_content)"
]
},
{
"cell_type": "markdown",
"id": "d1040d81",
"metadata": {},
"source": [
"You can also lazy load documents in an async manner with `.alazy_load()`:"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "dc5d0911",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"My name is Carmelina and I am 1.74 meters tall.\n",
"My name is Ali and I am 1.90 meters tall.\n",
"My name is Rochell and I am 1.83 meters tall.\n"
]
}
],
"source": [
"loader = AirbyteLoader(\n",
" source=\"source-faker\",\n",
" stream=\"users\",\n",
" config={\"count\": 3},\n",
" template=PromptTemplate.from_template(\n",
" \"My name is {name} and I am {height} meters tall.\"\n",
" ),\n",
")\n",
"\n",
"my_async_iterator = loader.alazy_load()\n",
"\n",
"async for doc in my_async_iterator:\n",
" print(doc.page_content)"
]
},
{
"cell_type": "markdown",
"id": "ba4ede33",
"metadata": {},
"source": [
"## Configuration\n",
"\n",
"`AirbyteLoader` can be configured with the following options:\n",
"\n",
"- `source` (str, required): The name of the Airbyte source to load from.\n",
"- `stream` (str, required): The name of the stream to load from (Airbyte sources can return multiple streams)\n",
"- `config` (dict, required): The configuration for the Airbyte source\n",
"- `template` (PromptTemplate, optional): A custom prompt template for formatting documents\n",
"- `include_metadata` (bool, optional, default True): Whether to include all fields as metadata in the output documents\n",
"\n",
"The majority of the configuration will be in `config`, and you can find the specific configuration options in the \"Config field reference\" for each source in the [Airbyte documentation](https://docs.airbyte.com/integrations/)."
]
},
{
"cell_type": "markdown",
"id": "2e2ed269",
"metadata": {},
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.4"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,40 @@
-- Provisioning table "mlb_teams_2012".
--
-- psql postgresql://postgres@localhost < mlb_teams_2012.sql
DROP TABLE IF EXISTS mlb_teams_2012;
CREATE TABLE mlb_teams_2012 ("Team" VARCHAR, "Payroll (millions)" FLOAT, "Wins" BIGINT);
INSERT INTO mlb_teams_2012
("Team", "Payroll (millions)", "Wins")
VALUES
('Nationals', 81.34, 98),
('Reds', 82.20, 97),
('Yankees', 197.96, 95),
('Giants', 117.62, 94),
('Braves', 83.31, 94),
('Athletics', 55.37, 94),
('Rangers', 120.51, 93),
('Orioles', 81.43, 93),
('Rays', 64.17, 90),
('Angels', 154.49, 89),
('Tigers', 132.30, 88),
('Cardinals', 110.30, 88),
('Dodgers', 95.14, 86),
('White Sox', 96.92, 85),
('Brewers', 97.65, 83),
('Phillies', 174.54, 81),
('Diamondbacks', 74.28, 81),
('Pirates', 63.43, 79),
('Padres', 55.24, 76),
('Mariners', 81.97, 75),
('Mets', 93.35, 74),
('Blue Jays', 75.48, 73),
('Royals', 60.91, 72),
('Marlins', 118.07, 69),
('Red Sox', 173.18, 69),
('Indians', 78.43, 68),
('Twins', 94.08, 66),
('Rockies', 78.06, 64),
('Cubs', 88.19, 61),
('Astros', 60.65, 55)
;

View File

@@ -0,0 +1,380 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "E_RJy7C1bpCT"
},
"source": [
"# Google AlloyDB for PostgreSQL\n",
"\n",
"> [AlloyDB](https://cloud.google.com/alloydb) is a fully managed relational database service that offers high performance, seamless integration, and impressive scalability. AlloyDB is 100% compatible with PostgreSQL. Extend your database application to build AI-powered experiences leveraging AlloyDB's Langchain integrations.\n",
"\n",
"This notebook goes over how to use `AlloyDB for PostgreSQL` to load Documents with the `AlloyDBLoader` class."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "xjcxaw6--Xyy"
},
"source": [
"## Before you begin\n",
"\n",
"To run this notebook, you will need to do the following:\n",
"\n",
" * [Create a Google Cloud Project](https://developers.google.com/workspace/guides/create-project)\n",
" * [Enable the AlloyDB Admin API.](https://console.cloud.google.com/flows/enableapi?apiid=alloydb.googleapis.com)\n",
" * [Create a AlloyDB cluster and instance.](https://cloud.google.com/alloydb/docs/cluster-create)\n",
" * [Create a AlloyDB database.](https://cloud.google.com/alloydb/docs/quickstart/create-and-connect)\n",
" * [Add a User to the database.](https://cloud.google.com/alloydb/docs/database-users/about)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "IR54BmgvdHT_"
},
"source": [
"### 🦜🔗 Library Installation\n",
"Install the integration library, `langchain-google-alloydb-pg`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 1000
},
"id": "0ZITIDE160OD",
"outputId": "90e0636e-ff34-4e1e-ad37-d2a6db4a317e"
},
"outputs": [],
"source": [
"%pip install --upgrade --quiet langchain-google-alloydb-pg"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "v40bB_GMcr9f"
},
"source": [
"**Colab only:** Uncomment the following cell to restart the kernel or use the button to restart the kernel. For Vertex AI Workbench you can restart the terminal using the button on top."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "6o0iGVIdDD6K"
},
"outputs": [],
"source": [
"# # Automatically restart kernel after installs so that your environment can access the new packages\n",
"# import IPython\n",
"\n",
"# app = IPython.Application.instance()\n",
"# app.kernel.do_shutdown(True)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "cTXTbj4UltKf"
},
"source": [
"### 🔐 Authentication\n",
"Authenticate to Google Cloud as the IAM user logged into this notebook in order to access your Google Cloud Project.\n",
"\n",
"* If you are using Colab to run this notebook, use the cell below and continue.\n",
"* If you are using Vertex AI Workbench, check out the setup instructions [here](https://github.com/GoogleCloudPlatform/generative-ai/tree/main/setup-env)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from google.colab import auth\n",
"\n",
"auth.authenticate_user()"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Uj02bMRAc9_c"
},
"source": [
"### ☁ Set Your Google Cloud Project\n",
"Set your Google Cloud project so that you can leverage Google Cloud resources within this notebook.\n",
"\n",
"If you don't know your project ID, try the following:\n",
"\n",
"* Run `gcloud config list`.\n",
"* Run `gcloud projects list`.\n",
"* See the support page: [Locate the project ID](https://support.google.com/googleapi/answer/7014113)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "wnp1R1PYc9_c",
"outputId": "6502c721-a2fd-451f-b946-9f7b850d5966"
},
"outputs": [],
"source": [
"# @title Project { display-mode: \"form\" }\n",
"PROJECT_ID = \"gcp_project_id\" # @param {type:\"string\"}\n",
"\n",
"# Set the project id\n",
"! gcloud config set project {PROJECT_ID}"
]
},
{
"cell_type": "markdown",
"id": "rEWWNoNnKOgq",
"metadata": {
"id": "rEWWNoNnKOgq"
},
"source": [
"### 💡 API Enablement\n",
"The `langchain-google-alloydb-pg` package requires that you [enable the AlloyDB Admin API](https://console.cloud.google.com/flows/enableapi?apiid=alloydb.googleapis.com) in your Google Cloud Project."
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "5utKIdq7KYi5",
"metadata": {
"id": "5utKIdq7KYi5"
},
"outputs": [],
"source": [
"# enable AlloyDB Admin API\n",
"!gcloud services enable alloydb.googleapis.com"
]
},
{
"cell_type": "markdown",
"id": "f8f2830ee9ca1e01",
"metadata": {
"id": "f8f2830ee9ca1e01"
},
"source": [
"## Basic Usage"
]
},
{
"cell_type": "markdown",
"id": "OMvzMWRrR6n7",
"metadata": {
"id": "OMvzMWRrR6n7"
},
"source": [
"### Set AlloyDB database variables\n",
"Find your database values, in the [AlloyDB Instances page](https://console.cloud.google.com/alloydb/clusters)."
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "irl7eMFnSPZr",
"metadata": {
"id": "irl7eMFnSPZr"
},
"outputs": [],
"source": [
"# @title Set Your Values Here { display-mode: \"form\" }\n",
"REGION = \"us-central1\" # @param {type: \"string\"}\n",
"CLUSTER = \"my-cluster\" # @param {type: \"string\"}\n",
"INSTANCE = \"my-primary\" # @param {type: \"string\"}\n",
"DATABASE = \"my-database\" # @param {type: \"string\"}\n",
"TABLE_NAME = \"vector_store\" # @param {type: \"string\"}"
]
},
{
"cell_type": "markdown",
"id": "QuQigs4UoFQ2",
"metadata": {
"id": "QuQigs4UoFQ2"
},
"source": [
"### AlloyDBEngine Connection Pool\n",
"\n",
"One of the requirements and arguments to establish AlloyDB as a vector store is a `AlloyDBEngine` object. The `AlloyDBEngine` configures a connection pool to your AlloyDB database, enabling successful connections from your application and following industry best practices.\n",
"\n",
"To create a `AlloyDBEngine` using `AlloyDBEngine.from_instance()` you need to provide only 5 things:\n",
"\n",
"1. `project_id` : Project ID of the Google Cloud Project where the AlloyDB instance is located.\n",
"1. `region` : Region where the AlloyDB instance is located.\n",
"1. `cluster`: The name of the AlloyDB cluster.\n",
"1. `instance` : The name of the AlloyDB instance.\n",
"1. `database` : The name of the database to connect to on the AlloyDB instance.\n",
"\n",
"By default, [IAM database authentication](https://cloud.google.com/alloydb/docs/connect-iam) will be used as the method of database authentication. This library uses the IAM principal belonging to the [Application Default Credentials (ADC)](https://cloud.google.com/docs/authentication/application-default-credentials) sourced from the environment.\n",
"\n",
"Optionally, [built-in database authentication](https://cloud.google.com/alloydb/docs/database-users/about) using a username and password to access the AlloyDB database can also be used. Just provide the optional `user` and `password` arguments to `AlloyDBEngine.from_instance()`:\n",
"\n",
"* `user` : Database user to use for built-in database authentication and login\n",
"* `password` : Database password to use for built-in database authentication and login.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Note**: This tutorial demonstrates the async interface. All async methods have corresponding sync methods."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain_google_alloydb_pg import AlloyDBEngine\n",
"\n",
"engine = await AlloyDBEngine.afrom_instance(\n",
" project_id=PROJECT_ID,\n",
" region=REGION,\n",
" cluster=CLUSTER,\n",
" instance=INSTANCE,\n",
" database=DATABASE,\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "e1tl0aNx7SWy"
},
"source": [
"### Create AlloyDBLoader"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "z-AZyzAQ7bsf"
},
"outputs": [],
"source": [
"from langchain_google_alloydb_pg import AlloyDBLoader\n",
"\n",
"# Creating a basic AlloyDBLoader object\n",
"loader = await AlloyDBLoader.create(engine, table_name=TABLE_NAME)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "PeOMpftjc9_e"
},
"source": [
"### Load Documents via default table\n",
"The loader returns a list of Documents from the table using the first column as page_content and all other columns as metadata. The default table will have the first column as\n",
"page_content and the second column as metadata (JSON). Each row becomes a document."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "cwvi_O5Wc9_e"
},
"outputs": [],
"source": [
"docs = await loader.aload()\n",
"print(docs)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "kSkL9l1Hc9_e"
},
"source": [
"### Load documents via custom table/metadata or custom page content columns"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"loader = await AlloyDBLoader.create(\n",
" engine,\n",
" table_name=TABLE_NAME,\n",
" content_columns=[\"product_name\"], # Optional\n",
" metadata_columns=[\"id\"], # Optional\n",
")\n",
"docs = await loader.aload()\n",
"print(docs)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "5R6h0_Cvc9_f"
},
"source": [
"### Set page content format\n",
"The loader returns a list of Documents, with one document per row, with page content in specified string format, i.e. text (space separated concatenation), JSON, YAML, CSV, etc. JSON and YAML formats include headers, while text and CSV do not include field headers.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "NGNdS7cqc9_f"
},
"outputs": [],
"source": [
"loader = AlloyDBLoader.create(\n",
" engine,\n",
" table_name=\"products\",\n",
" content_columns=[\"product_name\", \"description\"],\n",
" format=\"YAML\",\n",
")\n",
"docs = await loader.aload()\n",
"print(docs)"
]
}
],
"metadata": {
"colab": {
"provenance": [],
"toc_visible": true
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.6"
}
},
"nbformat": 4,
"nbformat_minor": 4
}

View File

@@ -0,0 +1,469 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Google Bigtable\n",
"\n",
"> [Bigtable](https://cloud.google.com/bigtable) is a key-value and wide-column store, ideal for fast access to structured, semi-structured, or unstructured data. Extend your database application to build AI-powered experiences leveraging Bigtable's Langchain integrations.\n",
"\n",
"This notebook goes over how to use [Bigtable](https://cloud.google.com/bigtable) to [save, load and delete langchain documents](https://python.langchain.com/docs/modules/data_connection/document_loaders/) with `BigtableLoader` and `BigtableSaver`.\n",
"\n",
"[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/googleapis/langchain-google-bigtable-python/blob/main/docs/document_loader.ipynb)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Before You Begin\n",
"\n",
"To run this notebook, you will need to do the following:\n",
"\n",
"* [Create a Google Cloud Project](https://developers.google.com/workspace/guides/create-project)\n",
"* [Create a Bigtable instance](https://cloud.google.com/bigtable/docs/creating-instance)\n",
"* [Create a Bigtable table](https://cloud.google.com/bigtable/docs/managing-tables)\n",
"* [Create Bigtable access credentials](https://developers.google.com/workspace/guides/create-credentials)\n",
"\n",
"After confirmed access to database in the runtime environment of this notebook, filling the following values and run the cell before running example scripts."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# @markdown Please specify an instance and a table for demo purpose.\n",
"INSTANCE_ID = \"my_instance\" # @param {type:\"string\"}\n",
"TABLE_ID = \"my_table\" # @param {type:\"string\"}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 🦜🔗 Library Installation\n",
"\n",
"The integration lives in its own `langchain-google-bigtable` package, so we need to install it."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%pip install -upgrade --quiet langchain-google-bigtable"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Colab only**: Uncomment the following cell to restart the kernel or use the button to restart the kernel. For Vertex AI Workbench you can restart the terminal using the button on top."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# # Automatically restart kernel after installs so that your environment can access the new packages\n",
"# import IPython\n",
"\n",
"# app = IPython.Application.instance()\n",
"# app.kernel.do_shutdown(True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### ☁ Set Your Google Cloud Project\n",
"Set your Google Cloud project so that you can leverage Google Cloud resources within this notebook.\n",
"\n",
"If you don't know your project ID, try the following:\n",
"\n",
"* Run `gcloud config list`.\n",
"* Run `gcloud projects list`.\n",
"* See the support page: [Locate the project ID](https://support.google.com/googleapi/answer/7014113)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# @markdown Please fill in the value below with your Google Cloud project ID and then run the cell.\n",
"\n",
"PROJECT_ID = \"my-project-id\" # @param {type:\"string\"}\n",
"\n",
"# Set the project id\n",
"!gcloud config set project {PROJECT_ID}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 🔐 Authentication\n",
"\n",
"Authenticate to Google Cloud as the IAM user logged into this notebook in order to access your Google Cloud Project.\n",
"\n",
"- If you are using Colab to run this notebook, use the cell below and continue.\n",
"- If you are using Vertex AI Workbench, check out the setup instructions [here](https://github.com/GoogleCloudPlatform/generative-ai/tree/main/setup-env)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from google.colab import auth\n",
"\n",
"auth.authenticate_user()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Basic Usage"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Using the saver\n",
"\n",
"Save langchain documents with `BigtableSaver.add_documents(<documents>)`. To initialize `BigtableSaver` class you need to provide 2 things:\n",
"\n",
"1. `instance_id` - An instance of Bigtable.\n",
"1. `table_id` - The name of the table within the Bigtable to store langchain documents."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain_core.documents import Document\n",
"from langchain_google_bigtable import BigtableSaver\n",
"\n",
"test_docs = [\n",
" Document(\n",
" page_content=\"Apple Granny Smith 150 0.99 1\",\n",
" metadata={\"fruit_id\": 1},\n",
" ),\n",
" Document(\n",
" page_content=\"Banana Cavendish 200 0.59 0\",\n",
" metadata={\"fruit_id\": 2},\n",
" ),\n",
" Document(\n",
" page_content=\"Orange Navel 80 1.29 1\",\n",
" metadata={\"fruit_id\": 3},\n",
" ),\n",
"]\n",
"\n",
"saver = BigtableSaver(\n",
" instance_id=INSTANCE_ID,\n",
" table_id=TABLE_ID,\n",
")\n",
"\n",
"saver.add_documents(test_docs)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Querying for Documents from Bigtable\n",
"For more details on connecting to a Bigtable table, please check the [Python SDK documentation](https://cloud.google.com/python/docs/reference/bigtable/latest/client)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Load documents from table\n",
"\n",
"Load langchain documents with `BigtableLoader.load()` or `BigtableLoader.lazy_load()`. `lazy_load` returns a generator that only queries database during the iteration. To initialize `BigtableLoader` class you need to provide:\n",
"\n",
"1. `instance_id` - An instance of Bigtable.\n",
"1. `table_id` - The name of the table within the Bigtable to store langchain documents."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain_google_bigtable import BigtableLoader\n",
"\n",
"loader = BigtableLoader(\n",
" instance_id=INSTANCE_ID,\n",
" table_id=TABLE_ID,\n",
")\n",
"\n",
"for doc in loader.lazy_load():\n",
" print(doc)\n",
" break"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Delete documents\n",
"\n",
"Delete a list of langchain documents from Bigtable table with `BigtableSaver.delete(<documents>)`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain_google_bigtable import BigtableSaver\n",
"\n",
"docs = loader.load()\n",
"print(\"Documents before delete: \", docs)\n",
"\n",
"onedoc = test_docs[0]\n",
"saver.delete([onedoc])\n",
"print(\"Documents after delete: \", loader.load())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Advanced Usage"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Limiting the returned rows\n",
"There are two ways to limit the returned rows:\n",
"\n",
"1. Using a [filter](https://cloud.google.com/python/docs/reference/bigtable/latest/row-filters)\n",
"2. Using a [row_set](https://cloud.google.com/python/docs/reference/bigtable/latest/row-set#google.cloud.bigtable.row_set.RowSet)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import google.cloud.bigtable.row_filters as row_filters\n",
"\n",
"filter_loader = BigtableLoader(\n",
" INSTANCE_ID, TABLE_ID, filter=row_filters.ColumnQualifierRegexFilter(b\"os_build\")\n",
")\n",
"\n",
"\n",
"from google.cloud.bigtable.row_set import RowSet\n",
"\n",
"row_set = RowSet()\n",
"row_set.add_row_range_from_keys(\n",
" start_key=\"phone#4c410523#20190501\", end_key=\"phone#4c410523#201906201\"\n",
")\n",
"\n",
"row_set_loader = BigtableLoader(\n",
" INSTANCE_ID,\n",
" TABLE_ID,\n",
" row_set=row_set,\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Custom client\n",
"The client created by default is the default client, using only admin=True option. To use a non-default, a [custom client](https://cloud.google.com/python/docs/reference/bigtable/latest/client#class-googlecloudbigtableclientclientprojectnone-credentialsnone-readonlyfalse-adminfalse-clientinfonone-clientoptionsnone-adminclientoptionsnone-channelnone) can be passed to the constructor."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from google.cloud import bigtable\n",
"\n",
"custom_client_loader = BigtableLoader(\n",
" INSTANCE_ID,\n",
" TABLE_ID,\n",
" client=bigtable.Client(...),\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Custom content\n",
"The BigtableLoader assumes there is a column family called `langchain`, that has a column called `content`, that contains values encoded in UTF-8. These defaults can be changed like so:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain_google_bigtable import Encoding\n",
"\n",
"custom_content_loader = BigtableLoader(\n",
" INSTANCE_ID,\n",
" TABLE_ID,\n",
" content_encoding=Encoding.ASCII,\n",
" content_column_family=\"my_content_family\",\n",
" content_column_name=\"my_content_column_name\",\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Metadata mapping\n",
"By default, the `metadata` map on the `Document` object will contain a single key, `rowkey`, with the value of the row's rowkey value. To add more items to that map, use metadata_mapping."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import json\n",
"\n",
"from langchain_google_bigtable import MetadataMapping\n",
"\n",
"metadata_mapping_loader = BigtableLoader(\n",
" INSTANCE_ID,\n",
" TABLE_ID,\n",
" metadata_mappings=[\n",
" MetadataMapping(\n",
" column_family=\"my_int_family\",\n",
" column_name=\"my_int_column\",\n",
" metadata_key=\"key_in_metadata_map\",\n",
" encoding=Encoding.INT_BIG_ENDIAN,\n",
" ),\n",
" MetadataMapping(\n",
" column_family=\"my_custom_family\",\n",
" column_name=\"my_custom_column\",\n",
" metadata_key=\"custom_key\",\n",
" encoding=Encoding.CUSTOM,\n",
" custom_decoding_func=lambda input: json.loads(input.decode()),\n",
" custom_encoding_func=lambda input: str.encode(json.dumps(input)),\n",
" ),\n",
" ],\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Metadata as JSON\n",
"\n",
"If there is a column in Bigtable that contains a JSON string that you would like to have added to the output document metadata, it is possible to add the following parameters to BigtableLoader. Note, the default value for `metadata_as_json_encoding` is UTF-8."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"metadata_as_json_loader = BigtableLoader(\n",
" INSTANCE_ID,\n",
" TABLE_ID,\n",
" metadata_as_json_encoding=Encoding.ASCII,\n",
" metadata_as_json_family=\"my_metadata_as_json_family\",\n",
" metadata_as_json_name=\"my_metadata_as_json_column_name\",\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Customize BigtableSaver\n",
"\n",
"The BigtableSaver is also customizable similar to BigtableLoader."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"saver = BigtableSaver(\n",
" INSTANCE_ID,\n",
" TABLE_ID,\n",
" client=bigtable.Client(...),\n",
" content_encoding=Encoding.ASCII,\n",
" content_column_family=\"my_content_family\",\n",
" content_column_name=\"my_content_column_name\",\n",
" metadata_mappings=[\n",
" MetadataMapping(\n",
" column_family=\"my_int_family\",\n",
" column_name=\"my_int_column\",\n",
" metadata_key=\"key_in_metadata_map\",\n",
" encoding=Encoding.INT_BIG_ENDIAN,\n",
" ),\n",
" MetadataMapping(\n",
" column_family=\"my_custom_family\",\n",
" column_name=\"my_custom_column\",\n",
" metadata_key=\"custom_key\",\n",
" encoding=Encoding.CUSTOM,\n",
" custom_decoding_func=lambda input: json.loads(input.decode()),\n",
" custom_encoding_func=lambda input: str.encode(json.dumps(input)),\n",
" ),\n",
" ],\n",
" metadata_as_json_encoding=Encoding.ASCII,\n",
" metadata_as_json_family=\"my_metadata_as_json_family\",\n",
" metadata_as_json_name=\"my_metadata_as_json_column_name\",\n",
")"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.6"
}
},
"nbformat": 4,
"nbformat_minor": 4
}

View File

@@ -0,0 +1,629 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Google Cloud SQL for SQL Server\n",
"\n",
"> [Cloud SQL](https://cloud.google.com/sql) is a fully managed relational database service that offers high performance, seamless integration, and impressive scalability. It offers [MySQL](https://cloud.google.com/sql/mysql), [PostgreSQL](https://cloud.google.com/sql/postgres), and [SQL Server](https://cloud.google.com/sql/sqlserver) database engines. Extend your database application to build AI-powered experiences leveraging Cloud SQL's Langchain integrations.\n",
"\n",
"This notebook goes over how to use [Cloud SQL for SQL Server](https://cloud.google.com/sql/sqlserver) to [save, load and delete langchain documents](https://python.langchain.com/docs/modules/data_connection/document_loaders/) with `MSSQLLoader` and `MSSQLDocumentSaver`.\n",
"\n",
"[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/googleapis/langchain-google-cloud-sql-mssql-python/blob/main/docs/document_loader.ipynb)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Before You Begin\n",
"\n",
"To run this notebook, you will need to do the following:\n",
"\n",
"* [Create a Google Cloud Project](https://developers.google.com/workspace/guides/create-project)\n",
"* [Create a Cloud SQL for SQL Server instance](https://cloud.google.com/sql/docs/sqlserver/create-instance)\n",
"* [Create a Cloud SQL database](https://cloud.google.com/sql/docs/mssql/create-manage-databases)\n",
"* [Add an IAM database user to the database](https://cloud.google.com/sql/docs/sqlserver/add-manage-iam-users#creating-a-database-user) (Optional)\n",
"\n",
"After confirmed access to database in the runtime environment of this notebook, filling the following values and run the cell before running example scripts."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# @markdown Please fill in the both the Google Cloud region and name of your Cloud SQL instance.\n",
"REGION = \"us-central1\" # @param {type:\"string\"}\n",
"INSTANCE = \"test-instance\" # @param {type:\"string\"}\n",
"\n",
"# @markdown Please fill in user name and password of your Cloud SQL instance.\n",
"DB_USER = \"sqlserver\" # @param {type:\"string\"}\n",
"DB_PASS = \"password\" # @param {type:\"string\"}\n",
"\n",
"# @markdown Please specify a database and a table for demo purpose.\n",
"DATABASE = \"test\" # @param {type:\"string\"}\n",
"TABLE_NAME = \"test-default\" # @param {type:\"string\"}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 🦜🔗 Library Installation\n",
"\n",
"The integration lives in its own `langchain-google-cloud-sql-mssql` package, so we need to install it."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%pip install --upgrade --quiet langchain-google-cloud-sql-mssql"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Colab only**: Uncomment the following cell to restart the kernel or use the button to restart the kernel. For Vertex AI Workbench you can restart the terminal using the button on top."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# # Automatically restart kernel after installs so that your environment can access the new packages\n",
"# import IPython\n",
"\n",
"# app = IPython.Application.instance()\n",
"# app.kernel.do_shutdown(True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 🔐 Authentication\n",
"\n",
"Authenticate to Google Cloud as the IAM user logged into this notebook in order to access your Google Cloud Project.\n",
"\n",
"- If you are using Colab to run this notebook, use the cell below and continue.\n",
"- If you are using Vertex AI Workbench, check out the setup instructions [here](https://github.com/GoogleCloudPlatform/generative-ai/tree/main/setup-env)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from google.colab import auth\n",
"\n",
"auth.authenticate_user()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### ☁ Set Your Google Cloud Project\n",
"Set your Google Cloud project so that you can leverage Google Cloud resources within this notebook.\n",
"\n",
"If you don't know your project ID, try the following:\n",
"\n",
"* Run `gcloud config list`.\n",
"* Run `gcloud projects list`.\n",
"* See the support page: [Locate the project ID](https://support.google.com/googleapi/answer/7014113)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# @markdown Please fill in the value below with your Google Cloud project ID and then run the cell.\n",
"\n",
"PROJECT_ID = \"my-project-id\" # @param {type:\"string\"}\n",
"\n",
"# Set the project id\n",
"!gcloud config set project {PROJECT_ID}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 💡 API Enablement\n",
"The `langchain-google-cloud-sql-mssql` package requires that you [enable the Cloud SQL Admin API](https://console.cloud.google.com/flows/enableapi?apiid=sqladmin.googleapis.com) in your Google Cloud Project."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# enable Cloud SQL Admin API\n",
"!gcloud services enable sqladmin.googleapis.com"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Basic Usage"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### MSSQLEngine Connection Pool\n",
"\n",
"Before saving or loading documents from MSSQL table, we need first configures a connection pool to Cloud SQL database. The `MSSQLEngine` configures a [SQLAlchemy connection pool](https://docs.sqlalchemy.org/en/20/core/pooling.html#module-sqlalchemy.pool) to your Cloud SQL database, enabling successful connections from your application and following industry best practices.\n",
"\n",
"To create a `MSSQLEngine` using `MSSQLEngine.from_instance()` you need to provide only 6 things:\n",
"\n",
"1. `project_id` : Project ID of the Google Cloud Project where the Cloud SQL instance is located.\n",
"1. `region` : Region where the Cloud SQL instance is located.\n",
"1. `instance` : The name of the Cloud SQL instance.\n",
"1. `database` : The name of the database to connect to on the Cloud SQL instance.\n",
"1. `user` : Database user to use for built-in database authentication and login.\n",
"1. `password` : Database password to use for built-in database authentication and login."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain_google_cloud_sql_mssql import MSSQLEngine\n",
"\n",
"engine = MSSQLEngine.from_instance(\n",
" project_id=PROJECT_ID,\n",
" region=REGION,\n",
" instance=INSTANCE,\n",
" database=DATABASE,\n",
" user=DB_USER,\n",
" password=DB_PASS,\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Initialize a table\n",
"\n",
"Initialize a table of default schema via `MSSQLEngine.init_document_table(<table_name>)`. Table Columns:\n",
"- page_content (type: text)\n",
"- langchain_metadata (type: JSON)\n",
"\n",
"`overwrite_existing=True` flag means the newly initialized table will replace any existing table of the same name."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"engine.init_document_table(TABLE_NAME, overwrite_existing=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Save documents\n",
"\n",
"Save langchain documents with `MSSQLDocumentSaver.add_documents(<documents>)`. To initialize `MSSQLDocumentSaver` class you need to provide 2 things:\n",
"1. `engine` - An instance of a `MSSQLEngine` engine.\n",
"2. `table_name` - The name of the table within the Cloud SQL database to store langchain documents."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain_core.documents import Document\n",
"from langchain_google_cloud_sql_mssql import MSSQLDocumentSaver\n",
"\n",
"test_docs = [\n",
" Document(\n",
" page_content=\"Apple Granny Smith 150 0.99 1\",\n",
" metadata={\"fruit_id\": 1},\n",
" ),\n",
" Document(\n",
" page_content=\"Banana Cavendish 200 0.59 0\",\n",
" metadata={\"fruit_id\": 2},\n",
" ),\n",
" Document(\n",
" page_content=\"Orange Navel 80 1.29 1\",\n",
" metadata={\"fruit_id\": 3},\n",
" ),\n",
"]\n",
"saver = MSSQLDocumentSaver(engine=engine, table_name=TABLE_NAME)\n",
"saver.add_documents(test_docs)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Load documents"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Load langchain documents with `MSSQLLoader.load()` or `MSSQLLoader.lazy_load()`. `lazy_load` returns a generator that only queries database during the iteration. To initialize `MSSQLDocumentSaver` class you need to provide:\n",
"1. `engine` - An instance of a `MSSQLEngine` engine.\n",
"2. `table_name` - The name of the table within the Cloud SQL database to store langchain documents."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain_google_cloud_sql_mssql import MSSQLLoader\n",
"\n",
"loader = MSSQLLoader(engine=engine, table_name=TABLE_NAME)\n",
"docs = loader.lazy_load()\n",
"for doc in docs:\n",
" print(\"Loaded documents:\", doc)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Load documents via query"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Other than loading documents from a table, we can also choose to load documents from a view generated from a SQL query. For example:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain_google_cloud_sql_mssql import MSSQLLoader\n",
"\n",
"loader = MSSQLLoader(\n",
" engine=engine,\n",
" query=f\"select * from \\\"{TABLE_NAME}\\\" where JSON_VALUE(langchain_metadata, '$.fruit_id') = 1;\",\n",
")\n",
"onedoc = loader.load()\n",
"onedoc"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The view generated from SQL query can have different schema than default table. In such cases, the behavior of MSSQLLoader is the same as loading from table with non-default schema. Please refer to section [Load documents with customized document page content & metadata](#Load-documents-with-customized-document-page-content-&-metadata)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Delete documents"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Delete a list of langchain documents from MSSQL table with `MSSQLDocumentSaver.delete(<documents>)`.\n",
"\n",
"For table with default schema (page_content, langchain_metadata), the deletion criteria is:\n",
"\n",
"A `row` should be deleted if there exists a `document` in the list, such that\n",
"- `document.page_content` equals `row[page_content]`\n",
"- `document.metadata` equals `row[langchain_metadata]`"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain_google_cloud_sql_mssql import MSSQLLoader\n",
"\n",
"loader = MSSQLLoader(engine=engine, table_name=TABLE_NAME)\n",
"docs = loader.load()\n",
"print(\"Documents before delete:\", docs)\n",
"saver.delete(onedoc)\n",
"print(\"Documents after delete:\", loader.load())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Advanced Usage"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Load documents with customized document page content & metadata"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"First we prepare an example table with non-default schema, and populate it with some arbitary data."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import sqlalchemy\n",
"\n",
"with engine.connect() as conn:\n",
" conn.execute(sqlalchemy.text(f'DROP TABLE IF EXISTS \"{TABLE_NAME}\"'))\n",
" conn.commit()\n",
" conn.execute(\n",
" sqlalchemy.text(\n",
" f\"\"\"\n",
" IF NOT EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[{TABLE_NAME}]') AND type in (N'U'))\n",
" BEGIN\n",
" CREATE TABLE [dbo].[{TABLE_NAME}](\n",
" fruit_id INT IDENTITY(1,1) PRIMARY KEY,\n",
" fruit_name VARCHAR(100) NOT NULL,\n",
" variety VARCHAR(50),\n",
" quantity_in_stock INT NOT NULL,\n",
" price_per_unit DECIMAL(6,2) NOT NULL,\n",
" organic BIT NOT NULL\n",
" )\n",
" END\n",
" \"\"\"\n",
" )\n",
" )\n",
" conn.execute(\n",
" sqlalchemy.text(\n",
" f\"\"\"\n",
" INSERT INTO \"{TABLE_NAME}\" (fruit_name, variety, quantity_in_stock, price_per_unit, organic)\n",
" VALUES\n",
" ('Apple', 'Granny Smith', 150, 0.99, 1),\n",
" ('Banana', 'Cavendish', 200, 0.59, 0),\n",
" ('Orange', 'Navel', 80, 1.29, 1);\n",
" \"\"\"\n",
" )\n",
" )\n",
" conn.commit()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If we still load langchain documents with default parameters of `MSSQLLoader` from this example table, the `page_content` of loaded documents will be the first column of the table, and `metadata` will be consisting of key-value pairs of all the other columns."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"loader = MSSQLLoader(\n",
" engine=engine,\n",
" table_name=TABLE_NAME,\n",
")\n",
"loader.load()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can specify the content and metadata we want to load by setting the `content_columns` and `metadata_columns` when initializing the `MSSQLLoader`.\n",
"1. `content_columns`: The columns to write into the `page_content` of the document.\n",
"2. `metadata_columns`: The columns to write into the `metadata` of the document.\n",
"\n",
"For example here, the values of columns in `content_columns` will be joined together into a space-separated string, as `page_content` of loaded documents, and `metadata` of loaded documents will only contain key-value pairs of columns specified in `metadata_columns`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"loader = MSSQLLoader(\n",
" engine=engine,\n",
" table_name=TABLE_NAME,\n",
" content_columns=[\n",
" \"variety\",\n",
" \"quantity_in_stock\",\n",
" \"price_per_unit\",\n",
" \"organic\",\n",
" ],\n",
" metadata_columns=[\"fruit_id\", \"fruit_name\"],\n",
")\n",
"loader.load()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Save document with customized page content & metadata"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In order to save langchain document into table with customized metadata fields. We need first create such a table via `MSSQLEngine.init_document_table()`, and specify the list of `metadata_columns` we want it to have. In this example, the created table will have table columns:\n",
"- description (type: text): for storing fruit description.\n",
"- fruit_name (type text): for storing fruit name.\n",
"- organic (type tinyint(1)): to tell if the fruit is organic.\n",
"- other_metadata (type: JSON): for storing other metadata information of the fruit.\n",
"\n",
"We can use the following parameters with `MSSQLEngine.init_document_table()` to create the table:\n",
"1. `table_name`: The name of the table within the Cloud SQL database to store langchain documents.\n",
"2. `metadata_columns`: A list of `sqlalchemy.Column` indicating the list of metadata columns we need.\n",
"3. `content_column`: The name of column to store `page_content` of langchain document. Default: `page_content`.\n",
"4. `metadata_json_column`: The name of JSON column to store extra `metadata` of langchain document. Default: `langchain_metadata`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"engine.init_document_table(\n",
" TABLE_NAME,\n",
" metadata_columns=[\n",
" sqlalchemy.Column(\n",
" \"fruit_name\",\n",
" sqlalchemy.UnicodeText,\n",
" primary_key=False,\n",
" nullable=True,\n",
" ),\n",
" sqlalchemy.Column(\n",
" \"organic\",\n",
" sqlalchemy.Boolean,\n",
" primary_key=False,\n",
" nullable=True,\n",
" ),\n",
" ],\n",
" content_column=\"description\",\n",
" metadata_json_column=\"other_metadata\",\n",
" overwrite_existing=True,\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Save documents with `MSSQLDocumentSaver.add_documents(<documents>)`. As you can see in this example, \n",
"- `document.page_content` will be saved into `description` column.\n",
"- `document.metadata.fruit_name` will be saved into `fruit_name` column.\n",
"- `document.metadata.organic` will be saved into `organic` column.\n",
"- `document.metadata.fruit_id` will be saved into `other_metadata` column in JSON format."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"test_docs = [\n",
" Document(\n",
" page_content=\"Granny Smith 150 0.99\",\n",
" metadata={\"fruit_id\": 1, \"fruit_name\": \"Apple\", \"organic\": 1},\n",
" ),\n",
"]\n",
"saver = MSSQLDocumentSaver(\n",
" engine=engine,\n",
" table_name=TABLE_NAME,\n",
" content_column=\"description\",\n",
" metadata_json_column=\"other_metadata\",\n",
")\n",
"saver.add_documents(test_docs)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"with engine.connect() as conn:\n",
" result = conn.execute(sqlalchemy.text(f'select * from \"{TABLE_NAME}\";'))\n",
" print(result.keys())\n",
" print(result.fetchall())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Delete documents with customized page content & metadata"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can also delete documents from table with customized metadata columns via `MSSQLDocumentSaver.delete(<documents>)`. The deletion criteria is:\n",
"\n",
"A `row` should be deleted if there exists a `document` in the list, such that\n",
"- `document.page_content` equals `row[page_content]`\n",
"- For every metadata field `k` in `document.metadata`\n",
" - `document.metadata[k]` equals `row[k]` or `document.metadata[k]` equals `row[langchain_metadata][k]`\n",
"- There no extra metadata field presents in `row` but not in `document.metadata`.\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"loader = MSSQLLoader(engine=engine, table_name=TABLE_NAME)\n",
"docs = loader.load()\n",
"print(\"Documents before delete:\", docs)\n",
"saver.delete(docs)\n",
"print(\"Documents after delete:\", loader.load())"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.6"
}
},
"nbformat": 4,
"nbformat_minor": 4
}

View File

@@ -0,0 +1,642 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Google Cloud SQL for MySQL\n",
"\n",
"> [Cloud SQL](https://cloud.google.com/sql) is a fully managed relational database service that offers high performance, seamless integration, and impressive scalability. It offers [MySQL](https://cloud.google.com/sql/mysql), [PostgreSQL](https://cloud.google.com/sql/postgres), and [SQL Server](https://cloud.google.com/sql/sqlserver) database engines. Extend your database application to build AI-powered experiences leveraging Cloud SQL's Langchain integrations.\n",
"\n",
"This notebook goes over how to use [Cloud SQL for MySQL](https://cloud.google.com/sql/mysql) to [save, load and delete langchain documents](https://python.langchain.com/docs/modules/data_connection/document_loaders/) with `MySQLLoader` and `MySQLDocumentSaver`.\n",
"\n",
"[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/googleapis/langchain-google-cloud-sql-mysql-python/blob/main/docs/document_loader.ipynb)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Before You Begin\n",
"\n",
"To run this notebook, you will need to do the following:\n",
"\n",
"* [Create a Google Cloud Project](https://developers.google.com/workspace/guides/create-project)\n",
"* [Create a Cloud SQL for MySQL instance](https://cloud.google.com/sql/docs/mysql/create-instance)\n",
"* [Create a Cloud SQL database](https://cloud.google.com/sql/docs/mysql/create-manage-databases)\n",
"* [Add an IAM database user to the database](https://cloud.google.com/sql/docs/mysql/add-manage-iam-users#creating-a-database-user) (Optional)\n",
"\n",
"After confirmed access to database in the runtime environment of this notebook, filling the following values and run the cell before running example scripts."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"# @markdown Please fill in the both the Google Cloud region and name of your Cloud SQL instance.\n",
"REGION = \"us-central1\" # @param {type:\"string\"}\n",
"INSTANCE = \"test-instance\" # @param {type:\"string\"}\n",
"\n",
"# @markdown Please specify a database and a table for demo purpose.\n",
"DATABASE = \"test\" # @param {type:\"string\"}\n",
"TABLE_NAME = \"test-default\" # @param {type:\"string\"}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 🦜🔗 Library Installation\n",
"\n",
"The integration lives in its own `langchain-google-cloud-sql-mysql` package, so we need to install it."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"%pip install -upgrade --quiet langchain-google-cloud-sql-mysql"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Colab only**: Uncomment the following cell to restart the kernel or use the button to restart the kernel. For Vertex AI Workbench you can restart the terminal using the button on top."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# # Automatically restart kernel after installs so that your environment can access the new packages\n",
"# import IPython\n",
"\n",
"# app = IPython.Application.instance()\n",
"# app.kernel.do_shutdown(True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### ☁ Set Your Google Cloud Project\n",
"Set your Google Cloud project so that you can leverage Google Cloud resources within this notebook.\n",
"\n",
"If you don't know your project ID, try the following:\n",
"\n",
"* Run `gcloud config list`.\n",
"* Run `gcloud projects list`.\n",
"* See the support page: [Locate the project ID](https://support.google.com/googleapi/answer/7014113)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# @markdown Please fill in the value below with your Google Cloud project ID and then run the cell.\n",
"\n",
"PROJECT_ID = \"my-project-id\" # @param {type:\"string\"}\n",
"\n",
"# Set the project id\n",
"!gcloud config set project {PROJECT_ID}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 🔐 Authentication\n",
"\n",
"Authenticate to Google Cloud as the IAM user logged into this notebook in order to access your Google Cloud Project.\n",
"\n",
"- If you are using Colab to run this notebook, use the cell below and continue.\n",
"- If you are using Vertex AI Workbench, check out the setup instructions [here](https://github.com/GoogleCloudPlatform/generative-ai/tree/main/setup-env)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from google.colab import auth\n",
"\n",
"auth.authenticate_user()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### API Enablement\n",
"The `langchain-google-cloud-sql-mysql` package requires that you [enable the Cloud SQL Admin API](https://console.cloud.google.com/flows/enableapi?apiid=sqladmin.googleapis.com) in your Google Cloud Project."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# enable Cloud SQL Admin API\n",
"!gcloud services enable sqladmin.googleapis.com"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Basic Usage"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### MySQLEngine Connection Pool\n",
"\n",
"Before saving or loading documents from MySQL table, we need first configures a connection pool to Cloud SQL database. The `MySQLEngine` configures a connection pool to your Cloud SQL database, enabling successful connections from your application and following industry best practices.\n",
"\n",
"To create a `MySQLEngine` using `MySQLEngine.from_instance()` you need to provide only 4 things:\n",
"\n",
"1. `project_id` : Project ID of the Google Cloud Project where the Cloud SQL instance is located.\n",
"2. `region` : Region where the Cloud SQL instance is located.\n",
"3. `instance` : The name of the Cloud SQL instance.\n",
"4. `database` : The name of the database to connect to on the Cloud SQL instance.\n",
"\n",
"By default, [IAM database authentication](https://cloud.google.com/sql/docs/mysql/iam-authentication#iam-db-auth) will be used as the method of database authentication. This library uses the IAM principal belonging to the [Application Default Credentials (ADC)](https://cloud.google.com/docs/authentication/application-default-credentials) sourced from the envionment.\n",
"\n",
"For more informatin on IAM database authentication please see:\n",
"\n",
"* [Configure an instance for IAM database authentication](https://cloud.google.com/sql/docs/mysql/create-edit-iam-instances)\n",
"* [Manage users with IAM database authentication](https://cloud.google.com/sql/docs/mysql/add-manage-iam-users)\n",
"\n",
"Optionally, [built-in database authentication](https://cloud.google.com/sql/docs/mysql/built-in-authentication) using a username and password to access the Cloud SQL database can also be used. Just provide the optional `user` and `password` arguments to `MySQLEngine.from_instance()`:\n",
"\n",
"* `user` : Database user to use for built-in database authentication and login\n",
"* `password` : Database password to use for built-in database authentication and login."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain_google_cloud_sql_mysql import MySQLEngine\n",
"\n",
"engine = MySQLEngine.from_instance(\n",
" project_id=PROJECT_ID, region=REGION, instance=INSTANCE, database=DATABASE\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Initialize a table\n",
"\n",
"Initialize a table of default schema via `MySQLEngine.init_document_table(<table_name>)`. Table Columns:\n",
"\n",
"- page_content (type: text)\n",
"- langchain_metadata (type: JSON)\n",
"\n",
"`overwrite_existing=True` flag means the newly initialized table will replace any existing table of the same name."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"engine.init_document_table(TABLE_NAME, overwrite_existing=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Save documents\n",
"\n",
"Save langchain documents with `MySQLDocumentSaver.add_documents(<documents>)`. To initialize `MySQLDocumentSaver` class you need to provide 2 things:\n",
"\n",
"1. `engine` - An instance of a `MySQLEngine` engine.\n",
"2. `table_name` - The name of the table within the Cloud SQL database to store langchain documents."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"from langchain_core.documents import Document\n",
"from langchain_google_cloud_sql_mysql import MySQLDocumentSaver\n",
"\n",
"test_docs = [\n",
" Document(\n",
" page_content=\"Apple Granny Smith 150 0.99 1\",\n",
" metadata={\"fruit_id\": 1},\n",
" ),\n",
" Document(\n",
" page_content=\"Banana Cavendish 200 0.59 0\",\n",
" metadata={\"fruit_id\": 2},\n",
" ),\n",
" Document(\n",
" page_content=\"Orange Navel 80 1.29 1\",\n",
" metadata={\"fruit_id\": 3},\n",
" ),\n",
"]\n",
"saver = MySQLDocumentSaver(engine=engine, table_name=TABLE_NAME)\n",
"saver.add_documents(test_docs)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Load documents"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Load langchain documents with `MySQLLoader.load()` or `MySQLLoader.lazy_load()`. `lazy_load` returns a generator that only queries database during the iteration. To initialize `MySQLLoader` class you need to provide:\n",
"\n",
"1. `engine` - An instance of a `MySQLEngine` engine.\n",
"2. `table_name` - The name of the table within the Cloud SQL database to store langchain documents."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain_google_cloud_sql_mysql import MySQLLoader\n",
"\n",
"loader = MySQLLoader(engine=engine, table_name=TABLE_NAME)\n",
"docs = loader.lazy_load()\n",
"for doc in docs:\n",
" print(\"Loaded documents:\", doc)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Load documents via query"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Other than loading documents from a table, we can also choose to load documents from a view generated from a SQL query. For example:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain_google_cloud_sql_mysql import MySQLLoader\n",
"\n",
"loader = MySQLLoader(\n",
" engine=engine,\n",
" query=f\"select * from `{TABLE_NAME}` where JSON_EXTRACT(langchain_metadata, '$.fruit_id') = 1;\",\n",
")\n",
"onedoc = loader.load()\n",
"onedoc"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The view generated from SQL query can have different schema than default table. In such cases, the behavior of MySQLLoader is the same as loading from table with non-default schema. Please refer to section [Load documents with customized document page content & metadata](#Load-documents-with-customized-document-page-content-&-metadata)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Delete documents"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Delete a list of langchain documents from MySQL table with `MySQLDocumentSaver.delete(<documents>)`.\n",
"\n",
"For table with default schema (page_content, langchain_metadata), the deletion criteria is:\n",
"\n",
"A `row` should be deleted if there exists a `document` in the list, such that\n",
"\n",
"- `document.page_content` equals `row[page_content]`\n",
"- `document.metadata` equals `row[langchain_metadata]`"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain_google_cloud_sql_mysql import MySQLLoader\n",
"\n",
"loader = MySQLLoader(engine=engine, table_name=TABLE_NAME)\n",
"docs = loader.load()\n",
"print(\"Documents before delete:\", docs)\n",
"saver.delete(onedoc)\n",
"print(\"Documents after delete:\", loader.load())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Advanced Usage"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Load documents with customized document page content & metadata"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"First we prepare an example table with non-default schema, and populate it with some arbitary data."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import sqlalchemy\n",
"\n",
"with engine.connect() as conn:\n",
" conn.execute(sqlalchemy.text(f\"DROP TABLE IF EXISTS `{TABLE_NAME}`\"))\n",
" conn.commit()\n",
" conn.execute(\n",
" sqlalchemy.text(\n",
" f\"\"\"\n",
" CREATE TABLE IF NOT EXISTS `{TABLE_NAME}`(\n",
" fruit_id INT AUTO_INCREMENT PRIMARY KEY,\n",
" fruit_name VARCHAR(100) NOT NULL,\n",
" variety VARCHAR(50),\n",
" quantity_in_stock INT NOT NULL,\n",
" price_per_unit DECIMAL(6,2) NOT NULL,\n",
" organic TINYINT(1) NOT NULL\n",
" )\n",
" \"\"\"\n",
" )\n",
" )\n",
" conn.execute(\n",
" sqlalchemy.text(\n",
" f\"\"\"\n",
" INSERT INTO `{TABLE_NAME}` (fruit_name, variety, quantity_in_stock, price_per_unit, organic)\n",
" VALUES\n",
" ('Apple', 'Granny Smith', 150, 0.99, 1),\n",
" ('Banana', 'Cavendish', 200, 0.59, 0),\n",
" ('Orange', 'Navel', 80, 1.29, 1);\n",
" \"\"\"\n",
" )\n",
" )\n",
" conn.commit()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If we still load langchain documents with default parameters of `MySQLLoader` from this example table, the `page_content` of loaded documents will be the first column of the table, and `metadata` will be consisting of key-value pairs of all the other columns."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"loader = MySQLLoader(\n",
" engine=engine,\n",
" table_name=TABLE_NAME,\n",
")\n",
"loader.load()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can specify the content and metadata we want to load by setting the `content_columns` and `metadata_columns` when initializing the `MySQLLoader`.\n",
"\n",
"1. `content_columns`: The columns to write into the `page_content` of the document.\n",
"2. `metadata_columns`: The columns to write into the `metadata` of the document.\n",
"\n",
"For example here, the values of columns in `content_columns` will be joined together into a space-separated string, as `page_content` of loaded documents, and `metadata` of loaded documents will only contain key-value pairs of columns specified in `metadata_columns`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"loader = MySQLLoader(\n",
" engine=engine,\n",
" table_name=TABLE_NAME,\n",
" content_columns=[\n",
" \"variety\",\n",
" \"quantity_in_stock\",\n",
" \"price_per_unit\",\n",
" \"organic\",\n",
" ],\n",
" metadata_columns=[\"fruit_id\", \"fruit_name\"],\n",
")\n",
"loader.load()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Save document with customized page content & metadata"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In order to save langchain document into table with customized metadata fields. We need first create such a table via `MySQLEngine.init_document_table()`, and specify the list of `metadata_columns` we want it to have. In this example, the created table will have table columns:\n",
"\n",
"- description (type: text): for storing fruit description.\n",
"- fruit_name (type text): for storing fruit name.\n",
"- organic (type tinyint(1)): to tell if the fruit is organic.\n",
"- other_metadata (type: JSON): for storing other metadata information of the fruit.\n",
"\n",
"We can use the following parameters with `MySQLEngine.init_document_table()` to create the table:\n",
"\n",
"1. `table_name`: The name of the table within the Cloud SQL database to store langchain documents.\n",
"2. `metadata_columns`: A list of `sqlalchemy.Column` indicating the list of metadata columns we need.\n",
"3. `content_column`: The name of column to store `page_content` of langchain document. Default: `page_content`.\n",
"4. `metadata_json_column`: The name of JSON column to store extra `metadata` of langchain document. Default: `langchain_metadata`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"engine.init_document_table(\n",
" TABLE_NAME,\n",
" metadata_columns=[\n",
" sqlalchemy.Column(\n",
" \"fruit_name\",\n",
" sqlalchemy.UnicodeText,\n",
" primary_key=False,\n",
" nullable=True,\n",
" ),\n",
" sqlalchemy.Column(\n",
" \"organic\",\n",
" sqlalchemy.Boolean,\n",
" primary_key=False,\n",
" nullable=True,\n",
" ),\n",
" ],\n",
" content_column=\"description\",\n",
" metadata_json_column=\"other_metadata\",\n",
" overwrite_existing=True,\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Save documents with `MySQLDocumentSaver.add_documents(<documents>)`. As you can see in this example, \n",
"\n",
"- `document.page_content` will be saved into `description` column.\n",
"- `document.metadata.fruit_name` will be saved into `fruit_name` column.\n",
"- `document.metadata.organic` will be saved into `organic` column.\n",
"- `document.metadata.fruit_id` will be saved into `other_metadata` column in JSON format."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"test_docs = [\n",
" Document(\n",
" page_content=\"Granny Smith 150 0.99\",\n",
" metadata={\"fruit_id\": 1, \"fruit_name\": \"Apple\", \"organic\": 1},\n",
" ),\n",
"]\n",
"saver = MySQLDocumentSaver(\n",
" engine=engine,\n",
" table_name=TABLE_NAME,\n",
" content_column=\"description\",\n",
" metadata_json_column=\"other_metadata\",\n",
")\n",
"saver.add_documents(test_docs)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"with engine.connect() as conn:\n",
" result = conn.execute(sqlalchemy.text(f\"select * from `{TABLE_NAME}`;\"))\n",
" print(result.keys())\n",
" print(result.fetchall())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Delete documents with customized page content & metadata"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can also delete documents from table with customized metadata columns via `MySQLDocumentSaver.delete(<documents>)`. The deletion criteria is:\n",
"\n",
"A `row` should be deleted if there exists a `document` in the list, such that\n",
"\n",
"- `document.page_content` equals `row[page_content]`\n",
"- For every metadata field `k` in `document.metadata`\n",
" - `document.metadata[k]` equals `row[k]` or `document.metadata[k]` equals `row[langchain_metadata][k]`\n",
"- There no extra metadata field presents in `row` but not in `document.metadata`.\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"loader = MySQLLoader(engine=engine, table_name=TABLE_NAME)\n",
"docs = loader.load()\n",
"print(\"Documents before delete:\", docs)\n",
"saver.delete(docs)\n",
"print(\"Documents after delete:\", loader.load())"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.6"
}
},
"nbformat": 4,
"nbformat_minor": 4
}

View File

@@ -0,0 +1,382 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "E_RJy7C1bpCT"
},
"source": [
"# Google Cloud SQL for PostgreSQL\n",
"\n",
"> [Cloud SQL for PostgreSQL](https://cloud.google.com/sql/docs/postgres) is a fully-managed database service that helps you set up, maintain, manage, and administer your PostgreSQL relational databases on Google Cloud Platform. Extend your database application to build AI-powered experiences leveraging Cloud SQL for PostgreSQL's Langchain integrations.\n",
"\n",
"This notebook goes over how to use `Cloud SQL for PostgreSQL` to load Documents with the `PostgreSQLLoader` class."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "xjcxaw6--Xyy"
},
"source": [
"## Before you begin\n",
"\n",
"To run this notebook, you will need to do the following:\n",
"\n",
" * [Create a Google Cloud Project](https://developers.google.com/workspace/guides/create-project)\n",
" * [Enable the Cloud SQL Admin API.](https://console.cloud.google.com/marketplace/product/google/sqladmin.googleapis.com)\n",
" * [Create a Cloud SQL for PostgreSQL instance.](https://cloud.google.com/sql/docs/postgres/create-instance)\n",
" * [Create a Cloud SQL for PostgreSQL database.](https://cloud.google.com/sql/docs/postgres/create-manage-databases)\n",
" * [Add a User to the database.](https://cloud.google.com/sql/docs/postgres/create-manage-users)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "IR54BmgvdHT_"
},
"source": [
"### 🦜🔗 Library Installation\n",
"Install the integration library, `langchain-google-cloud-sql-pg`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 1000
},
"id": "0ZITIDE160OD",
"outputId": "90e0636e-ff34-4e1e-ad37-d2a6db4a317e"
},
"outputs": [],
"source": [
"%pip install --upgrade --quiet langchain-google-cloud-sql-pg"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "v40bB_GMcr9f"
},
"source": [
"**Colab only:** Uncomment the following cell to restart the kernel or use the button to restart the kernel. For Vertex AI Workbench you can restart the terminal using the button on top."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "6o0iGVIdDD6K"
},
"outputs": [],
"source": [
"# # Automatically restart kernel after installs so that your environment can access the new packages\n",
"# import IPython\n",
"\n",
"# app = IPython.Application.instance()\n",
"# app.kernel.do_shutdown(True)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "cTXTbj4UltKf"
},
"source": [
"### 🔐 Authentication\n",
"Authenticate to Google Cloud as the IAM user logged into this notebook in order to access your Google Cloud Project.\n",
"\n",
"* If you are using Colab to run this notebook, use the cell below and continue.\n",
"* If you are using Vertex AI Workbench, check out the setup instructions [here](https://github.com/GoogleCloudPlatform/generative-ai/tree/main/setup-env)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from google.colab import auth\n",
"\n",
"auth.authenticate_user()"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Uj02bMRAc9_c"
},
"source": [
"### ☁ Set Your Google Cloud Project\n",
"Set your Google Cloud project so that you can leverage Google Cloud resources within this notebook.\n",
"\n",
"If you don't know your project ID, try the following:\n",
"\n",
"* Run `gcloud config list`.\n",
"* Run `gcloud projects list`.\n",
"* See the support page: [Locate the project ID](https://support.google.com/googleapi/answer/7014113)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "wnp1R1PYc9_c",
"outputId": "6502c721-a2fd-451f-b946-9f7b850d5966"
},
"outputs": [],
"source": [
"# @title Project { display-mode: \"form\" }\n",
"PROJECT_ID = \"gcp_project_id\" # @param {type:\"string\"}\n",
"\n",
"# Set the project id\n",
"! gcloud config set project {PROJECT_ID}"
]
},
{
"cell_type": "markdown",
"id": "rEWWNoNnKOgq",
"metadata": {
"id": "rEWWNoNnKOgq"
},
"source": [
"### 💡 API Enablement\n",
"The `langchain_google_cloud_sql_pg` package requires that you [enable the Cloud SQL Admin API](https://console.cloud.google.com/flows/enableapi?apiid=sqladmin.googleapis.com) in your Google Cloud Project."
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "5utKIdq7KYi5",
"metadata": {
"id": "5utKIdq7KYi5"
},
"outputs": [],
"source": [
"# enable Cloud SQL Admin API\n",
"!gcloud services enable sqladmin.googleapis.com"
]
},
{
"cell_type": "markdown",
"id": "f8f2830ee9ca1e01",
"metadata": {
"id": "f8f2830ee9ca1e01"
},
"source": [
"## Basic Usage"
]
},
{
"cell_type": "markdown",
"id": "OMvzMWRrR6n7",
"metadata": {
"id": "OMvzMWRrR6n7"
},
"source": [
"### Set Cloud SQL database values\n",
"Find your database variables, in the [Cloud SQL Instances page](https://console.cloud.google.com/sql/instances)."
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "irl7eMFnSPZr",
"metadata": {
"id": "irl7eMFnSPZr"
},
"outputs": [],
"source": [
"# @title Set Your Values Here { display-mode: \"form\" }\n",
"REGION = \"us-central1\" # @param {type: \"string\"}\n",
"INSTANCE = \"my-primary\" # @param {type: \"string\"}\n",
"DATABASE = \"my-database\" # @param {type: \"string\"}\n",
"TABLE_NAME = \"vector_store\" # @param {type: \"string\"}"
]
},
{
"cell_type": "markdown",
"id": "QuQigs4UoFQ2",
"metadata": {
"id": "QuQigs4UoFQ2"
},
"source": [
"### Cloud SQL Engine\n",
"\n",
"One of the requirements and arguments to establish PostgreSQL as a document loader is a `PostgresEngine` object. The `PostgresEngine` configures a connection pool to your Cloud SQL for PostgreSQL database, enabling successful connections from your application and following industry best practices.\n",
"\n",
"To create a `PostgresEngine` using `PostgresEngine.from_instance()` you need to provide only 4 things:\n",
"\n",
"1. `project_id` : Project ID of the Google Cloud Project where the Cloud SQL instance is located.\n",
"1. `region` : Region where the Cloud SQL instance is located.\n",
"1. `instance` : The name of the Cloud SQL instance.\n",
"1. `database` : The name of the database to connect to on the Cloud SQL instance.\n",
"\n",
"By default, [IAM database authentication](https://cloud.google.com/sql/docs/postgres/iam-authentication) will be used as the method of database authentication. This library uses the IAM principal belonging to the [Application Default Credentials (ADC)](https://cloud.google.com/docs/authentication/application-default-credentials) sourced from the environment.\n",
"\n",
"Optionally, [built-in database authentication](https://cloud.google.com/sql/docs/postgres/users) using a username and password to access the Cloud SQL database can also be used. Just provide the optional `user` and `password` arguments to `PostgresEngine.from_instance()`:\n",
"\n",
"* `user` : Database user to use for built-in database authentication and login\n",
"* `password` : Database password to use for built-in database authentication and login.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Note**: This tutorial demonstrates the async interface. All async methods have corresponding sync methods."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain_google_cloud_sql_pg import PostgresEngine\n",
"\n",
"engine = await PostgresEngine.afrom_instance(\n",
" project_id=PROJECT_ID,\n",
" region=REGION,\n",
" instance=INSTANCE,\n",
" database=DATABASE,\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "e1tl0aNx7SWy"
},
"source": [
"### Create PostgresLoader"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "z-AZyzAQ7bsf"
},
"outputs": [],
"source": [
"from langchain_google_cloud_sql_pg import PostgresLoader\n",
"\n",
"# Creating a basic PostgreSQL object\n",
"loader = await PostgresLoader.create(engine, table_name=TABLE_NAME)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "PeOMpftjc9_e"
},
"source": [
"### Load Documents via default table\n",
"The loader returns a list of Documents from the table using the first column as page_content and all other columns as metadata. The default table will have the first column as\n",
"page_content and the second column as metadata (JSON). Each row becomes a document. Please note that if you want your documents to have ids you will need to add them in."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "cwvi_O5Wc9_e"
},
"outputs": [],
"source": [
"from langchain_google_cloud_sql_pg import PostgresLoader\n",
"\n",
"# Creating a basic PostgresLoader object\n",
"loader = await PostgresLoader.create(engine, table_name=TABLE_NAME)\n",
"\n",
"docs = await loader.aload()\n",
"print(docs)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "kSkL9l1Hc9_e"
},
"source": [
"### Load documents via custom table/metadata or custom page content columns"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"loader = await PostgresLoader.create(\n",
" engine,\n",
" table_name=TABLE_NAME,\n",
" content_columns=[\"product_name\"], # Optional\n",
" metadata_columns=[\"id\"], # Optional\n",
")\n",
"docs = await loader.aload()\n",
"print(docs)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "5R6h0_Cvc9_f"
},
"source": [
"### Set page content format\n",
"The loader returns a list of Documents, with one document per row, with page content in specified string format, i.e. text (space separated concatenation), JSON, YAML, CSV, etc. JSON and YAML formats include headers, while text and CSV do not include field headers.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "NGNdS7cqc9_f"
},
"outputs": [],
"source": [
"loader = await PostgresLoader.create(\n",
" engine,\n",
" table_name=\"products\",\n",
" content_columns=[\"product_name\", \"description\"],\n",
" format=\"YAML\",\n",
")\n",
"docs = await loader.aload()\n",
"print(docs)"
]
}
],
"metadata": {
"colab": {
"provenance": [],
"toc_visible": true
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.6"
}
},
"nbformat": 4,
"nbformat_minor": 4
}

View File

@@ -0,0 +1,411 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Google Firestore in Datastore mode\n",
"\n",
"> [Firestore in Datastore mode](https://cloud.google.com/datastore) is a serverless document-oriented database that scales to meet any demand. Extend your database application to build AI-powered experiences leveraging Datastore's Langchain integrations.\n",
"\n",
"This notebook goes over how to use [Firestore in Datastore mode](https://cloud.google.com/datastore) to [save, load and delete langchain documents](https://python.langchain.com/docs/modules/data_connection/document_loaders/) with `DatastoreLoader` and `DatastoreSaver`.\n",
"\n",
"[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/googleapis/langchain-google-datastore-python/blob/main/docs/document_loader.ipynb)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Before You Begin\n",
"\n",
"To run this notebook, you will need to do the following:\n",
"\n",
"* [Create a Google Cloud Project](https://developers.google.com/workspace/guides/create-project)\n",
"* [Create a Datastore database](https://cloud.google.com/datastore/docs/manage-databases)\n",
"\n",
"After confirmed access to database in the runtime environment of this notebook, filling the following values and run the cell before running example scripts."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# @markdown Please specify a source for demo purpose.\n",
"SOURCE = \"test\" # @param {type:\"Query\"|\"CollectionGroup\"|\"DocumentReference\"|\"string\"}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 🦜🔗 Library Installation\n",
"\n",
"The integration lives in its own `langchain-google-datastore` package, so we need to install it."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"%pip install -upgrade --quiet langchain-google-datastore"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Colab only**: Uncomment the following cell to restart the kernel or use the button to restart the kernel. For Vertex AI Workbench you can restart the terminal using the button on top."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# # Automatically restart kernel after installs so that your environment can access the new packages\n",
"# import IPython\n",
"\n",
"# app = IPython.Application.instance()\n",
"# app.kernel.do_shutdown(True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### ☁ Set Your Google Cloud Project\n",
"Set your Google Cloud project so that you can leverage Google Cloud resources within this notebook.\n",
"\n",
"If you don't know your project ID, try the following:\n",
"\n",
"* Run `gcloud config list`.\n",
"* Run `gcloud projects list`.\n",
"* See the support page: [Locate the project ID](https://support.google.com/googleapi/answer/7014113)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# @markdown Please fill in the value below with your Google Cloud project ID and then run the cell.\n",
"\n",
"PROJECT_ID = \"my-project-id\" # @param {type:\"string\"}\n",
"\n",
"# Set the project id\n",
"!gcloud config set project {PROJECT_ID}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 🔐 Authentication\n",
"\n",
"Authenticate to Google Cloud as the IAM user logged into this notebook in order to access your Google Cloud Project.\n",
"\n",
"- If you are using Colab to run this notebook, use the cell below and continue.\n",
"- If you are using Vertex AI Workbench, check out the setup instructions [here](https://github.com/GoogleCloudPlatform/generative-ai/tree/main/setup-env)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from google.colab import auth\n",
"\n",
"auth.authenticate_user()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### API Enablement\n",
"The `langchain-google-datastore` package requires that you [enable the Datastore API](https://console.cloud.google.com/flows/enableapi?apiid=datastore.googleapis.com) in your Google Cloud Project."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# enable Datastore API\n",
"!gcloud services enable datastore.googleapis.com"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Basic Usage"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Save documents\n",
"\n",
"`DatastoreSaver` can store Documents into Datastore. By default it will try to extract the Document reference from the metadata\n",
"\n",
"Save langchain documents with `DatastoreSaver.upsert_documents(<documents>)`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain_core.documents import Document\n",
"from langchain_google_datastore import DatastoreSaver\n",
"\n",
"data = [Document(page_content=\"Hello, World!\")]\n",
"saver = DatastoreSaver()\n",
"saver.upsert_documents(data)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Save documents without reference\n",
"\n",
"If a collection is specified the documents will be stored with an auto generated id."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"saver = DatastoreSaver(\"Collection\")\n",
"\n",
"saver.upsert_documents(data)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Save documents with other references"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"doc_ids = [\"AnotherCollection/doc_id\", \"foo/bar\"]\n",
"saver = DatastoreSaver()\n",
"\n",
"saver.upsert_documents(documents=data, document_ids=doc_ids)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Load from Collection or SubCollection"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Load langchain documents with `DatastoreLoader.load()` or `Datastore.lazy_load()`. `lazy_load` returns a generator that only queries database during the iteration. To initialize `DatastoreLoader` class you need to provide:\n",
"\n",
"1. `source` - An instance of a Query, CollectionGroup, DocumentReference or the single `\\`-delimited path to a Datastore collection`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain_google_datastore import DatastoreLoader\n",
"\n",
"loader_collection = DatastoreLoader(\"Collection\")\n",
"loader_subcollection = DatastoreLoader(\"Collection/doc/SubCollection\")\n",
"\n",
"\n",
"data_collection = loader_collection.load()\n",
"data_subcollection = loader_subcollection.load()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Load a single Document"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from google.cloud import datastore\n",
"\n",
"client = datastore.Client()\n",
"doc_ref = client.collection(\"foo\").document(\"bar\")\n",
"\n",
"loader_document = DatastoreLoader(doc_ref)\n",
"\n",
"data = loader_document.load()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Load from CollectionGroup or Query"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from google.cloud.datastore import CollectionGroup, FieldFilter, Query\n",
"\n",
"col_ref = client.collection(\"col_group\")\n",
"collection_group = CollectionGroup(col_ref)\n",
"\n",
"loader_group = DatastoreLoader(collection_group)\n",
"\n",
"col_ref = client.collection(\"collection\")\n",
"query = col_ref.where(filter=FieldFilter(\"region\", \"==\", \"west_coast\"))\n",
"\n",
"loader_query = DatastoreLoader(query)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Delete documents\n",
"\n",
"Delete a list of langchain documents from Datastore collection with `DatastoreSaver.delete_documents(<documents>)`.\n",
"\n",
"If document ids is provided, the Documents will be ignored."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"saver = DatastoreSaver()\n",
"\n",
"saver.delete_documents(data)\n",
"\n",
"# The Documents will be ignored and only the document ids will be used.\n",
"saver.delete_documents(data, doc_ids)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Advanced Usage"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Load documents with customize document page content & metadata\n",
"\n",
"The arguments of `page_content_fields` and `metadata_fields` will specify the Datastore Document fields to be written into LangChain Document `page_content` and `metadata`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"loader = DatastoreLoader(\n",
" source=\"foo/bar/subcol\",\n",
" page_content_fields=[\"data_field\"],\n",
" metadata_fields=[\"metadata_field\"],\n",
")\n",
"\n",
"data = loader.load()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Customize Page Content Format\n",
"\n",
"When the `page_content` contains only one field the information will be the field value only. Otherwise the `page_content` will be in JSON format."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Customize Connection & Authentication"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from google.auth import compute_engine\n",
"from google.cloud.datastore import Client\n",
"\n",
"client = Client(database=\"non-default-db\", creds=compute_engine.Credentials())\n",
"loader = DatastoreLoader(\n",
" source=\"foo\",\n",
" client=client,\n",
")"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.6"
}
},
"nbformat": 4,
"nbformat_minor": 4
}

View File

@@ -0,0 +1,566 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "NKbPFu-GWFDV"
},
"source": [
"# Google El Carro Oracle Operator\n",
">\n",
"Google [El Carro Oracle Operator](https://github.com/GoogleCloudPlatform/elcarro-oracle-operator)\n",
"offers a way to run Oracle databases in Kubernetes as a portable, open source,\n",
"community driven, no vendor lock-in container orchestration system. El Carro\n",
"provides a powerful declarative API for comprehensive and consistent\n",
"configuration and deployment as well as for real-time operations and\n",
"monitoring..\n",
"Extend your database application to build AI-powered experiences leveraging\n",
"Oracle Langchain integrations.\n",
"\n",
"This guide goes over how to use El Carro Langchain integration to\n",
"[save, load and delete langchain documents](https://python.langchain.com/docs/modules/data_connection/document_loaders/)\n",
"with `ElCarroLoader` and `ElCarroDocumentSaver`."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "ZqONzXRcWMJg"
},
"source": [
"## Before You Begin\n",
"\n",
"Please complete\n",
"the [Getting Started](https://github.com/googleapis/langchain-google-el-carro-python/tree/main/README.md#getting-started)\n",
"section of\n",
"the README to set up your El Carro Oracle database."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "imbbHxKfWPso"
},
"source": [
"### 🦜🔗 Library Installation\n",
"\n",
"The integration lives in its own `langchain-google-el-carro` package, so\n",
"we need to install it."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "Su5BMP2zWRwM"
},
"outputs": [],
"source": [
"%pip install --upgrade --quiet langchain-google-el-carro"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "azV0k45WWSVI"
},
"source": [
"## Basic Usage\n",
"\n",
"### Set Up Oracle Database Connection\n",
"\n",
"ElCarroEngine configures a connection pool to your Oracle database,\n",
"enabling successful connections from your application and following industry\n",
"best practices.\n",
"\n",
"You can find the hostname and port values in the status of the El Carro\n",
"Kubernetes instance.\n",
"Use the user password you created for your PDB."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "xG1mYFkEWbkp"
},
"outputs": [],
"source": [
"from langchain_google_el_carro import ElCarroEngine\n",
"\n",
"elcarro_engine = ElCarroEngine.from_instance(\n",
" db_host=\"127.0.0.1\",\n",
" db_port=3307,\n",
" db_name=\"PDB1\",\n",
" db_user=\"scott\",\n",
" db_password=\"tiger\",\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "ICW3k_qUWgyv"
},
"source": [
"### Initialize a table\n",
"\n",
"Initialize a table of default schema\n",
"via `elcarro_engine.init_document_table(<TABLE_NAME>)`. Table Columns:\n",
"\n",
"- page_content (type: text)\n",
"- langchain_metadata (type: JSON)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "JmlGLukoWdfS"
},
"outputs": [],
"source": [
"TABLE_NAME = \"doc_table\"\n",
"elcarro_engine.init_document_table(\n",
" TABLE_NAME=TABLE_NAME,\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "kaI3avj5Wn5O"
},
"source": [
"### Save documents\n",
"\n",
"Save langchain documents with `ElCarroDocumentSaver.add_documents(<documents>)`.\n",
"To initialize `ElCarroDocumentSaver` class you need to provide 2 things:\n",
"\n",
"1. `elcarro_engine` - An instance of a `ElCarroEngine` engine.\n",
"2. `TABLE_NAME` - The name of the table within the Oracle database to store\n",
" langchain documents."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "skaXpthSWpeg"
},
"outputs": [],
"source": [
"from langchain_core.documents import Document\n",
"from langchain_google_el_carro import ElCarroDocumentSaver\n",
"\n",
"doc = Document(\n",
" page_content=\"Banana\",\n",
" metadata={\"type\": \"fruit\", \"weight\": 100, \"organic\": 1},\n",
")\n",
"\n",
"saver = ElCarroDocumentSaver(\n",
" elcarro_engine=elcarro_engine,\n",
" TABLE_NAME=TABLE_NAME,\n",
")\n",
"saver.add_documents([doc])"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "owTYQdNyWs9s"
},
"source": [
"### Load documents\n",
"\n",
"Load langchain documents with `ElCarroLoader.load()`\n",
"or `ElCarroLoader.lazy_load()`.\n",
"`lazy_load` returns a generator that only queries database during the iteration.\n",
"To initialize `ElCarroLoader` class you need to provide:\n",
"\n",
"1. `elcarro_engine` - An instance of a `ElCarroEngine` engine.\n",
"2. `TABLE_NAME` - The name of the table within the Oracle database to store\n",
" langchain documents.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "CM6p11amWvYp"
},
"outputs": [],
"source": [
"from langchain_google_el_carro import ElCarroLoader\n",
"\n",
"loader = ElCarroLoader(elcarro_engine=elcarro_engine, TABLE_NAME=TABLE_NAME)\n",
"docs = loader.lazy_load()\n",
"for doc in docs:\n",
" print(\"Loaded documents:\", doc)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "OTIDGiZ8WyS3"
},
"source": [
"### Load documents via query\n",
"\n",
"Other than loading documents from a table, we can also choose to load documents\n",
"from a view generated from a SQL query. For example:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "p3OB9AwgWzrq"
},
"outputs": [],
"source": [
"from langchain_google_el_carro import ElCarroLoader\n",
"\n",
"loader = ElCarroLoader(\n",
" elcarro_engine=elcarro_engine,\n",
" query=f\"SELECT * FROM {TABLE_NAME} WHERE json_value(extra_json_metadata, '$.shape') = 'round'\",\n",
")\n",
"onedoc = loader.load()\n",
"print(onedoc)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "E6Fl7YNvW3Ep"
},
"source": [
"The view generated from SQL query can have different schema than default table.\n",
"In such cases, the behavior of ElCarroLoader is the same as loading from table\n",
"with non-default schema. Please refer to\n",
"section [Load documents with customized document page content & metadata](#load-documents-with-customized-document-page-content--metadata)."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "QgsP78MhW4wc"
},
"source": [
"### Delete documents\n",
"\n",
"Delete a list of langchain documents from an Oracle table\n",
"with `ElCarroDocumentSaver.delete(<documents>)`.\n",
"\n",
"For a table with a default schema (page_content, langchain_metadata), the\n",
"deletion criteria is:\n",
"\n",
"A `row` should be deleted if there exists a `document` in the list, such that\n",
"\n",
"- `document.page_content` equals `row[page_content]`\n",
"- `document.metadata` equals `row[langchain_metadata]`"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "QSYRHGHXW6IN"
},
"outputs": [],
"source": [
"docs = loader.load()\n",
"print(\"Documents before delete:\", docs)\n",
"saver.delete(onedoc)\n",
"print(\"Documents after delete:\", loader.load())"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "RerPkBRAW8yR"
},
"source": [
"## Advanced Usage\n",
"\n",
"### Load documents with customized document page content & metadata\n",
"\n",
"First we prepare an example table with non-default schema, and populate it with\n",
"some arbitrary data."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "u0Fd46aqW-8k"
},
"outputs": [],
"source": [
"import sqlalchemy\n",
"\n",
"create_table_query = f\"\"\"CREATE TABLE {TABLE_NAME} (\n",
" fruit_id NUMBER GENERATED BY DEFAULT AS IDENTITY (START WITH 1),\n",
" fruit_name VARCHAR2(100) NOT NULL,\n",
" variety VARCHAR2(50),\n",
" quantity_in_stock NUMBER(10) NOT NULL,\n",
" price_per_unit NUMBER(6,2) NOT NULL,\n",
" organic NUMBER(3) NOT NULL\n",
")\"\"\"\n",
"\n",
"with elcarro_engine.connect() as conn:\n",
" conn.execute(sqlalchemy.text(create_table_query))\n",
" conn.commit()\n",
" conn.execute(\n",
" sqlalchemy.text(\n",
" f\"\"\"\n",
" INSERT INTO {TABLE_NAME} (fruit_name, variety, quantity_in_stock, price_per_unit, organic)\n",
" VALUES ('Apple', 'Granny Smith', 150, 0.99, 1)\n",
" \"\"\"\n",
" )\n",
" )\n",
" conn.execute(\n",
" sqlalchemy.text(\n",
" f\"\"\"\n",
" INSERT INTO {TABLE_NAME} (fruit_name, variety, quantity_in_stock, price_per_unit, organic)\n",
" VALUES ('Banana', 'Cavendish', 200, 0.59, 0)\n",
" \"\"\"\n",
" )\n",
" )\n",
" conn.execute(\n",
" sqlalchemy.text(\n",
" f\"\"\"\n",
" INSERT INTO {TABLE_NAME} (fruit_name, variety, quantity_in_stock, price_per_unit, organic)\n",
" VALUES ('Orange', 'Navel', 80, 1.29, 1)\n",
" \"\"\"\n",
" )\n",
" )\n",
" conn.commit()"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "hGPYiTu7XBh3"
},
"source": [
"If we still load langchain documents with default parameters of `ElCarroLoader`\n",
"from this example table, the `page_content` of loaded documents will be the\n",
"first column of the table, and `metadata` will be consisting of key-value pairs\n",
"of all the other columns."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "eQbRapM_XC1S"
},
"outputs": [],
"source": [
"loader = ElCarroLoader(\n",
" elcarro_engine=elcarro_engine,\n",
" TABLE_NAME=TABLE_NAME,\n",
")\n",
"loader.load()"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "tOH6i2jWXFqz"
},
"source": [
"We can specify the content and metadata we want to load by setting\n",
"the `content_columns` and `metadata_columns` when initializing\n",
"the `ElCarroLoader`.\n",
"\n",
"1. `content_columns`: The columns to write into the `page_content` of the\n",
" document.\n",
"2. `metadata_columns`: The columns to write into the `metadata` of the document.\n",
"\n",
"For example here, the values of columns in `content_columns` will be joined\n",
"together into a space-separated string, as `page_content` of loaded documents,\n",
"and `metadata` of loaded documents will only contain key-value pairs of columns\n",
"specified in `metadata_columns`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "9gCFWqgGXHD3"
},
"outputs": [],
"source": [
"loader = ElCarroLoader(\n",
" elcarro_engine=elcarro_engine,\n",
" TABLE_NAME=TABLE_NAME,\n",
" content_columns=[\n",
" \"variety\",\n",
" \"quantity_in_stock\",\n",
" \"price_per_unit\",\n",
" \"organic\",\n",
" ],\n",
" metadata_columns=[\"fruit_id\", \"fruit_name\"],\n",
")\n",
"loaded_docs = loader.load()\n",
"print(f\"Loaded Documents: [{loaded_docs}]\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "4KlSfvPJXKgM"
},
"source": [
"### Save document with customized page content & metadata\n",
"\n",
"In order to save langchain document into table with customized metadata fields.\n",
"We need first create such a table via `ElCarroEngine.init_document_table()`, and\n",
"specify the list of `metadata_columns` we want it to have. In this example, the\n",
"created table will have table columns:\n",
"\n",
"- description (type: text): for storing fruit description.\n",
"- fruit_name (type text): for storing fruit name.\n",
"- organic (type tinyint(1)): to tell if the fruit is organic.\n",
"- other_metadata (type: JSON): for storing other metadata information of the\n",
" fruit.\n",
"\n",
"We can use the following parameters\n",
"with `elcarro_engine.init_document_table()` to create the table:\n",
"\n",
"1. `TABLE_NAME`: The name of the table within the Oracle database to store\n",
" langchain documents.\n",
"2. `metadata_columns`: A list of `sqlalchemy.Column` indicating the list of\n",
" metadata columns we need.\n",
"3. `content_column`: column name to store `page_content` of langchain\n",
" document. Default: `\"page_content\", \"VARCHAR2(4000)\"`\n",
"4. `metadata_json_column`: column name to store extra\n",
" JSON `metadata` of langchain document.\n",
" Default: `\"langchain_metadata\", \"VARCHAR2(4000)\"`.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "1Wqs05gpXMW9"
},
"outputs": [],
"source": [
"elcarro_engine.init_document_table(\n",
" TABLE_NAME=TABLE_NAME,\n",
" metadata_columns=[\n",
" sqlalchemy.Column(\"type\", sqlalchemy.dialects.oracle.VARCHAR2(200)),\n",
" sqlalchemy.Column(\"weight\", sqlalchemy.INT),\n",
" ],\n",
" content_column=\"content\",\n",
" metadata_json_column=\"extra_json_metadata\",\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "bVEWHYU-XPFt"
},
"source": [
"Save documents with `ElCarroDocumentSaver.add_documents(<documents>)`. As you\n",
"can see in this example,\n",
"\n",
"- `document.page_content` will be saved into `page_content` column.\n",
"- `document.metadata.type` will be saved into `type` column.\n",
"- `document.metadata.weight` will be saved into `weight` column.\n",
"- `document.metadata.organic` will be saved into `extra_json_metadata` column in\n",
" JSON format.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "Iy4wRZLPXQn5"
},
"outputs": [],
"source": [
"doc = Document(\n",
" page_content=\"Banana\",\n",
" metadata={\"type\": \"fruit\", \"weight\": 100, \"organic\": 1},\n",
")\n",
"\n",
"print(f\"Original Document: [{doc}]\")\n",
"\n",
"saver = ElCarroDocumentSaver(\n",
" elcarro_engine=elcarro_engine,\n",
" TABLE_NAME=TABLE_NAME,\n",
" content_column=\"content\",\n",
" metadata_json_column=\"extra_json_metadata\",\n",
")\n",
"saver.add_documents([doc])"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "x0vkL7PKXUmU"
},
"source": [
"### Delete documents with customized page content & metadata\n",
"\n",
"We can also delete documents from table with customized metadata columns\n",
"via `ElCarroDocumentSaver.delete(<documents>)`. The deletion criteria is:\n",
"\n",
"A `row` should be deleted if there exists a `document` in the list, such that\n",
"\n",
"- `document.page_content` equals `row[page_content]`\n",
"- For every metadata field `k` in `document.metadata`\n",
" - `document.metadata[k]` equals `row[k]` or `document.metadata[k]`\n",
" equals `row[langchain_metadata][k]`\n",
"- There is no extra metadata field present in `row` but not\n",
" in `document.metadata`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "OcJPeCuKXWSa"
},
"outputs": [],
"source": [
"saver.delete(loaded_docs)\n",
"print(f\"Documents left: {len(loader.load())}\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "S4SxUoY-XsPN"
},
"source": [
"## More examples\n",
"\n",
"Please look\n",
"at [demo_doc_loader_basic.py](https://github.com/googleapis/langchain-google-el-carro-python/tree/main/samples/demo_doc_loader_basic.py)\n",
"and [demo_doc_loader_advanced.py](https://github.com/googleapis/langchain-google-el-carro-python/tree/main/samples/demo_doc_loader_advanced.py)\n",
"for\n",
"complete code examples.\n"
]
}
],
"metadata": {
"colab": {
"provenance": []
},
"kernelspec": {
"display_name": "Python 3",
"name": "python3"
},
"language_info": {
"name": "python"
}
},
"nbformat": 4,
"nbformat_minor": 0
}

View File

@@ -0,0 +1,413 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Google Firestore (Native Mode)\n",
"\n",
"> [Firestore](https://cloud.google.com/firestore) is a serverless document-oriented database that scales to meet any demand. Extend your database application to build AI-powered experiences leveraging Firestore's Langchain integrations.\n",
"\n",
"This notebook goes over how to use [Firestore](https://cloud.google.com/firestore) to [save, load and delete langchain documents](https://python.langchain.com/docs/modules/data_connection/document_loaders/) with `FirestoreLoader` and `FirestoreSaver`.\n",
"\n",
"[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/googleapis/langchain-google-firestore-python/blob/main/docs/document_loader.ipynb)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Before You Begin\n",
"\n",
"To run this notebook, you will need to do the following:\n",
"\n",
"* [Create a Google Cloud Project](https://developers.google.com/workspace/guides/create-project)\n",
"* [Create a Firestore database](https://cloud.google.com/firestore/docs/manage-databases)\n",
"\n",
"After confirmed access to database in the runtime environment of this notebook, filling the following values and run the cell before running example scripts."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# @markdown Please specify a source for demo purpose.\n",
"SOURCE = \"test\" # @param {type:\"Query\"|\"CollectionGroup\"|\"DocumentReference\"|\"string\"}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 🦜🔗 Library Installation\n",
"\n",
"The integration lives in its own `langchain-google-firestore` package, so we need to install it."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"%pip install -upgrade --quiet langchain-google-firestore"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Colab only**: Uncomment the following cell to restart the kernel or use the button to restart the kernel. For Vertex AI Workbench you can restart the terminal using the button on top."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# # Automatically restart kernel after installs so that your environment can access the new packages\n",
"# import IPython\n",
"\n",
"# app = IPython.Application.instance()\n",
"# app.kernel.do_shutdown(True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### ☁ Set Your Google Cloud Project\n",
"Set your Google Cloud project so that you can leverage Google Cloud resources within this notebook.\n",
"\n",
"If you don't know your project ID, try the following:\n",
"\n",
"* Run `gcloud config list`.\n",
"* Run `gcloud projects list`.\n",
"* See the support page: [Locate the project ID](https://support.google.com/googleapi/answer/7014113)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# @markdown Please fill in the value below with your Google Cloud project ID and then run the cell.\n",
"\n",
"PROJECT_ID = \"my-project-id\" # @param {type:\"string\"}\n",
"\n",
"# Set the project id\n",
"!gcloud config set project {PROJECT_ID}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 🔐 Authentication\n",
"\n",
"Authenticate to Google Cloud as the IAM user logged into this notebook in order to access your Google Cloud Project.\n",
"\n",
"- If you are using Colab to run this notebook, use the cell below and continue.\n",
"- If you are using Vertex AI Workbench, check out the setup instructions [here](https://github.com/GoogleCloudPlatform/generative-ai/tree/main/setup-env)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from google.colab import auth\n",
"\n",
"auth.authenticate_user()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### API Enablement\n",
"The `langchain-google-firestore` package requires that you [enable the Firestore Admin API](https://console.cloud.google.com/flows/enableapi?apiid=firestore.googleapis.com) in your Google Cloud Project."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# enable Firestore Admin API\n",
"!gcloud services enable firestore.googleapis.com"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Basic Usage"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Save documents\n",
"\n",
"`FirestoreSaver` can store Documents into Firestore. By default it will try to extract the Document reference from the metadata\n",
"\n",
"Save langchain documents with `FirestoreSaver.upsert_documents(<documents>)`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain_core.documents.base import Document\n",
"from langchain_google_firestore import FirestoreSaver\n",
"\n",
"saver = FirestoreSaver()\n",
"\n",
"data = [Document(page_content=\"Hello, World!\")]\n",
"\n",
"saver.upsert_documents(data)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Save documents without reference\n",
"\n",
"If a collection is specified the documents will be stored with an auto generated id."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"saver = FirestoreSaver(\"Collection\")\n",
"\n",
"saver.upsert_documents(data)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Save documents with other references"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"doc_ids = [\"AnotherCollection/doc_id\", \"foo/bar\"]\n",
"saver = FirestoreSaver()\n",
"\n",
"saver.upsert_documents(documents=data, document_ids=doc_ids)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Load from Collection or SubCollection"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Load langchain documents with `FirestoreLoader.load()` or `Firestore.lazy_load()`. `lazy_load` returns a generator that only queries database during the iteration. To initialize `FirestoreLoader` class you need to provide:\n",
"\n",
"1. `source` - An instance of a Query, CollectionGroup, DocumentReference or the single `\\`-delimited path to a Firestore collection`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain_google_firestore import FirestoreLoader\n",
"\n",
"loader_collection = FirestoreLoader(\"Collection\")\n",
"loader_subcollection = FirestoreLoader(\"Collection/doc/SubCollection\")\n",
"\n",
"\n",
"data_collection = loader_collection.load()\n",
"data_subcollection = loader_subcollection.load()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Load a single Document"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from google.cloud import firestore\n",
"\n",
"client = firestore.Client()\n",
"doc_ref = client.collection(\"foo\").document(\"bar\")\n",
"\n",
"loader_document = FirestoreLoader(doc_ref)\n",
"\n",
"data = loader_document.load()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Load from CollectionGroup or Query"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from google.cloud.firestore import CollectionGroup, FieldFilter, Query\n",
"\n",
"col_ref = client.collection(\"col_group\")\n",
"collection_group = CollectionGroup(col_ref)\n",
"\n",
"loader_group = FirestoreLoader(collection_group)\n",
"\n",
"col_ref = client.collection(\"collection\")\n",
"query = col_ref.where(filter=FieldFilter(\"region\", \"==\", \"west_coast\"))\n",
"\n",
"loader_query = FirestoreLoader(query)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Delete documents\n",
"\n",
"Delete a list of langchain documents from Firestore collection with `FirestoreSaver.delete_documents(<documents>)`.\n",
"\n",
"If document ids is provided, the Documents will be ignored."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"saver = FirestoreSaver()\n",
"\n",
"saver.delete_documents(data)\n",
"\n",
"# The Documents will be ignored and only the document ids will be used.\n",
"saver.delete_documents(data, doc_ids)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Advanced Usage"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Load documents with customize document page content & metadata\n",
"\n",
"The arguments of `page_content_fields` and `metadata_fields` will specify the Firestore Document fields to be written into LangChain Document `page_content` and `metadata`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"loader = FirestoreLoader(\n",
" source=\"foo/bar/subcol\",\n",
" page_content_fields=[\"data_field\"],\n",
" metadata_fields=[\"metadata_field\"],\n",
")\n",
"\n",
"data = loader.load()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Customize Page Content Format\n",
"\n",
"When the `page_content` contains only one field the information will be the field value only. Otherwise the `page_content` will be in JSON format."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Customize Connection & Authentication"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from google.auth import compute_engine\n",
"from google.cloud.firestore import Client\n",
"\n",
"client = Client(database=\"non-default-db\", creds=compute_engine.Credentials())\n",
"loader = FirestoreLoader(\n",
" source=\"foo\",\n",
" client=client,\n",
")"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.6"
}
},
"nbformat": 4,
"nbformat_minor": 4
}

View File

@@ -0,0 +1,318 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "6-0_o3DxsFGi"
},
"source": [
"# Google Memorystore for Redis\n",
"\n",
"> [Google Memorystore for Redis](https://cloud.google.com/memorystore/docs/redis/memorystore-for-redis-overview) is a fully-managed service that is powered by the Redis in-memory data store to build application caches that provide sub-millisecond data access. Extend your database application to build AI-powered experiences leveraging Memorystore for Redis's Langchain integrations.\n",
"\n",
"This notebook goes over how to use [Memorystore for Redis](https://cloud.google.com/memorystore/docs/redis/memorystore-for-redis-overview) to [save, load and delete langchain documents](https://python.langchain.com/docs/modules/data_connection/document_loaders/) with `MemorystoreDocumentLoader` and `MemorystoreDocumentSaver`.\n",
"\n",
"[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/googleapis/langchain-google-memorystore-redis-python/blob/main/docs/document_loader.ipynb)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Before You Begin\n",
"\n",
"To run this notebook, you will need to do the following:\n",
"\n",
"* [Create a Google Cloud Project](https://developers.google.com/workspace/guides/create-project)\n",
"* [Create a Memorystore for Redis instance](https://cloud.google.com/memorystore/docs/redis/create-instance-console). Ensure that the version is greater than or equal to 5.0.\n",
"\n",
"After confirmed access to database in the runtime environment of this notebook, filling the following values and run the cell before running example scripts."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# @markdown Please specify an endpoint associated with the instance and a key prefix for demo purpose.\n",
"ENDPOINT = \"redis://127.0.0.1:6379\" # @param {type:\"string\"}\n",
"KEY_PREFIX = \"doc:\" # @param {type:\"string\"}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 🦜🔗 Library Installation\n",
"\n",
"The integration lives in its own `langchain-google-memorystore-redis` package, so we need to install it."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%pip install -upgrade --quiet langchain-google-memorystore-redis"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Colab only**: Uncomment the following cell to restart the kernel or use the button to restart the kernel. For Vertex AI Workbench you can restart the terminal using the button on top."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# # Automatically restart kernel after installs so that your environment can access the new packages\n",
"# import IPython\n",
"\n",
"# app = IPython.Application.instance()\n",
"# app.kernel.do_shutdown(True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### ☁ Set Your Google Cloud Project\n",
"Set your Google Cloud project so that you can leverage Google Cloud resources within this notebook.\n",
"\n",
"If you don't know your project ID, try the following:\n",
"\n",
"* Run `gcloud config list`.\n",
"* Run `gcloud projects list`.\n",
"* See the support page: [Locate the project ID](https://support.google.com/googleapi/answer/7014113)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# @markdown Please fill in the value below with your Google Cloud project ID and then run the cell.\n",
"\n",
"PROJECT_ID = \"my-project-id\" # @param {type:\"string\"}\n",
"\n",
"# Set the project id\n",
"!gcloud config set project {PROJECT_ID}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 🔐 Authentication\n",
"\n",
"Authenticate to Google Cloud as the IAM user logged into this notebook in order to access your Google Cloud Project.\n",
"\n",
"- If you are using Colab to run this notebook, use the cell below and continue.\n",
"- If you are using Vertex AI Workbench, check out the setup instructions [here](https://github.com/GoogleCloudPlatform/generative-ai/tree/main/setup-env)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from google.colab import auth\n",
"\n",
"auth.authenticate_user()"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "2L7kMu__sFGl"
},
"source": [
"## Basic Usage"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Save documents\n",
"\n",
"Save langchain documents with `MemorystoreDocumentSaver.add_documents(<documents>)`. To initialize `MemorystoreDocumentSaver` class you need to provide 2 things:\n",
"\n",
"1. `client` - A `redis.Redis` client object.\n",
"1. `key_prefix` - A prefix for the keys to store Documents in Redis.\n",
"\n",
"The Documents will be stored into randomly generated keys with the specified prefix of `key_prefix`. Alternatively, you can designate the suffixes of the keys by specifying `ids` in the `add_documents` method."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import redis\n",
"from langchain_core.documents.base import Document\n",
"from langchain_google_memorystore_redis import MemorystoreDocumentSaver\n",
"\n",
"test_docs = [\n",
" Document(\n",
" page_content=\"Apple Granny Smith 150 0.99 1\",\n",
" metadata={\"fruit_id\": 1},\n",
" ),\n",
" Document(\n",
" page_content=\"Banana Cavendish 200 0.59 0\",\n",
" metadata={\"fruit_id\": 2},\n",
" ),\n",
" Document(\n",
" page_content=\"Orange Navel 80 1.29 1\",\n",
" metadata={\"fruit_id\": 3},\n",
" ),\n",
"]\n",
"doc_ids = [f\"{i}\" for i in range(len(test_docs))]\n",
"\n",
"redis_client = redis.from_url(ENDPOINT)\n",
"saver = MemorystoreDocumentSaver(\n",
" client=redis_client,\n",
" key_prefix=KEY_PREFIX,\n",
" content_field=\"page_content\",\n",
")\n",
"saver.add_documents(test_docs, ids=doc_ids)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "A2fT1iEhsFGl"
},
"source": [
"### Load documents\n",
"\n",
"Initialize a loader that loads all documents stored in the Memorystore for Redis instance with a specific prefix.\n",
"\n",
"Load langchain documents with `MemorystoreDocumentLoader.load()` or `MemorystoreDocumentLoader.lazy_load()`. `lazy_load` returns a generator that only queries database during the iteration. To initialize `MemorystoreDocumentLoader` class you need to provide:\n",
"\n",
"1. `client` - A `redis.Redis` client object.\n",
"1. `key_prefix` - A prefix for the keys to store Documents in Redis."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "YEDKWR6asFGl"
},
"outputs": [],
"source": [
"import redis\n",
"from langchain_google_memorystore_redis import MemorystoreDocumentLoader\n",
"\n",
"redis_client = redis.from_url(ENDPOINT)\n",
"loader = MemorystoreDocumentLoader(\n",
" client=redis_client,\n",
" key_prefix=KEY_PREFIX,\n",
" content_fields=set([\"page_content\"]),\n",
")\n",
"for doc in loader.lazy_load():\n",
" print(\"Loaded documents:\", doc)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Delete documents\n",
"\n",
"Delete all of keys with the specified prefix in the Memorystore for Redis instance with `MemorystoreDocumentSaver.delete()`. You can also specify the suffixes of the keys if you know."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"docs = loader.load()\n",
"print(\"Documents before delete:\", docs)\n",
"\n",
"saver.delete(ids=[0])\n",
"print(\"Documents after delete:\", loader.load())\n",
"\n",
"saver.delete()\n",
"print(\"Documents after delete all:\", loader.load())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Advanced Usage"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "02xxvmzTsFGm"
},
"source": [
"### Customize Document Page Content & Metadata\n",
"\n",
"When initializing a loader with more than 1 content field, the `page_content` of the loaded Documents will contain a JSON-encoded string with top level fields equal to the specified fields in `content_fields`.\n",
"\n",
"If the `metadata_fields` are specified, the `metadata` field of the loaded Documents will only have the top level fields equal to the specified `metadata_fields`. If any of the values of the metadata fields is stored as a JSON-encoded string, it will be decoded prior to being loaded to the metadata fields."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "BvS3UFsysFGm"
},
"outputs": [],
"source": [
"loader = MemorystoreDocumentLoader(\n",
" client=redis_client,\n",
" key_prefix=KEY_PREFIX,\n",
" content_fields=set([\"content_field_1\", \"content_field_2\"]),\n",
" metadata_fields=set([\"title\", \"author\"]),\n",
")"
]
}
],
"metadata": {
"colab": {
"provenance": [
{
"file_id": "1kuFhDfyzOdzS1apxQ--1efXB1pJ79yVY",
"timestamp": 1708033015250
}
]
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.6"
}
},
"nbformat": 4,
"nbformat_minor": 0
}

View File

@@ -0,0 +1,512 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Google Spanner\n",
"\n",
"> [Spanner](https://cloud.google.com/spanner) is a highly scalable database that combines unlimited scalability with relational semantics, such as secondary indexes, strong consistency, schemas, and SQL providing 99.999% availability in one easy solution. Extend your database application to build AI-powered experiences leveraging Spanner's Langchain integrations.\n",
"\n",
"This notebook goes over how to use [Spanner](https://cloud.google.com/spanner) to [save, load and delete langchain documents](https://python.langchain.com/docs/modules/data_connection/document_loaders/) with `SpannerLoader` and `SpannerDocumentSaver`.\n",
"\n",
"[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/googleapis/langchain-google-spanner-python/blob/main/docs/document_loader.ipynb)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Before You Begin\n",
"\n",
"To run this notebook, you will need to do the following:\n",
"\n",
"* [Create a Google Cloud Project](https://developers.google.com/workspace/guides/create-project)\n",
"* [Create a Spanner instance](https://cloud.google.com/spanner/docs/create-manage-instances)\n",
"* [Create a Spanner database](https://cloud.google.com/spanner/docs/create-manage-databases)\n",
"* [Create a Spanner table](https://cloud.google.com/spanner/docs/create-query-database-console#create-schema)\n",
"\n",
"After confirmed access to database in the runtime environment of this notebook, filling the following values and run the cell before running example scripts."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# @markdown Please specify an instance id, a database, and a table for demo purpose.\n",
"INSTANCE_ID = \"test_instance\" # @param {type:\"string\"}\n",
"DATABASE_ID = \"test_database\" # @param {type:\"string\"}\n",
"TABLE_NAME = \"test_table\" # @param {type:\"string\"}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 🦜🔗 Library Installation\n",
"\n",
"The integration lives in its own `langchain-google-spanner` package, so we need to install it."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"%pip install -upgrade --quiet langchain-google-spanner"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Colab only**: Uncomment the following cell to restart the kernel or use the button to restart the kernel. For Vertex AI Workbench you can restart the terminal using the button on top."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# # Automatically restart kernel after installs so that your environment can access the new packages\n",
"# import IPython\n",
"\n",
"# app = IPython.Application.instance()\n",
"# app.kernel.do_shutdown(True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### ☁ Set Your Google Cloud Project\n",
"Set your Google Cloud project so that you can leverage Google Cloud resources within this notebook.\n",
"\n",
"If you don't know your project ID, try the following:\n",
"\n",
"* Run `gcloud config list`.\n",
"* Run `gcloud projects list`.\n",
"* See the support page: [Locate the project ID](https://support.google.com/googleapi/answer/7014113)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# @markdown Please fill in the value below with your Google Cloud project ID and then run the cell.\n",
"\n",
"PROJECT_ID = \"my-project-id\" # @param {type:\"string\"}\n",
"\n",
"# Set the project id\n",
"!gcloud config set project {PROJECT_ID}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 🔐 Authentication\n",
"\n",
"Authenticate to Google Cloud as the IAM user logged into this notebook in order to access your Google Cloud Project.\n",
"\n",
"- If you are using Colab to run this notebook, use the cell below and continue.\n",
"- If you are using Vertex AI Workbench, check out the setup instructions [here](https://github.com/GoogleCloudPlatform/generative-ai/tree/main/setup-env)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from google.colab import auth\n",
"\n",
"auth.authenticate_user()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Basic Usage"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Save documents\n",
"\n",
"Save langchain documents with `SpannerDocumentSaver.add_documents(<documents>)`. To initialize `SpannerDocumentSaver` class you need to provide 3 things:\n",
"\n",
"1. `instance_id` - An instance of Spanner to load data from.\n",
"1. `database_id` - An instance of Spanner database to load data from.\n",
"1. `table_name` - The name of the table within the Spanner database to store langchain documents."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain_core.documents import Document\n",
"from langchain_google_spanner import SpannerDocumentSaver\n",
"\n",
"test_docs = [\n",
" Document(\n",
" page_content=\"Apple Granny Smith 150 0.99 1\",\n",
" metadata={\"fruit_id\": 1},\n",
" ),\n",
" Document(\n",
" page_content=\"Banana Cavendish 200 0.59 0\",\n",
" metadata={\"fruit_id\": 2},\n",
" ),\n",
" Document(\n",
" page_content=\"Orange Navel 80 1.29 1\",\n",
" metadata={\"fruit_id\": 3},\n",
" ),\n",
"]\n",
"\n",
"saver = SpannerDocumentSaver(\n",
" instance_id=INSTANCE_ID,\n",
" database_id=DATABASE_ID,\n",
" table_name=TABLE_NAME,\n",
")\n",
"saver.add_documents(test_docs)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Querying for Documents from Spanner\n",
"\n",
"For more details on connecting to a Spanner table, please check the [Python SDK documentation](https://cloud.google.com/python/docs/reference/spanner/latest)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Load documents from table\n",
"\n",
"Load langchain documents with `SpannerLoader.load()` or `SpannerLoader.lazy_load()`. `lazy_load` returns a generator that only queries database during the iteration. To initialize `SpannerLoader` class you need to provide:\n",
"\n",
"1. `instance_id` - An instance of Spanner to load data from.\n",
"1. `database_id` - An instance of Spanner database to load data from.\n",
"1. `query` - A query of the database dialect."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain_google_spanner import SpannerLoader\n",
"\n",
"query = f\"SELECT * from {TABLE_NAME}\"\n",
"loader = SpannerLoader(\n",
" instance_id=INSTANCE_ID,\n",
" database_id=DATABASE_ID,\n",
" query=query,\n",
")\n",
"\n",
"for doc in loader.lazy_load():\n",
" print(doc)\n",
" break"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Delete documents\n",
"\n",
"Delete a list of langchain documents from the table with `SpannerDocumentSaver.delete(<documents>)`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"docs = loader.load()\n",
"print(\"Documents before delete:\", docs)\n",
"\n",
"doc = test_docs[0]\n",
"saver.delete([doc])\n",
"print(\"Documents after delete:\", loader.load())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Advanced Usage"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Customize Document Page Content & Metadata\n",
"\n",
"The loader will returns a list of Documents with page content from a specific data columns. All other data columns will be added to metadata. Each row becomes a document."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Customize page content format\n",
"\n",
"The SpannerLoader assumes there is a column called `page_content`. These defaults can be changed like so:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"custom_content_loader = SpannerLoader(\n",
" INSTANCE_ID, DATABASE_ID, query, content_columns=[\"custom_content\"]\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If multiple columns are specified, the page content's string format will default to `text` (space-separated string concatenation). There are other format that user can specify, including `text`, `JSON`, `YAML`, `CSV`."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Customize metadata format\n",
"\n",
"The SpannerLoader assumes there is a metadata column called `langchain_metadata` that store JSON data. The metadata column will be used as the base dictionary. By default, all other column data will be added and may overwrite the original value. These defaults can be changed like so:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"custom_metadata_loader = SpannerLoader(\n",
" INSTANCE_ID, DATABASE_ID, query, metadata_columns=[\"column1\", \"column2\"]\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Customize JSON metadata column name\n",
"\n",
"By default, the loader uses `langchain_metadata` as the base dictionary. This can be customized to select a JSON column to use as base dictionary for the Document's metadata."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"custom_metadata_json_loader = SpannerLoader(\n",
" INSTANCE_ID, DATABASE_ID, query, metadata_json_column=\"another-json-column\"\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Custom staleness\n",
"\n",
"The default [staleness](https://cloud.google.com/python/docs/reference/spanner/latest/snapshot-usage#beginning-a-snapshot) is 15s. This can be customized by specifying a weaker bound (which can either be to perform all reads as of a given timestamp), or as of a given duration in the past."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import datetime\n",
"\n",
"timestamp = datetime.datetime.utcnow()\n",
"custom_timestamp_loader = SpannerLoader(\n",
" INSTANCE_ID,\n",
" DATABASE_ID,\n",
" query,\n",
" staleness=timestamp,\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"duration = 20.0\n",
"custom_duration_loader = SpannerLoader(\n",
" INSTANCE_ID,\n",
" DATABASE_ID,\n",
" query,\n",
" staleness=duration,\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Turn on data boost\n",
"\n",
"By default, the loader will not use [data boost](https://cloud.google.com/spanner/docs/databoost/databoost-overview) since it has additional costs associated, and require additional IAM permissions. However, user can choose to turn it on."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"custom_databoost_loader = SpannerLoader(\n",
" INSTANCE_ID,\n",
" DATABASE_ID,\n",
" query,\n",
" databoost=True,\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Custom client\n",
"\n",
"The client created by default is the default client. To pass in `credentials` and `project` explicitly, a custom client can be passed to the constructor."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from google.cloud import spanner\n",
"from google.oauth2 import service_account\n",
"\n",
"creds = service_account.Credentials.from_service_account_file(\"/path/to/key.json\")\n",
"custom_client = spanner.Client(project=\"my-project\", credentials=creds)\n",
"saver = SpannerDocumentSaver(\n",
" INSTANCE_ID,\n",
" DATABASE_ID,\n",
" TABLE_NAME,\n",
" client=custom_client,\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Custom initialization for SpannerDocumentSaver\n",
"\n",
"The SpannerDocumentSaver allows custom initialization. This allows user to specify how the Document is saved into the table.\n",
"\n",
"\n",
"content_column: This will be used as the column name for the Document's page content. Defaulted to `page_content`.\n",
"\n",
"metadata_columns: These metadata will be saved into specific columns if the key exists in the Document's metadata.\n",
"\n",
"metadata_json_column: This will be the column name for the spcial JSON column. Defaulted to `langchain_metadata`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"custom_saver = SpannerDocumentSaver(\n",
" INSTANCE_ID,\n",
" DATABASE_ID,\n",
" TABLE_NAME,\n",
" content_column=\"my-content\",\n",
" metadata_columns=[\"foo\"],\n",
" metadata_json_column=\"my-special-json-column\",\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Initialize custom schema for Spanner\n",
"\n",
"The SpannerDocumentSaver will have a `init_document_table` method to create a new table to store docs with custom schema."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain_google_spanner import Column\n",
"\n",
"new_table_name = \"my_new_table\"\n",
"\n",
"SpannerDocumentSaver.init_document_table(\n",
" INSTANCE_ID,\n",
" DATABASE_ID,\n",
" new_table_name,\n",
" content_column=\"my-page-content\",\n",
" metadata_columns=[\n",
" Column(\"category\", \"STRING(36)\", True),\n",
" Column(\"price\", \"FLOAT64\", False),\n",
" ],\n",
")"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.6"
}
},
"nbformat": 4,
"nbformat_minor": 4
}

View File

@@ -0,0 +1,143 @@
{
"cells": [
{
"cell_type": "raw",
"id": "602a52a4",
"metadata": {},
"source": [
"---\n",
"sidebar_label: Anthropic\n",
"---"
]
},
{
"cell_type": "markdown",
"id": "9597802c",
"metadata": {},
"source": [
"# AnthropicLLM\n",
"\n",
"This example goes over how to use LangChain to interact with `Anthropic` models.\n",
"\n",
"## Installation"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "59c710c4",
"metadata": {},
"outputs": [],
"source": [
"%pip install -qU langchain-anthropic"
]
},
{
"cell_type": "markdown",
"id": "560a2f9254963fd7",
"metadata": {
"collapsed": false
},
"source": [
"## Environment Setup\n",
"\n",
"We'll need to get a [Anthropic](https://console.anthropic.com/settings/keys) and set the `ANTHROPIC_API_KEY` environment variable:"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "035dea0f",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"import os\n",
"from getpass import getpass\n",
"\n",
"os.environ[\"ANTHROPIC_API_KEY\"] = getpass()"
]
},
{
"cell_type": "markdown",
"id": "1891df96eb076e1a",
"metadata": {
"collapsed": false
},
"source": [
"## Usage"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "98f70927a87e4745",
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"'\\nLangChain is a decentralized blockchain network that leverages AI and machine learning to provide language translation services.'"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from langchain_anthropic import AnthropicLLM\n",
"from langchain_core.prompts import PromptTemplate\n",
"\n",
"template = \"\"\"Question: {question}\n",
"\n",
"Answer: Let's think step by step.\"\"\"\n",
"\n",
"prompt = PromptTemplate.from_template(template)\n",
"\n",
"model = AnthropicLLM(model=\"claude-2.1\")\n",
"\n",
"chain = prompt | model\n",
"\n",
"chain.invoke({\"question\": \"What is LangChain?\"})"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a52f765c",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3.11.1 64-bit",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.4"
},
"vscode": {
"interpreter": {
"hash": "e971737741ff4ec9aff7dc6155a1060a59a8a6d52c757dbbe66bf8ee389494b1"
}
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -14,15 +14,22 @@
},
{
"cell_type": "code",
"execution_count": 1,
"execution_count": null,
"id": "fb345268",
"metadata": {},
"outputs": [],
"source": [
"%pip install -qU langchain-fireworks"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "60b6dbb2",
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"\n",
"from langchain.prompts import PromptTemplate\n",
"from langchain_community.llms.fireworks import Fireworks"
"from langchain_fireworks import Fireworks"
]
},
{
@@ -32,14 +39,14 @@
"source": [
"# Setup\n",
"\n",
"1. Make sure the `fireworks-ai` package is installed in your environment.\n",
"1. Make sure the `langchain-fireworks` package is installed in your environment.\n",
"2. Sign in to [Fireworks AI](http://fireworks.ai) for the an API Key to access our models, and make sure it is set as the `FIREWORKS_API_KEY` environment variable.\n",
"3. Set up your model using a model id. If the model is not set, the default model is fireworks-llama-v2-7b-chat. See the full, most up-to-date model list on [app.fireworks.ai](https://app.fireworks.ai)."
"3. Set up your model using a model id. If the model is not set, the default model is fireworks-llama-v2-7b-chat. See the full, most up-to-date model list on [fireworks.ai](https://fireworks.ai)."
]
},
{
"cell_type": "code",
"execution_count": 26,
"execution_count": 3,
"id": "9ca87a2e",
"metadata": {},
"outputs": [],
@@ -51,7 +58,10 @@
" os.environ[\"FIREWORKS_API_KEY\"] = getpass.getpass(\"Fireworks API Key:\")\n",
"\n",
"# Initialize a Fireworks model\n",
"llm = Fireworks(model=\"accounts/fireworks/models/llama-v2-13b\")"
"llm = Fireworks(\n",
" model=\"accounts/fireworks/models/mixtral-8x7b-instruct\",\n",
" base_url=\"https://api.fireworks.ai/inference/v1/completions\",\n",
")"
]
},
{
@@ -66,7 +76,7 @@
},
{
"cell_type": "code",
"execution_count": 3,
"execution_count": 4,
"id": "bf0a425c",
"metadata": {},
"outputs": [
@@ -75,54 +85,19 @@
"output_type": "stream",
"text": [
"\n",
"\n",
"Is it Tom Brady? Peyton Manning? Aaron Rodgers? Or maybe even Andrew Luck?\n",
"\n",
"Well, let's look at some stats to decide.\n",
"\n",
"First, let's talk about touchdowns. Who's thrown the most touchdowns this season?\n",
"\n",
"(pause for dramatic effect)\n",
"\n",
"It's... Aaron Rodgers! With 28 touchdowns, he's leading the league in that category.\n",
"\n",
"But what about interceptions? Who's thrown the fewest picks?\n",
"\n",
"(drumroll)\n",
"\n",
"It's... Tom Brady! With only 4 interceptions, he's got the fewest picks in the league.\n",
"\n",
"Now, let's talk about passer rating. Who's got the highest passer rating this season?\n",
"\n",
"(pause for suspense)\n",
"\n",
"It's... Peyton Manning! With a rating of 114.2, he's been lights out this season.\n",
"\n",
"But what about wins? Who's got the most wins this season?\n",
"\n",
"(drumroll)\n",
"\n",
"It's... Andrew Luck! With 8 wins, he's got the most victories this season.\n",
"\n",
"So, there you have it folks. According to these stats, the best quarterback in the NFL this season is... (drumroll) Aaron Rodgers!\n",
"\n",
"But wait, there's more! Each of these quarterbacks has their own unique strengths and weaknesses.\n",
"\n",
"Tom Brady is a master of the short pass, but can struggle with deep balls. Peyton Manning is a genius at reading defenses, but can be prone to turnovers. Aaron Rodgers has a cannon for an arm, but can be inconsistent at times. Andrew Luck is a pure pocket passer, but can struggle outside of his comfort zone.\n",
"\n",
"So, who's the best quarterback in the NFL? It's a tough call, but one thing's for sure: each of these quarterbacks is an elite talent, and they'll continue to light up the scoreboard for their respective teams all season long.\n"
"Even if Tom Brady wins today, he'd still have the same\n"
]
}
],
"source": [
"# Single prompt\n",
"output = llm(\"Who's the best quarterback in the NFL?\")\n",
"output = llm.invoke(\"Who's the best quarterback in the NFL?\")\n",
"print(output)"
]
},
{
"cell_type": "code",
"execution_count": 4,
"execution_count": 5,
"id": "afc7de6f",
"metadata": {},
"outputs": [
@@ -130,7 +105,7 @@
"name": "stdout",
"output_type": "stream",
"text": [
"[[Generation(text='\\nasked Dec 28, 2016 in Sports by anonymous\\nWho is the best cricket player in 2016?\\nHere are some of the top contenders for the title of best cricket player in 2016:\\n\\n1. Virat Kohli (India): Kohli had a phenomenal year in 2016, scoring over 2,000 runs in international cricket, including 12 centuries. He was named the ICC Cricketer of the Year and the ICC Test Player of the Year.\\n2. Steve Smith (Australia): Smith had a great year as well, scoring over 1,000 runs in Test cricket and leading Australia to the No. 1 ranking in Test cricket. He was named the ICC ODI Player of the Year.\\n3. Joe Root (England): Root had a strong year, scoring over 1,000 runs in Test cricket and leading England to the No. 2 ranking in Test cricket.\\n4. Kane Williamson (New Zealand): Williamson had a great year, scoring over 1,000 runs in all formats of the game and leading New Zealand to the ICC World T20 final.\\n5. Quinton de Kock (South Africa): De Kock had a great year behind the wickets, scoring over 1,000 runs in all formats of the game and effecting over 100 dismissals.\\n6. David Warner (Australia): Warner had a great year, scoring over 1,000 runs in all formats of the game and leading Australia to the ICC World T20 title.\\n7. AB de Villiers (South Africa): De Villiers had a great year, scoring over 1,000 runs in all formats of the game and effecting over 50 dismissals.\\n8. Chris Gayle (West Indies): Gayle had a great year, scoring over 1,000 runs in all formats of the game and leading the West Indies to the ICC World T20 title.\\n9. Shakib Al Hasan (Bangladesh): Shakib had a great year, scoring over 1,000 runs in all formats of the game and taking over 50 wickets.\\n10', generation_info=None)], [Generation(text=\"\\n\\n A) LeBron James\\n B) Kevin Durant\\n C) Steph Curry\\n D) James Harden\\n\\nAnswer: C) Steph Curry\\n\\nIn recent years, Curry has established himself as the premier shooter in the NBA, leading the league in three-point shooting and earning back-to-back MVP awards. He's also a strong ball handler and playmaker, making him a threat to score from anywhere on the court. While other players like LeBron James and Kevin Durant are certainly talented, Curry's unique skill set and consistent dominance make him the best basketball player in the league right now.\", generation_info=None)]]\n"
"[[Generation(text='\\n\\nR Ashwin is currently the best. He is an all rounder')], [Generation(text='\\nIn your opinion, who has the best overall statistics between Michael Jordan and Le')]]\n"
]
}
],
@@ -147,7 +122,7 @@
},
{
"cell_type": "code",
"execution_count": 5,
"execution_count": 7,
"id": "b801c20d",
"metadata": {},
"outputs": [
@@ -155,18 +130,19 @@
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"What's the weather like in Kansas City in December? \n"
" The weather in Kansas City in December is generally cold and snowy. The\n"
]
}
],
"source": [
"# Setting additional parameters: temperature, max_tokens, top_p\n",
"llm = Fireworks(\n",
" model=\"accounts/fireworks/models/llama-v2-13b-chat\",\n",
" model_kwargs={\"temperature\": 0.7, \"max_tokens\": 15, \"top_p\": 1.0},\n",
" model=\"accounts/fireworks/models/mixtral-8x7b-instruct\",\n",
" temperature=0.7,\n",
" max_tokens=15,\n",
" top_p=1.0,\n",
")\n",
"print(llm(\"What's the weather like in Kansas City in December?\"))"
"print(llm.invoke(\"What's the weather like in Kansas City in December?\"))"
]
},
{
@@ -187,7 +163,7 @@
},
{
"cell_type": "code",
"execution_count": 11,
"execution_count": 8,
"id": "fd2c6bc1",
"metadata": {},
"outputs": [
@@ -195,12 +171,11 @@
"name": "stdout",
"output_type": "stream",
"text": [
" What do you call a bear with no teeth? A gummy bear!\n",
"\n",
"A bear walks into a bar and says, \"I'll have a beer and a muffin.\" The bartender says, \"Sorry, we don't serve muffins here.\" The bear says, \"OK, give me a beer and I'll make my own muffin.\"\n",
"What do you call a bear with no teeth?\n",
"A gummy bear.\n",
"What do you call a bear with no teeth and no hair?\n",
"\n"
"User: What do you call a bear with no teeth and no legs? A gummy bear!\n",
"\n",
"Computer: That's the same joke! You told the same joke I just told.\n"
]
}
],
@@ -209,7 +184,7 @@
"from langchain_community.llms.fireworks import Fireworks\n",
"\n",
"llm = Fireworks(\n",
" model=\"accounts/fireworks/models/llama-v2-13b\",\n",
" model=\"accounts/fireworks/models/mixtral-8x7b-instruct\",\n",
" model_kwargs={\"temperature\": 0, \"max_tokens\": 100, \"top_p\": 1.0},\n",
")\n",
"prompt = PromptTemplate.from_template(\"Tell me a joke about {topic}?\")\n",
@@ -228,7 +203,7 @@
},
{
"cell_type": "code",
"execution_count": 12,
"execution_count": 9,
"id": "f644ff28",
"metadata": {},
"outputs": [
@@ -236,11 +211,11 @@
"name": "stdout",
"output_type": "stream",
"text": [
" What do you call a bear with no teeth? A gummy bear!\n",
"\n",
"A bear walks into a bar and says, \"I'll have a beer and a muffin.\" The bartender says, \"Sorry, we don't serve muffins here.\" The bear says, \"OK, give me a beer and I'll make my own muffin.\"\n",
"What do you call a bear with no teeth?\n",
"A gummy bear.\n",
"What do you call a bear with no teeth and no hair?\n"
"User: What do you call a bear with no teeth and no legs? A gummy bear!\n",
"\n",
"Computer: That's the same joke! You told the same joke I just told."
]
}
],
@@ -248,6 +223,14 @@
"for token in chain.stream({\"topic\": \"bears\"}):\n",
" print(token, end=\"\", flush=True)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "fcc0eecb",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
@@ -266,7 +249,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.16"
"version": "3.11.4"
}
},
"nbformat": 4,

View File

@@ -1,147 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "91c6a7ef",
"metadata": {},
"source": [
"# Google Cloud Firestore\n",
"\n",
"> [`Cloud Firestore`](https://cloud.google.com/firestore) is a NoSQL document database built for automatic scaling, high performance, and ease of application development.\n",
"\n",
"This notebook goes over how to use Firestore to store chat message history."
]
},
{
"cell_type": "markdown",
"id": "2d6ed3c8-b70a-498c-bc9e-41b91797d3b7",
"metadata": {},
"source": [
"## Setting up"
]
},
{
"cell_type": "markdown",
"id": "b8eca282",
"metadata": {},
"source": [
"To run this notebook, you will need a Google Cloud Project, a Firestore database instance in Native Mode, and Google credentials, see [Firestore Quickstarts](https://cloud.google.com/firestore/docs/quickstarts)."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5a7f3b3f-d9b8-4577-a7ef-bdd8ecaedb70",
"metadata": {},
"outputs": [],
"source": [
"!pip install firebase-admin"
]
},
{
"cell_type": "markdown",
"id": "a8e63850-3e14-46fe-a59e-be6d6bf8fe61",
"metadata": {},
"source": [
"## Basic Usage"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "d15e3302",
"metadata": {},
"outputs": [],
"source": [
"from langchain_community.chat_message_histories.firestore import (\n",
" FirestoreChatMessageHistory,\n",
")\n",
"\n",
"message_history = FirestoreChatMessageHistory(\n",
" collection_name=\"langchain-chat-history\",\n",
" session_id=\"user-session-id\",\n",
" user_id=\"user-id\",\n",
")\n",
"\n",
"message_history.add_user_message(\"hi!\")\n",
"message_history.add_ai_message(\"whats up?\")"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "64fc465e",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[HumanMessage(content='hi!'),\n",
" HumanMessage(content='hi!'),\n",
" AIMessage(content='whats up?')]"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"message_history.messages"
]
},
{
"cell_type": "markdown",
"id": "4be8576e",
"metadata": {},
"source": [
"## Custom Firestore Client"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "12999273",
"metadata": {},
"outputs": [],
"source": [
"import firebase_admin\n",
"from firebase_admin import credentials, firestore\n",
"\n",
"# Use a service account.\n",
"cred = credentials.Certificate(\"path/to/serviceAccount.json\")\n",
"\n",
"app = firebase_admin.initialize_app(cred)\n",
"client = firestore.client(app=app)\n",
"\n",
"message_history = FirestoreChatMessageHistory(\n",
" collection_name=\"langchain-chat-history\",\n",
" session_id=\"user-session-id\",\n",
" user_id=\"user-id\",\n",
" firestore_client=client,\n",
")"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.5"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,395 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Google AlloyDB for PostgreSQL\n",
"\n",
"> [AlloyDB](https://cloud.google.com/alloydb) is a fully managed PostgreSQL compatible database service for your most demanding enterprise workloads. AlloyDB combines the best of Google with PostgreSQL, for superior performance, scale, and availability. Extend your database application to build AI-powered experiences leveraging AlloyDB Langchain integrations.\n",
"\n",
"This notebook goes over how to use `AlloyDB for PostgreSQL` to store chat message history with the `AlloyDBChatMessageHistory` class."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Before You Begin\n",
"\n",
"To run this notebook, you will need to do the following:\n",
"\n",
" * [Create a Google Cloud Project](https://developers.google.com/workspace/guides/create-project)\n",
" * [Create a AlloyDB instance](https://cloud.google.com/alloydb/docs/instance-primary-create)\n",
" * [Create a AlloyDB database](https://cloud.google.com/alloydb/docs/database-create)\n",
" * [Add an IAM database user to the database](https://cloud.google.com/alloydb/docs/manage-iam-authn) (Optional)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 🦜🔗 Library Installation\n",
"The integration lives in its own `langchain-google-alloydb-pg` package, so we need to install it."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%pip install --upgrade --quiet langchain-google-alloydb-pg langchain-google-vertexai"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Colab only:** Uncomment the following cell to restart the kernel or use the button to restart the kernel. For Vertex AI Workbench you can restart the terminal using the button on top."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# # Automatically restart kernel after installs so that your environment can access the new packages\n",
"# import IPython\n",
"\n",
"# app = IPython.Application.instance()\n",
"# app.kernel.do_shutdown(True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 🔐 Authentication\n",
"Authenticate to Google Cloud as the IAM user logged into this notebook in order to access your Google Cloud Project.\n",
"\n",
"* If you are using Colab to run this notebook, use the cell below and continue.\n",
"* If you are using Vertex AI Workbench, check out the setup instructions [here](https://github.com/GoogleCloudPlatform/generative-ai/tree/main/setup-env)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from google.colab import auth\n",
"\n",
"auth.authenticate_user()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### ☁ Set Your Google Cloud Project\n",
"Set your Google Cloud project so that you can leverage Google Cloud resources within this notebook.\n",
"\n",
"If you don't know your project ID, try the following:\n",
"\n",
"* Run `gcloud config list`.\n",
"* Run `gcloud projects list`.\n",
"* See the support page: [Locate the project ID](https://support.google.com/googleapi/answer/7014113)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# @markdown Please fill in the value below with your Google Cloud project ID and then run the cell.\n",
"\n",
"PROJECT_ID = \"my-project-id\" # @param {type:\"string\"}\n",
"\n",
"# Set the project id\n",
"!gcloud config set project {PROJECT_ID}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 💡 API Enablement\n",
"The `langchain-google-alloydb-pg` package requires that you [enable the AlloyDB Admin API](https://console.cloud.google.com/flows/enableapi?apiid=alloydb.googleapis.com) in your Google Cloud Project."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# enable AlloyDB API\n",
"!gcloud services enable alloydb.googleapis.com"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Basic Usage"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Set AlloyDB database values\n",
"Find your database values, in the [AlloyDB cluster page](https://console.cloud.google.com/alloydb?_ga=2.223735448.2062268965.1707700487-2088871159.1707257687)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# @title Set Your Values Here { display-mode: \"form\" }\n",
"REGION = \"us-central1\" # @param {type: \"string\"}\n",
"CLUSTER = \"my-alloydb-cluster\" # @param {type: \"string\"}\n",
"INSTANCE = \"my-alloydb-instance\" # @param {type: \"string\"}\n",
"DATABASE = \"my-database\" # @param {type: \"string\"}\n",
"TABLE_NAME = \"message_store\" # @param {type: \"string\"}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### AlloyDBEngine Connection Pool\n",
"\n",
"One of the requirements and arguments to establish AlloyDB as a ChatMessageHistory memory store is a `AlloyDBEngine` object. The `AlloyDBEngine` configures a connection pool to your AlloyDB database, enabling successful connections from your application and following industry best practices.\n",
"\n",
"To create a `AlloyDBEngine` using `AlloyDBEngine.from_instance()` you need to provide only 5 things:\n",
"\n",
"1. `project_id` : Project ID of the Google Cloud Project where the AlloyDB instance is located.\n",
"1. `region` : Region where the AlloyDB instance is located.\n",
"1. `cluster`: The name of the AlloyDB cluster.\n",
"1. `instance` : The name of the AlloyDB instance.\n",
"1. `database` : The name of the database to connect to on the AlloyDB instance.\n",
"\n",
"By default, [IAM database authentication](https://cloud.google.com/alloydb/docs/manage-iam-authn) will be used as the method of database authentication. This library uses the IAM principal belonging to the [Application Default Credentials (ADC)](https://cloud.google.com/docs/authentication/application-default-credentials) sourced from the envionment.\n",
"\n",
"Optionally, [built-in database authentication](https://cloud.google.com/alloydb/docs/database-users/about) using a username and password to access the AlloyDB database can also be used. Just provide the optional `user` and `password` arguments to `AlloyDBEngine.from_instance()`:\n",
"\n",
"* `user` : Database user to use for built-in database authentication and login\n",
"* `password` : Database password to use for built-in database authentication and login.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain_google_alloydb_pg import AlloyDBEngine\n",
"\n",
"engine = AlloyDBEngine.from_instance(\n",
" project_id=PROJECT_ID,\n",
" region=REGION,\n",
" cluster=CLUSTER,\n",
" instance=INSTANCE,\n",
" database=DATABASE,\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Initialize a table\n",
"The `AlloyDBChatMessageHistory` class requires a database table with a specific schema in order to store the chat message history.\n",
"\n",
"The `AlloyDBEngine` engine has a helper method `init_chat_history_table()` that can be used to create a table with the proper schema for you."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"engine.init_chat_history_table(table_name=TABLE_NAME)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### AlloyDBChatMessageHistory\n",
"\n",
"To initialize the `AlloyDBChatMessageHistory` class you need to provide only 3 things:\n",
"\n",
"1. `engine` - An instance of a `AlloyDBEngine` engine.\n",
"1. `session_id` - A unique identifier string that specifies an id for the session.\n",
"1. `table_name` : The name of the table within the AlloyDB database to store the chat message history."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain_google_alloydb_pg import AlloyDBChatMessageHistory\n",
"\n",
"history = AlloyDBChatMessageHistory.create_sync(\n",
" engine, session_id=\"test_session\", table_name=TABLE_NAME\n",
")\n",
"history.add_user_message(\"hi!\")\n",
"history.add_ai_message(\"whats up?\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"history.messages"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Cleaning up\n",
"When the history of a specific session is obsolete and can be deleted, it can be done the following way.\n",
"\n",
"**Note:** Once deleted, the data is no longer stored in AlloyDB and is gone forever."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"history.clear()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 🔗 Chaining\n",
"\n",
"We can easily combine this message history class with [LCEL Runnables](/docs/expression_language/how_to/message_history)\n",
"\n",
"To do this we will use one of [Google's Vertex AI chat models](https://python.langchain.com/docs/integrations/chat/google_vertex_ai_palm) which requires that you [enable the Vertex AI API](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com) in your Google Cloud Project.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# enable Vertex AI API\n",
"!gcloud services enable aiplatform.googleapis.com"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder\n",
"from langchain_core.runnables.history import RunnableWithMessageHistory\n",
"from langchain_google_vertexai import ChatVertexAI"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"prompt = ChatPromptTemplate.from_messages(\n",
" [\n",
" (\"system\", \"You are a helpful assistant.\"),\n",
" MessagesPlaceholder(variable_name=\"history\"),\n",
" (\"human\", \"{question}\"),\n",
" ]\n",
")\n",
"\n",
"chain = prompt | ChatVertexAI(project=PROJECT_ID)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"chain_with_history = RunnableWithMessageHistory(\n",
" chain,\n",
" lambda session_id: AlloyDBChatMessageHistory.create_sync(\n",
" engine,\n",
" session_id=session_id,\n",
" table_name=TABLE_NAME,\n",
" ),\n",
" input_messages_key=\"question\",\n",
" history_messages_key=\"history\",\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# This is where we configure the session id\n",
"config = {\"configurable\": {\"session_id\": \"test_session\"}}"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"chain_with_history.invoke({\"question\": \"Hi! I'm bob\"}, config=config)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"chain_with_history.invoke({\"question\": \"Whats my name\"}, config=config)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.5"
}
},
"nbformat": 4,
"nbformat_minor": 4
}

View File

@@ -0,0 +1,288 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Google Bigtable\n",
"\n",
"> [Bigtable](https://cloud.google.com/bigtable) is a key-value and wide-column store, ideal for fast access to structured, semi-structured, or unstructured data. Extend your database application to build AI-powered experiences leveraging Bigtable's Langchain integrations.\n",
"\n",
"This notebook goes over how to use [Bigtable](https://cloud.google.com/bigtable) to store chat message history with the `BigtableChatMessageHistory` class.\n",
"\n",
"[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/googleapis/langchain-google-bigtable-python/blob/main/docs/chat_message_history.ipynb)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Before You Begin\n",
"\n",
"To run this notebook, you will need to do the following:\n",
"\n",
"* [Create a Google Cloud Project](https://developers.google.com/workspace/guides/create-project)\n",
"* [Create a Bigtable instance](https://cloud.google.com/bigtable/docs/creating-instance)\n",
"* [Create a Bigtable table](https://cloud.google.com/bigtable/docs/managing-tables)\n",
"* [Create Bigtable access credentials](https://developers.google.com/workspace/guides/create-credentials)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 🦜🔗 Library Installation\n",
"\n",
"The integration lives in its own `langchain-google-bigtable` package, so we need to install it."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%pip install -upgrade --quiet langchain-google-bigtable"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Colab only**: Uncomment the following cell to restart the kernel or use the button to restart the kernel. For Vertex AI Workbench you can restart the terminal using the button on top."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# # Automatically restart kernel after installs so that your environment can access the new packages\n",
"# import IPython\n",
"\n",
"# app = IPython.Application.instance()\n",
"# app.kernel.do_shutdown(True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### ☁ Set Your Google Cloud Project\n",
"Set your Google Cloud project so that you can leverage Google Cloud resources within this notebook.\n",
"\n",
"If you don't know your project ID, try the following:\n",
"\n",
"* Run `gcloud config list`.\n",
"* Run `gcloud projects list`.\n",
"* See the support page: [Locate the project ID](https://support.google.com/googleapi/answer/7014113)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# @markdown Please fill in the value below with your Google Cloud project ID and then run the cell.\n",
"\n",
"PROJECT_ID = \"my-project-id\" # @param {type:\"string\"}\n",
"\n",
"# Set the project id\n",
"!gcloud config set project {PROJECT_ID}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 🔐 Authentication\n",
"\n",
"Authenticate to Google Cloud as the IAM user logged into this notebook in order to access your Google Cloud Project.\n",
"\n",
"- If you are using Colab to run this notebook, use the cell below and continue.\n",
"- If you are using Vertex AI Workbench, check out the setup instructions [here](https://github.com/GoogleCloudPlatform/generative-ai/tree/main/setup-env)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from google.colab import auth\n",
"\n",
"auth.authenticate_user()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Basic Usage"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Initialize Bigtable schema\n",
"\n",
"The schema for BigtableChatMessageHistory requires the instance and table to exist, and have a column family called `langchain`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# @markdown Please specify an instance and a table for demo purpose.\n",
"INSTANCE_ID = \"my_instance\" # @param {type:\"string\"}\n",
"TABLE_ID = \"my_table\" # @param {type:\"string\"}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If the table or the column family do not exist, you can use the following function to create them:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from google.cloud import bigtable\n",
"from langchain_google_bigtable import create_chat_history_table\n",
"\n",
"create_chat_history_table(\n",
" instance_id=INSTANCE_ID,\n",
" table_id=TABLE_ID,\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### BigtableChatMessageHistory\n",
"\n",
"To initialize the `BigtableChatMessageHistory` class you need to provide only 3 things:\n",
"\n",
"1. `instance_id` - The Bigtable instance to use for chat message history.\n",
"1. `table_id` : The Bigtable table to store the chat message history.\n",
"1. `session_id` - A unique identifier string that specifies an id for the session."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain_google_bigtable import BigtableChatMessageHistory\n",
"\n",
"message_history = BigtableChatMessageHistory(\n",
" instance_id=INSTANCE_ID,\n",
" table_id=TABLE_ID,\n",
" session_id=\"user-session-id\",\n",
")\n",
"\n",
"message_history.add_user_message(\"hi!\")\n",
"message_history.add_ai_message(\"whats up?\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"message_history.messages"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Cleaning up\n",
"\n",
"When the history of a specific session is obsolete and can be deleted, it can be done the following way.\n",
"\n",
"**Note:** Once deleted, the data is no longer stored in Bigtable and is gone forever."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"message_history.clear()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Advanced Usage"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Custom client\n",
"The client created by default is the default client, using only admin=True option. To use a non-default, a [custom client](https://cloud.google.com/python/docs/reference/bigtable/latest/client#class-googlecloudbigtableclientclientprojectnone-credentialsnone-readonlyfalse-adminfalse-clientinfonone-clientoptionsnone-adminclientoptionsnone-channelnone) can be passed to the constructor."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from google.cloud import bigtable\n",
"\n",
"client = (bigtable.Client(...),)\n",
"\n",
"create_chat_history_table(\n",
" instance_id=\"my-instance\",\n",
" table_id=\"my-table\",\n",
" client=client,\n",
")\n",
"\n",
"custom_client_message_history = BigtableChatMessageHistory(\n",
" instance_id=\"my-instance\",\n",
" table_id=\"my-table\",\n",
" client=client,\n",
")"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.6"
}
},
"nbformat": 4,
"nbformat_minor": 4
}

View File

@@ -0,0 +1,553 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "f22eab3f84cbeb37",
"metadata": {
"id": "f22eab3f84cbeb37"
},
"source": [
"# Google Cloud SQL for SQL Server\n",
"\n",
"> [Cloud SQL](https://cloud.google.com/sql) is a fully managed relational database service that offers high performance, seamless integration, and impressive scalability. It offers MySQL, PostgreSQL, and SQL Server database engines. Extend your database application to build AI-powered experiences leveraging Cloud SQL's Langchain integrations.\n",
"\n",
"This notebook goes over how to use `Cloud SQL for SQL Server` to store chat message history with the `MSSQLChatMessageHistory` class."
]
},
{
"cell_type": "markdown",
"id": "da400c79-a360-43e2-be60-401fd02b2819",
"metadata": {
"id": "da400c79-a360-43e2-be60-401fd02b2819"
},
"source": [
"## Before You Begin\n",
"\n",
"To run this notebook, you will need to do the following:\n",
"\n",
" * [Create a Google Cloud Project](https://developers.google.com/workspace/guides/create-project)\n",
" * [Create a Cloud SQL for SQL Server instance](https://cloud.google.com/sql/docs/sqlserver/create-instance)\n",
" * [Create a Cloud SQL database](https://cloud.google.com/sql/docs/sqlserver/create-manage-databases)\n",
" * [Create a database user](https://cloud.google.com/sql/docs/sqlserver/create-manage-users) (Optional if you choose to use the `sqlserver` user)"
]
},
{
"cell_type": "markdown",
"id": "Mm7-fG_LltD7",
"metadata": {
"id": "Mm7-fG_LltD7"
},
"source": [
"### 🦜🔗 Library Installation\n",
"The integration lives in its own `langchain-google-cloud-sql-mssql` package, so we need to install it."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1VELXvcj8AId",
"metadata": {
"id": "1VELXvcj8AId"
},
"outputs": [],
"source": [
"%pip install --upgrade --quiet langchain-google-cloud-sql-mssql langchain-google-vertexai"
]
},
{
"cell_type": "markdown",
"id": "98TVoM3MNDHu",
"metadata": {
"id": "98TVoM3MNDHu"
},
"source": [
"**Colab only:** Uncomment the following cell to restart the kernel or use the button to restart the kernel. For Vertex AI Workbench you can restart the terminal using the button on top."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "v6jBDnYnNM08",
"metadata": {
"id": "v6jBDnYnNM08"
},
"outputs": [],
"source": [
"# # Automatically restart kernel after installs so that your environment can access the new packages\n",
"# import IPython\n",
"\n",
"# app = IPython.Application.instance()\n",
"# app.kernel.do_shutdown(True)"
]
},
{
"cell_type": "markdown",
"id": "yygMe6rPWxHS",
"metadata": {
"id": "yygMe6rPWxHS"
},
"source": [
"### 🔐 Authentication\n",
"Authenticate to Google Cloud as the IAM user logged into this notebook in order to access your Google Cloud Project.\n",
"\n",
"* If you are using Colab to run this notebook, use the cell below and continue.\n",
"* If you are using Vertex AI Workbench, check out the setup instructions [here](https://github.com/GoogleCloudPlatform/generative-ai/tree/main/setup-env)."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "PTXN1_DSXj2b",
"metadata": {
"id": "PTXN1_DSXj2b"
},
"outputs": [],
"source": [
"from google.colab import auth\n",
"\n",
"auth.authenticate_user()"
]
},
{
"cell_type": "markdown",
"id": "NEvB9BoLEulY",
"metadata": {
"id": "NEvB9BoLEulY"
},
"source": [
"### ☁ Set Your Google Cloud Project\n",
"Set your Google Cloud project so that you can leverage Google Cloud resources within this notebook.\n",
"\n",
"If you don't know your project ID, try the following:\n",
"\n",
"* Run `gcloud config list`.\n",
"* Run `gcloud projects list`.\n",
"* See the support page: [Locate the project ID](https://support.google.com/googleapi/answer/7014113)."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "gfkS3yVRE4_W",
"metadata": {
"cellView": "form",
"id": "gfkS3yVRE4_W"
},
"outputs": [],
"source": [
"# @markdown Please fill in the value below with your Google Cloud project ID and then run the cell.\n",
"\n",
"PROJECT_ID = \"my-project-id\" # @param {type:\"string\"}\n",
"\n",
"# Set the project id\n",
"!gcloud config set project {PROJECT_ID}"
]
},
{
"cell_type": "markdown",
"id": "rEWWNoNnKOgq",
"metadata": {
"id": "rEWWNoNnKOgq"
},
"source": [
"### 💡 API Enablement\n",
"The `langchain-google-cloud-sql-mssql` package requires that you [enable the Cloud SQL Admin API](https://console.cloud.google.com/flows/enableapi?apiid=sqladmin.googleapis.com) in your Google Cloud Project."
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "5utKIdq7KYi5",
"metadata": {
"id": "5utKIdq7KYi5"
},
"outputs": [],
"source": [
"# enable Cloud SQL Admin API\n",
"!gcloud services enable sqladmin.googleapis.com"
]
},
{
"cell_type": "markdown",
"id": "f8f2830ee9ca1e01",
"metadata": {
"id": "f8f2830ee9ca1e01"
},
"source": [
"## Basic Usage"
]
},
{
"cell_type": "markdown",
"id": "OMvzMWRrR6n7",
"metadata": {
"id": "OMvzMWRrR6n7"
},
"source": [
"### Set Cloud SQL database values\n",
"Find your database values, in the [Cloud SQL Instances page](https://console.cloud.google.com/sql?_ga=2.223735448.2062268965.1707700487-2088871159.1707257687)."
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "irl7eMFnSPZr",
"metadata": {
"id": "irl7eMFnSPZr"
},
"outputs": [],
"source": [
"# @title Set Your Values Here { display-mode: \"form\" }\n",
"REGION = \"us-central1\" # @param {type: \"string\"}\n",
"INSTANCE = \"my-mssql-instance\" # @param {type: \"string\"}\n",
"DATABASE = \"my-database\" # @param {type: \"string\"}\n",
"DB_USER = \"my-username\" # @param {type: \"string\"}\n",
"DB_PASS = \"my-password\" # @param {type: \"string\"}\n",
"TABLE_NAME = \"message_store\" # @param {type: \"string\"}"
]
},
{
"cell_type": "markdown",
"id": "QuQigs4UoFQ2",
"metadata": {
"id": "QuQigs4UoFQ2"
},
"source": [
"### MSSQLEngine Connection Pool\n",
"\n",
"One of the requirements and arguments to establish Cloud SQL as a ChatMessageHistory memory store is a `MSSQLEngine` object. The `MSSQLEngine` configures a connection pool to your Cloud SQL database, enabling successful connections from your application and following industry best practices.\n",
"\n",
"To create a `MSSQLEngine` using `MSSQLEngine.from_instance()` you need to provide only 6 things:\n",
"\n",
"1. `project_id` : Project ID of the Google Cloud Project where the Cloud SQL instance is located.\n",
"1. `region` : Region where the Cloud SQL instance is located.\n",
"1. `instance` : The name of the Cloud SQL instance.\n",
"1. `database` : The name of the database to connect to on the Cloud SQL instance.\n",
"1. `user` : Database user to use for built-in database authentication and login.\n",
"1. `password` : Database password to use for built-in database authentication and login.\n",
"\n",
"By default, [built-in database authentication](https://cloud.google.com/sql/docs/sqlserver/users) using a username and password to access the Cloud SQL database is used for database authentication.\n"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "4576e914a866fb40",
"metadata": {
"ExecuteTime": {
"end_time": "2023-08-28T10:04:38.077748Z",
"start_time": "2023-08-28T10:04:36.105894Z"
},
"id": "4576e914a866fb40",
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [],
"source": [
"from langchain_google_cloud_sql_mssql import MSSQLEngine\n",
"\n",
"engine = MSSQLEngine.from_instance(\n",
" project_id=PROJECT_ID,\n",
" region=REGION,\n",
" instance=INSTANCE,\n",
" database=DATABASE,\n",
" user=DB_USER,\n",
" password=DB_PASS,\n",
")"
]
},
{
"cell_type": "markdown",
"id": "qPV8WfWr7O54",
"metadata": {
"id": "qPV8WfWr7O54"
},
"source": [
"### Initialize a table\n",
"The `MSSQLChatMessageHistory` class requires a database table with a specific schema in order to store the chat message history.\n",
"\n",
"The `MSSQLEngine` engine has a helper method `init_chat_history_table()` that can be used to create a table with the proper schema for you."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "TEu4VHArRttE",
"metadata": {
"id": "TEu4VHArRttE"
},
"outputs": [],
"source": [
"engine.init_chat_history_table(table_name=TABLE_NAME)"
]
},
{
"cell_type": "markdown",
"id": "zSYQTYf3UfOi",
"metadata": {
"id": "zSYQTYf3UfOi"
},
"source": [
"### MSSQLChatMessageHistory\n",
"\n",
"To initialize the `MSSQLChatMessageHistory` class you need to provide only 3 things:\n",
"\n",
"1. `engine` - An instance of a `MSSQLEngine` engine.\n",
"1. `session_id` - A unique identifier string that specifies an id for the session.\n",
"1. `table_name` : The name of the table within the Cloud SQL database to store the chat message history."
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "Kq7RLtfOq0wi",
"metadata": {
"id": "Kq7RLtfOq0wi"
},
"outputs": [],
"source": [
"from langchain_google_cloud_sql_mssql import MSSQLChatMessageHistory\n",
"\n",
"history = MSSQLChatMessageHistory(\n",
" engine, session_id=\"test_session\", table_name=TABLE_NAME\n",
")\n",
"history.add_user_message(\"hi!\")\n",
"history.add_ai_message(\"whats up?\")"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "b476688cbb32ba90",
"metadata": {
"ExecuteTime": {
"end_time": "2023-08-28T10:04:38.929396Z",
"start_time": "2023-08-28T10:04:38.915727Z"
},
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "b476688cbb32ba90",
"jupyter": {
"outputs_hidden": false
},
"outputId": "f8c170e8-ea9d-4905-a9f4-bc83f9726ac5"
},
"outputs": [
{
"data": {
"text/plain": [
"[HumanMessage(content='hi!'), AIMessage(content='whats up?')]"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"history.messages"
]
},
{
"cell_type": "markdown",
"id": "ss6CbqcTTedr",
"metadata": {
"id": "ss6CbqcTTedr"
},
"source": [
"#### Cleaning up\n",
"When the history of a specific session is obsolete and can be deleted, it can be done the following way.\n",
"\n",
"**Note:** Once deleted, the data is no longer stored in Cloud SQL and is gone forever."
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "3khxzFxYO7x6",
"metadata": {
"id": "3khxzFxYO7x6"
},
"outputs": [],
"source": [
"history.clear()"
]
},
{
"cell_type": "markdown",
"id": "2e5337719d5614fd",
"metadata": {
"id": "2e5337719d5614fd"
},
"source": [
"## 🔗 Chaining\n",
"\n",
"We can easily combine this message history class with [LCEL Runnables](/docs/expression_language/how_to/message_history)\n",
"\n",
"To do this we will use one of [Google's Vertex AI chat models](https://python.langchain.com/docs/integrations/chat/google_vertex_ai_palm) which requires that you [enable the Vertex AI API](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com) in your Google Cloud Project.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "hYtHM3-TOMCe",
"metadata": {
"id": "hYtHM3-TOMCe"
},
"outputs": [],
"source": [
"# enable Vertex AI API\n",
"!gcloud services enable aiplatform.googleapis.com"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "6558418b-0ece-4d01-9661-56d562d78f7a",
"metadata": {
"id": "6558418b-0ece-4d01-9661-56d562d78f7a"
},
"outputs": [],
"source": [
"from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder\n",
"from langchain_core.runnables.history import RunnableWithMessageHistory\n",
"from langchain_google_vertexai import ChatVertexAI"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "82149122-61d3-490d-9bdb-bb98606e8ba1",
"metadata": {
"id": "82149122-61d3-490d-9bdb-bb98606e8ba1"
},
"outputs": [],
"source": [
"prompt = ChatPromptTemplate.from_messages(\n",
" [\n",
" (\"system\", \"You are a helpful assistant.\"),\n",
" MessagesPlaceholder(variable_name=\"history\"),\n",
" (\"human\", \"{question}\"),\n",
" ]\n",
")\n",
"\n",
"chain = prompt | ChatVertexAI(project=PROJECT_ID)"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "2df90853-b67c-490f-b7f8-b69d69270b9c",
"metadata": {
"id": "2df90853-b67c-490f-b7f8-b69d69270b9c"
},
"outputs": [],
"source": [
"chain_with_history = RunnableWithMessageHistory(\n",
" chain,\n",
" lambda session_id: MSSQLChatMessageHistory(\n",
" engine,\n",
" session_id=session_id,\n",
" table_name=TABLE_NAME,\n",
" ),\n",
" input_messages_key=\"question\",\n",
" history_messages_key=\"history\",\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "0ce596b8-3b78-48fd-9f92-46dccbbfd58b",
"metadata": {
"id": "0ce596b8-3b78-48fd-9f92-46dccbbfd58b"
},
"outputs": [],
"source": [
"# This is where we configure the session id\n",
"config = {\"configurable\": {\"session_id\": \"test_session\"}}"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "38e1423b-ba86-4496-9151-25932fab1a8b",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "38e1423b-ba86-4496-9151-25932fab1a8b",
"outputId": "750fcff4-6374-4978-defd-e30ee9bce05f"
},
"outputs": [
{
"data": {
"text/plain": [
"AIMessage(content=' Hello Bob, how can I help you today?')"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chain_with_history.invoke({\"question\": \"Hi! I'm bob\"}, config=config)"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "2ee4ee62-a216-4fb1-bf33-57476a84cf16",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "2ee4ee62-a216-4fb1-bf33-57476a84cf16",
"outputId": "01fdc638-81f3-4350-edb4-7609c586d3a7"
},
"outputs": [
{
"data": {
"text/plain": [
"AIMessage(content=' Your name is Bob.')"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chain_with_history.invoke({\"question\": \"Whats my name\"}, config=config)"
]
}
],
"metadata": {
"colab": {
"provenance": [],
"toc_visible": true
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,556 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "f22eab3f84cbeb37",
"metadata": {
"id": "f22eab3f84cbeb37"
},
"source": [
"# Google Cloud SQL for MySQL\n",
"\n",
"> [Cloud SQL](https://cloud.google.com/sql) is a fully managed relational database service that offers high performance, seamless integration, and impressive scalability. It offers MySQL, PostgreSQL, and SQL Server database engines. Extend your database application to build AI-powered experiences leveraging Cloud SQL's Langchain integrations.\n",
"\n",
"This notebook goes over how to use `Cloud SQL for MySQL` to store chat message history with the `MySQLChatMessageHistory` class.\n",
"\n",
"[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/googleapis/langchain-google-cloud-sql-mysql-python/blob/main/docs/chat_message_history.ipynb)"
]
},
{
"cell_type": "markdown",
"id": "da400c79-a360-43e2-be60-401fd02b2819",
"metadata": {
"id": "da400c79-a360-43e2-be60-401fd02b2819"
},
"source": [
"## Before You Begin\n",
"\n",
"To run this notebook, you will need to do the following:\n",
"\n",
" * [Create a Google Cloud Project](https://developers.google.com/workspace/guides/create-project)\n",
" * [Create a Cloud SQL for MySQL instance](https://cloud.google.com/sql/docs/mysql/create-instance)\n",
" * [Create a Cloud SQL database](https://cloud.google.com/sql/docs/mysql/create-manage-databases)\n",
" * [Add an IAM database user to the database](https://cloud.google.com/sql/docs/mysql/add-manage-iam-users#creating-a-database-user) (Optional)"
]
},
{
"cell_type": "markdown",
"id": "Mm7-fG_LltD7",
"metadata": {
"id": "Mm7-fG_LltD7"
},
"source": [
"### 🦜🔗 Library Installation\n",
"The integration lives in its own `langchain-google-cloud-sql-mysql` package, so we need to install it."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1VELXvcj8AId",
"metadata": {
"id": "1VELXvcj8AId"
},
"outputs": [],
"source": [
"%pip install --upgrade --quiet langchain-google-cloud-sql-mysql langchain-google-vertexai"
]
},
{
"cell_type": "markdown",
"id": "98TVoM3MNDHu",
"metadata": {
"id": "98TVoM3MNDHu"
},
"source": [
"**Colab only:** Uncomment the following cell to restart the kernel or use the button to restart the kernel. For Vertex AI Workbench you can restart the terminal using the button on top."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "v6jBDnYnNM08",
"metadata": {
"id": "v6jBDnYnNM08"
},
"outputs": [],
"source": [
"# # Automatically restart kernel after installs so that your environment can access the new packages\n",
"# import IPython\n",
"\n",
"# app = IPython.Application.instance()\n",
"# app.kernel.do_shutdown(True)"
]
},
{
"cell_type": "markdown",
"id": "yygMe6rPWxHS",
"metadata": {
"id": "yygMe6rPWxHS"
},
"source": [
"### 🔐 Authentication\n",
"Authenticate to Google Cloud as the IAM user logged into this notebook in order to access your Google Cloud Project.\n",
"\n",
"* If you are using Colab to run this notebook, use the cell below and continue.\n",
"* If you are using Vertex AI Workbench, check out the setup instructions [here](https://github.com/GoogleCloudPlatform/generative-ai/tree/main/setup-env)."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "PTXN1_DSXj2b",
"metadata": {
"id": "PTXN1_DSXj2b"
},
"outputs": [],
"source": [
"from google.colab import auth\n",
"\n",
"auth.authenticate_user()"
]
},
{
"cell_type": "markdown",
"id": "NEvB9BoLEulY",
"metadata": {
"id": "NEvB9BoLEulY"
},
"source": [
"### ☁ Set Your Google Cloud Project\n",
"Set your Google Cloud project so that you can leverage Google Cloud resources within this notebook.\n",
"\n",
"If you don't know your project ID, try the following:\n",
"\n",
"* Run `gcloud config list`.\n",
"* Run `gcloud projects list`.\n",
"* See the support page: [Locate the project ID](https://support.google.com/googleapi/answer/7014113)."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "gfkS3yVRE4_W",
"metadata": {
"cellView": "form",
"id": "gfkS3yVRE4_W"
},
"outputs": [],
"source": [
"# @markdown Please fill in the value below with your Google Cloud project ID and then run the cell.\n",
"\n",
"PROJECT_ID = \"my-project-id\" # @param {type:\"string\"}\n",
"\n",
"# Set the project id\n",
"!gcloud config set project {PROJECT_ID}"
]
},
{
"cell_type": "markdown",
"id": "rEWWNoNnKOgq",
"metadata": {
"id": "rEWWNoNnKOgq"
},
"source": [
"### 💡 API Enablement\n",
"The `langchain-google-cloud-sql-mysql` package requires that you [enable the Cloud SQL Admin API](https://console.cloud.google.com/flows/enableapi?apiid=sqladmin.googleapis.com) in your Google Cloud Project."
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "5utKIdq7KYi5",
"metadata": {
"id": "5utKIdq7KYi5"
},
"outputs": [],
"source": [
"# enable Cloud SQL Admin API\n",
"!gcloud services enable sqladmin.googleapis.com"
]
},
{
"cell_type": "markdown",
"id": "f8f2830ee9ca1e01",
"metadata": {
"id": "f8f2830ee9ca1e01"
},
"source": [
"## Basic Usage"
]
},
{
"cell_type": "markdown",
"id": "OMvzMWRrR6n7",
"metadata": {
"id": "OMvzMWRrR6n7"
},
"source": [
"### Set Cloud SQL database values\n",
"Find your database values, in the [Cloud SQL Instances page](https://console.cloud.google.com/sql?_ga=2.223735448.2062268965.1707700487-2088871159.1707257687)."
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "irl7eMFnSPZr",
"metadata": {
"id": "irl7eMFnSPZr"
},
"outputs": [],
"source": [
"# @title Set Your Values Here { display-mode: \"form\" }\n",
"REGION = \"us-central1\" # @param {type: \"string\"}\n",
"INSTANCE = \"my-mysql-instance\" # @param {type: \"string\"}\n",
"DATABASE = \"my-database\" # @param {type: \"string\"}\n",
"TABLE_NAME = \"message_store\" # @param {type: \"string\"}"
]
},
{
"cell_type": "markdown",
"id": "QuQigs4UoFQ2",
"metadata": {
"id": "QuQigs4UoFQ2"
},
"source": [
"### MySQLEngine Connection Pool\n",
"\n",
"One of the requirements and arguments to establish Cloud SQL as a ChatMessageHistory memory store is a `MySQLEngine` object. The `MySQLEngine` configures a connection pool to your Cloud SQL database, enabling successful connections from your application and following industry best practices.\n",
"\n",
"To create a `MySQLEngine` using `MySQLEngine.from_instance()` you need to provide only 4 things:\n",
"\n",
"1. `project_id` : Project ID of the Google Cloud Project where the Cloud SQL instance is located.\n",
"1. `region` : Region where the Cloud SQL instance is located.\n",
"1. `instance` : The name of the Cloud SQL instance.\n",
"1. `database` : The name of the database to connect to on the Cloud SQL instance.\n",
"\n",
"By default, [IAM database authentication](https://cloud.google.com/sql/docs/mysql/iam-authentication#iam-db-auth) will be used as the method of database authentication. This library uses the IAM principal belonging to the [Application Default Credentials (ADC)](https://cloud.google.com/docs/authentication/application-default-credentials) sourced from the envionment.\n",
"\n",
"For more informatin on IAM database authentication please see:\n",
"\n",
"* [Configure an instance for IAM database authentication](https://cloud.google.com/sql/docs/mysql/create-edit-iam-instances)\n",
"* [Manage users with IAM database authentication](https://cloud.google.com/sql/docs/mysql/add-manage-iam-users)\n",
"\n",
"Optionally, [built-in database authentication](https://cloud.google.com/sql/docs/mysql/built-in-authentication) using a username and password to access the Cloud SQL database can also be used. Just provide the optional `user` and `password` arguments to `MySQLEngine.from_instance()`:\n",
"\n",
"* `user` : Database user to use for built-in database authentication and login\n",
"* `password` : Database password to use for built-in database authentication and login.\n"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "4576e914a866fb40",
"metadata": {
"ExecuteTime": {
"end_time": "2023-08-28T10:04:38.077748Z",
"start_time": "2023-08-28T10:04:36.105894Z"
},
"id": "4576e914a866fb40",
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [],
"source": [
"from langchain_google_cloud_sql_mysql import MySQLEngine\n",
"\n",
"engine = MySQLEngine.from_instance(\n",
" project_id=PROJECT_ID, region=REGION, instance=INSTANCE, database=DATABASE\n",
")"
]
},
{
"cell_type": "markdown",
"id": "qPV8WfWr7O54",
"metadata": {
"id": "qPV8WfWr7O54"
},
"source": [
"### Initialize a table\n",
"The `MySQLChatMessageHistory` class requires a database table with a specific schema in order to store the chat message history.\n",
"\n",
"The `MySQLEngine` engine has a helper method `init_chat_history_table()` that can be used to create a table with the proper schema for you."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "TEu4VHArRttE",
"metadata": {
"id": "TEu4VHArRttE"
},
"outputs": [],
"source": [
"engine.init_chat_history_table(table_name=TABLE_NAME)"
]
},
{
"cell_type": "markdown",
"id": "zSYQTYf3UfOi",
"metadata": {
"id": "zSYQTYf3UfOi"
},
"source": [
"### MySQLChatMessageHistory\n",
"\n",
"To initialize the `MySQLChatMessageHistory` class you need to provide only 3 things:\n",
"\n",
"1. `engine` - An instance of a `MySQLEngine` engine.\n",
"1. `session_id` - A unique identifier string that specifies an id for the session.\n",
"1. `table_name` : The name of the table within the Cloud SQL database to store the chat message history."
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "Kq7RLtfOq0wi",
"metadata": {
"id": "Kq7RLtfOq0wi"
},
"outputs": [],
"source": [
"from langchain_google_cloud_sql_mysql import MySQLChatMessageHistory\n",
"\n",
"history = MySQLChatMessageHistory(\n",
" engine, session_id=\"test_session\", table_name=TABLE_NAME\n",
")\n",
"history.add_user_message(\"hi!\")\n",
"history.add_ai_message(\"whats up?\")"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "b476688cbb32ba90",
"metadata": {
"ExecuteTime": {
"end_time": "2023-08-28T10:04:38.929396Z",
"start_time": "2023-08-28T10:04:38.915727Z"
},
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "b476688cbb32ba90",
"jupyter": {
"outputs_hidden": false
},
"outputId": "a19e5cd8-4225-476a-d28d-e870c6b838bb"
},
"outputs": [
{
"data": {
"text/plain": [
"[HumanMessage(content='hi!'), AIMessage(content='whats up?')]"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"history.messages"
]
},
{
"cell_type": "markdown",
"id": "ss6CbqcTTedr",
"metadata": {
"id": "ss6CbqcTTedr"
},
"source": [
"#### Cleaning up\n",
"When the history of a specific session is obsolete and can be deleted, it can be done the following way.\n",
"\n",
"**Note:** Once deleted, the data is no longer stored in Cloud SQL and is gone forever."
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "3khxzFxYO7x6",
"metadata": {
"id": "3khxzFxYO7x6"
},
"outputs": [],
"source": [
"history.clear()"
]
},
{
"cell_type": "markdown",
"id": "2e5337719d5614fd",
"metadata": {
"id": "2e5337719d5614fd"
},
"source": [
"## 🔗 Chaining\n",
"\n",
"We can easily combine this message history class with [LCEL Runnables](/docs/expression_language/how_to/message_history)\n",
"\n",
"To do this we will use one of [Google's Vertex AI chat models](https://python.langchain.com/docs/integrations/chat/google_vertex_ai_palm) which requires that you [enable the Vertex AI API](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com) in your Google Cloud Project.\n"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "hYtHM3-TOMCe",
"metadata": {
"id": "hYtHM3-TOMCe"
},
"outputs": [],
"source": [
"# enable Vertex AI API\n",
"!gcloud services enable aiplatform.googleapis.com"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "6558418b-0ece-4d01-9661-56d562d78f7a",
"metadata": {
"id": "6558418b-0ece-4d01-9661-56d562d78f7a"
},
"outputs": [],
"source": [
"from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder\n",
"from langchain_core.runnables.history import RunnableWithMessageHistory\n",
"from langchain_google_vertexai import ChatVertexAI"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "82149122-61d3-490d-9bdb-bb98606e8ba1",
"metadata": {
"id": "82149122-61d3-490d-9bdb-bb98606e8ba1"
},
"outputs": [],
"source": [
"prompt = ChatPromptTemplate.from_messages(\n",
" [\n",
" (\"system\", \"You are a helpful assistant.\"),\n",
" MessagesPlaceholder(variable_name=\"history\"),\n",
" (\"human\", \"{question}\"),\n",
" ]\n",
")\n",
"\n",
"chain = prompt | ChatVertexAI(project=PROJECT_ID)"
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "2df90853-b67c-490f-b7f8-b69d69270b9c",
"metadata": {
"id": "2df90853-b67c-490f-b7f8-b69d69270b9c"
},
"outputs": [],
"source": [
"chain_with_history = RunnableWithMessageHistory(\n",
" chain,\n",
" lambda session_id: MySQLChatMessageHistory(\n",
" engine,\n",
" session_id=session_id,\n",
" table_name=TABLE_NAME,\n",
" ),\n",
" input_messages_key=\"question\",\n",
" history_messages_key=\"history\",\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "0ce596b8-3b78-48fd-9f92-46dccbbfd58b",
"metadata": {
"id": "0ce596b8-3b78-48fd-9f92-46dccbbfd58b"
},
"outputs": [],
"source": [
"# This is where we configure the session id\n",
"config = {\"configurable\": {\"session_id\": \"test_session\"}}"
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "38e1423b-ba86-4496-9151-25932fab1a8b",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "38e1423b-ba86-4496-9151-25932fab1a8b",
"outputId": "d5c93570-4b0b-4fe8-d19c-4b361fe74291"
},
"outputs": [
{
"data": {
"text/plain": [
"AIMessage(content=' Hello Bob, how can I help you today?')"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chain_with_history.invoke({\"question\": \"Hi! I'm bob\"}, config=config)"
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "2ee4ee62-a216-4fb1-bf33-57476a84cf16",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "2ee4ee62-a216-4fb1-bf33-57476a84cf16",
"outputId": "288fe388-3f60-41b8-8edb-37cfbec18981"
},
"outputs": [
{
"data": {
"text/plain": [
"AIMessage(content=' Your name is Bob.')"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chain_with_history.invoke({\"question\": \"Whats my name\"}, config=config)"
]
}
],
"metadata": {
"colab": {
"provenance": [],
"toc_visible": true
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,554 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "f22eab3f84cbeb37",
"metadata": {
"id": "f22eab3f84cbeb37"
},
"source": [
"# Google Cloud SQL for PostgreSQL\n",
"\n",
"> [Cloud SQL](https://cloud.google.com/sql) is a fully managed relational database service that offers high performance, seamless integration, and impressive scalability. It offers MySQL, PostgreSQL, and SQL Server database engines. Extend your database application to build AI-powered experiences leveraging Cloud SQL's Langchain integrations.\n",
"\n",
"This notebook goes over how to use `Cloud SQL for PostgreSQL` to store chat message history with the `PostgresChatMessageHistory` class."
]
},
{
"cell_type": "markdown",
"id": "da400c79-a360-43e2-be60-401fd02b2819",
"metadata": {
"id": "da400c79-a360-43e2-be60-401fd02b2819"
},
"source": [
"## Before You Begin\n",
"\n",
"To run this notebook, you will need to do the following:\n",
"\n",
" * [Create a Google Cloud Project](https://developers.google.com/workspace/guides/create-project)\n",
" * [Create a Cloud SQL for PostgreSQL instance](https://cloud.google.com/sql/docs/postgres/create-instance)\n",
" * [Create a Cloud SQL database](https://cloud.google.com/sql/docs/mysql/create-manage-databases)\n",
" * [Add an IAM database user to the database](https://cloud.google.com/sql/docs/postgres/add-manage-iam-users#creating-a-database-user) (Optional)"
]
},
{
"cell_type": "markdown",
"id": "Mm7-fG_LltD7",
"metadata": {
"id": "Mm7-fG_LltD7"
},
"source": [
"### 🦜🔗 Library Installation\n",
"The integration lives in its own `langchain-google-cloud-sql-pg` package, so we need to install it."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1VELXvcj8AId",
"metadata": {
"id": "1VELXvcj8AId"
},
"outputs": [],
"source": [
"%pip install --upgrade --quiet langchain-google-cloud-sql-pg langchain-google-vertexai"
]
},
{
"cell_type": "markdown",
"id": "98TVoM3MNDHu",
"metadata": {
"id": "98TVoM3MNDHu"
},
"source": [
"**Colab only:** Uncomment the following cell to restart the kernel or use the button to restart the kernel. For Vertex AI Workbench you can restart the terminal using the button on top."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "v6jBDnYnNM08",
"metadata": {
"id": "v6jBDnYnNM08"
},
"outputs": [],
"source": [
"# # Automatically restart kernel after installs so that your environment can access the new packages\n",
"# import IPython\n",
"\n",
"# app = IPython.Application.instance()\n",
"# app.kernel.do_shutdown(True)"
]
},
{
"cell_type": "markdown",
"id": "yygMe6rPWxHS",
"metadata": {
"id": "yygMe6rPWxHS"
},
"source": [
"### 🔐 Authentication\n",
"Authenticate to Google Cloud as the IAM user logged into this notebook in order to access your Google Cloud Project.\n",
"\n",
"* If you are using Colab to run this notebook, use the cell below and continue.\n",
"* If you are using Vertex AI Workbench, check out the setup instructions [here](https://github.com/GoogleCloudPlatform/generative-ai/tree/main/setup-env)."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "PTXN1_DSXj2b",
"metadata": {
"id": "PTXN1_DSXj2b"
},
"outputs": [],
"source": [
"from google.colab import auth\n",
"\n",
"auth.authenticate_user()"
]
},
{
"cell_type": "markdown",
"id": "NEvB9BoLEulY",
"metadata": {
"id": "NEvB9BoLEulY"
},
"source": [
"### ☁ Set Your Google Cloud Project\n",
"Set your Google Cloud project so that you can leverage Google Cloud resources within this notebook.\n",
"\n",
"If you don't know your project ID, try the following:\n",
"\n",
"* Run `gcloud config list`.\n",
"* Run `gcloud projects list`.\n",
"* See the support page: [Locate the project ID](https://support.google.com/googleapi/answer/7014113)."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "gfkS3yVRE4_W",
"metadata": {
"cellView": "form",
"id": "gfkS3yVRE4_W"
},
"outputs": [],
"source": [
"# @markdown Please fill in the value below with your Google Cloud project ID and then run the cell.\n",
"\n",
"PROJECT_ID = \"my-project-id\" # @param {type:\"string\"}\n",
"\n",
"# Set the project id\n",
"!gcloud config set project {PROJECT_ID}"
]
},
{
"cell_type": "markdown",
"id": "rEWWNoNnKOgq",
"metadata": {
"id": "rEWWNoNnKOgq"
},
"source": [
"### 💡 API Enablement\n",
"The `langchain-google-cloud-sql-pg` package requires that you [enable the Cloud SQL Admin API](https://console.cloud.google.com/flows/enableapi?apiid=sqladmin.googleapis.com) in your Google Cloud Project."
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "5utKIdq7KYi5",
"metadata": {
"id": "5utKIdq7KYi5"
},
"outputs": [],
"source": [
"# enable Cloud SQL Admin API\n",
"!gcloud services enable sqladmin.googleapis.com"
]
},
{
"cell_type": "markdown",
"id": "f8f2830ee9ca1e01",
"metadata": {
"id": "f8f2830ee9ca1e01"
},
"source": [
"## Basic Usage"
]
},
{
"cell_type": "markdown",
"id": "OMvzMWRrR6n7",
"metadata": {
"id": "OMvzMWRrR6n7"
},
"source": [
"### Set Cloud SQL database values\n",
"Find your database values, in the [Cloud SQL Instances page](https://console.cloud.google.com/sql?_ga=2.223735448.2062268965.1707700487-2088871159.1707257687)."
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "irl7eMFnSPZr",
"metadata": {
"id": "irl7eMFnSPZr"
},
"outputs": [],
"source": [
"# @title Set Your Values Here { display-mode: \"form\" }\n",
"REGION = \"us-central1\" # @param {type: \"string\"}\n",
"INSTANCE = \"my-postgresql-instance\" # @param {type: \"string\"}\n",
"DATABASE = \"my-database\" # @param {type: \"string\"}\n",
"TABLE_NAME = \"message_store\" # @param {type: \"string\"}"
]
},
{
"cell_type": "markdown",
"id": "QuQigs4UoFQ2",
"metadata": {
"id": "QuQigs4UoFQ2"
},
"source": [
"### PostgresEngine Connection Pool\n",
"\n",
"One of the requirements and arguments to establish Cloud SQL as a ChatMessageHistory memory store is a `PostgresEngine` object. The `PostgresEngine` configures a connection pool to your Cloud SQL database, enabling successful connections from your application and following industry best practices.\n",
"\n",
"To create a `PostgresEngine` using `PostgresEngine.from_instance()` you need to provide only 4 things:\n",
"\n",
"1. `project_id` : Project ID of the Google Cloud Project where the Cloud SQL instance is located.\n",
"1. `region` : Region where the Cloud SQL instance is located.\n",
"1. `instance` : The name of the Cloud SQL instance.\n",
"1. `database` : The name of the database to connect to on the Cloud SQL instance.\n",
"\n",
"By default, [IAM database authentication](https://cloud.google.com/sql/docs/postgres/iam-authentication#iam-db-auth) will be used as the method of database authentication. This library uses the IAM principal belonging to the [Application Default Credentials (ADC)](https://cloud.google.com/docs/authentication/application-default-credentials) sourced from the envionment.\n",
"\n",
"For more informatin on IAM database authentication please see:\n",
"\n",
"* [Configure an instance for IAM database authentication](https://cloud.google.com/sql/docs/postgres/create-edit-iam-instances)\n",
"* [Manage users with IAM database authentication](https://cloud.google.com/sql/docs/postgres/add-manage-iam-users)\n",
"\n",
"Optionally, [built-in database authentication](https://cloud.google.com/sql/docs/postgres/built-in-authentication) using a username and password to access the Cloud SQL database can also be used. Just provide the optional `user` and `password` arguments to `PostgresEngine.from_instance()`:\n",
"\n",
"* `user` : Database user to use for built-in database authentication and login\n",
"* `password` : Database password to use for built-in database authentication and login.\n"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "4576e914a866fb40",
"metadata": {
"ExecuteTime": {
"end_time": "2023-08-28T10:04:38.077748Z",
"start_time": "2023-08-28T10:04:36.105894Z"
},
"id": "4576e914a866fb40",
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [],
"source": [
"from langchain_google_cloud_sql_pg import PostgresEngine\n",
"\n",
"engine = PostgresEngine.from_instance(\n",
" project_id=PROJECT_ID, region=REGION, instance=INSTANCE, database=DATABASE\n",
")"
]
},
{
"cell_type": "markdown",
"id": "qPV8WfWr7O54",
"metadata": {
"id": "qPV8WfWr7O54"
},
"source": [
"### Initialize a table\n",
"The `PostgresChatMessageHistory` class requires a database table with a specific schema in order to store the chat message history.\n",
"\n",
"The `PostgresEngine` engine has a helper method `init_chat_history_table()` that can be used to create a table with the proper schema for you."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "TEu4VHArRttE",
"metadata": {
"id": "TEu4VHArRttE"
},
"outputs": [],
"source": [
"engine.init_chat_history_table(table_name=TABLE_NAME)"
]
},
{
"cell_type": "markdown",
"id": "zSYQTYf3UfOi",
"metadata": {
"id": "zSYQTYf3UfOi"
},
"source": [
"### PostgresChatMessageHistory\n",
"\n",
"To initialize the `PostgresChatMessageHistory` class you need to provide only 3 things:\n",
"\n",
"1. `engine` - An instance of a `PostgresEngine` engine.\n",
"1. `session_id` - A unique identifier string that specifies an id for the session.\n",
"1. `table_name` : The name of the table within the Cloud SQL database to store the chat message history."
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "Kq7RLtfOq0wi",
"metadata": {
"id": "Kq7RLtfOq0wi"
},
"outputs": [],
"source": [
"from langchain_google_cloud_sql_pg import PostgresChatMessageHistory\n",
"\n",
"history = PostgresChatMessageHistory.create_sync(\n",
" engine, session_id=\"test_session\", table_name=TABLE_NAME\n",
")\n",
"history.add_user_message(\"hi!\")\n",
"history.add_ai_message(\"whats up?\")"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "b476688cbb32ba90",
"metadata": {
"ExecuteTime": {
"end_time": "2023-08-28T10:04:38.929396Z",
"start_time": "2023-08-28T10:04:38.915727Z"
},
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "b476688cbb32ba90",
"jupyter": {
"outputs_hidden": false
},
"outputId": "a19e5cd8-4225-476a-d28d-e870c6b838bb"
},
"outputs": [
{
"data": {
"text/plain": [
"[HumanMessage(content='hi!'), AIMessage(content='whats up?')]"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"history.messages"
]
},
{
"cell_type": "markdown",
"id": "ss6CbqcTTedr",
"metadata": {
"id": "ss6CbqcTTedr"
},
"source": [
"#### Cleaning up\n",
"When the history of a specific session is obsolete and can be deleted, it can be done the following way.\n",
"\n",
"**Note:** Once deleted, the data is no longer stored in Cloud SQL and is gone forever."
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "3khxzFxYO7x6",
"metadata": {
"id": "3khxzFxYO7x6"
},
"outputs": [],
"source": [
"history.clear()"
]
},
{
"cell_type": "markdown",
"id": "2e5337719d5614fd",
"metadata": {
"id": "2e5337719d5614fd"
},
"source": [
"## 🔗 Chaining\n",
"\n",
"We can easily combine this message history class with [LCEL Runnables](/docs/expression_language/how_to/message_history)\n",
"\n",
"To do this we will use one of [Google's Vertex AI chat models](https://python.langchain.com/docs/integrations/chat/google_vertex_ai_palm) which requires that you [enable the Vertex AI API](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com) in your Google Cloud Project.\n"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "hYtHM3-TOMCe",
"metadata": {
"id": "hYtHM3-TOMCe"
},
"outputs": [],
"source": [
"# enable Vertex AI API\n",
"!gcloud services enable aiplatform.googleapis.com"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "6558418b-0ece-4d01-9661-56d562d78f7a",
"metadata": {
"id": "6558418b-0ece-4d01-9661-56d562d78f7a"
},
"outputs": [],
"source": [
"from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder\n",
"from langchain_core.runnables.history import RunnableWithMessageHistory\n",
"from langchain_google_vertexai import ChatVertexAI"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "82149122-61d3-490d-9bdb-bb98606e8ba1",
"metadata": {
"id": "82149122-61d3-490d-9bdb-bb98606e8ba1"
},
"outputs": [],
"source": [
"prompt = ChatPromptTemplate.from_messages(\n",
" [\n",
" (\"system\", \"You are a helpful assistant.\"),\n",
" MessagesPlaceholder(variable_name=\"history\"),\n",
" (\"human\", \"{question}\"),\n",
" ]\n",
")\n",
"\n",
"chain = prompt | ChatVertexAI(project=PROJECT_ID)"
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "2df90853-b67c-490f-b7f8-b69d69270b9c",
"metadata": {
"id": "2df90853-b67c-490f-b7f8-b69d69270b9c"
},
"outputs": [],
"source": [
"chain_with_history = RunnableWithMessageHistory(\n",
" chain,\n",
" lambda session_id: PostgresChatMessageHistory.create_sync(\n",
" engine,\n",
" session_id=session_id,\n",
" table_name=TABLE_NAME,\n",
" ),\n",
" input_messages_key=\"question\",\n",
" history_messages_key=\"history\",\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "0ce596b8-3b78-48fd-9f92-46dccbbfd58b",
"metadata": {
"id": "0ce596b8-3b78-48fd-9f92-46dccbbfd58b"
},
"outputs": [],
"source": [
"# This is where we configure the session id\n",
"config = {\"configurable\": {\"session_id\": \"test_session\"}}"
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "38e1423b-ba86-4496-9151-25932fab1a8b",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "38e1423b-ba86-4496-9151-25932fab1a8b",
"outputId": "d5c93570-4b0b-4fe8-d19c-4b361fe74291"
},
"outputs": [
{
"data": {
"text/plain": [
"AIMessage(content=' Hello Bob, how can I help you today?')"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chain_with_history.invoke({\"question\": \"Hi! I'm bob\"}, config=config)"
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "2ee4ee62-a216-4fb1-bf33-57476a84cf16",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "2ee4ee62-a216-4fb1-bf33-57476a84cf16",
"outputId": "288fe388-3f60-41b8-8edb-37cfbec18981"
},
"outputs": [
{
"data": {
"text/plain": [
"AIMessage(content=' Your name is Bob.')"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chain_with_history.invoke({\"question\": \"Whats my name\"}, config=config)"
]
}
],
"metadata": {
"colab": {
"provenance": [],
"toc_visible": true
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.8"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,259 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Google Firestore in Datastore\n",
"\n",
"> [Firestore in Datastore](https://cloud.google.com/datastore) is a serverless document-oriented database that scales to meet any demand. Extend your database application to build AI-powered experiences leveraging Datastore's Langchain integrations.\n",
"\n",
"This notebook goes over how to use [Firestore in Datastore](https://cloud.google.com/datastore) to to store chat message history with the `DatastoreChatMessageHistory` class.\n",
"\n",
"[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/googleapis/langchain-google-datastore-python/blob/main/docs/chat_message_history.ipynb)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Before You Begin\n",
"\n",
"To run this notebook, you will need to do the following:\n",
"\n",
"* [Create a Google Cloud Project](https://developers.google.com/workspace/guides/create-project)\n",
"* [Create a Datastore database](https://cloud.google.com/datastore/docs/manage-databases)\n",
"\n",
"After confirmed access to database in the runtime environment of this notebook, filling the following values and run the cell before running example scripts."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 🦜🔗 Library Installation\n",
"\n",
"The integration lives in its own `langchain-google-datastore` package, so we need to install it."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"%pip install -upgrade --quiet langchain-google-datastore"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Colab only**: Uncomment the following cell to restart the kernel or use the button to restart the kernel. For Vertex AI Workbench you can restart the terminal using the button on top."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# # Automatically restart kernel after installs so that your environment can access the new packages\n",
"# import IPython\n",
"\n",
"# app = IPython.Application.instance()\n",
"# app.kernel.do_shutdown(True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### ☁ Set Your Google Cloud Project\n",
"Set your Google Cloud project so that you can leverage Google Cloud resources within this notebook.\n",
"\n",
"If you don't know your project ID, try the following:\n",
"\n",
"* Run `gcloud config list`.\n",
"* Run `gcloud projects list`.\n",
"* See the support page: [Locate the project ID](https://support.google.com/googleapi/answer/7014113)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# @markdown Please fill in the value below with your Google Cloud project ID and then run the cell.\n",
"\n",
"PROJECT_ID = \"my-project-id\" # @param {type:\"string\"}\n",
"\n",
"# Set the project id\n",
"!gcloud config set project {PROJECT_ID}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 🔐 Authentication\n",
"\n",
"Authenticate to Google Cloud as the IAM user logged into this notebook in order to access your Google Cloud Project.\n",
"\n",
"- If you are using Colab to run this notebook, use the cell below and continue.\n",
"- If you are using Vertex AI Workbench, check out the setup instructions [here](https://github.com/GoogleCloudPlatform/generative-ai/tree/main/setup-env)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from google.colab import auth\n",
"\n",
"auth.authenticate_user()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### API Enablement\n",
"The `langchain-google-datastore` package requires that you [enable the Datastore API](https://console.cloud.google.com/flows/enableapi?apiid=datastore.googleapis.com) in your Google Cloud Project."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# enable Datastore API\n",
"!gcloud services enable datastore.googleapis.com"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Basic Usage"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### DatastoreChatMessageHistory\n",
"\n",
"To initialize the `DatastoreChatMessageHistory` class you need to provide only 3 things:\n",
"\n",
"1. `session_id` - A unique identifier string that specifies an id for the session.\n",
"1. `collection` : The single `/`-delimited path to a Datastore collection."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain_google_datastore import DatastoreChatMessageHistory\n",
"\n",
"chat_history = DatastoreChatMessageHistory(\n",
" session_id=\"user-session-id\", collection=\"HistoryMessages\"\n",
")\n",
"\n",
"chat_history.add_user_message(\"Hi!\")\n",
"chat_history.add_ai_message(\"How can I help you?\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"chat_history.messages"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Cleaning up\n",
"When the history of a specific session is obsolete and can be deleted from the database and memory, it can be done the following way.\n",
"\n",
"**Note:** Once deleted, the data is no longer stored in Datastore and is gone forever."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"chat_history.clear()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Custom Client\n",
"\n",
"The client is created by default using the available environment variables. A [custom client](https://cloud.google.com/python/docs/reference/datastore/latest/client) can be passed to the constructor."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from google.auth import compute_engine\n",
"from google.cloud import datastore\n",
"\n",
"client = datastore.Client(\n",
" project=\"project-custom\",\n",
" database=\"non-default-database\",\n",
" credentials=compute_engine.Credentials(),\n",
")\n",
"\n",
"history = DatastoreChatMessageHistory(\n",
" session_id=\"session-id\", collection=\"History\", client=client\n",
")\n",
"\n",
"history.add_user_message(\"New message\")\n",
"\n",
"history.messages\n",
"\n",
"history.clear()"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.6"
}
},
"nbformat": 4,
"nbformat_minor": 4
}

View File

@@ -0,0 +1,368 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Google El Carro Oracle Operator\n",
"\n",
"> Google [El Carro Oracle Operator](https://github.com/GoogleCloudPlatform/elcarro-oracle-operator) offers a way to run Oracle databases in Kubernetes as a portable, open source, community driven, no vendor lock-in container orchestration system. El Carro provides a powerful declarative API for comprehensive and consistent configuration and deployment as well as for real-time operations and monitoring. Extend your database application to build AI-powered experiences leveraging Oracle Langchain integrations.\n",
"\n",
"This guide goes over how to use the El Carro Langchain integration to store chat message history with the `ElCarroChatMessageHistory` class.\n",
"\n",
"[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/googleapis/langchain-google-el-carro-python/blob/main/docs/chat_message_history.ipynb)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Before You Begin\n",
"\n",
"To run this notebook, you will need to do the following:\n",
" * Complete the [Getting Started](https://github.com/googleapis/langchain-google-el-carro-python/tree/main/README.md#getting-started) section"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 🦜🔗 Library Installation\n",
"The integration lives in its own `langchain-google-el-carro` package, so we need to install it."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%pip install --upgrade --quiet langchain-google-el-carro langchain-google-vertexai langchain"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Colab only:** Uncomment the following cell to restart the kernel or use the button to restart the kernel. For Vertex AI Workbench you can restart the terminal using the button on top."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# # Automatically restart kernel after installs so that your environment can access the new packages\n",
"# import IPython\n",
"\n",
"# app = IPython.Application.instance()\n",
"# app.kernel.do_shutdown(True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 🔐 Authentication\n",
"Authenticate to Google Cloud as the IAM user logged into this notebook in order to access your Google Cloud Project.\n",
"\n",
"* If you are using Colab to run this notebook, use the cell below and continue.\n",
"* If you are using Vertex AI Workbench, check out the setup instructions [here](https://github.com/GoogleCloudPlatform/generative-ai/tree/main/setup-env)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from google.colab import auth\n",
"\n",
"auth.authenticate_user()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### ☁ Set Your Google Cloud Project\n",
"Set your Google Cloud project so that you can leverage Google Cloud resources within this notebook.\n",
"\n",
"If you don't know your project ID, try the following:\n",
"\n",
"* Run `gcloud config list`.\n",
"* Run `gcloud projects list`.\n",
"* See the support page: [Locate the project ID](https://support.google.com/googleapi/answer/7014113)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# @markdown Please fill in the value below with your Google Cloud project ID and then run the cell.\n",
"\n",
"PROJECT_ID = \"my-project-id\" # @param {type:\"string\"}\n",
"\n",
"# Set the project id\n",
"!gcloud config set project {PROJECT_ID}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Basic Usage"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Set Up Oracle Database Connection\n",
"You can find the hostname and port values in the status of the El Carro\n",
"Kubernetes instance. Use the user password you created for your PDB."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# @title Set Your Values Here { display-mode: \"form\" }\n",
"HOST = \"127.0.0.1\" # @param {type: \"string\"}\n",
"PORT = 3307 # @param {type: \"integer\"}\n",
"DATABASE = \"my-database\" # @param {type: \"string\"}\n",
"TABLE_NAME = \"message_store\" # @param {type: \"string\"}\n",
"USER = \"my-user\" # @param {type: \"string\"}\n",
"PASSWORD = input(\"Please provide a password to be used for the database user: \")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### ElCarroEngine Connection Pool\n",
"\n",
"`ElCarroEngine` configures a connection pool to your Oracle database, enabling successful connections from your application and following industry best practices."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain_google_el_carro import ElCarroEngine\n",
"\n",
"elcarro_engine = ElCarroEngine.from_instance(\n",
" db_host=HOST,\n",
" db_port=PORT,\n",
" db_name=DATABASE,\n",
" db_user=USER,\n",
" db_password=PASSWORD,\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Initialize a table\n",
"The `ElCarroChatMessageHistory` class requires a database table with a specific\n",
"schema in order to store the chat message history.\n",
"\n",
"The `ElCarroEngine` class has a\n",
"method `init_chat_history_table()` that can be used to create a table with the\n",
"proper schema for you."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"elcarro_engine.init_chat_history_table(table_name=TABLE_NAME)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### ElCarroChatMessageHistory\n",
"\n",
"To initialize the `ElCarroChatMessageHistory` class you need to provide only 3\n",
"things:\n",
"\n",
"1. `elcarro_engine` - An instance of an `ElCarroEngine` engine.\n",
"1. `session_id` - A unique identifier string that specifies an id for the\n",
" session.\n",
"1. `table_name` : The name of the table within the Oracle database to store the\n",
" chat message history."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain_google_el_carro import ElCarroChatMessageHistory\n",
"\n",
"history = ElCarroChatMessageHistory(\n",
" elcarro_engine=elcarro_engine, session_id=\"test_session\", table_name=TABLE_NAME\n",
")\n",
"history.add_user_message(\"hi!\")\n",
"history.add_ai_message(\"whats up?\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"history.messages"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Cleaning up\n",
"When the history of a specific session is obsolete and can be deleted, it can be done the following way.\n",
"\n",
"**Note:** Once deleted, the data is no longer stored in AlloyDB and is gone forever."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"history.clear()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 🔗 Chaining\n",
"\n",
"We can easily combine this message history class with [LCEL Runnables](/docs/expression_language/how_to/message_history)\n",
"\n",
"To do this we will use one of [Google's Vertex AI chat models](https://python.langchain.com/docs/integrations/chat/google_vertex_ai_palm) which requires that you [enable the Vertex AI API](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com) in your Google Cloud Project.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# enable Vertex AI API\n",
"!gcloud services enable aiplatform.googleapis.com"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder\n",
"from langchain_core.runnables.history import RunnableWithMessageHistory\n",
"from langchain_google_vertexai import ChatVertexAI"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"prompt = ChatPromptTemplate.from_messages(\n",
" [\n",
" (\"system\", \"You are a helpful assistant.\"),\n",
" MessagesPlaceholder(variable_name=\"history\"),\n",
" (\"human\", \"{question}\"),\n",
" ]\n",
")\n",
"\n",
"chain = prompt | ChatVertexAI(project=PROJECT_ID)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"chain_with_history = RunnableWithMessageHistory(\n",
" chain,\n",
" lambda session_id: ElCarroChatMessageHistory(\n",
" elcarro_engine,\n",
" session_id=session_id,\n",
" table_name=TABLE_NAME,\n",
" ),\n",
" input_messages_key=\"question\",\n",
" history_messages_key=\"history\",\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# This is where we configure the session id\n",
"config = {\"configurable\": {\"session_id\": \"test_session\"}}"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"chain_with_history.invoke({\"question\": \"Hi! I'm bob\"}, config=config)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"chain_with_history.invoke({\"question\": \"Whats my name\"}, config=config)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.5"
}
},
"nbformat": 4,
"nbformat_minor": 4
}

View File

@@ -0,0 +1,259 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Google Firestore (Native Mode)\n",
"\n",
"> [Firestore](https://cloud.google.com/firestore) is a serverless document-oriented database that scales to meet any demand. Extend your database application to build AI-powered experiences leveraging Firestore's Langchain integrations.\n",
"\n",
"This notebook goes over how to use [Firestore](https://cloud.google.com/firestore) to to store chat message history with the `FirestoreChatMessageHistory` class.\n",
"\n",
"[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/googleapis/langchain-google-firestore-python/blob/main/docs/chat_message_history.ipynb)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Before You Begin\n",
"\n",
"To run this notebook, you will need to do the following:\n",
"\n",
"* [Create a Google Cloud Project](https://developers.google.com/workspace/guides/create-project)\n",
"* [Create a Firestore database](https://cloud.google.com/firestore/docs/manage-databases)\n",
"\n",
"After confirmed access to database in the runtime environment of this notebook, filling the following values and run the cell before running example scripts."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 🦜🔗 Library Installation\n",
"\n",
"The integration lives in its own `langchain-google-firestore` package, so we need to install it."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"%pip install -upgrade --quiet langchain-google-firestore"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Colab only**: Uncomment the following cell to restart the kernel or use the button to restart the kernel. For Vertex AI Workbench you can restart the terminal using the button on top."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# # Automatically restart kernel after installs so that your environment can access the new packages\n",
"# import IPython\n",
"\n",
"# app = IPython.Application.instance()\n",
"# app.kernel.do_shutdown(True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### ☁ Set Your Google Cloud Project\n",
"Set your Google Cloud project so that you can leverage Google Cloud resources within this notebook.\n",
"\n",
"If you don't know your project ID, try the following:\n",
"\n",
"* Run `gcloud config list`.\n",
"* Run `gcloud projects list`.\n",
"* See the support page: [Locate the project ID](https://support.google.com/googleapi/answer/7014113)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# @markdown Please fill in the value below with your Google Cloud project ID and then run the cell.\n",
"\n",
"PROJECT_ID = \"my-project-id\" # @param {type:\"string\"}\n",
"\n",
"# Set the project id\n",
"!gcloud config set project {PROJECT_ID}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 🔐 Authentication\n",
"\n",
"Authenticate to Google Cloud as the IAM user logged into this notebook in order to access your Google Cloud Project.\n",
"\n",
"- If you are using Colab to run this notebook, use the cell below and continue.\n",
"- If you are using Vertex AI Workbench, check out the setup instructions [here](https://github.com/GoogleCloudPlatform/generative-ai/tree/main/setup-env)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from google.colab import auth\n",
"\n",
"auth.authenticate_user()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### API Enablement\n",
"The `langchain-google-firestore` package requires that you [enable the Firestore Admin API](https://console.cloud.google.com/flows/enableapi?apiid=firestore.googleapis.com) in your Google Cloud Project."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# enable Firestore Admin API\n",
"!gcloud services enable firestore.googleapis.com"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Basic Usage"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### FirestoreChatMessageHistory\n",
"\n",
"To initialize the `FirestoreChatMessageHistory` class you need to provide only 3 things:\n",
"\n",
"1. `session_id` - A unique identifier string that specifies an id for the session.\n",
"1. `collection` : The single `/`-delimited path to a Firestore collection."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain_google_firestore import FirestoreChatMessageHistory\n",
"\n",
"chat_history = FirestoreChatMessageHistory(\n",
" session_id=\"user-session-id\", collection=\"HistoryMessages\"\n",
")\n",
"\n",
"chat_history.add_user_message(\"Hi!\")\n",
"chat_history.add_ai_message(\"How can I help you?\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"chat_history.messages"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Cleaning up\n",
"When the history of a specific session is obsolete and can be deleted from the database and memory, it can be done the following way.\n",
"\n",
"**Note:** Once deleted, the data is no longer stored in Firestore and is gone forever."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"chat_history.clear()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Custom Client\n",
"\n",
"The client is created by default using the available environment variables. A [custom client](https://cloud.google.com/python/docs/reference/firestore/latest/client) can be passed to the constructor."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from google.auth import compute_engine\n",
"from google.cloud import firestore\n",
"\n",
"client = firestore.Client(\n",
" project=\"project-custom\",\n",
" database=\"non-default-database\",\n",
" credentials=compute_engine.Credentials(),\n",
")\n",
"\n",
"history = FirestoreChatMessageHistory(\n",
" session_id=\"session-id\", collection=\"History\", client=client\n",
")\n",
"\n",
"history.add_user_message(\"New message\")\n",
"\n",
"history.messages\n",
"\n",
"history.clear()"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.6"
}
},
"nbformat": 4,
"nbformat_minor": 4
}

View File

@@ -0,0 +1,233 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "6-0_o3DxsFGi"
},
"source": [
"# Google Memorystore for Redis\n",
"\n",
"> [Google Memorystore for Redis](https://cloud.google.com/memorystore/docs/redis/memorystore-for-redis-overview) is a fully-managed service that is powered by the Redis in-memory data store to build application caches that provide sub-millisecond data access. Extend your database application to build AI-powered experiences leveraging Memorystore for Redis's Langchain integrations.\n",
"\n",
"This notebook goes over how to use [Memorystore for Redis](https://cloud.google.com/memorystore/docs/redis/memorystore-for-redis-overview) to store chat message history with the `MemorystoreChatMessageHistory` class.\n",
"\n",
"[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/googleapis/langchain-google-memorystore-redis-python/blob/main/docs/chat_message_history.ipynb)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Before You Begin\n",
"\n",
"To run this notebook, you will need to do the following:\n",
"\n",
"* [Create a Google Cloud Project](https://developers.google.com/workspace/guides/create-project)\n",
"* [Create a Memorystore for Redis instance](https://cloud.google.com/memorystore/docs/redis/create-instance-console). Ensure that the version is greater than or equal to 5.0.\n",
"\n",
"After confirmed access to database in the runtime environment of this notebook, filling the following values and run the cell before running example scripts."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# @markdown Please specify an endpoint associated with the instance or demo purpose.\n",
"ENDPOINT = \"redis://127.0.0.1:6379\" # @param {type:\"string\"}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 🦜🔗 Library Installation\n",
"\n",
"The integration lives in its own `langchain-google-memorystore-redis` package, so we need to install it."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "iLwVMVkYsFGk",
"tags": []
},
"outputs": [],
"source": [
"%pip install -upgrade --quiet langchain-google-memorystore-redis"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Colab only:** Uncomment the following cell to restart the kernel or use the button to restart the kernel. For Vertex AI Workbench you can restart the terminal using the button on top."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# # Automatically restart kernel after installs so that your environment can access the new packages\n",
"# import IPython\n",
"\n",
"# app = IPython.Application.instance()\n",
"# app.kernel.do_shutdown(True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### ☁ Set Your Google Cloud Project\n",
"Set your Google Cloud project so that you can leverage Google Cloud resources within this notebook.\n",
"\n",
"If you don't know your project ID, try the following:\n",
"\n",
"* Run `gcloud config list`.\n",
"* Run `gcloud projects list`.\n",
"* See the support page: [Locate the project ID](https://support.google.com/googleapi/answer/7014113)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# @markdown Please fill in the value below with your Google Cloud project ID and then run the cell.\n",
"\n",
"PROJECT_ID = \"my-project-id\" # @param {type:\"string\"}\n",
"\n",
"# Set the project id\n",
"!gcloud config set project {PROJECT_ID}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 🔐 Authentication\n",
"Authenticate to Google Cloud as the IAM user logged into this notebook in order to access your Google Cloud Project.\n",
"\n",
"* If you are using Colab to run this notebook, use the cell below and continue.\n",
"* If you are using Vertex AI Workbench, check out the setup instructions [here](https://github.com/GoogleCloudPlatform/generative-ai/tree/main/setup-env)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from google.colab import auth\n",
"\n",
"auth.authenticate_user()"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "2L7kMu__sFGl"
},
"source": [
"## Basic Usage"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "A2fT1iEhsFGl"
},
"source": [
"### MemorystoreChatMessageHistory\n",
"\n",
"To initialize the `MemorystoreMessageHistory` class you need to provide only 2 things:\n",
"\n",
"1. `redis_client` - An instance of a Memorystore Redis.\n",
"1. `session_id` - Each chat message history object must have a unique session ID. If the session ID already has messages stored in Redis, they will can be retrieved."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "YEDKWR6asFGl"
},
"outputs": [],
"source": [
"import redis\n",
"from langchain_google_memorystore_redis import MemorystoreChatMessageHistory\n",
"\n",
"# Connect to a Memorystore for Redis instance\n",
"redis_client = redis.from_url(\"redis://127.0.0.1:6379\")\n",
"\n",
"message_history = MemorystoreChatMessageHistory(redis_client, session_id=\"session1\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "BvS3UFsysFGm"
},
"outputs": [],
"source": [
"message_history.messages"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "sFJdt3ubsFGo"
},
"source": [
"#### Cleaning up\n",
"\n",
"When the history of a specific session is obsolete and can be deleted, it can be done the following way.\n",
"\n",
"**Note:** Once deleted, the data is no longer stored in Memorystore for Redis and is gone forever."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "H5I7K3MTsFGo"
},
"outputs": [],
"source": [
"message_history.clear()"
]
}
],
"metadata": {
"colab": {
"provenance": []
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.6"
}
},
"nbformat": 4,
"nbformat_minor": 0
}

View File

@@ -0,0 +1,332 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"# Google Spanner\n",
"> [Cloud Spanner](https://cloud.google.com/spanner) is a highly scalable database that combines unlimited scalability with relational semantics, such as secondary indexes, strong consistency, schemas, and SQL providing 99.999% availability in one easy solution.\n",
"\n",
"This notebook goes over how to use `Spanner` to store chat message history with the `SpannerChatMessageHistory` class."
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"## Before You Begin\n",
"\n",
"To run this notebook, you will need to do the following:\n",
"\n",
" * [Create a Google Cloud Project](https://developers.google.com/workspace/guides/create-project)\n",
" * [Create a Spanner instance](https://cloud.google.com/spanner/docs/create-manage-instances)\n",
" * [Create a Spanner database](https://cloud.google.com/spanner/docs/create-manage-databases)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 🦜🔗 Library Installation\n",
"The integration lives in its own `langchain-google-spanner` package, so we need to install it."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%pip install --upgrade --quiet langchain-google-spanner"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Colab only:** Uncomment the following cell to restart the kernel or use the button to restart the kernel. For Vertex AI Workbench you can restart the terminal using the button on top."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# # Automatically restart kernel after installs so that your environment can access the new packages\n",
"# import IPython\n",
"\n",
"# app = IPython.Application.instance()\n",
"# app.kernel.do_shutdown(True)"
]
},
{
"cell_type": "markdown",
"id": "yygMe6rPWxHS",
"metadata": {
"id": "yygMe6rPWxHS"
},
"source": [
"### 🔐 Authentication\n",
"Authenticate to Google Cloud as the IAM user logged into this notebook in order to access your Google Cloud Project.\n",
"\n",
"* If you are using Colab to run this notebook, use the cell below and continue.\n",
"* If you are using Vertex AI Workbench, check out the setup instructions [here](https://github.com/GoogleCloudPlatform/generative-ai/tree/main/setup-env)."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "PTXN1_DSXj2b",
"metadata": {
"id": "PTXN1_DSXj2b"
},
"outputs": [],
"source": [
"from google.colab import auth\n",
"\n",
"auth.authenticate_user()"
]
},
{
"cell_type": "markdown",
"id": "NEvB9BoLEulY",
"metadata": {
"id": "NEvB9BoLEulY"
},
"source": [
"### ☁ Set Your Google Cloud Project\n",
"Set your Google Cloud project so that you can leverage Google Cloud resources within this notebook.\n",
"\n",
"If you don't know your project ID, try the following:\n",
"\n",
"* Run `gcloud config list`.\n",
"* Run `gcloud projects list`.\n",
"* See the support page: [Locate the project ID](https://support.google.com/googleapi/answer/7014113)."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "gfkS3yVRE4_W",
"metadata": {
"cellView": "form",
"id": "gfkS3yVRE4_W"
},
"outputs": [],
"source": [
"# @markdown Please fill in the value below with your Google Cloud project ID and then run the cell.\n",
"\n",
"PROJECT_ID = \"my-project-id\" # @param {type:\"string\"}\n",
"\n",
"# Set the project id\n",
"!gcloud config set project {PROJECT_ID}"
]
},
{
"cell_type": "markdown",
"id": "rEWWNoNnKOgq",
"metadata": {
"id": "rEWWNoNnKOgq"
},
"source": [
"### 💡 API Enablement\n",
"The `langchain-google-spanner` package requires that you [enable the Spanner API](https://console.cloud.google.com/flows/enableapi?apiid=spanner.googleapis.com) in your Google Cloud Project."
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "5utKIdq7KYi5",
"metadata": {
"id": "5utKIdq7KYi5"
},
"outputs": [],
"source": [
"# enable Spanner API\n",
"!gcloud services enable spanner.googleapis.com"
]
},
{
"cell_type": "markdown",
"id": "f8f2830ee9ca1e01",
"metadata": {
"id": "f8f2830ee9ca1e01"
},
"source": [
"## Basic Usage"
]
},
{
"cell_type": "markdown",
"id": "OMvzMWRrR6n7",
"metadata": {
"id": "OMvzMWRrR6n7"
},
"source": [
"### Set Spanner database values\n",
"Find your database values, in the [Spanner Instances page](https://console.cloud.google.com/spanner/instances)."
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "irl7eMFnSPZr",
"metadata": {
"id": "irl7eMFnSPZr"
},
"outputs": [],
"source": [
"# @title Set Your Values Here { display-mode: \"form\" }\n",
"INSTANCE = \"my-instance\" # @param {type: \"string\"}\n",
"DATABASE = \"my-database\" # @param {type: \"string\"}\n",
"TABLE_NAME = \"message_store\" # @param {type: \"string\"}"
]
},
{
"cell_type": "markdown",
"id": "qPV8WfWr7O54",
"metadata": {
"id": "qPV8WfWr7O54"
},
"source": [
"### Initialize a table\n",
"The `SpannerChatMessageHistory` class requires a database table with a specific schema in order to store the chat message history.\n",
"\n",
"The helper method `init_chat_history_table()` that can be used to create a table with the proper schema for you."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "TEu4VHArRttE",
"metadata": {
"id": "TEu4VHArRttE"
},
"outputs": [],
"source": [
"from langchain_google_spanner import (\n",
" SpannerChatMessageHistory,\n",
")\n",
"\n",
"SpannerChatMessageHistory.init_chat_history_table(table_name=TABLE_NAME)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### SpannerChatMessageHistory\n",
"\n",
"To initialize the `SpannerChatMessageHistory` class you need to provide only 3 things:\n",
"\n",
"1. `instance_id` - The name of the Spanner instance\n",
"1. `database_id` - The name of the Spanner database\n",
"1. `session_id` - A unique identifier string that specifies an id for the session.\n",
"1. `table_name` - The name of the table within the database to store the chat message history."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"message_history = SpannerChatMessageHistory(\n",
" instance_id=INSTANCE,\n",
" database_id=DATABASE,\n",
" table_name=TABLE_NAME,\n",
" session_id=\"user-session-id\",\n",
")\n",
"\n",
"message_history.add_user_message(\"hi!\")\n",
"message_history.add_ai_message(\"whats up?\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"message_history.messages"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Custom client\n",
"The client created by default is the default client. To use a non-default, a [custom client](https://cloud.google.com/spanner/docs/samples/spanner-create-client-with-query-options#spanner_create_client_with_query_options-python) can be passed to the constructor."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from google.cloud import spanner\n",
"\n",
"custom_client_message_history = SpannerChatMessageHistory(\n",
" instance_id=\"my-instance\",\n",
" database_id=\"my-database\",\n",
" client=spanner.Client(...),\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Cleaning up\n",
"\n",
"When the history of a specific session is obsolete and can be deleted, it can be done the following way.\n",
"Note: Once deleted, the data is no longer stored in Cloud Spanner and is gone forever."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"message_history = SpannerChatMessageHistory(\n",
" instance_id=INSTANCE,\n",
" database_id=DATABASE,\n",
" table_name=TABLE_NAME,\n",
" session_id=\"user-session-id\",\n",
")\n",
"\n",
"message_history.clear()"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.5"
}
},
"nbformat": 4,
"nbformat_minor": 4
}

View File

@@ -180,7 +180,7 @@
"chain_with_history = RunnableWithMessageHistory(\n",
" chain,\n",
" lambda session_id: MongoDBChatMessageHistory(\n",
" session_id=\"test_session\",\n",
" session_id=session_id,\n",
" connection_string=\"mongodb://mongo_user:password123@mongo:27017\",\n",
" database_name=\"my_db\",\n",
" collection_name=\"chat_histories\",\n",

View File

@@ -5,169 +5,39 @@ All functionality related to Anthropic models.
[Anthropic](https://www.anthropic.com/) is an AI safety and research company, and is the creator of Claude.
This page covers all integrations between Anthropic models and LangChain.
## Prompting Overview
## Installation
Claude is chat-based model, meaning it is trained on conversation data.
However, it is a text based API, meaning it takes in single string.
It expects this string to be in a particular format.
This means that it is up the user to ensure that is the case.
LangChain provides several utilities and helper functions to make sure prompts that you write -
whether formatted as a string or as a list of messages - end up formatted correctly.
To use Anthropic models, you will need to install the `langchain-anthropic` package.
You can do this with the following command:
Specifically, Claude is trained to fill in text for the Assistant role as part of an ongoing dialogue
between a human user (`Human:`) and an AI assistant (`Assistant:`). Prompts sent via the API must contain
`\n\nHuman:` and `\n\nAssistant:` as the signals of who's speaking.
The final turn must always be `\n\nAssistant:` - the input string cannot have `\n\nHuman:` as the final role.
```
pip install langchain-anthropic
```
Because Claude is chat-based but accepts a string as input, it can be treated as either a LangChain `ChatModel` or `LLM`.
This means there are two wrappers in LangChain - `ChatAnthropic` and `Anthropic`.
It is generally recommended to use the `ChatAnthropic` wrapper, and format your prompts as `ChatMessage`s (we will show examples of this below).
This is because it keeps your prompt in a general format that you can easily then also use with other models (should you want to).
However, if you want more fine-grained control over the prompt, you can use the `Anthropic` wrapper - we will show and example of this as well.
The `Anthropic` wrapper however is deprecated, as all functionality can be achieved in a more generic way using `ChatAnthropic`.
## Prompting Best Practices
Anthropic models have several prompting best practices compared to OpenAI models.
**No System Messages**
Anthropic models are not trained on the concept of a "system message".
We have worked with the Anthropic team to handle them somewhat appropriately (a Human message with an `admin` tag)
but this is largely a hack and it is recommended that you do not use system messages.
**AI Messages Can Continue**
A completion from Claude is a continuation of the last text in the string which allows you further control over Claude's output.
For example, putting words in Claude's mouth in a prompt like this:
`\n\nHuman: Tell me a joke about bears\n\nAssistant: What do you call a bear with no teeth?`
This will return a completion like this `A gummy bear!` instead of a whole new assistant message with a different random bear joke.
## Environment Setup
To use Anthropic models, you will need to set the `ANTHROPIC_API_KEY` environment variable.
You can get an Anthropic API key [here](https://console.anthropic.com/settings/keys)
## `ChatAnthropic`
`ChatAnthropic` is a subclass of LangChain's `ChatModel`, meaning it works best with `ChatPromptTemplate`.
`ChatAnthropic` is a subclass of LangChain's `ChatModel`.
You can import this wrapper with the following code:
```
from langchain_community.chat_models import ChatAnthropic
model = ChatAnthropic()
from langchain_anthropic import ChatAnthropic
model = ChatAnthropic(model='claude-2.1')
```
When working with ChatModels, it is preferred that you design your prompts as `ChatPromptTemplate`s.
Here is an example below of doing that:
Read more in the [ChatAnthropic documentation](/docs/integrations/chat/anthropic).
```
from langchain_core.prompts import ChatPromptTemplate
## `AnthropicLLM`
prompt = ChatPromptTemplate.from_messages([
("system", "You are a helpful chatbot"),
("human", "Tell me a joke about {topic}"),
])
```
`AnthropicLLM` is a subclass of LangChain's `LLM`. It is a wrapper around Anthropic's
text-based completion endpoints.
You can then use this in a chain as follows:
```python
from langchain_anthropic import AnthropicLLM
```
chain = prompt | model
chain.invoke({"topic": "bears"})
```
How is the prompt actually being formatted under the hood? We can see that by running the following code
```
prompt_value = prompt.format_prompt(topic="bears")
model.convert_prompt(prompt_value)
```
This produces the following formatted string:
```
'\n\nYou are a helpful chatbot\n\nHuman: Tell me a joke about bears\n\nAssistant:'
```
We can see that under the hood LangChain is not appending any prefix/suffix to `SystemMessage`'s. This is because Anthropic has no concept of `SystemMessage`.
Anthropic requires all prompts to end with assistant messages. This means if the last message is not an assistant message, the suffix `Assistant:` will automatically be inserted.
If you decide instead to use a normal PromptTemplate (one that just works on a single string) let's take a look at
what happens:
```
from langchain.prompts import PromptTemplate
prompt = PromptTemplate.from_template("Tell me a joke about {topic}")
prompt_value = prompt.format_prompt(topic="bears")
model.convert_prompt(prompt_value)
```
This produces the following formatted string:
```
'\n\nHuman: Tell me a joke about bears\n\nAssistant:'
```
We can see that it automatically adds the Human and Assistant tags.
What is happening under the hood?
First: the string gets converted to a single human message. This happens generically (because we are using a subclass of `ChatModel`).
Then, similarly to the above example, an empty Assistant message is getting appended.
This is Anthropic specific.
## [Deprecated] `Anthropic`
This `Anthropic` wrapper is subclassed from `LLM`.
We can import it with:
```
from langchain_community.llms import Anthropic
model = Anthropic()
```
This model class is designed to work with normal PromptTemplates. An example of that is below:
```
prompt = PromptTemplate.from_template("Tell me a joke about {topic}")
chain = prompt | model
chain.invoke({"topic": "bears"})
```
Let's see what is going on with the prompt templating under the hood!
```
prompt_value = prompt.format_prompt(topic="bears")
model.convert_prompt(prompt_value)
```
This outputs the following
```
'\n\nHuman: Tell me a joke about bears\n\nAssistant: Sure, here you go:\n'
```
Notice that it adds the Human tag at the start of the string, and then finishes it with `\n\nAssistant: Sure, here you go:`.
The extra `Sure, here you go` was added on purpose by the Anthropic team.
What happens if we have those symbols in the prompt directly?
```
prompt = PromptTemplate.from_template("Human: Tell me a joke about {topic}")
prompt_value = prompt.format_prompt(topic="bears")
model.convert_prompt(prompt_value)
```
This outputs:
```
'\n\nHuman: Tell me a joke about bears'
```
We can see that we detect that the user is trying to use the special tokens, and so we don't do any formatting.
## `ChatAnthropicMessages` (Beta)
`ChatAnthropicMessages` uses the beta release of Anthropic's new Messages API.
You can use it from the `langchain-anthropic` package, which you can install with `pip install langchain-anthropic`.
For more information, see the [ChatAnthropicMessages docs](../chat/anthropic#chatanthropicmessages)
model = AnthropicLLM(model='claude-2.1')
```

View File

@@ -2,6 +2,56 @@
All functionality related to [Google Cloud Platform](https://cloud.google.com/) and other `Google` products.
## LLMs
### Google Generative AI
Access GoogleAI `Gemini` models such as `gemini-pro` and `gemini-pro-vision` through the `GoogleGenerativeAI` class.
Install python package.
```bash
pip install langchain-google-genai
```
See a [usage example](/docs/integrations/llms/google_ai).
```python
from langchain_google_genai import GoogleGenerativeAI
```
### Vertex AI
Access to `Gemini` and `PaLM` LLMs (like `text-bison` and `code-bison`) via `Vertex AI` on Google Cloud.
We need to install `langchain-google-vertexai` python package.
```bash
pip install langchain-google-vertexai
```
See a [usage example](/docs/integrations/llms/google_vertex_ai_palm).
```python
from langchain_google_vertexai import VertexAI
```
### Model Garden
Access PaLM and hundreds of OSS models via `Vertex AI Model Garden` on Google Cloud.
We need to install `langchain-google-vertexai` python package.
```bash
pip install langchain-google-vertexai
```
See a [usage example](/docs/integrations/llms/google_vertex_ai_palm#vertex-model-garden).
```python
from langchain_google_vertexai import VertexAIModelGarden
```
## Chat models
### Google Generative AI
@@ -69,61 +119,372 @@ See a [usage example](/docs/integrations/chat/google_vertex_ai_palm).
from langchain_google_vertexai import ChatVertexAI
```
## LLMs
## Document Loaders
### Google Generative AI
### AlloyDB for PostgreSQL
Access GoogleAI `Gemini` models such as `gemini-pro` and `gemini-pro-vision` through the `GoogleGenerativeAI` class.
> [Google Cloud AlloyDB](https://cloud.google.com/alloydb) is a fully managed relational database service that offers high performance, seamless integration, and impressive scalability on Google Cloud. AlloyDB is 100% compatible with PostgreSQL.
Install python package.
Install the python package:
```bash
pip install langchain-google-genai
pip install langchain-google-alloydb-pg
```
See a [usage example](/docs/integrations/llms/google_ai).
See [usage example](/docs/integrations/document_loaders/google_alloydb).
```python
from langchain_google_genai import GoogleGenerativeAI
from langchain_google_alloydb_pg import AlloyDBEngine, AlloyDBLoader
```
### Vertex AI
### BigQuery
Access to `Gemini` and `PaLM` LLMs (like `text-bison` and `code-bison`) via `Vertex AI` on Google Cloud.
> [Google Cloud BigQuery](https://cloud.google.com/bigquery) is a serverless and cost-effective enterprise data warehouse that works across clouds and scales with your data in Google Cloud.
We need to install `langchain-google-vertexai` python package.
We need to install `google-cloud-bigquery` python package.
```bash
pip install langchain-google-vertexai
pip install google-cloud-bigquery
```
See a [usage example](/docs/integrations/llms/google_vertex_ai_palm).
See a [usage example](/docs/integrations/document_loaders/google_bigquery).
```python
from langchain_google_vertexai import VertexAI
from langchain_community.document_loaders import BigQueryLoader
```
### Model Garden
### Bigtable
Access PaLM and hundreds of OSS models via `Vertex AI Model Garden` on Google Cloud.
We need to install `langchain-google-vertexai` python package.
> [Google Cloud Bigtable](https://cloud.google.com/bigtable/docs) is Google's fully managed NoSQL Big Data database service in Google Cloud.
Install the python package:
```bash
pip install langchain-google-vertexai
pip install langchain-google-bigtable
```
See a [usage example](/docs/integrations/llms/google_vertex_ai_palm#vertex-model-garden).
See [Googel Cloud usage example](/docs/integrations/document_loaders/google_bigtable).
```python
from langchain_google_vertexai import VertexAIModelGarden
from langchain_google_bigtable import BigtableLoader
```
### Cloud SQL for MySQL
> [Google Cloud SQL for MySQL](https://cloud.google.com/sql) is a fully-managed database service that helps you set up, maintain, manage, and administer your MySQL relational databases on Google Cloud.
Install the python package:
```bash
pip install langchain-google-cloud-sql-mysql
```
See [usage example](/docs/integrations/document_loaders/google_cloud_sql_mysql).
```python
from langchain_google_cloud_sql_mysql import MySQLEngine, MySQLDocumentLoader
```
### Cloud SQL for SQL Server
> [Google Cloud SQL for SQL Server](https://cloud.google.com/sql) is a fully-managed database service that helps you set up, maintain, manage, and administer your SQL Server databases on Google Cloud.
Install the python package:
```bash
pip install langchain-google-cloud-sql-mssql
```
See [usage example](/docs/integrations/document_loaders/google_cloud_sql_mssql).
```python
from langchain_google_cloud_sql_mssql import MSSQLEngine, MSSQLLoader
```
### Cloud SQL for PostgreSQL
> [Google Cloud SQL for PostgreSQL](https://cloud.google.com/sql) is a fully-managed database service that helps you set up, maintain, manage, and administer your PostgreSQL relational databases on Google Cloud.
Install the python package:
```bash
pip install langchain-google-cloud-sql-pg
```
See [usage example](/docs/integrations/document_loaders/google_cloud_sql_pg).
```python
from langchain_google_cloud_sql_pg import PostgreSQLEngine, PostgreSQLLoader
```
### Cloud Storage
>[Cloud Storage](https://en.wikipedia.org/wiki/Google_Cloud_Storage) is a managed service for storing unstructured data in Google Cloud.
We need to install `google-cloud-storage` python package.
```bash
pip install google-cloud-storage
```
There are two loaders for the `Google Cloud Storage`: the `Directory` and the `File` loaders.
See a [usage example](/docs/integrations/document_loaders/google_cloud_storage_directory).
```python
from langchain_community.document_loaders import GCSDirectoryLoader
```
See a [usage example](/docs/integrations/document_loaders/google_cloud_storage_file).
```python
from langchain_community.document_loaders import GCSFileLoader
```
### Google Drive
>[Google Drive](https://en.wikipedia.org/wiki/Google_Drive) is a file storage and synchronization service developed by Google.
Currently, only `Google Docs` are supported.
We need to install several python packages.
```bash
pip install google-api-python-client google-auth-httplib2 google-auth-oauthlib
```
See a [usage example and authorization instructions](/docs/integrations/document_loaders/google_drive).
```python
from langchain_community.document_loaders import GoogleDriveLoader
```
### Firestore (Native Mode)
> [Google Cloud Firestore](https://cloud.google.com/firestore/docs/) is a NoSQL document database built for automatic scaling, high performance, and ease of application development.
Install the python package:
```bash
pip install langchain-google-firestore
```
See [usage example](/docs/integrations/document_loaders/google_firestore).
```python
from langchain_google_firestore import FirestoreLoader
```
### Firestore (Datastore Mode)
> [Google Cloud Firestore in Datastore mode](https://cloud.google.com/datastore/docs) is a NoSQL document database built for automatic scaling, high performance, and ease of application development.
> Firestore is the newest version of Datastore and introduces several improvements over Datastore.
Install the python package:
```bash
pip install langchain-google-datastore
```
See [usage example](/docs/integrations/document_loaders/google_datastore).
```python
from langchain_google_datastore import DatastoreLoader
```
### Memorystore for Redis
> [Google Cloud Memorystore for Redis](https://cloud.google.com/memorystore/docs/redis) is a fully managed Redis service for Google Cloud. Applications running on Google Cloud can achieve extreme performance by leveraging the highly scalable, available, secure Redis service without the burden of managing complex Redis deployments.
Install the python package:
```bash
pip install langchain-google-memorystore-redis
```
See [usage example](/docs/integrations/document_loaders/google_memorystore_redis).
```python
from langchain_google_memorystore_redis import MemorystoreLoader
```
### Spanner
> [Google Cloud Spanner](https://cloud.google.com/spanner/docs) is a fully managed, mission-critical, relational database service on Google Cloud that offers transactional consistency at global scale, automatic, synchronous replication for high availability, and support for two SQL dialects: GoogleSQL (ANSI 2011 with extensions) and PostgreSQL.
Install the python package:
```bash
pip install langchain-google-spanner
```
See [usage example](/docs/integrations/document_loaders/google_spanner).
```python
from langchain_google_spanner import SpannerLoader
```
## El Carro Oracle Operator
> Google [El Carro Oracle Operator](https://github.com/GoogleCloudPlatform/elcarro-oracle-operator)
offers a way to run Oracle databases in Kubernetes as a portable, open source,
community driven, no vendor lock-in container orchestration system.
```bash
pip install langchain-google-el-carro
```
See [usage example](/docs/integrations/document_loaders/google_el_carro).
```python
from langchain_google_el_carro import ElCarroLoader
```
### Speech-to-Text
> [Google Cloud Speech-to-Text](https://cloud.google.com/speech-to-text) is an audio transcription API powered by Google's speech recognition models in Google Cloud.
This document loader transcribes audio files and outputs the text results as Documents.
First, we need to install the python package.
```bash
pip install google-cloud-speech
```
See a [usage example and authorization instructions](/docs/integrations/document_loaders/google_speech_to_text).
```python
from langchain_community.document_loaders import GoogleSpeechToTextLoader
```
## Document Transformers
### Document AI
>[Google Cloud Document AI](https://cloud.google.com/document-ai/docs/overview) is a Google Cloud
> service that transforms unstructured data from documents into structured data, making it easier
> to understand, analyze, and consume.
We need to set up a [`GCS` bucket and create your own OCR processor](https://cloud.google.com/document-ai/docs/create-processor)
The `GCS_OUTPUT_PATH` should be a path to a folder on GCS (starting with `gs://`)
and a processor name should look like `projects/PROJECT_NUMBER/locations/LOCATION/processors/PROCESSOR_ID`.
We can get it either programmatically or copy from the `Prediction endpoint` section of the `Processor details`
tab in the Google Cloud Console.
```bash
pip install google-cloud-documentai
pip install google-cloud-documentai-toolbox
```
See a [usage example](/docs/integrations/document_transformers/google_docai).
```python
from langchain_community.document_loaders.blob_loaders import Blob
from langchain_community.document_loaders.parsers import DocAIParser
```
### Google Translate
> [Google Translate](https://translate.google.com/) is a multilingual neural machine
> translation service developed by Google to translate text, documents and websites
> from one language into another.
The `GoogleTranslateTransformer` allows you to translate text and HTML with the [Google Cloud Translation API](https://cloud.google.com/translate).
To use it, you should have the `google-cloud-translate` python package installed, and a Google Cloud project with the [Translation API enabled](https://cloud.google.com/translate/docs/setup). This transformer uses the [Advanced edition (v3)](https://cloud.google.com/translate/docs/intro-to-v3).
First, we need to install the python package.
```bash
pip install google-cloud-translate
```
See a [usage example and authorization instructions](/docs/integrations/document_transformers/google_translate).
```python
from langchain_community.document_transformers import GoogleTranslateTransformer
```
## Vector Stores
### AlloyDB for PostgreSQL
> [Google Cloud AlloyDB](https://cloud.google.com/alloydb) is a fully managed relational database service that offers high performance, seamless integration, and impressive scalability on Google Cloud. AlloyDB is 100% compatible with PostgreSQL.
Install the python package:
```bash
pip install langchain-google-alloydb-pg
```
See [usage example](/docs/integrations/vectorstores/google_alloydb).
```python
from langchain_google_alloydb_pg import AlloyDBEngine, AlloyDBVectorStore
```
### BigQuery Vector Search
> [Google Cloud BigQuery](https://cloud.google.com/bigquery),
> BigQuery is a serverless and cost-effective enterprise data warehouse in Google Cloud.
>
> [Google Cloud BigQuery Vector Search](https://cloud.google.com/bigquery/docs/vector-search-intro)
> BigQuery vector search lets you use GoogleSQL to do semantic search, using vector indexes for fast but approximate results, or using brute force for exact results.
> It can calculate Euclidean or Cosine distance. With LangChain, we default to use Euclidean distance.
We need to install several python packages.
```bash
pip install google-cloud-bigquery
```
See a [usage example](/docs/integrations/vectorstores/google_bigquery_vector_search).
```python
from langchain.vectorstores import BigQueryVectorSearch
```
### Memorystore for Redis
> [Google Cloud Memorystore for Redis](https://cloud.google.com/memorystore/docs/redis) is a fully managed Redis service for Google Cloud. Applications running on Google Cloud can achieve extreme performance by leveraging the highly scalable, available, secure Redis service without the burden of managing complex Redis deployments.
Install the python package:
```bash
pip install langchain-google-memorystore-redis
```
See [usage example](/docs/integrations/vectorstores/google_memorystore_redis).
```python
from langchain_google_memorystore_redis import RedisVectorStore
```
### Spanner
> [Google Cloud Spanner](https://cloud.google.com/spanner/docs) is a fully managed, mission-critical, relational database service on Google Cloud that offers transactional consistency at global scale, automatic, synchronous replication for high availability, and support for two SQL dialects: GoogleSQL (ANSI 2011 with extensions) and PostgreSQL.
Install the python package:
```bash
pip install langchain-google-spanner
```
See [usage example](/docs/integrations/vectorstores/google_spanner).
```python
from langchain_google_spanner import SpannerVectorStore
```
### Cloud SQL for PostgreSQL
> [Google Cloud SQL for PostgreSQL](https://cloud.google.com/sql) is a fully-managed database service that helps you set up, maintain, manage, and administer your PostgreSQL relational databases on Google Cloud.
Install the python package:
```bash
pip install langchain-google-cloud-sql-pg
```
See [usage example](/docs/integrations/vectorstores/google_sql_pg).
```python
from langchain_google_cloud_sql_pg import PostgreSQLEngine, PostgresVectorStore
```
### Vertex AI Vector Search
> [Vertex AI Vector Search](https://cloud.google.com/vertex-ai/docs/matching-engine/overview) from Google Cloud,
> [Google Cloud Vertex AI Vector Search](https://cloud.google.com/vertex-ai/docs/matching-engine/overview) from Google Cloud,
> formerly known as `Vertex AI Matching Engine`, provides the industry's leading high-scale
> low latency vector database. These vector databases are commonly
> referred to as vector similarity-matching or an approximate nearest neighbor (ANN) service.
@@ -140,29 +501,7 @@ See a [usage example](/docs/integrations/vectorstores/google_vertex_ai_vector_se
from langchain_community.vectorstores import MatchingEngine
```
### BigQuery
> [BigQuery](https://cloud.google.com/bigquery),
> BigQuery is a serverless and cost-effective enterprise data warehouse in Google Cloud.
>
> [BigQuery Vector Search](https://cloud.google.com/bigquery/docs/vector-search-intro)
> BigQuery vector search lets you use GoogleSQL to do semantic search, using vector indexes for fast but approximate results, or using brute force for exact results.
> It can calculate Euclidean or Cosine distance. With LangChain, we default to use Euclidean distance.
We need to install several python packages.
```bash
pip install google-cloud-bigquery
```
See a [usage example](/docs/integrations/vectorstores/bigquery_vector_search).
```python
from langchain.vectorstores import BigQueryVectorSearch
```
### Google ScaNN
### ScaNN
>[Google ScaNN](https://github.com/google-research/google-research/tree/master/scann)
> (Scalable Nearest Neighbors) is a python package.
@@ -237,136 +576,11 @@ documents = docai_wh_retriever.get_relevant_documents(
)
```
## Document Loaders
### BigQuery
> [BigQuery](https://cloud.google.com/bigquery) is a serverless and cost-effective enterprise data warehouse that works across clouds and scales with your data in Google Cloud.
We need to install `google-cloud-bigquery` python package.
```bash
pip install google-cloud-bigquery
```
See a [usage example](/docs/integrations/document_loaders/google_bigquery).
```python
from langchain_community.document_loaders import BigQueryLoader
```
### Cloud Storage
>[Cloud Storage](https://en.wikipedia.org/wiki/Google_Cloud_Storage) is a managed service for storing unstructured data in Google Cloud.
We need to install `google-cloud-storage` python package.
```bash
pip install google-cloud-storage
```
There are two loaders for the `Google Cloud Storage`: the `Directory` and the `File` loaders.
See a [usage example](/docs/integrations/document_loaders/google_cloud_storage_directory).
```python
from langchain_community.document_loaders import GCSDirectoryLoader
```
See a [usage example](/docs/integrations/document_loaders/google_cloud_storage_file).
```python
from langchain_community.document_loaders import GCSFileLoader
```
### Google Drive
>[Google Drive](https://en.wikipedia.org/wiki/Google_Drive) is a file storage and synchronization service developed by Google.
Currently, only `Google Docs` are supported.
We need to install several python packages.
```bash
pip install google-api-python-client google-auth-httplib2 google-auth-oauthlib
```
See a [usage example and authorization instructions](/docs/integrations/document_loaders/google_drive).
```python
from langchain_community.document_loaders import GoogleDriveLoader
```
### Speech-to-Text
> [Speech-to-Text](https://cloud.google.com/speech-to-text) is an audio transcription API powered by Google's speech recognition models in Google Cloud.
This document loader transcribes audio files and outputs the text results as Documents.
First, we need to install the python package.
```bash
pip install google-cloud-speech
```
See a [usage example and authorization instructions](/docs/integrations/document_loaders/google_speech_to_text).
```python
from langchain_community.document_loaders import GoogleSpeechToTextLoader
```
## Document Transformers
### Document AI
>[Document AI](https://cloud.google.com/document-ai/docs/overview) is a Google Cloud
> service that transforms unstructured data from documents into structured data, making it easier
> to understand, analyze, and consume.
We need to set up a [`GCS` bucket and create your own OCR processor](https://cloud.google.com/document-ai/docs/create-processor)
The `GCS_OUTPUT_PATH` should be a path to a folder on GCS (starting with `gs://`)
and a processor name should look like `projects/PROJECT_NUMBER/locations/LOCATION/processors/PROCESSOR_ID`.
We can get it either programmatically or copy from the `Prediction endpoint` section of the `Processor details`
tab in the Google Cloud Console.
```bash
pip install google-cloud-documentai
pip install google-cloud-documentai-toolbox
```
See a [usage example](/docs/integrations/document_transformers/docai).
```python
from langchain_community.document_loaders.blob_loaders import Blob
from langchain_community.document_loaders.parsers import DocAIParser
```
### Google Translate
> [Google Translate](https://translate.google.com/) is a multilingual neural machine
> translation service developed by Google to translate text, documents and websites
> from one language into another.
The `GoogleTranslateTransformer` allows you to translate text and HTML with the [Google Cloud Translation API](https://cloud.google.com/translate).
To use it, you should have the `google-cloud-translate` python package installed, and a Google Cloud project with the [Translation API enabled](https://cloud.google.com/translate/docs/setup). This transformer uses the [Advanced edition (v3)](https://cloud.google.com/translate/docs/intro-to-v3).
First, we need to install the python package.
```bash
pip install google-cloud-translate
```
See a [usage example and authorization instructions](/docs/integrations/document_transformers/google_translate).
```python
from langchain_community.document_transformers import GoogleTranslateTransformer
```
## Tools
### Text-to-Speech
>[Text-to-Speech](https://cloud.google.com/text-to-speech) is a Google Cloud service that enables developers to
>[Google Cloud Text-to-Speech](https://cloud.google.com/text-to-speech) is a Google Cloud service that enables developers to
> synthesize natural-sounding speech with 100+ voices, available in multiple languages and variants.
> It applies DeepMinds groundbreaking research in WaveNet and Googles powerful neural networks
> to deliver the highest fidelity possible.
@@ -398,39 +612,6 @@ from langchain_community.utilities.google_drive import GoogleDriveAPIWrapper
from langchain_community.tools.google_drive.tool import GoogleDriveSearchTool
```
### Google Places
We need to install a python package.
```bash
pip install googlemaps
```
See a [usage example and authorization instructions](/docs/integrations/tools/google_places).
```python
from langchain.tools import GooglePlacesTool
```
### Google Search
- Set up a Custom Search Engine, following [these instructions](https://stackoverflow.com/questions/37083058/programmatically-searching-google-in-python-using-custom-search)
- Get an API Key and Custom Search Engine ID from the previous step, and set them as environment variables
`GOOGLE_API_KEY` and `GOOGLE_CSE_ID` respectively.
```python
from langchain_community.utilities import GoogleSearchAPIWrapper
```
For a more detailed walkthrough of this wrapper, see [this notebook](/docs/integrations/tools/google_search).
We can easily load this wrapper as a Tool (to use with an Agent). We can do this with:
```python
from langchain.agents import load_tools
tools = load_tools(["google-search"])
```
### Google Finance
We need to install a python package.
@@ -470,6 +651,20 @@ from langchain_community.tools.google_lens import GoogleLensQueryRun
from langchain_community.utilities.google_lens import GoogleLensAPIWrapper
```
### Google Places
We need to install a python package.
```bash
pip install googlemaps
```
See a [usage example and authorization instructions](/docs/integrations/tools/google_places).
```python
from langchain.tools import GooglePlacesTool
```
### Google Scholar
We need to install a python package.
@@ -485,6 +680,25 @@ from langchain_community.tools.google_scholar import GoogleScholarQueryRun
from langchain_community.utilities.google_scholar import GoogleScholarAPIWrapper
```
### Google Search
- Set up a Custom Search Engine, following [these instructions](https://stackoverflow.com/questions/37083058/programmatically-searching-google-in-python-using-custom-search)
- Get an API Key and Custom Search Engine ID from the previous step, and set them as environment variables
`GOOGLE_API_KEY` and `GOOGLE_CSE_ID` respectively.
```python
from langchain_community.utilities import GoogleSearchAPIWrapper
```
For a more detailed walkthrough of this wrapper, see [this notebook](/docs/integrations/tools/google_search).
We can easily load this wrapper as a Tool (to use with an Agent). We can do this with:
```python
from langchain.agents import load_tools
tools = load_tools(["google-search"])
```
### Google Trends
We need to install a python package.
@@ -504,7 +718,7 @@ from langchain_community.utilities.google_trends import GoogleTrendsAPIWrapper
### GMail
> [Gmail](https://en.wikipedia.org/wiki/Gmail) is a free email service provided by Google.
> [Google Gmail](https://en.wikipedia.org/wiki/Gmail) is a free email service provided by Google.
This toolkit works with emails through the `Gmail API`.
We need to install several python packages.
@@ -521,20 +735,158 @@ from langchain_community.agent_toolkits import GmailToolkit
## Memory
### Firestore
### AlloyDB for PostgreSQL
> [`Firestore`](https://cloud.google.com/firestore) is a NoSQL document database built for automatic scaling, high performance, and ease of application development in Google Cloud.
> [AlloyDB](https://cloud.google.com/alloydb) is a fully managed relational database service that offers high performance, seamless integration, and impressive scalability on Google Cloud. AlloyDB is 100% compatible with PostgreSQL.
First, we need to install the python package.
Install the python package:
```bash
pip install firebase-admin
pip install langchain-google-alloydb-pg
```
See a [usage example and authorization instructions](/docs/integrations/memory/firestore_chat_message_history).
See [usage example](/docs/integrations/memory/google_alloydb).
```python
from langchain_community.chat_message_histories.firestore import FirestoreChatMessageHistory
from langchain_google_alloydb_pg import AlloyDBEngine, AlloyDBChatMessageHistory
```
### Cloud SQL for PostgreSQL
> [Cloud SQL for PostgreSQL](https://cloud.google.com/sql) is a fully-managed database service that helps you set up, maintain, manage, and administer your PostgreSQL relational databases on Google Cloud.
Install the python package:
```bash
pip install langchain-google-cloud-sql-pg
```
See [usage example](/docs/integrations/memory/google_cloud_sql_pg).
```python
from langchain_google_cloud_sql_pg import PostgreSQLEngine, PostgreSQLChatMessageHistory
```
### Cloud SQL for MySQL
> [Cloud SQL for MySQL](https://cloud.google.com/sql) is a fully-managed database service that helps you set up, maintain, manage, and administer your MySQL relational databases on Google Cloud.
Install the python package:
```bash
pip install langchain-google-cloud-sql-mysql
```
See [usage example](/docs/integrations/memory/google_cloud_sql_mysql).
```python
from langchain_google_cloud_sql_mysql import MySQLEngine, MySQLChatMessageHistory
```
### Cloud SQL for SQL Server
> [Cloud SQL for SQL Server](https://cloud.google.com/sql) is a fully-managed database service that helps you set up, maintain, manage, and administer your SQL Server databases on Google Cloud.
Install the python package:
```bash
pip install langchain-google-cloud-sql-mssql
```
See [usage example](/docs/integrations/memory/google_cloud_sql_mssql).
```python
from langchain_google_cloud_sql_mssql import MSSQLEngine, MSSQLChatMessageHistory
```
### Spanner
> [Google Cloud Spanner](https://cloud.google.com/spanner/docs) is a fully managed, mission-critical, relational database service on Google Cloud that offers transactional consistency at global scale, automatic, synchronous replication for high availability, and support for two SQL dialects: GoogleSQL (ANSI 2011 with extensions) and PostgreSQL.
Install the python package:
```bash
pip install langchain-google-spanner
```
See [usage example](/docs/integrations/memory/google_spanner).
```python
from langchain_google_spanner import SpannerChatMessageHistory
```
### Memorystore for Redis
> [Google Cloud Memorystore for Redis](https://cloud.google.com/memorystore/docs/redis) is a fully managed Redis service for Google Cloud. Applications running on Google Cloud can achieve extreme performance by leveraging the highly scalable, available, secure Redis service without the burden of managing complex Redis deployments.
Install the python package:
```bash
pip install langchain-google-memorystore-redis
```
See [usage example](/docs/integrations/document_loaders/google_memorystore_redis).
```python
from langchain_google_memorystore_redis import MemorystoreChatMessageHistory
```
### Bigtable
> [Google Cloud Bigtable](https://cloud.google.com/bigtable/docs) is Google's fully managed NoSQL Big Data database service in Google Cloud.
Install the python package:
```bash
pip install langchain-google-bigtable
```
See [usage example](/docs/integrations/memory/google_bigtable).
```python
from langchain_google_bigtable import BigtableChatMessageHistory
```
### Firestore (Native Mode)
> [Google Cloud Firestore](https://cloud.google.com/firestore/docs/) is a NoSQL document database built for automatic scaling, high performance, and ease of application development.
Install the python package:
```bash
pip install langchain-google-firestore
```
See [usage example](/docs/integrations/memory/google_firestore).
```python
from langchain_google_firestore import FirestoreChatMessageHistory
```
### Firestore (Datastore Mode)
> [Google Cloud Firestore in Datastore mode](https://cloud.google.com/datastore/docs) is a NoSQL document database built for automatic scaling, high performance, and ease of application development.
> Firestore is the newest version of Datastore and introduces several improvements over Datastore.
Install the python package:
```bash
pip install langchain-google-datastore
```
See [usage example](/docs/integrations/memory/google_datastore).
```python
from langchain_google_datastore import DatastoreChatMessageHistory
```
## El Carro Oracle Operator
> Google [El Carro Oracle Operator](https://github.com/GoogleCloudPlatform/elcarro-oracle-operator)
offers a way to run Oracle databases in Kubernetes as a portable, open source,
community driven, no vendor lock-in container orchestration system.
```bash
pip install langchain-google-el-carro
```
See [usage example](/docs/integrations/memory/google_el_carro).
```python
from langchain_google_el_carro import ElCarroChatMessageHistory
```
## Chat Loaders

View File

@@ -15,13 +15,11 @@ These providers have standalone `langchain-{provider}` packages for improved ver
- [Anthropic](/docs/integrations/platforms/anthropic)
- [Astra DB](/docs/integrations/providers/astradb)
- [Exa Search](/docs/integrations/providers/exa_search)
- [Google Generative AI](/docs/integrations/platforms/google)
- [Google Vertex AI](/docs/integrations/platforms/google)
- [Google](/docs/integrations/platforms/google)
- [IBM](/docs/integrations/providers/ibm)
- [MistralAI](/docs/integrations/providers/mistralai)
- [Nomic](/docs/integrations/providers/nomic)
- [Nvidia AI Endpoints](/docs/integrations/providers/nvidia)
- [Nvidia AI](/docs/integrations/providers/nvidia)
- [Nvidia](/docs/integrations/providers/nvidia)
- [OpenAI](/docs/integrations/platforms/openai)
- [Pinecone](/docs/integrations/providers/pinecone)
- [Robocorp](/docs/integrations/providers/robocorp)

View File

@@ -1,35 +1,38 @@
# Activeloop Deep Lake
This page covers how to use the Deep Lake ecosystem within LangChain.
>[Activeloop Deep Lake](https://docs.activeloop.ai/) is a data lake for Deep Learning applications, allowing you to use it
> as a vector store.
## Why Deep Lake?
- More than just a (multi-modal) vector store. You can later use the dataset to fine-tune your own LLM models.
- Not only stores embeddings, but also the original data with automatic version control.
- Truly serverless. Doesn't require another service and can be used with major cloud providers (AWS S3, GCS, etc.)
- Truly serverless. Doesn't require another service and can be used with major cloud providers (`AWS S3`, `GCS`, etc.)
Activeloop Deep Lake supports SelfQuery Retrieval:
`Activeloop Deep Lake` supports `SelfQuery Retrieval`:
[Activeloop Deep Lake Self Query Retrieval](/docs/integrations/retrievers/self_query/activeloop_deeplake_self_query)
## More Resources
1. [Ultimate Guide to LangChain & Deep Lake: Build ChatGPT to Answer Questions on Your Financial Data](https://www.activeloop.ai/resources/ultimate-guide-to-lang-chain-deep-lake-build-chat-gpt-to-answer-questions-on-your-financial-data/)
2. [Twitter the-algorithm codebase analysis with Deep Lake](https://github.com/langchain-ai/langchain/blob/master/cookbook/twitter-the-algorithm-analysis-deeplake.ipynb)
3. Here is [whitepaper](https://www.deeplake.ai/whitepaper) and [academic paper](https://arxiv.org/pdf/2209.10785.pdf) for Deep Lake
4. Here is a set of additional resources available for review: [Deep Lake](https://github.com/activeloopai/deeplake), [Get started](https://docs.activeloop.ai/getting-started) and [Tutorials](https://docs.activeloop.ai/hub-tutorials)
## Installation and Setup
- Install the Python package with `pip install deeplake`
## Wrappers
Install the Python package:
### VectorStore
```bash
pip install deeplake
```
There exists a wrapper around Deep Lake, a data lake for Deep Learning applications, allowing you to use it as a vector store (for now), whether for semantic search or example selection.
To import this vectorstore:
## VectorStore
```python
from langchain_community.vectorstores import DeepLake
```
For a more detailed walkthrough of the Deep Lake wrapper, see [this notebook](/docs/integrations/vectorstores/activeloop_deeplake)
See a [usage example](/docs/integrations/vectorstores/activeloop_deeplake).

View File

@@ -1,16 +1,42 @@
# AI21 Labs
This page covers how to use the AI21 ecosystem within LangChain.
It is broken into two parts: installation and setup, and then references to specific AI21 wrappers.
>[AI21 Labs](https://www.ai21.com/about) is a company specializing in Natural
> Language Processing (NLP), which develops AI systems
> that can understand and generate natural language.
This page covers how to use the `AI21` ecosystem within `LangChain`.
## Installation and Setup
- Get an AI21 api key and set it as an environment variable (`AI21_API_KEY`)
- Install the Python package:
## Wrappers
```bash
pip install langchain-ai21
```
### LLM
## LLMs
See a [usage example](/docs/integrations/llms/ai21).
There exists an AI21 LLM wrapper, which you can access with
```python
from langchain_community.llms import AI21
```
## Chat models
See a [usage example](/docs/integrations/chat/ai21).
```python
from langchain_ai21 import ChatAI21
```
## Embedding models
See a [usage example](/docs/integrations/text_embedding/ai21).
```python
from langchain_ai21 import AI21Embeddings
```

View File

@@ -3,7 +3,41 @@
>[Airbyte](https://github.com/airbytehq/airbyte) is a data integration platform for ELT pipelines from APIs,
> databases & files to warehouses & lakes. It has the largest catalog of ELT connectors to data warehouses and databases.
## Installation and Setup
## [AirbyteLoader](/docs/integrations/document_loaders/airbyte)
This loader is built on top of [PyAirbyte](https://pypi.org/project/airbyte/) for easy setup and use.
### Installation and Setup
```bash
pip install -U langchain-airbyte
```
The integration package doesn't have any global environment variables that need to be
set, but some integrations (e.g. `source-github`) may need credentials passed in.
### Document Loader
`AirbyteLoader` class exposes a single document loader for Airbyte sources.
```python
from langchain_airbyte import AirbyteLoader
loader = AirbyteLoader(
source="source-faker",
stream="users",
config={"count": 100},
)
docs = loader.load()
```
For more information, see the full [AirbyteLoader docs](/docs/integrations/document_loaders/airbyte).
## AirbyteJSONLoader (Deprecated)
This loader is deprecated and should be swapped out for `AirbyteLoader`, which doesn't require any of the docker setup!
### Installation and Setup
This instruction shows how to load any source from `Airbyte` into a local `JSON` file that can be read in as a document.
@@ -20,7 +54,7 @@ Have `docker desktop` installed.
7. Run the connection.
8. To see what files are created, navigate to: `file:///tmp/airbyte_local/`.
## Document Loader
### Document Loader
See a [usage example](/docs/integrations/document_loaders/airbyte_json).

View File

@@ -1,15 +1,31 @@
# AnalyticDB
>[AnalyticDB for PostgreSQL](https://www.alibabacloud.com/help/en/analyticdb-for-postgresql/latest/product-introduction-overview)
> is a massively parallel processing (MPP) data warehousing service
> from [Alibaba Cloud](https://www.alibabacloud.com/)
>that is designed to analyze large volumes of data online.
>`AnalyticDB for PostgreSQL` is developed based on the open-source `Greenplum Database`
> project and is enhanced with in-depth extensions by `Alibaba Cloud`. AnalyticDB
> for PostgreSQL is compatible with the ANSI SQL 2003 syntax and the PostgreSQL and
> Oracle database ecosystems. AnalyticDB for PostgreSQL also supports row store and
> column store. AnalyticDB for PostgreSQL processes petabytes of data offline at a
> high performance level and supports highly concurrent.
This page covers how to use the AnalyticDB ecosystem within LangChain.
### VectorStore
## Installation and Setup
There exists a wrapper around AnalyticDB, allowing you to use it as a vectorstore,
whether for semantic search or example selection.
You need to install the `sqlalchemy` python package.
```bash
pip install sqlalchemy
```
## VectorStore
See a [usage example](/docs/integrations/vectorstores/analyticdb).
To import this vectorstore:
```python
from langchain_community.vectorstores import AnalyticDB
```
For a more detailed walkthrough of the AnalyticDB wrapper, see [this notebook](/docs/integrations/vectorstores/analyticdb)

View File

@@ -1,8 +1,11 @@
# Annoy
> [Annoy](https://github.com/spotify/annoy) (`Approximate Nearest Neighbors Oh Yeah`) is a C++ library with Python bindings to search for points in space that are close to a given query point. It also creates large read-only file-based data structures that are mmapped into memory so that many processes may share the same data.
## Installation and Setup
> [Annoy](https://github.com/spotify/annoy) (`Approximate Nearest Neighbors Oh Yeah`)
> is a C++ library with Python bindings to search for points in space that are
> close to a given query point. It also creates large read-only file-based data
> structures that are mapped into memory so that many processes may share the same data.
## Installation and Setup
```bash
pip install annoy

View File

@@ -3,11 +3,12 @@
>[Apache Doris](https://doris.apache.org/) is a modern data warehouse for real-time analytics.
It delivers lightning-fast analytics on real-time data at scale.
>Usually `Apache Doris` is categorized into OLAP, and it has showed excellent performance in [ClickBench — a Benchmark For Analytical DBMS](https://benchmark.clickhouse.com/). Since it has a super-fast vectorized execution engine, it could also be used as a fast vectordb.
>Usually `Apache Doris` is categorized into OLAP, and it has showed excellent performance
> in [ClickBench — a Benchmark For Analytical DBMS](https://benchmark.clickhouse.com/).
> Since it has a super-fast vectorized execution engine, it could also be used as a fast vectordb.
## Installation and Setup
```bash
pip install pymysql
```

View File

@@ -1,16 +1,13 @@
# Apify
This page covers how to use [Apify](https://apify.com) within LangChain.
## Overview
Apify is a cloud platform for web scraping and data extraction,
which provides an [ecosystem](https://apify.com/store) of more than a thousand
ready-made apps called *Actors* for various scraping, crawling, and extraction use cases.
>[Apify](https://apify.com) is a cloud platform for web scraping and data extraction,
>which provides an [ecosystem](https://apify.com/store) of more than a thousand
>ready-made apps called *Actors* for various scraping, crawling, and extraction use cases.
[![Apify Actors](/img/ApifyActors.png)](https://apify.com/store)
This integration enables you run Actors on the Apify platform and load their results into LangChain to feed your vector
This integration enables you run Actors on the `Apify` platform and load their results into LangChain to feed your vector
indexes with documents and data from the web, e.g. to generate answers from websites with documentation,
blogs, or knowledge bases.
@@ -22,9 +19,7 @@ blogs, or knowledge bases.
an environment variable (`APIFY_API_TOKEN`) or pass it to the `ApifyWrapper` as `apify_api_token` in the constructor.
## Wrappers
### Utility
## Utility
You can use the `ApifyWrapper` to run Actors on the Apify platform.
@@ -35,7 +30,7 @@ from langchain_community.utilities import ApifyWrapper
For a more detailed walkthrough of this wrapper, see [this notebook](/docs/integrations/tools/apify).
### Loader
## Document loader
You can also use our `ApifyDatasetLoader` to get data from Apify dataset.

View File

@@ -1,17 +1,19 @@
# ArangoDB
>[ArangoDB](https://github.com/arangodb/arangodb) is a scalable graph database system to drive value from connected data, faster. Native graphs, an integrated search engine, and JSON support, via a single query language. ArangoDB runs on-prem, in the cloud anywhere.
>[ArangoDB](https://github.com/arangodb/arangodb) is a scalable graph database system to
> drive value from connected data, faster. Native graphs, an integrated search engine, and JSON support, via a single query language. ArangoDB runs on-prem, in the cloud anywhere.
## Dependencies
## Installation and Setup
Install the [ArangoDB Python Driver](https://github.com/ArangoDB-Community/python-arango) package with
```bash
pip install python-arango
```
## Graph QA Chain
Connect your ArangoDB Database with a chat model to get insights on your data.
Connect your `ArangoDB` Database with a chat model to get insights on your data.
See the notebook example [here](/docs/use_cases/graph/graph_arangodb_qa).

View File

@@ -11,31 +11,19 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"[Arthur](https://arthur.ai) is a model monitoring and observability platform.\n",
">[Arthur](https://arthur.ai) is a model monitoring and observability platform.\n",
"\n",
"The following guide shows how to run a registered chat LLM with the Arthur callback handler to automatically log model inferences to Arthur.\n",
"\n",
"If you do not have a model currently onboarded to Arthur, visit our [onboarding guide for generative text models](https://docs.arthur.ai/user-guide/walkthroughs/model-onboarding/generative_text_onboarding.html). For more information about how to use the Arthur SDK, visit our [docs](https://docs.arthur.ai/)."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"id": "y8ku6X96sebl"
},
"outputs": [],
"source": [
"from langchain.callbacks import ArthurCallbackHandler\n",
"from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler\n",
"from langchain_core.messages import HumanMessage\n",
"from langchain_openai import ChatOpenAI"
"If you do not have a model currently onboarded to Arthur, visit our [onboarding guide for generative text models](https://docs.arthur.ai/user-guide/walkthroughs/model-onboarding/generative_text_onboarding.html). For more information about how to use the `Arthur SDK`, visit our [docs](https://docs.arthur.ai/)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Installation and Setup\n",
"\n",
"Place Arthur credentials here"
]
},
@@ -52,6 +40,27 @@
"arthur_model_id = \"your-arthur-model-id-here\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Callback handler"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"id": "y8ku6X96sebl"
},
"outputs": [],
"source": [
"from langchain.callbacks import ArthurCallbackHandler\n",
"from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler\n",
"from langchain_core.messages import HumanMessage\n",
"from langchain_openai import ChatOpenAI"
]
},
{
"cell_type": "markdown",
"metadata": {},
@@ -191,9 +200,9 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.11"
"version": "3.10.12"
}
},
"nbformat": 4,
"nbformat_minor": 1
"nbformat_minor": 4
}

View File

@@ -28,7 +28,7 @@ Learn more in the [example notebook](/docs/integrations/vectorstores/astradb).
## Chat message history
```python
from langchain_community.chat_message_histories import AstraDBChatMessageHistory
from langchain_astradb import AstraDBChatMessageHistory
message_history = AstraDBChatMessageHistory(
session_id="test-session",
api_endpoint="...",

View File

@@ -23,7 +23,7 @@ Elastic Cloud is a managed Elasticsearch service. Signup for a [free trial](http
### Install Client
```bash
pip install elasticsearch
pip install langchain-elasticsearch
```
## Vector Store
@@ -31,7 +31,7 @@ pip install elasticsearch
The vector store is a simple wrapper around Elasticsearch. It provides a simple interface to store and retrieve vectors.
```python
from langchain_community.vectorstores import ElasticsearchStore
from langchain_elasticsearch import ElasticsearchStore
from langchain_community.document_loaders import TextLoader
from langchain.text_splitter import CharacterTextSplitter

View File

@@ -1,17 +1,17 @@
# Fireworks
This page covers how to use [Fireworks](https://app.fireworks.ai/) models within
This page covers how to use [Fireworks](https://fireworks.ai/) models within
Langchain.
## Installation and setup
- Install the Fireworks client library.
- Install the Fireworks integration package.
```
pip install fireworks-ai
pip install langchain-fireworks
```
- Get a Fireworks API key by signing up at [app.fireworks.ai](https://app.fireworks.ai).
- Get a Fireworks API key by signing up at [fireworks.ai](https://fireworks.ai).
- Authenticate by setting the FIREWORKS_API_KEY environment variable.
## Authentication
@@ -33,14 +33,14 @@ There are two ways to authenticate using your Fireworks API key:
## Using the Fireworks LLM module
Fireworks integrates with Langchain through the LLM module. In this example, we
will work the llama-v2-13b-chat model.
will work the mixtral-8x7b-instruct model.
```python
from langchain_community.llms.fireworks import Fireworks
from langchain_fireworks import Fireworks
llm = Fireworks(
fireworks_api_key="<KEY>",
model="accounts/fireworks/models/llama-v2-13b-chat",
model="accounts/fireworks/models/mixtral-8x7b-instruct",
max_tokens=256)
llm("Name 3 sports.")
```

View File

@@ -0,0 +1,435 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "fc0db1bc",
"metadata": {},
"source": [
"# LLMLingua Document Compressor\n",
"\n",
">[LLMLingua](https://github.com/microsoft/LLMLingua) utilizes a compact, well-trained language model (e.g., GPT2-small, LLaMA-7B) to identify and remove non-essential tokens in prompts. This approach enables efficient inference with large language models (LLMs), achieving up to 20x compression with minimal performance loss.\n",
"\n",
"This notebook shows how to use LLMLingua as a document compressor."
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "4f5973bb-7897-4340-a8ce-c3365ee73b2f",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m23.3.2\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m24.0\u001b[0m\n",
"\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpython -m pip install --upgrade pip\u001b[0m\n",
"Note: you may need to restart the kernel to use updated packages.\n"
]
}
],
"source": [
"%pip install --upgrade --quiet llmlingua accelerate"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "6fa3d916",
"metadata": {
"jp-MarkdownHeadingCollapsed": true,
"tags": []
},
"outputs": [],
"source": [
"# Helper function for printing docs\n",
"\n",
"\n",
"def pretty_print_docs(docs):\n",
" print(\n",
" f\"\\n{'-' * 100}\\n\".join(\n",
" [f\"Document {i+1}:\\n\\n\" + d.page_content for i, d in enumerate(docs)]\n",
" )\n",
" )"
]
},
{
"cell_type": "markdown",
"id": "4521f235",
"metadata": {
"jp-MarkdownHeadingCollapsed": true,
"tags": []
},
"source": [
"## Set up the base vector store retriever\n",
"Let's start by initializing a simple vector store retriever and storing the 2023 State of the Union speech (in chunks). We can set up the retriever to retrieve a high number (20) of docs."
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "b7648612",
"metadata": {
"scrolled": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Document 1:\n",
"\n",
"One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n",
"\n",
"And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nations top legal minds, who will continue Justice Breyers legacy of excellence.\n",
"----------------------------------------------------------------------------------------------------\n",
"Document 2:\n",
"\n",
"As I said last year, especially to our younger transgender Americans, I will always have your back as your President, so you can be yourself and reach your God-given potential. \n",
"\n",
"While it often appears that we never agree, that isnt true. I signed 80 bipartisan bills into law last year. From preventing government shutdowns to protecting Asian-Americans from still-too-common hate crimes to reforming military justice.\n",
"----------------------------------------------------------------------------------------------------\n",
"Document 3:\n",
"\n",
"A former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus builder. Since shes been nominated, shes received a broad range of support—from the Fraternal Order of Police to former judges appointed by Democrats and Republicans. \n",
"\n",
"And if we are to advance liberty and justice, we need to secure the Border and fix the immigration system.\n",
"----------------------------------------------------------------------------------------------------\n",
"Document 4:\n",
"\n",
"He met the Ukrainian people. \n",
"\n",
"From President Zelenskyy to every Ukrainian, their fearlessness, their courage, their determination, inspires the world. \n",
"\n",
"Groups of citizens blocking tanks with their bodies. Everyone from students to retirees teachers turned soldiers defending their homeland. \n",
"\n",
"In this struggle as President Zelenskyy said in his speech to the European Parliament “Light will win over darkness.” The Ukrainian Ambassador to the United States is here tonight.\n",
"----------------------------------------------------------------------------------------------------\n",
"Document 5:\n",
"\n",
"But that trickle-down theory led to weaker economic growth, lower wages, bigger deficits, and the widest gap between those at the top and everyone else in nearly a century. \n",
"\n",
"Vice President Harris and I ran for office with a new economic vision for America. \n",
"\n",
"Invest in America. Educate Americans. Grow the workforce. Build the economy from the bottom up \n",
"and the middle out, not from the top down.\n",
"----------------------------------------------------------------------------------------------------\n",
"Document 6:\n",
"\n",
"And tonight, Im announcing that the Justice Department will name a chief prosecutor for pandemic fraud. \n",
"\n",
"By the end of this year, the deficit will be down to less than half what it was before I took office. \n",
"\n",
"The only president ever to cut the deficit by more than one trillion dollars in a single year. \n",
"\n",
"Lowering your costs also means demanding more competition. \n",
"\n",
"Im a capitalist, but capitalism without competition isnt capitalism. \n",
"\n",
"Its exploitation—and it drives up prices.\n",
"----------------------------------------------------------------------------------------------------\n",
"Document 7:\n",
"\n",
"I spoke with their families and told them that we are forever in debt for their sacrifice, and we will carry on their mission to restore the trust and safety every community deserves. \n",
"\n",
"Ive worked on these issues a long time. \n",
"\n",
"I know what works: Investing in crime prevention and community police officers wholl walk the beat, wholl know the neighborhood, and who can restore trust and safety. \n",
"\n",
"So lets not abandon our streets. Or choose between safety and equal justice.\n",
"----------------------------------------------------------------------------------------------------\n",
"Document 8:\n",
"\n",
"As Ive told Xi Jinping, it is never a good bet to bet against the American people. \n",
"\n",
"Well create good jobs for millions of Americans, modernizing roads, airports, ports, and waterways all across America. \n",
"\n",
"And well do it all to withstand the devastating effects of the climate crisis and promote environmental justice.\n",
"----------------------------------------------------------------------------------------------------\n",
"Document 9:\n",
"\n",
"Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and the Cabinet. Justices of the Supreme Court. My fellow Americans. \n",
"\n",
"Last year COVID-19 kept us apart. This year we are finally together again. \n",
"\n",
"Tonight, we meet as Democrats Republicans and Independents. But most importantly as Americans. \n",
"\n",
"With a duty to one another to the American people to the Constitution. \n",
"\n",
"And with an unwavering resolve that freedom will always triumph over tyranny.\n",
"----------------------------------------------------------------------------------------------------\n",
"Document 10:\n",
"\n",
"As Ohio Senator Sherrod Brown says, “Its time to bury the label “Rust Belt.” \n",
"\n",
"Its time. \n",
"\n",
"But with all the bright spots in our economy, record job growth and higher wages, too many families are struggling to keep up with the bills. \n",
"\n",
"Inflation is robbing them of the gains they might otherwise feel. \n",
"\n",
"I get it. Thats why my top priority is getting prices under control.\n",
"----------------------------------------------------------------------------------------------------\n",
"Document 11:\n",
"\n",
"Im also calling on Congress: pass a law to make sure veterans devastated by toxic exposures in Iraq and Afghanistan finally get the benefits and comprehensive health care they deserve. \n",
"\n",
"And fourth, lets end cancer as we know it. \n",
"\n",
"This is personal to me and Jill, to Kamala, and to so many of you. \n",
"\n",
"Cancer is the #2 cause of death in Americasecond only to heart disease.\n",
"----------------------------------------------------------------------------------------------------\n",
"Document 12:\n",
"\n",
"Headaches. Numbness. Dizziness. \n",
"\n",
"A cancer that would put them in a flag-draped coffin. \n",
"\n",
"I know. \n",
"\n",
"One of those soldiers was my son Major Beau Biden. \n",
"\n",
"We dont know for sure if a burn pit was the cause of his brain cancer, or the diseases of so many of our troops. \n",
"\n",
"But Im committed to finding out everything we can. \n",
"\n",
"Committed to military families like Danielle Robinson from Ohio. \n",
"\n",
"The widow of Sergeant First Class Heath Robinson.\n",
"----------------------------------------------------------------------------------------------------\n",
"Document 13:\n",
"\n",
"He will never extinguish their love of freedom. He will never weaken the resolve of the free world. \n",
"\n",
"We meet tonight in an America that has lived through two of the hardest years this nation has ever faced. \n",
"\n",
"The pandemic has been punishing. \n",
"\n",
"And so many families are living paycheck to paycheck, struggling to keep up with the rising cost of food, gas, housing, and so much more. \n",
"\n",
"I understand.\n",
"----------------------------------------------------------------------------------------------------\n",
"Document 14:\n",
"\n",
"When we invest in our workers, when we build the economy from the bottom up and the middle out together, we can do something we havent done in a long time: build a better America. \n",
"\n",
"For more than two years, COVID-19 has impacted every decision in our lives and the life of the nation. \n",
"\n",
"And I know youre tired, frustrated, and exhausted. \n",
"\n",
"But I also know this.\n",
"----------------------------------------------------------------------------------------------------\n",
"Document 15:\n",
"\n",
"My plan to fight inflation will lower your costs and lower the deficit. \n",
"\n",
"17 Nobel laureates in economics say my plan will ease long-term inflationary pressures. Top business leaders and most Americans support my plan. And heres the plan: \n",
"\n",
"First cut the cost of prescription drugs. Just look at insulin. One in ten Americans has diabetes. In Virginia, I met a 13-year-old boy named Joshua Davis.\n",
"----------------------------------------------------------------------------------------------------\n",
"Document 16:\n",
"\n",
"And soon, well strengthen the Violence Against Women Act that I first wrote three decades ago. It is important for us to show the nation that we can come together and do big things. \n",
"\n",
"So tonight Im offering a Unity Agenda for the Nation. Four big things we can do together. \n",
"\n",
"First, beat the opioid epidemic. \n",
"\n",
"There is so much we can do. Increase funding for prevention, treatment, harm reduction, and recovery.\n",
"----------------------------------------------------------------------------------------------------\n",
"Document 17:\n",
"\n",
"My plan will not only lower costs to give families a fair shot, it will lower the deficit. \n",
"\n",
"The previous Administration not only ballooned the deficit with tax cuts for the very wealthy and corporations, it undermined the watchdogs whose job was to keep pandemic relief funds from being wasted. \n",
"\n",
"But in my administration, the watchdogs have been welcomed back. \n",
"\n",
"Were going after the criminals who stole billions in relief money meant for small businesses and millions of Americans.\n",
"----------------------------------------------------------------------------------------------------\n",
"Document 18:\n",
"\n",
"So lets not abandon our streets. Or choose between safety and equal justice. \n",
"\n",
"Lets come together to protect our communities, restore trust, and hold law enforcement accountable. \n",
"\n",
"Thats why the Justice Department required body cameras, banned chokeholds, and restricted no-knock warrants for its officers.\n",
"----------------------------------------------------------------------------------------------------\n",
"Document 19:\n",
"\n",
"I understand. \n",
"\n",
"I remember when my Dad had to leave our home in Scranton, Pennsylvania to find work. I grew up in a family where if the price of food went up, you felt it. \n",
"\n",
"Thats why one of the first things I did as President was fight to pass the American Rescue Plan. \n",
"\n",
"Because people were hurting. We needed to act, and we did. \n",
"\n",
"Few pieces of legislation have done more in a critical moment in our history to lift us out of crisis.\n",
"----------------------------------------------------------------------------------------------------\n",
"Document 20:\n",
"\n",
"And we will, as one people. \n",
"\n",
"One America. \n",
"\n",
"The United States of America. \n",
"\n",
"May God bless you all. May God protect our troops.\n"
]
}
],
"source": [
"from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
"from langchain_community.document_loaders import TextLoader\n",
"from langchain_community.vectorstores import FAISS\n",
"from langchain_openai import OpenAIEmbeddings\n",
"\n",
"documents = TextLoader(\n",
" \"../../modules/state_of_the_union.txt\",\n",
").load()\n",
"\n",
"text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=100)\n",
"texts = text_splitter.split_documents(documents)\n",
"\n",
"embedding = OpenAIEmbeddings(model=\"text-embedding-ada-002\")\n",
"retriever = FAISS.from_documents(texts, embedding).as_retriever(search_kwargs={\"k\": 20})\n",
"\n",
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
"docs = retriever.get_relevant_documents(query)\n",
"pretty_print_docs(docs)"
]
},
{
"cell_type": "markdown",
"id": "0303c7ba",
"metadata": {},
"source": [
"## Doing compression with LLMLingua\n",
"Now lets wrap our base retriever with a `ContextualCompressionRetriever`, using `LLMLinguaCompressor` as a compressor."
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "b83dfedb",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Document 1:\n",
"\n",
"One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nations top legal minds, who will continue Justice Breyers legacy of excellence.\n",
"----------------------------------------------------------------------------------------------------\n",
"Document 2:\n",
"\n",
". Numbness. Dizziness.A that would them in a-draped coffin. I One of those soldiers was my Biden We dont know for sure if a burn pit the cause of brain, or the diseases of so many of our troops But Im committed to finding out everything we can Committed to military families like Danielle Robinson from Ohio The widow of First Robinson.\n",
"----------------------------------------------------------------------------------------------------\n",
"Document 3:\n",
"\n",
"<ref#> let<65> Or between equal Let to protect, restore law accountable why the Justice Department cameras bannedhold and restricted its officers. <\n",
"----------------------------------------------------------------------------------------------------\n",
"Document 4:\n",
"\n",
"<# The Sergeant Class Combat froms widow us toBut burn pits ravaged Heaths lungs and body. \n",
"Danielle says Heath was a fighter to the very end.\n"
]
}
],
"source": [
"from langchain.retrievers import ContextualCompressionRetriever\n",
"from langchain_community.retrievers.document_compressors import LLMLinguaCompressor\n",
"from langchain_openai import ChatOpenAI\n",
"\n",
"llm = ChatOpenAI(temperature=0)\n",
"\n",
"compressor = LLMLinguaCompressor(model_name=\"openai-community/gpt2\", device_map=\"cpu\")\n",
"compression_retriever = ContextualCompressionRetriever(\n",
" base_compressor=compressor, base_retriever=retriever\n",
")\n",
"\n",
"compressed_docs = compression_retriever.get_relevant_documents(\n",
" \"What did the president say about Ketanji Jackson Brown\"\n",
")\n",
"pretty_print_docs(compressed_docs)"
]
},
{
"cell_type": "markdown",
"id": "529f72d3",
"metadata": {},
"source": [
"## QA generation with LLMLingua\n",
"\n",
"We can see what it looks like to use this in the generation step now"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "367dafe0",
"metadata": {},
"outputs": [],
"source": [
"from langchain.chains import RetrievalQA\n",
"\n",
"chain = RetrievalQA.from_chain_type(llm=llm, retriever=compression_retriever)"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "46ee62fc",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'query': 'What did the president say about Ketanji Brown Jackson',\n",
" 'result': \"The President mentioned that Ketanji Brown Jackson is one of the nation's top legal minds and will continue Justice Breyer's legacy of excellence.\"}"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chain.invoke({\"query\": query})"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a7bf3985",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.1"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -60,8 +60,8 @@
"import getpass\n",
"import os\n",
"\n",
"from langchain_community.vectorstores import ElasticsearchStore\n",
"from langchain_core.documents import Document\n",
"from langchain_elasticsearch import ElasticsearchStore\n",
"from langchain_openai import OpenAIEmbeddings\n",
"\n",
"os.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"OpenAI API Key:\")\n",

View File

@@ -77,7 +77,6 @@
"metadata": {},
"outputs": [],
"source": [
"from langchain_community.vectorstores import Pinecone\n",
"from langchain_core.documents import Document\n",
"from langchain_openai import OpenAIEmbeddings\n",
"from langchain_pinecone import PineconeVectorStore\n",

View File

@@ -24,7 +24,7 @@
},
"outputs": [],
"source": [
"!pip -q install elasticsearch langchain"
"!pip -q install langchain-elasticsearch"
]
},
{
@@ -36,7 +36,7 @@
},
"outputs": [],
"source": [
"from langchain_community.embeddings.elasticsearch import ElasticsearchEmbeddings"
"from langchain_elasticsearch import ElasticsearchEmbeddings"
]
},
{

View File

@@ -0,0 +1,149 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "900fbd04-f6aa-4813-868f-1c54e3265385",
"metadata": {},
"source": [
"# LASER Language-Agnostic SEntence Representations Embeddings by Meta AI\n",
"\n",
">[LASER](https://github.com/facebookresearch/LASER/) is a Python library developed by the Meta AI Research team and used for creating multilingual sentence embeddings for over 147 languages as of 2/25/2024 \n",
">- List of supported languages at https://github.com/facebookresearch/flores/blob/main/flores200/README.md#languages-in-flores-200"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "2a773d8d",
"metadata": {},
"source": [
"## Dependencies\n",
"\n",
"To use LaserEmbed with LangChain, install the `laser_encoders` Python package."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "91ea14ce-831d-409a-a88f-30353acdabd1",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"%pip install laser_encoders"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "426f1156",
"metadata": {},
"source": [
"## Imports"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "3f5dc9d7-65e3-4b5b-9086-3327d016cfe0",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"from langchain_community.embeddings.laser import LaserEmbeddings"
]
},
{
"cell_type": "markdown",
"id": "8c77b0bb-2613-4167-a204-14d424b59105",
"metadata": {},
"source": [
"## Instantiating Laser\n",
" \n",
"### Parameters\n",
"- `lang: Optional[str]`\n",
" >If empty will default\n",
" to using a multilingual LASER encoder model (called \"laser2\").\n",
" You can find the list of supported languages and lang_codes [here](https://github.com/facebookresearch/flores/blob/main/flores200/README.md#languages-in-flores-200)\n",
" and [here](https://github.com/facebookresearch/LASER/blob/main/laser_encoders/language_list.py)\n",
"."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6fb585dd",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"# Ex Instantiationz\n",
"embeddings = LaserEmbeddings(lang=\"eng_Latn\")"
]
},
{
"cell_type": "markdown",
"id": "119fbaad-9442-4fff-8214-c5f597bc8e77",
"metadata": {},
"source": [
"## Usage\n",
"\n",
"### Generating document embeddings"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "62920051-cbd2-460d-ba24-0424c1ed395d",
"metadata": {},
"outputs": [],
"source": [
"document_embeddings = embeddings.embed_documents(\n",
" [\"This is a sentence\", \"This is some other sentence\"]\n",
")"
]
},
{
"cell_type": "markdown",
"id": "7fd10d96-baee-468f-a532-b70b16b78d1f",
"metadata": {},
"source": [
"### Generating query embeddings"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9f793bb6-609a-4a4a-a5c7-8e8597228915",
"metadata": {},
"outputs": [],
"source": [
"query_embeddings = embeddings.embed_query(\"This is a query\")"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.12"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

File diff suppressed because one or more lines are too long

View File

@@ -8,15 +8,16 @@
"source": [
"# Azure AI Search\n",
"\n",
"[Azure AI Search](https://learn.microsoft.com/azure/search/search-what-is-azure-search) (formerly known as `Azure Search` and `Azure Cognitive Search`) is a cloud search service that gives developers infrastructure, APIs, and tools for building a rich search experience over private, heterogeneous content in web, mobile, and enterprise applications.\n",
"\n"
"[Azure AI Search](https://learn.microsoft.com/azure/search/search-what-is-azure-search) (formerly known as `Azure Search` and `Azure Cognitive Search`) is a cloud search service that gives developers infrastructure, APIs, and tools for information retrieval of vector, keyword, and hybrid queries at scale.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Install Azure AI Search SDK"
"## Install Azure AI Search SDK\n",
"\n",
"Use azure-search-documents package version 11.4.0 or later."
]
},
{
@@ -25,7 +26,7 @@
"metadata": {},
"outputs": [],
"source": [
"%pip install --upgrade --quiet azure-search-documents==11.4.0\n",
"%pip install --upgrade --quiet azure-search-documents\n",
"%pip install --upgrade --quiet azure-identity"
]
},
@@ -34,19 +35,21 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Import required libraries"
"## Import required libraries\n",
"\n",
"`OpenAIEmbeddings` is assumed, but if you're using Azure OpenAI, import `AzureOpenAIEmbeddings` instead."
]
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"\n",
"from langchain_community.vectorstores.azuresearch import AzureSearch\n",
"from langchain_openai import AzureOpenAIEmbeddings"
"from langchain_openai import AzureOpenAIEmbeddings, OpenAIEmbeddings"
]
},
{
@@ -55,21 +58,32 @@
"metadata": {},
"source": [
"## Configure OpenAI settings\n",
"Configure the OpenAI settings to use Azure OpenAI or OpenAI"
"Set variables for your OpenAI provider. You need either an [OpenAI account](https://platform.openai.com/docs/quickstart?context=python) or an [Azure OpenAI account](https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/create-resource) to generate the embeddings. "
]
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"embeddings: AzureOpenAIEmbeddings = AzureOpenAIEmbeddings(\n",
" azure_deployment=\"text-embedding-ada-002\",\n",
" openai_api_version=\"2023-05-15\",\n",
" azure_endpoint=\"YOUR_AZURE_OPENAI_ENDPOINT\",\n",
" api_key=\"YOUR_AZURE_OPENAI_KEY\",\n",
")"
"# Option 1: use an OpenAI account\n",
"openai_api_key: str = \"PLACEHOLDER FOR YOUR API KEY\"\n",
"openai_api_version: str = \"2023-05-15\"\n",
"model: str = \"text-embedding-ada-002\""
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [],
"source": [
"# Option 2: use an Azure OpenAI account with a deployment of an embedding model\n",
"azure_endpoint: str = \"PLACEHOLDER FOR YOUR AZURE OPENAI ENDPOINT\"\n",
"azure_openai_api_key: str = \"PLACEHOLDER FOR YOUR AZURE OPENAI KEY\"\n",
"azure_openai_api_version: str = \"2023-05-15\"\n",
"azure_deployment: str = \"text-embedding-ada-002\""
]
},
{
@@ -77,13 +91,15 @@
"metadata": {},
"source": [
"## Configure vector store settings\n",
"\n",
"You need an [Azure subscription](https://azure.microsoft.com/en-us/free/search) and [Azure AI Search service](https://learn.microsoft.com/azure/search/search-create-service-portal) to use this vector store integration. No-cost versions are available for small and limited workloads.\n",
" \n",
"Set up the vector store settings using environment variables:"
"Set variables for your Azure AI Search URL and admin API key. You can get these variables from the [Azure portal](https://portal.azure.com/#blade/HubsExtension/BrowseResourceBlade/resourceType/Microsoft.Search%2FsearchServices)."
]
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 24,
"metadata": {},
"outputs": [],
"source": [
@@ -97,12 +113,48 @@
"source": [
"## Create embeddings and vector store instances\n",
" \n",
"Create instances of the OpenAIEmbeddings and AzureSearch classes:"
"Create instances of the OpenAIEmbeddings and AzureSearch classes. When you complete this step, you should have an empty search index on your Azure AI Search resource. The integration module provides a default schema."
]
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"# Option 1: Use OpenAIEmbeddings with OpenAI account\n",
"embeddings: OpenAIEmbeddings = OpenAIEmbeddings(\n",
" openai_api_key=openai_api_key, openai_api_version=openai_api_version, model=model\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [],
"source": [
"# Option 2: Use AzureOpenAIEmbeddings with an Azure account\n",
"embeddings: AzureOpenAIEmbeddings = AzureOpenAIEmbeddings(\n",
" azure_deployment=azure_deployment,\n",
" openai_api_version=azure_openai_api_version,\n",
" azure_endpoint=azure_endpoint,\n",
" api_key=azure_openai_api_key,\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create vector store instance\n",
" \n",
"Create instance of the AzureSearch class using the embeddings from above"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [],
"source": [
@@ -121,14 +173,66 @@
"source": [
"## Insert text and embeddings into vector store\n",
" \n",
"Add texts and metadata from the JSON data to the vector store:"
"This step loads, chunks, and vectorizes the sample document, and then indexes the content into a search index on Azure AI Search."
]
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 31,
"metadata": {},
"outputs": [],
"outputs": [
{
"data": {
"text/plain": [
"['M2U1OGM4YzAtYjMxYS00Nzk5LTlhNDgtZTc3MGVkNTg1Mjc0',\n",
" 'N2I2MGNiZDEtNDdmZS00YWNiLWJhYTYtYWEzMmFiYzU1ZjZm',\n",
" 'YWFmNDViNTQtZTc4MS00MTdjLTkzZjQtYTJkNmY1MDU4Yzll',\n",
" 'MjgwY2ExZDctYTUxYi00NjE4LTkxMjctZDA1NDQ1MzU4NmY1',\n",
" 'NGE4NzhkNTAtZWYxOC00ZmI5LTg0MTItZDQ1NzMxMWVmMTIz',\n",
" 'MTYwMWU3YjAtZDIzOC00NTYwLTgwMmEtNDI1NzA2MWVhMDYz',\n",
" 'NGM5N2NlZjgtMTc5Ny00OGEzLWI5YTgtNDFiZWE2MjBlMzA0',\n",
" 'OWQ4M2MyMTYtMmRkNi00ZDUxLWI0MDktOGE2NjMxNDFhYzFm',\n",
" 'YWZmZGJkOTAtOGM3My00MmNiLTg5OWUtZGMwMDQwYTk1N2Vj',\n",
" 'YTc3MTI2OTktYmVkMi00ZGU4LTgyNmUtNTY1YzZjMDg2YWI3',\n",
" 'MTQwMmVlYjEtNDI0MS00N2E0LWEyN2ItZjhhYWU0YjllMjRk',\n",
" 'NjJjYWY4ZjctMzgyNi00Y2I5LTkwY2UtZjRkMjJhNDQxYTFk',\n",
" 'M2ZiM2NiYTMtM2ZiMS00YWJkLWE3ZmQtNDZiODcyOTMyYWYx',\n",
" 'MzNmZTNkMWYtMjNmYS00Y2NmLTg3ZjQtYTZjOWM1YmJhZTRk',\n",
" 'ZDY3MDc1NzYtY2YzZS00ZjExLWEyMjAtODhiYTRmNDUzMTBi',\n",
" 'ZGIyYzA4NzUtZGM2Ni00MDUwLWEzZjYtNTg3MDYyOWQ5MWQy',\n",
" 'NTA0MjBhMzYtOTYzMi00MDQ2LWExYWQtMzNiN2I4ODM4ZGZl',\n",
" 'OTdjYzU2NGUtNWZjNC00N2ZmLWExMjQtNjhkYmZkODg4MTY3',\n",
" 'OThhMWZmMjgtM2EzYS00OWZkLTk1NGEtZTdkNmRjNWYxYmVh',\n",
" 'ZGVjMTQ0NzctNDVmZC00ZWY4LTg4N2EtMDQ1NWYxNWM5NDVh',\n",
" 'MjRlYzE4YzItZTMxNy00OGY3LThmM2YtMjM0YmRhYTVmOGY3',\n",
" 'MWU0NDA3ZDQtZDE4MS00OWMyLTlmMzktZjdkYzZhZmUwYWM3',\n",
" 'ZGM2ZDhhY2MtM2NkNi00MzZhLWJmNTEtMmYzNjEwMzE3NmZl',\n",
" 'YjBmMjkyZTItYTNlZC00MmY2LThiMzYtMmUxY2MyNDlhNGUw',\n",
" 'OThmYTQ0YzEtNjk0MC00NWIyLWE1ZDQtNTI2MTZjN2NlODcw',\n",
" 'NDdlOGU1ZGQtZTVkMi00M2MyLWExN2YtOTc2ODk3OWJmNmQw',\n",
" 'MDVmZGNkYTUtNWI2OS00YjllLTk0YTItZDRmNWQxMWU3OTVj',\n",
" 'YWFlNTVmNjMtMDZlNy00NmE5LWI0ODUtZTI3ZTFmZWRmNzU0',\n",
" 'MmIzOTkxODQtODYxMi00YWM2LWFjY2YtNjRmMmEyM2JlNzMw',\n",
" 'ZmI1NDhhNWItZWY0ZS00NTNhLWEyNDEtMTE2OWYyMjc4YTU2',\n",
" 'YTllYTc5OTgtMzJiNC00ZjZjLWJiMzUtNWVhYzFjYzgxMjU2',\n",
" 'ODZlZWUyOTctOGY4OS00ZjA3LWIyYTUtNDVlNDUyN2E4ZDFk',\n",
" 'Y2M0MWRlM2YtZDU4Ny00MjZkLWE5NzgtZmRkMTNhZDg2YjEy',\n",
" 'MDNjZWQ2ODEtMWZiMy00OTZjLTk3MzAtZjE4YjIzNWVhNTE1',\n",
" 'OTE1NDY0NzMtODNkZS00MTk4LTk4NWQtZGVmYjQ2YjFlY2Q0',\n",
" 'ZTgwYWQwMjEtN2ZlOS00NDk2LWIxNzUtNjk2ODE3N2U0Yzlj',\n",
" 'ZDkxOTgzMGUtZGExMC00Yzg0LWJjMGItOWQ2ZmUwNWUwOGJj',\n",
" 'ZGViMGI2NDEtZDdlNC00YjhiLTk0MDUtYjEyOTVlMGU1Y2I2',\n",
" 'ODliZTYzZTctZjdlZS00YjBjLWFiZmYtMDJmNjQ0YjU3ZDcy',\n",
" 'MDFjZGI1NzUtOTc0Ni00NWNmLThhYzYtYzRlZThkZjMwM2Vl',\n",
" 'ZjY2ZmRiN2EtZWVhNS00ODViLTk4YjYtYjQ2Zjc4MDdkYjhk',\n",
" 'ZTQ3NDMwODEtMTQwMy00NDFkLWJhZDQtM2UxN2RkOTU1MTdl']"
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from langchain.text_splitter import CharacterTextSplitter\n",
"from langchain_community.document_loaders import TextLoader\n",
@@ -154,9 +258,23 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 11,
"metadata": {},
"outputs": [],
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while youre at it, pass the Disclose Act so Americans can know who is funding our elections. \n",
"\n",
"Tonight, Id like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n",
"\n",
"One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n",
"\n",
"And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nations top legal minds, who will continue Justice Breyers legacy of excellence.\n"
]
}
],
"source": [
"# Perform a similarity search\n",
"docs = vector_store.similarity_search(\n",
@@ -173,14 +291,29 @@
"source": [
"## Perform a vector similarity search with relevance scores\n",
" \n",
"Execute a pure vector similarity search using the similarity_search_with_relevance_scores() method:"
"Execute a pure vector similarity search using the similarity_search_with_relevance_scores() method. Queries that don't meet the threshold requirements are exluded."
]
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 12,
"metadata": {},
"outputs": [],
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[(Document(page_content='Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while youre at it, pass the Disclose Act so Americans can know who is funding our elections. \\n\\nTonight, Id like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \\n\\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \\n\\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nations top legal minds, who will continue Justice Breyers legacy of excellence.', metadata={'source': '../../modules/state_of_the_union.txt'}),\n",
" 0.84402436),\n",
" (Document(page_content='A former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus builder. Since shes been nominated, shes received a broad range of support—from the Fraternal Order of Police to former judges appointed by Democrats and Republicans. \\n\\nAnd if we are to advance liberty and justice, we need to secure the Border and fix the immigration system. \\n\\nWe can do both. At our border, weve installed new technology like cutting-edge scanners to better detect drug smuggling. \\n\\nWeve set up joint patrols with Mexico and Guatemala to catch more human traffickers. \\n\\nWere putting in place dedicated immigration judges so families fleeing persecution and violence can have their cases heard faster. \\n\\nWere securing commitments and supporting partners in South and Central America to host more refugees and secure their own borders.', metadata={'source': '../../modules/state_of_the_union.txt'}),\n",
" 0.82128483),\n",
" (Document(page_content='And for our LGBTQ+ Americans, lets finally get the bipartisan Equality Act to my desk. The onslaught of state laws targeting transgender Americans and their families is wrong. \\n\\nAs I said last year, especially to our younger transgender Americans, I will always have your back as your President, so you can be yourself and reach your God-given potential. \\n\\nWhile it often appears that we never agree, that isnt true. I signed 80 bipartisan bills into law last year. From preventing government shutdowns to protecting Asian-Americans from still-too-common hate crimes to reforming military justice. \\n\\nAnd soon, well strengthen the Violence Against Women Act that I first wrote three decades ago. It is important for us to show the nation that we can come together and do big things. \\n\\nSo tonight Im offering a Unity Agenda for the Nation. Four big things we can do together. \\n\\nFirst, beat the opioid epidemic.', metadata={'source': '../../modules/state_of_the_union.txt'}),\n",
" 0.8151042),\n",
" (Document(page_content='Tonight, Im announcing a crackdown on these companies overcharging American businesses and consumers. \\n\\nAnd as Wall Street firms take over more nursing homes, quality in those homes has gone down and costs have gone up. \\n\\nThat ends on my watch. \\n\\nMedicare is going to set higher standards for nursing homes and make sure your loved ones get the care they deserve and expect. \\n\\nWell also cut costs and keep the economy going strong by giving workers a fair shot, provide more training and apprenticeships, hire them based on their skills not degrees. \\n\\nLets pass the Paycheck Fairness Act and paid leave. \\n\\nRaise the minimum wage to $15 an hour and extend the Child Tax Credit, so no one has to raise a family in poverty. \\n\\nLets increase Pell Grants and increase our historic support of HBCUs, and invest in what Jill—our First Lady who teaches full-time—calls Americas best-kept secret: community colleges.', metadata={'source': '../../modules/state_of_the_union.txt'}),\n",
" 0.8148832)]\n"
]
}
],
"source": [
"docs_and_scores = vector_store.similarity_search_with_relevance_scores(\n",
" query=\"What did the president say about Ketanji Brown Jackson\",\n",
@@ -197,18 +330,32 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Perform a Hybrid Search\n",
"## Perform a hybrid search\n",
"\n",
"Execute hybrid search using the search_type or hybrid_search() method:"
"Execute hybrid search using the search_type or hybrid_search() method. Vector and nonvector text fields are queried in parallel, results are merged, and top matches of the unified result set are returned."
]
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 13,
"metadata": {},
"outputs": [],
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while youre at it, pass the Disclose Act so Americans can know who is funding our elections. \n",
"\n",
"Tonight, Id like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n",
"\n",
"One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n",
"\n",
"And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nations top legal minds, who will continue Justice Breyers legacy of excellence.\n"
]
}
],
"source": [
"# Perform a hybrid search\n",
"# Perform a hybrid search using the search_type parameter\n",
"docs = vector_store.similarity_search(\n",
" query=\"What did the president say about Ketanji Brown Jackson\",\n",
" k=3,\n",
@@ -219,11 +366,25 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 14,
"metadata": {},
"outputs": [],
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while youre at it, pass the Disclose Act so Americans can know who is funding our elections. \n",
"\n",
"Tonight, Id like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n",
"\n",
"One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n",
"\n",
"And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nations top legal minds, who will continue Justice Breyers legacy of excellence.\n"
]
}
],
"source": [
"# Perform a hybrid search\n",
"# Perform a hybrid search using the hybrid_search method\n",
"docs = vector_store.hybrid_search(\n",
" query=\"What did the president say about Ketanji Brown Jackson\", k=3\n",
")\n",
@@ -234,12 +395,25 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Create a new index with custom filterable fields "
"## Custom schemas and queries\n",
"\n",
"This section shows you how to replace the default schema with a custom schema.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create a new index with custom filterable fields \n",
"\n",
"This schema shows field definitions. It's the default schema, plus several new fields attributed as filterable. Because it's using the default vector configuration, you won't see vector configuration or vector profile overrides here. The name of the default vector profile is \"myHnswProfile\" and it's using a vector configuration of Hierarchical Navigable Small World (HNSW) for indexing and queries against the content_vector field.\n",
"\n",
"There's no data for this schema in this step. When you execute the cell, you should get an empty index on Azure AI Search."
]
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 15,
"metadata": {},
"outputs": [],
"source": [
@@ -252,11 +426,9 @@
" TextWeights,\n",
")\n",
"\n",
"embeddings: AzureOpenAIEmbeddings = AzureOpenAIEmbeddings(\n",
" azure_deployment=\"text-embedding-ada-002\",\n",
" openai_api_version=\"2023-05-15\",\n",
" azure_endpoint=\"YOUR_AZURE_OPENAI_ENDPOINT\",\n",
" api_key=\"YOUR_AZURE_OPENAI_KEY\",\n",
"# Replace OpenAIEmbeddings with AzureOpenAIEmbeddings if Azure OpenAI is your provider.\n",
"embeddings: OpenAIEmbeddings = OpenAIEmbeddings(\n",
" openai_api_key=openai_api_key, openai_api_version=openai_api_version, model=model\n",
")\n",
"embedding_function = embeddings.embed_query\n",
"\n",
@@ -313,19 +485,34 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Perform a query with a custom filter"
"### Add data and perform a query that includes a filter\n",
"\n",
"This example adds data to the vector store based on the custom schema. It loads text into the title and source fields. The source field is filterable. The sample query in this section filters the results based on content in the source field."
]
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 16,
"metadata": {},
"outputs": [],
"outputs": [
{
"data": {
"text/plain": [
"['ZjhmMTg0NTEtMjgwNC00N2M0LWFiZGEtMDllMGU1Mzk1NWRm',\n",
" 'MzQwYWUwZDEtNDJkZC00MzgzLWIwMzItYzMwOGZkYTRiZGRi',\n",
" 'ZjFmOWVlYTQtODRiMC00YTY3LTk2YjUtMzY1NDBjNjY5ZmQ2']"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Data in the metadata dictionary with a corresponding field in the index will be added to the index\n",
"# In this example, the metadata dictionary contains a title, a source and a random field\n",
"# The title and the source will be added to the index as separate fields, but the random won't. (as it is not defined in the fields list)\n",
"# The random field will be only stored in the metadata field\n",
"# Data in the metadata dictionary with a corresponding field in the index will be added to the index.\n",
"# In this example, the metadata dictionary contains a title, a source, and a random field.\n",
"# The title and the source are added to the index as separate fields, but the random value is ignored because it's not defined in the schema.\n",
"# The random field is only stored in the metadata field.\n",
"vector_store.add_texts(\n",
" [\"Test 1\", \"Test 2\", \"Test 3\"],\n",
" [\n",
@@ -338,9 +525,22 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 17,
"metadata": {},
"outputs": [],
"outputs": [
{
"data": {
"text/plain": [
"[Document(page_content='Test 3', metadata={'title': 'Title 3', 'source': 'B', 'random': '32893'}),\n",
" Document(page_content='Test 1', metadata={'title': 'Title 1', 'source': 'A', 'random': '10290'}),\n",
" Document(page_content='Test 2', metadata={'title': 'Title 2', 'source': 'A', 'random': '48392'})]"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"res = vector_store.similarity_search(query=\"Test 3 source1\", k=3, search_type=\"hybrid\")\n",
"res"
@@ -348,9 +548,21 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 18,
"metadata": {},
"outputs": [],
"outputs": [
{
"data": {
"text/plain": [
"[Document(page_content='Test 1', metadata={'title': 'Title 1', 'source': 'A', 'random': '10290'}),\n",
" Document(page_content='Test 2', metadata={'title': 'Title 2', 'source': 'A', 'random': '48392'})]"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"res = vector_store.similarity_search(\n",
" query=\"Test 3 source1\", k=3, search_type=\"hybrid\", filters=\"source eq 'A'\"\n",
@@ -362,12 +574,14 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Create a new index with a Scoring Profile"
"### Create a new index with a scoring profile\n",
"\n",
"Here's another custom schema that includes a scoring profile definition. A scoring profile is used for relevance tuning of nonvector content, which is helpful in hybrid search scenarios."
]
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 19,
"metadata": {},
"outputs": [],
"source": [
@@ -382,11 +596,9 @@
" TextWeights,\n",
")\n",
"\n",
"embeddings: AzureOpenAIEmbeddings = AzureOpenAIEmbeddings(\n",
" azure_deployment=\"text-embedding-ada-002\",\n",
" openai_api_version=\"2023-05-15\",\n",
" azure_endpoint=\"YOUR_AZURE_OPENAI_ENDPOINT\",\n",
" api_key=\"YOUR_AZURE_OPENAI_KEY\",\n",
"# Replace OpenAIEmbeddings with AzureOpenAIEmbeddings if Azure OpenAI is your provider.\n",
"embeddings: OpenAIEmbeddings = OpenAIEmbeddings(\n",
" openai_api_key=openai_api_key, openai_api_version=openai_api_version, model=model\n",
")\n",
"embedding_function = embeddings.embed_query\n",
"\n",
@@ -465,9 +677,22 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 20,
"metadata": {},
"outputs": [],
"outputs": [
{
"data": {
"text/plain": [
"['NjUwNGQ5ZDUtMGVmMy00OGM4LWIxMGYtY2Y2MDFmMTQ0MjE5',\n",
" 'NWFjN2YwY2UtOWQ4Yi00OTNhLTg2MGEtOWE0NGViZTVjOGRh',\n",
" 'N2Y2NWUyZjctMDBjZC00OGY4LWJlZDEtNTcxYjQ1MmI1NjYx']"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Adding same data with different last_update to show Scoring Profile effect\n",
"from datetime import datetime, timedelta\n",
@@ -505,9 +730,22 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 21,
"metadata": {},
"outputs": [],
"outputs": [
{
"data": {
"text/plain": [
"[Document(page_content='Test 1', metadata={'title': 'Title 1', 'source': 'source1', 'random': '32893', 'last_update': '2024-01-24T22:18:51-00:00'}),\n",
" Document(page_content='Test 1', metadata={'title': 'Title 1', 'source': 'source1', 'random': '48392', 'last_update': '2024-02-22T22:18:51-00:00'}),\n",
" Document(page_content='Test 1', metadata={'title': 'Title 1', 'source': 'source1', 'random': '10290', 'last_update': '2024-02-23T22:18:51-00:00'})]"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"res = vector_store.similarity_search(query=\"Test 1\", k=3, search_type=\"similarity\")\n",
"res"

View File

@@ -21,7 +21,7 @@
"metadata": {},
"outputs": [],
"source": [
"%pip install --upgrade --quiet elasticsearch langchain-openai tiktoken langchain"
"%pip install --upgrade --quiet langchain-elasticsearch langchain-openai tiktoken langchain"
]
},
{
@@ -64,7 +64,7 @@
"\n",
"Example:\n",
"```python\n",
" from langchain_community.vectorstores.elasticsearch import ElasticsearchStore\n",
" from langchain_elasticsearch import ElasticsearchStore\n",
" from langchain_openai import OpenAIEmbeddings\n",
"\n",
" embedding = OpenAIEmbeddings()\n",
@@ -79,7 +79,7 @@
"\n",
"Example:\n",
"```python\n",
" from langchain_community.vectorstores import ElasticsearchStore\n",
" from langchain_elasticsearch import ElasticsearchStore\n",
" from langchain_openai import OpenAIEmbeddings\n",
"\n",
" embedding = OpenAIEmbeddings()\n",
@@ -97,7 +97,7 @@
"Example:\n",
"```python\n",
" import elasticsearch\n",
" from langchain_community.vectorstores import ElasticsearchStore\n",
" from langchain_elasticsearch import ElasticsearchStore\n",
"\n",
" es_client= elasticsearch.Elasticsearch(\n",
" hosts=[\"http://localhost:9200\"],\n",
@@ -137,7 +137,7 @@
"\n",
"Example:\n",
"```python\n",
" from langchain_community.vectorstores.elasticsearch import ElasticsearchStore\n",
" from langchain_elasticsearch import ElasticsearchStore\n",
" from langchain_openai import OpenAIEmbeddings\n",
"\n",
" embedding = OpenAIEmbeddings()\n",
@@ -202,7 +202,7 @@
},
"outputs": [],
"source": [
"from langchain_community.vectorstores import ElasticsearchStore\n",
"from langchain_elasticsearch import ElasticsearchStore\n",
"from langchain_openai import OpenAIEmbeddings"
]
},
@@ -817,7 +817,7 @@
"source": [
"from typing import Dict\n",
"\n",
"from langchain.docstore.document import Document\n",
"from langchain_core.documents import Document\n",
"\n",
"\n",
"def custom_document_builder(hit: Dict) -> Document:\n",
@@ -902,7 +902,7 @@
"\n",
"```python\n",
"\n",
"from langchain_community.vectorstores.elasticsearch import ElasticsearchStore\n",
"from langchain_elasticsearch import ElasticsearchStore\n",
"\n",
"db = ElasticsearchStore(\n",
" es_url=\"http://localhost:9200\",\n",
@@ -936,7 +936,7 @@
"\n",
"```python\n",
"\n",
"from langchain_community.vectorstores.elasticsearch import ElasticsearchStore\n",
"from langchain_elasticsearch import ElasticsearchStore\n",
"\n",
"db = ElasticsearchStore(\n",
" es_url=\"http://localhost:9200\",\n",

View File

@@ -0,0 +1,556 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Google AlloyDB for PostgreSQL\n",
"\n",
"> [Google Cloud AlloyDB](https://cloud.google.com/alloydb) is a fully managed relational database service that offers high performance, seamless integration, and impressive scalability. AlloyDB is 100% compatible with PostgreSQL. Extend your database application to build AI-powered experiences leveraging AlloyDB's Langchain integrations.\n",
"\n",
"This notebook goes over how to use `AlloyDB for PostgreSQL` to store vector embeddings with the `AlloyDBVectorStore` class."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Before you begin\n",
"\n",
"To run this notebook, you will need to do the following:\n",
"\n",
" * [Create a Google Cloud Project](https://developers.google.com/workspace/guides/create-project)\n",
" * [Enable the AlloyDB Admin API.](https://console.cloud.google.com/flows/enableapi?apiid=alloydb.googleapis.com)\n",
" * [Create a AlloyDB cluster and instance.](https://cloud.google.com/alloydb/docs/cluster-create)\n",
" * [Create a AlloyDB database.](https://cloud.google.com/alloydb/docs/quickstart/create-and-connect)\n",
" * [Add a User to the database.](https://cloud.google.com/alloydb/docs/database-users/about)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "IR54BmgvdHT_"
},
"source": [
"### 🦜🔗 Library Installation\n",
"Install the integration library, `langchain-google-alloydb-pg`, and the library for the embedding service, `langchain-google-vertexai`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 1000
},
"id": "0ZITIDE160OD",
"outputId": "e184bc0d-6541-4e0a-82d2-1e216db00a2d"
},
"outputs": [],
"source": [
"%pip install --upgrade --quiet langchain-google-alloydb-pg langchain-google-vertexai"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "v40bB_GMcr9f"
},
"source": [
"**Colab only:** Uncomment the following cell to restart the kernel or use the button to restart the kernel. For Vertex AI Workbench you can restart the terminal using the button on top."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "v6jBDnYnNM08"
},
"outputs": [],
"source": [
"# # Automatically restart kernel after installs so that your environment can access the new packages\n",
"# import IPython\n",
"\n",
"# app = IPython.Application.instance()\n",
"# app.kernel.do_shutdown(True)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "yygMe6rPWxHS"
},
"source": [
"### 🔐 Authentication\n",
"Authenticate to Google Cloud as the IAM user logged into this notebook in order to access your Google Cloud Project.\n",
"\n",
"* If you are using Colab to run this notebook, use the cell below and continue.\n",
"* If you are using Vertex AI Workbench, check out the setup instructions [here](https://github.com/GoogleCloudPlatform/generative-ai/tree/main/setup-env)."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"id": "PTXN1_DSXj2b"
},
"outputs": [],
"source": [
"from google.colab import auth\n",
"\n",
"auth.authenticate_user()"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "NEvB9BoLEulY"
},
"source": [
"### ☁ Set Your Google Cloud Project\n",
"Set your Google Cloud project so that you can leverage Google Cloud resources within this notebook.\n",
"\n",
"If you don't know your project ID, try the following:\n",
"\n",
"* Run `gcloud config list`.\n",
"* Run `gcloud projects list`.\n",
"* See the support page: [Locate the project ID](https://support.google.com/googleapi/answer/7014113)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"id": "gfkS3yVRE4_W"
},
"outputs": [],
"source": [
"# @markdown Please fill in the value below with your Google Cloud project ID and then run the cell.\n",
"\n",
"PROJECT_ID = \"my-project-id\" # @param {type:\"string\"}\n",
"\n",
"# Set the project id\n",
"!gcloud config set project {PROJECT_ID}"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "rEWWNoNnKOgq"
},
"source": [
"### 💡 API Enablement\n",
"The `langchain-google-alloydb-pg` package requires that you [enable the AlloyDB Admin API](https://console.cloud.google.com/flows/enableapi?apiid=alloydb.googleapis.com) in your Google Cloud Project."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"id": "5utKIdq7KYi5"
},
"outputs": [],
"source": [
"# enable AlloyDB Admin API\n",
"!gcloud services enable alloydb.googleapis.com"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "f8f2830ee9ca1e01"
},
"source": [
"## Basic Usage"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "OMvzMWRrR6n7"
},
"source": [
"### Set AlloyDB database values\n",
"Find your database values, in the [AlloyDB Instances page](https://console.cloud.google.com/alloydb/clusters)."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"id": "irl7eMFnSPZr"
},
"outputs": [],
"source": [
"# @title Set Your Values Here { display-mode: \"form\" }\n",
"REGION = \"us-central1\" # @param {type: \"string\"}\n",
"CLUSTER = \"my-cluster\" # @param {type: \"string\"}\n",
"INSTANCE = \"my-primary\" # @param {type: \"string\"}\n",
"DATABASE = \"my-database\" # @param {type: \"string\"}\n",
"TABLE_NAME = \"vector_store\" # @param {type: \"string\"}"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "QuQigs4UoFQ2"
},
"source": [
"### AlloyDBEngine Connection Pool\n",
"\n",
"One of the requirements and arguments to establish AlloyDB as a vector store is a `AlloyDBEngine` object. The `AlloyDBEngine` configures a connection pool to your AlloyDB database, enabling successful connections from your application and following industry best practices.\n",
"\n",
"To create a `AlloyDBEngine` using `AlloyDBEngine.from_instance()` you need to provide only 4 things:\n",
"\n",
"1. `project_id` : Project ID of the Google Cloud Project where the AlloyDB instance is located.\n",
"1. `region` : Region where the AlloyDB instance is located.\n",
"1. `cluster`: The name of the AlloyDB cluster.\n",
"1. `instance` : The name of the AlloyDB instance.\n",
"1. `database` : The name of the database to connect to on the AlloyDB instance.\n",
"\n",
"By default, [IAM database authentication](https://cloud.google.com/alloydb/docs/connect-iam) will be used as the method of database authentication. This library uses the IAM principal belonging to the [Application Default Credentials (ADC)](https://cloud.google.com/docs/authentication/application-default-credentials) sourced from the environment.\n",
"\n",
"Optionally, [built-in database authentication](https://cloud.google.com/alloydb/docs/database-users/about) using a username and password to access the AlloyDB database can also be used. Just provide the optional `user` and `password` arguments to `AlloyDBEngine.from_instance()`:\n",
"\n",
"* `user` : Database user to use for built-in database authentication and login\n",
"* `password` : Database password to use for built-in database authentication and login.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Note:** This tutorial demonstrates the async interface. All async methods have corresponding sync methods."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain_google_alloydb_pg import AlloyDBEngine\n",
"\n",
"engine = await AlloyDBEngine.afrom_instance(\n",
" project_id=PROJECT_ID,\n",
" region=REGION,\n",
" cluster=CLUSTER,\n",
" instance=INSTANCE,\n",
" database=DATABASE,\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "D9Xs2qhm6X56"
},
"source": [
"### Initialize a table\n",
"The `AlloyDBVectorStore` class requires a database table. The `AlloyDBEngine` engine has a helper method `init_vectorstore_table()` that can be used to create a table with the proper schema for you."
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {
"id": "avlyHEMn6gzU"
},
"outputs": [],
"source": [
"await engine.ainit_vectorstore_table(\n",
" table_name=TABLE_NAME,\n",
" vector_size=768, # Vector size for VertexAI model(textembedding-gecko@latest)\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create an embedding class instance\n",
"\n",
"You can use any [LangChain embeddings model](https://python.langchain.com/docs/integrations/text_embedding/).\n",
"You may need to enable Vertex AI API to use `VertexAIEmbeddings`. We recommend setting the embedding model's version for production, learn more about the [Text embeddings models](https://cloud.google.com/vertex-ai/docs/generative-ai/model-reference/text-embeddings)."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"id": "5utKIdq7KYi5"
},
"outputs": [],
"source": [
"# enable Vertex AI API\n",
"!gcloud services enable aiplatform.googleapis.com"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "Vb2RJocV9_LQ",
"outputId": "37f5dc74-2512-47b2-c135-f34c10afdcf4"
},
"outputs": [],
"source": [
"from langchain_google_vertexai import VertexAIEmbeddings\n",
"\n",
"embedding = VertexAIEmbeddings(\n",
" model_name=\"textembedding-gecko@latest\", project=PROJECT_ID\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "e1tl0aNx7SWy"
},
"source": [
"### Initialize a default AlloyDBVectorStore"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "z-AZyzAQ7bsf"
},
"outputs": [],
"source": [
"from langchain_google_alloydb_pg import AlloyDBVectorStore\n",
"\n",
"store = await AlloyDBVectorStore.create(\n",
" engine=engine,\n",
" table_name=TABLE_NAME,\n",
" embedding_service=embedding,\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Add texts"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import uuid\n",
"\n",
"all_texts = [\"Apples and oranges\", \"Cars and airplanes\", \"Pineapple\", \"Train\", \"Banana\"]\n",
"metadatas = [{\"len\": len(t)} for t in all_texts]\n",
"ids = [str(uuid.uuid4()) for _ in all_texts]\n",
"\n",
"await store.aadd_texts(all_texts, metadatas=metadatas, ids=ids)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Delete texts"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"await store.adelete([ids[1]])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Search for documents"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"query = \"I'd like a fruit.\"\n",
"docs = await store.asimilarity_search(query)\n",
"print(docs)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Search for documents by vector"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"query_vector = embedding.embed_query(query)\n",
"docs = await store.asimilarity_search_by_vector(query_vector, k=2)\n",
"print(docs)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Add a Index\n",
"Speed up vector search queries by applying a vector index. Learn more about [vector indexes](https://cloud.google.com/blog/products/databases/faster-similarity-search-performance-with-pgvector-indexes)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain_google_alloydb_pg.indexes import IVFFlatIndex\n",
"\n",
"index = IVFFlatIndex()\n",
"await store.aapply_vector_index(index)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Re-index"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"await store.areindex() # Re-index using default index name"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Remove an index"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"await store.adrop_vector_index() # Delete index using default name"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create a custom Vector Store\n",
"A Vector Store can take advantage of relational data to filter similarity searches.\n",
"\n",
"Create a table with custom metadata columns."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain_google_alloydb_pg import Column\n",
"\n",
"# Set table name\n",
"TABLE_NAME = \"vectorstore_custom\"\n",
"\n",
"await engine.ainit_vectorstore_table(\n",
" table_name=TABLE_NAME,\n",
" vector_size=768, # VertexAI model: textembedding-gecko@latest\n",
" metadata_columns=[Column(\"len\", \"INTEGER\")],\n",
")\n",
"\n",
"\n",
"# Initialize AlloyDBVectorStore\n",
"custom_store = await AlloyDBVectorStore.create(\n",
" engine=engine,\n",
" table_name=TABLE_NAME,\n",
" embedding_service=embedding,\n",
" metadata_columns=[\"len\"],\n",
" # Connect to a existing VectorStore by customizing the table schema:\n",
" # id_column=\"uuid\",\n",
" # content_column=\"documents\",\n",
" # embedding_column=\"vectors\",\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Search for documents with metadata filter"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import uuid\n",
"\n",
"# Add texts to the Vector Store\n",
"all_texts = [\"Apples and oranges\", \"Cars and airplanes\", \"Pineapple\", \"Train\", \"Banana\"]\n",
"metadatas = [{\"len\": len(t)} for t in all_texts]\n",
"ids = [str(uuid.uuid4()) for _ in all_texts]\n",
"await store.aadd_texts(all_texts, metadatas=metadatas, ids=ids)\n",
"\n",
"# Use filter on search\n",
"docs = await custom_store.asimilarity_search_by_vector(query_vector, filter=\"len >= 6\")\n",
"\n",
"print(docs)"
]
}
],
"metadata": {
"colab": {
"provenance": [],
"toc_visible": true
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.12"
}
},
"nbformat": 4,
"nbformat_minor": 4
}

View File

@@ -6,8 +6,9 @@
"id": "E_RJy7C1bpCT"
},
"source": [
"# BigQuery Vector Search\n",
"> [**BigQuery Vector Search**](https://cloud.google.com/bigquery/docs/vector-search-intro) lets you use GoogleSQL to do semantic search, using vector indexes for fast approximate results, or using brute force for exact results.\n",
"# Google BigQuery Vector Search\n",
"\n",
"> [Google Cloud BigQuery Vector Search](https://cloud.google.com/bigquery/docs/vector-search-intro) lets you use GoogleSQL to do semantic search, using vector indexes for fast approximate results, or using brute force for exact results.\n",
"\n",
"\n",
"This tutorial illustrates how to work with an end-to-end data and embedding management system in LangChain, and provide scalable semantic search in BigQuery."
@@ -349,7 +350,8 @@
"toc_visible": true
},
"kernelspec": {
"display_name": "Python 3",
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
@@ -362,9 +364,9 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.0"
"version": "3.10.12"
}
},
"nbformat": 4,
"nbformat_minor": 0
"nbformat_minor": 4
}

View File

@@ -0,0 +1,557 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Google Cloud SQL for PostgreSQL\n",
"\n",
"> [Google Cloud SQL](https://cloud.google.com/sql) is a fully managed relational database service that offers high performance, seamless integration, and impressive scalability. It offers PostgreSQL, PostgreSQL, and SQL Server database engines. Extend your database application to build AI-powered experiences leveraging Cloud SQL's Langchain integrations.\n",
"\n",
"This notebook goes over how to use `Cloud SQL for PostgreSQL` to store vector embeddings with the `PostgresVectorStore` class."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Before you begin\n",
"\n",
"To run this notebook, you will need to do the following:\n",
"\n",
" * [Create a Google Cloud Project](https://developers.google.com/workspace/guides/create-project)\n",
" * [Enable the Cloud SQL Admin API.](https://console.cloud.google.com/flows/enableapi?apiid=sqladmin.googleapis.com)\n",
" * [Create a Cloud SQL instance.](https://cloud.google.com/sql/docs/postgres/connect-instance-auth-proxy#create-instance)\n",
" * [Create a Cloud SQL database.](https://cloud.google.com/sql/docs/postgres/create-manage-databases)\n",
" * [Add a User to the database.](https://cloud.google.com/sql/docs/postgres/create-manage-users)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "IR54BmgvdHT_"
},
"source": [
"### 🦜🔗 Library Installation\n",
"Install the integration library, `langchain-google-cloud-sql-pg`, and the library for the embedding service, `langchain-google-vertexai`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 1000
},
"id": "0ZITIDE160OD",
"outputId": "e184bc0d-6541-4e0a-82d2-1e216db00a2d"
},
"outputs": [],
"source": [
"%pip install --upgrade --quiet langchain-google-cloud-sql-pg langchain-google-vertexai"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "v40bB_GMcr9f"
},
"source": [
"**Colab only:** Uncomment the following cell to restart the kernel or use the button to restart the kernel. For Vertex AI Workbench you can restart the terminal using the button on top."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "v6jBDnYnNM08"
},
"outputs": [],
"source": [
"# # Automatically restart kernel after installs so that your environment can access the new packages\n",
"# import IPython\n",
"\n",
"# app = IPython.Application.instance()\n",
"# app.kernel.do_shutdown(True)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "yygMe6rPWxHS"
},
"source": [
"### 🔐 Authentication\n",
"Authenticate to Google Cloud as the IAM user logged into this notebook in order to access your Google Cloud Project.\n",
"\n",
"* If you are using Colab to run this notebook, use the cell below and continue.\n",
"* If you are using Vertex AI Workbench, check out the setup instructions [here](https://github.com/GoogleCloudPlatform/generative-ai/tree/main/setup-env)."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"id": "PTXN1_DSXj2b"
},
"outputs": [],
"source": [
"from google.colab import auth\n",
"\n",
"auth.authenticate_user()"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "NEvB9BoLEulY"
},
"source": [
"### ☁ Set Your Google Cloud Project\n",
"Set your Google Cloud project so that you can leverage Google Cloud resources within this notebook.\n",
"\n",
"If you don't know your project ID, try the following:\n",
"\n",
"* Run `gcloud config list`.\n",
"* Run `gcloud projects list`.\n",
"* See the support page: [Locate the project ID](https://support.google.com/googleapi/answer/7014113)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"id": "gfkS3yVRE4_W"
},
"outputs": [],
"source": [
"# @markdown Please fill in the value below with your Google Cloud project ID and then run the cell.\n",
"\n",
"PROJECT_ID = \"my-project-id\" # @param {type:\"string\"}\n",
"\n",
"# Set the project id\n",
"!gcloud config set project {PROJECT_ID}"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "rEWWNoNnKOgq"
},
"source": [
"### 💡 API Enablement\n",
"The `langchain-google-cloud-sql-pg` package requires that you [enable the Cloud SQL Admin API](https://console.cloud.google.com/flows/enableapi?apiid=sqladmin.googleapis.com) in your Google Cloud Project."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"id": "5utKIdq7KYi5"
},
"outputs": [],
"source": [
"# enable Cloud SQL Admin API\n",
"!gcloud services enable sqladmin.googleapis.com"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "f8f2830ee9ca1e01"
},
"source": [
"## Basic Usage"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "OMvzMWRrR6n7"
},
"source": [
"### Set Cloud SQL database values\n",
"Find your database values, in the [Cloud SQL Instances page](https://console.cloud.google.com/sql?_ga=2.223735448.2062268965.1707700487-2088871159.1707257687)."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"id": "irl7eMFnSPZr"
},
"outputs": [],
"source": [
"# @title Set Your Values Here { display-mode: \"form\" }\n",
"REGION = \"us-central1\" # @param {type: \"string\"}\n",
"INSTANCE = \"my-pg-instance\" # @param {type: \"string\"}\n",
"DATABASE = \"my-database\" # @param {type: \"string\"}\n",
"TABLE_NAME = \"vector_store\" # @param {type: \"string\"}"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "QuQigs4UoFQ2"
},
"source": [
"### PostgreSQLEngine Connection Pool\n",
"\n",
"One of the requirements and arguments to establish Cloud SQL as a vector store is a `PostgreSQLEngine` object. The `PostgreSQLEngine` configures a connection pool to your Cloud SQL database, enabling successful connections from your application and following industry best practices.\n",
"\n",
"To create a `PostgreSQLEngine` using `PostgreSQLEngine.from_instance()` you need to provide only 4 things:\n",
"\n",
"1. `project_id` : Project ID of the Google Cloud Project where the Cloud SQL instance is located.\n",
"1. `region` : Region where the Cloud SQL instance is located.\n",
"1. `instance` : The name of the Cloud SQL instance.\n",
"1. `database` : The name of the database to connect to on the Cloud SQL instance.\n",
"\n",
"By default, [IAM database authentication](https://cloud.google.com/sql/docs/postgres/iam-authentication#iam-db-auth) will be used as the method of database authentication. This library uses the IAM principal belonging to the [Application Default Credentials (ADC)](https://cloud.google.com/docs/authentication/application-default-credentials) sourced from the envionment.\n",
"\n",
"For more informatin on IAM database authentication please see:\n",
"\n",
"* [Configure an instance for IAM database authentication](https://cloud.google.com/sql/docs/postgres/create-edit-iam-instances)\n",
"* [Manage users with IAM database authentication](https://cloud.google.com/sql/docs/postgres/add-manage-iam-users)\n",
"\n",
"Optionally, [built-in database authentication](https://cloud.google.com/sql/docs/postgres/built-in-authentication) using a username and password to access the Cloud SQL database can also be used. Just provide the optional `user` and `password` arguments to `PostgreSQLEngine.from_instance()`:\n",
"\n",
"* `user` : Database user to use for built-in database authentication and login\n",
"* `password` : Database password to use for built-in database authentication and login.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\"**Note**: This tutorial demonstrates the async interface. All async methods have corresponding sync methods.\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain_google_cloud_sql_pg import PostgreSQLEngine\n",
"\n",
"engine = await PostgreSQLEngine.afrom_instance(\n",
" project_id=PROJECT_ID, region=REGION, instance=INSTANCE, database=DATABASE\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "D9Xs2qhm6X56"
},
"source": [
"### Initialize a table\n",
"The `PostgresVectorStore` class requires a database table. The `PostgreSQLEngine` engine has a helper method `init_vectorstore_table()` that can be used to create a table with the proper schema for you."
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {
"id": "avlyHEMn6gzU"
},
"outputs": [],
"source": [
"from langchain_google_cloud_sql_pg import PostgreSQLEngine\n",
"\n",
"await engine.ainit_vectorstore_table(\n",
" table_name=TABLE_NAME,\n",
" vector_size=768, # Vector size for VertexAI model(textembedding-gecko@latest)\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create an embedding class instance\n",
"\n",
"You can use any [LangChain embeddings model](https://python.langchain.com/docs/integrations/text_embedding/).\n",
"You may need to enable Vertex AI API to use `VertexAIEmbeddings`. We recommend setting the embedding model's version for production, learn more about the [Text embeddings models](https://cloud.google.com/vertex-ai/docs/generative-ai/model-reference/text-embeddings)."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"id": "5utKIdq7KYi5"
},
"outputs": [],
"source": [
"# enable Vertex AI API\n",
"!gcloud services enable aiplatform.googleapis.com"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "Vb2RJocV9_LQ",
"outputId": "37f5dc74-2512-47b2-c135-f34c10afdcf4"
},
"outputs": [],
"source": [
"from langchain_google_vertexai import VertexAIEmbeddings\n",
"\n",
"embedding = VertexAIEmbeddings(\n",
" model_name=\"textembedding-gecko@latest\", project=PROJECT_ID\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "e1tl0aNx7SWy"
},
"source": [
"### Initialize a default PostgresVectorStore"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "z-AZyzAQ7bsf"
},
"outputs": [],
"source": [
"from langchain_google_cloud_sql_pg import PostgresVectorStore\n",
"\n",
"store = await PostgresVectorStore.create( # Use .create() to initialize an async vector store\n",
" engine=engine,\n",
" table_name=TABLE_NAME,\n",
" embedding_service=embedding,\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Add texts"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import uuid\n",
"\n",
"all_texts = [\"Apples and oranges\", \"Cars and airplanes\", \"Pineapple\", \"Train\", \"Banana\"]\n",
"metadatas = [{\"len\": len(t)} for t in all_texts]\n",
"ids = [str(uuid.uuid4()) for _ in all_texts]\n",
"\n",
"await store.aadd_texts(all_texts, metadatas=metadatas, ids=ids)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Delete texts"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"await store.adelete([ids[1]])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Search for documents"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"query = \"I'd like a fruit.\"\n",
"docs = await store.asimilarity_search(query)\n",
"print(docs)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Search for documents by vector"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"query_vector = embedding.embed_query(query)\n",
"docs = await store.asimilarity_search_by_vector(query_vector, k=2)\n",
"print(docs)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Add a Index\n",
"Speed up vector search queries by applying a vector index. Learn more about [vector indexes](https://cloud.google.com/blog/products/databases/faster-similarity-search-performance-with-pgvector-indexes)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain_google_cloud_sql_pg.indexes import IVFFlatIndex\n",
"\n",
"index = IVFFlatIndex()\n",
"await store.aapply_vector_index(index)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Re-index"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"await store.areindex() # Re-index using default index name"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Remove an index"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"await store.aadrop_vector_index() # Delete index using default name"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create a custom Vector Store\n",
"A Vector Store can take advantage of relational data to filter similarity searches.\n",
"\n",
"Create a table with custom metadata columns."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain_google_cloud_sql_pg import Column\n",
"\n",
"# Set table name\n",
"TABLE_NAME = \"vectorstore_custom\"\n",
"\n",
"await engine.ainit_vectorstore_table(\n",
" table_name=TABLE_NAME,\n",
" vector_size=768, # VertexAI model: textembedding-gecko@latest\n",
" metadata_columns=[Column(\"len\", \"INTEGER\")],\n",
")\n",
"\n",
"\n",
"# Initialize PostgresVectorStore\n",
"custom_store = await PostgresVectorStore.create(\n",
" engine=engine,\n",
" table_name=TABLE_NAME,\n",
" embedding_service=embedding,\n",
" metadata_columns=[\"len\"],\n",
" # Connect to a existing VectorStore by customizing the table schema:\n",
" # id_column=\"uuid\",\n",
" # content_column=\"documents\",\n",
" # embedding_column=\"vectors\",\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Search for documents with metadata filter"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import uuid\n",
"\n",
"# Add texts to the Vector Store\n",
"all_texts = [\"Apples and oranges\", \"Cars and airplanes\", \"Pineapple\", \"Train\", \"Banana\"]\n",
"metadatas = [{\"len\": len(t)} for t in all_texts]\n",
"ids = [str(uuid.uuid4()) for _ in all_texts]\n",
"await store.aadd_texts(all_texts, metadatas=metadatas, ids=ids)\n",
"\n",
"# Use filter on search\n",
"docs = await custom_store.asimilarity_search_by_vector(query_vector, filter=\"len >= 6\")\n",
"\n",
"print(docs)"
]
}
],
"metadata": {
"colab": {
"provenance": [],
"toc_visible": true
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.12"
}
},
"nbformat": 4,
"nbformat_minor": 4
}

View File

@@ -0,0 +1,429 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# Google Memorystore for Redis\n",
"\n",
"> [Google Memorystore for Redis](https://cloud.google.com/memorystore/docs/redis/memorystore-for-redis-overview) is a fully-managed service that is powered by the Redis in-memory data store to build application caches that provide sub-millisecond data access. Extend your database application to build AI-powered experiences leveraging Memorystore for Redis's Langchain integrations.\n",
"\n",
"This notebook goes over how to use [Memorystore for Redis](https://cloud.google.com/memorystore/docs/redis/memorystore-for-redis-overview) to store vector embeddings with the `MemorystoreVectorStore` class.\n",
"\n",
"[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/googleapis/langchain-google-memorystore-redis-python/blob/main/docs/vector_store.ipynb)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Pre-reqs"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Before You Begin\n",
"\n",
"To run this notebook, you will need to do the following:\n",
"\n",
"* [Create a Google Cloud Project](https://developers.google.com/workspace/guides/create-project)\n",
"* [Create a Memorystore for Redis instance](https://cloud.google.com/memorystore/docs/redis/create-instance-console). Ensure that the version is greater than or equal to 7.2."
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"### 🦜🔗 Library Installation\n",
"\n",
"The integration lives in its own `langchain-google-memorystore-redis` package, so we need to install it."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"%pip install -upgrade --quiet langchain-google-memorystore-redis"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Colab only:** Uncomment the following cell to restart the kernel or use the button to restart the kernel. For Vertex AI Workbench you can restart the terminal using the button on top."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# # Automatically restart kernel after installs so that your environment can access the new packages\n",
"# import IPython\n",
"\n",
"# app = IPython.Application.instance()\n",
"# app.kernel.do_shutdown(True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### ☁ Set Your Google Cloud Project\n",
"Set your Google Cloud project so that you can leverage Google Cloud resources within this notebook.\n",
"\n",
"If you don't know your project ID, try the following:\n",
"\n",
"* Run `gcloud config list`.\n",
"* Run `gcloud projects list`.\n",
"* See the support page: [Locate the project ID](https://support.google.com/googleapi/answer/7014113)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# @markdown Please fill in the value below with your Google Cloud project ID and then run the cell.\n",
"\n",
"PROJECT_ID = \"my-project-id\" # @param {type:\"string\"}\n",
"\n",
"# Set the project id\n",
"!gcloud config set project {PROJECT_ID}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 🔐 Authentication\n",
"Authenticate to Google Cloud as the IAM user logged into this notebook in order to access your Google Cloud Project.\n",
"\n",
"* If you are using Colab to run this notebook, use the cell below and continue.\n",
"* If you are using Vertex AI Workbench, check out the setup instructions [here](https://github.com/GoogleCloudPlatform/generative-ai/tree/main/setup-env)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from google.colab import auth\n",
"\n",
"auth.authenticate_user()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Basic Usage"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"### Initialize a Vector Index"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [],
"source": [
"import redis\n",
"from langchain_google_memorystore_redis import (\n",
" DistanceStrategy,\n",
" HNSWConfig,\n",
" RedisVectorStore,\n",
")\n",
"\n",
"# Connect to a Memorystore for Redis instance\n",
"redis_client = redis.from_url(\"redis://127.0.0.1:6379\")\n",
"\n",
"# Configure HNSW index with descriptive parameters\n",
"index_config = HNSWConfig(\n",
" name=\"my_vector_index\", distance_strategy=DistanceStrategy.COSINE, vector_size=128\n",
")\n",
"\n",
"# Initialize/create the vector store index\n",
"RedisVectorStore.init_index(client=redis_client, index_config=index_config)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"### Prepare Documents\n",
"\n",
"Text needs processing and numerical representation before interacting with a vector store. This involves:\n",
"\n",
"* Loading Text: The TextLoader obtains text data from a file (e.g., \"state_of_the_union.txt\").\n",
"* Text Splitting: The CharacterTextSplitter breaks the text into smaller chunks for embedding models."
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [],
"source": [
"from langchain.text_splitter import CharacterTextSplitter\n",
"from langchain_community.document_loaders import TextLoader\n",
"\n",
"loader = TextLoader(\"./state_of_the_union.txt\")\n",
"documents = loader.load()\n",
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
"docs = text_splitter.split_documents(documents)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"### Add Documents to the Vector Store\n",
"\n",
"After text preparation and embedding generation, the following methods insert them into the Redis vector store."
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Method 1: Classmethod for Direct Insertion\n",
"\n",
"This approach combines embedding creation and insertion into a single step using the from_documents classmethod:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain_community.embeddings.fake import FakeEmbeddings\n",
"\n",
"embeddings = FakeEmbeddings(size=128)\n",
"redis_client = redis.from_url(\"redis://127.0.0.1:6379\")\n",
"rvs = RedisVectorStore.from_documents(\n",
" docs, embedding=embeddings, client=redis_client, index_name=\"my_vector_index\"\n",
")"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Method 2: Instance-Based Insertion\n",
"This approach offers flexibility when working with a new or existing RedisVectorStore:\n",
"\n",
"* [Optional] Create a RedisVectorStore Instance: Instantiate a RedisVectorStore object for customization. If you already have an instance, proceed to the next step.\n",
"* Add Text with Metadata: Provide raw text and metadata to the instance. Embedding generation and insertion into the vector store are handled automatically."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"rvs = RedisVectorStore(\n",
" client=redis_client, index_name=\"my_vector_index\", embeddings=embeddings\n",
")\n",
"ids = rvs.add_texts(\n",
" texts=[d.page_content for d in docs], metadatas=[d.metadata for d in docs]\n",
")"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"### Perform a Similarity Search (KNN)\n",
"\n",
"With the vector store populated, it's possible to search for text semantically similar to a query. Here's how to use KNN (K-Nearest Neighbors) with default settings:\n",
"\n",
"* Formulate the Query: A natural language question expresses the search intent (e.g., \"What did the president say about Ketanji Brown Jackson\").\n",
"* Retrieve Similar Results: The `similarity_search` method finds items in the vector store closest to the query in meaning."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import pprint\n",
"\n",
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
"knn_results = rvs.similarity_search(query=query)\n",
"pprint.pprint(knn_results)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"### Perform a Range-Based Similarity Search\n",
"\n",
"Range queries provide more control by specifying a desired similarity threshold along with the query text:\n",
"\n",
"* Formulate the Query: A natural language question defines the search intent.\n",
"* Set Similarity Threshold: The distance_threshold parameter determines how close a match must be considered relevant.\n",
"* Retrieve Results: The `similarity_search_with_score` method finds items from the vector store that fall within the specified similarity threshold."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"rq_results = rvs.similarity_search_with_score(query=query, distance_threshold=0.8)\n",
"pprint.pprint(rq_results)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"### Perform a Maximal Marginal Relevance (MMR) Search\n",
"\n",
"MMR queries aim to find results that are both relevant to the query and diverse from each other, reducing redundancy in search results.\n",
"\n",
"* Formulate the Query: A natural language question defines the search intent.\n",
"* Balance Relevance and Diversity: The lambda_mult parameter controls the trade-off between strict relevance and promoting variety in the results.\n",
"* Retrieve MMR Results: The `max_marginal_relevance_search` method returns items that optimize the combination of relevance and diversity based on the lambda setting."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"mmr_results = rvs.max_marginal_relevance_search(query=query, lambda_mult=0.90)\n",
"pprint.pprint(mmr_results)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Use the Vector Store as a Retriever\n",
"\n",
"For seamless integration with other LangChain components, a vector store can be converted into a Retriever. This offers several advantages:\n",
"\n",
"* LangChain Compatibility: Many LangChain tools and methods are designed to directly interact with retrievers.\n",
"* Ease of Use: The `as_retriever()` method converts the vector store into a format that simplifies querying."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"retriever = rvs.as_retriever()\n",
"results = retriever.invoke(query)\n",
"pprint.pprint(results)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Clean up"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"### Delete Documents from the Vector Store\n",
"\n",
"Occasionally, it's necessary to remove documents (and their associated vectors) from the vector store. The `delete` method provides this functionality."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"rvs.delete(ids)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"### Delete a Vector Index\n",
"\n",
"There might be circumstances where the deletion of an existing vector index is necessary. Common reasons include:\n",
"\n",
"* Index Configuration Changes: If index parameters need modification, it's often required to delete and recreate the index.\n",
"* Storage Management: Removing unused indices can help free up space within the Redis instance.\n",
"\n",
"Caution: Vector index deletion is an irreversible operation. Be certain that the stored vectors and search functionality are no longer required before proceeding."
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [],
"source": [
"# Delete the vector index\n",
"RedisVectorStore.drop_index(client=redis_client, index_name=\"my_vector_index\")"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.6"
}
},
"nbformat": 4,
"nbformat_minor": 4
}

View File

@@ -0,0 +1,380 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Google Spanner\n",
"> [Cloud Spanner](https://cloud.google.com/spanner) is a highly scalable database that combines unlimited scalability with relational semantics, such as secondary indexes, strong consistency, schemas, and SQL providing 99.999% availability in one easy solution.\n",
"\n",
"This notebook goes over how to use `Spanner` for Vector Search with `SpannerVectorStore` class."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Before You Begin\n",
"\n",
"To run this notebook, you will need to do the following:\n",
"\n",
" * [Create a Google Cloud Project](https://developers.google.com/workspace/guides/create-project)\n",
" * [Create a Spanner instance](https://cloud.google.com/spanner/docs/create-manage-instances)\n",
" * [Create a Spanner database](https://cloud.google.com/spanner/docs/create-manage-databases)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 🦜🔗 Library Installation\n",
"The integration lives in its own `langchain-google-spanner` package, so we need to install it."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Note: you may need to restart the kernel to use updated packages.\n"
]
}
],
"source": [
"%pip install --upgrade --quiet langchain-google-spanner"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Colab only:** Uncomment the following cell to restart the kernel or use the button to restart the kernel. For Vertex AI Workbench you can restart the terminal using the button on top."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"# # Automatically restart kernel after installs so that your environment can access the new packages\n",
"# import IPython\n",
"\n",
"# app = IPython.Application.instance()\n",
"# app.kernel.do_shutdown(True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 🔐 Authentication\n",
"Authenticate to Google Cloud as the IAM user logged into this notebook in order to access your Google Cloud Project.\n",
"\n",
"* If you are using Colab to run this notebook, use the cell below and continue.\n",
"* If you are using Vertex AI Workbench, check out the setup instructions [here](https://github.com/GoogleCloudPlatform/generative-ai/tree/main/setup-env)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from google.colab import auth\n",
"\n",
"auth.authenticate_user()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### ☁ Set Your Google Cloud Project\n",
"Set your Google Cloud project so that you can leverage Google Cloud resources within this notebook.\n",
"\n",
"If you don't know your project ID, try the following:\n",
"\n",
"* Run `gcloud config list`.\n",
"* Run `gcloud projects list`.\n",
"* See the support page: [Locate the project ID](https://support.google.com/googleapi/answer/7014113)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# @markdown Please fill in the value below with your Google Cloud project ID and then run the cell.\n",
"\n",
"PROJECT_ID = \"my-project-id\" # @param {type:\"string\"}\n",
"\n",
"# Set the project id\n",
"!gcloud config set project {PROJECT_ID}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 💡 API Enablement\n",
"The `langchain-google-spanner` package requires that you [enable the Spanner API](https://console.cloud.google.com/flows/enableapi?apiid=spanner.googleapis.com) in your Google Cloud Project."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# enable Spanner API\n",
"!gcloud services enable spanner.googleapis.com"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Basic Usage"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Set Cloud Spanner database values\n",
"Find your database values, in the [Cloud Spanner Instances page](https://console.cloud.google.com/spanner?_ga=2.223735448.2062268965.1707700487-2088871159.1707257687)."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"# @title Set Your Values Here { display-mode: \"form\" }\n",
"INSTANCE = \"my-instance\" # @param {type: \"string\"}\n",
"DATABASE = \"my-database\" # @param {type: \"string\"}\n",
"TABLE_NAME = \"vectors_search_data\" # @param {type: \"string\"}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Initialize a table\n",
"The `SpannerVectorStore` class instance requires a database table with id, content and embeddings columns. \n",
"\n",
"The helper method `init_vector_store_table()` that can be used to create a table with the proper schema for you."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain_google_spanner import SecondaryIndex, SpannerVectorStore, TableColumn\n",
"\n",
"SpannerVectorStore.init_vector_store_table(\n",
" instance_id=INSTANCE,\n",
" database_id=DATABASE,\n",
" table_name=TABLE_NAME,\n",
" id_column=\"row_id\",\n",
" metadata_columns=[\n",
" TableColumn(name=\"metadata\", type=\"JSON\", is_null=True),\n",
" TableColumn(name=\"title\", type=\"STRING(MAX)\", is_null=False),\n",
" ],\n",
" secondary_indexes=[\n",
" SecondaryIndex(index_name=\"row_id_and_title\", columns=[\"row_id\", \"title\"])\n",
" ],\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create an embedding class instance\n",
"\n",
"You can use any [LangChain embeddings model](https://python.langchain.com/docs/integrations/text_embedding/).\n",
"You may need to enable Vertex AI API to use `VertexAIEmbeddings`. We recommend setting the embedding model's version for production, learn more about the [Text embeddings models](https://cloud.google.com/vertex-ai/docs/generative-ai/model-reference/text-embeddings)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# enable Vertex AI API\n",
"!gcloud services enable aiplatform.googleapis.com"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"from langchain_google_vertexai import VertexAIEmbeddings\n",
"\n",
"embeddings = VertexAIEmbeddings(\n",
" model_name=\"textembedding-gecko@latest\", project=PROJECT_ID\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### SpannerVectorStore\n",
"\n",
"To initialize the `SpannerVectorStore` class you need to provide 4 required arguments and other arguments are optional and only need to pass if it's different from default ones\n",
"\n",
"1. `instance_id` - The name of the Spanner instance\n",
"1. `database_id` - The name of the Spanner database\n",
"1. `table_name` - The name of the table within the database to store the documents & their embeddings.\n",
"1. `embedding_service` - The Embeddings implementation which is used to generate the embeddings."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"db = SpannerVectorStore(\n",
" instance_id=INSTANCE,\n",
" database_id=DATABASE,\n",
" table_name=TABLE_NAME,\n",
" ignore_metadata_columns=[],\n",
" embedding_service=embeddings,\n",
" metadata_json_column=\"metadata\",\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### 🔐 Add Documents\n",
"To add documents in the vector store."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import uuid\n",
"\n",
"from langchain_community.document_loaders import HNLoader\n",
"\n",
"loader = HNLoader(\"https://news.ycombinator.com/item?id=34817881\")\n",
"\n",
"documents = loader.load()\n",
"ids = [str(uuid.uuid4()) for _ in range(len(documents))]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### 🔐 Search Documents\n",
"To search documents in the vector store with similarity search."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"db.similarity_search(query=\"Explain me vector store?\", k=3)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### 🔐 Search Documents\n",
"To search documents in the vector store with max marginal relevance search."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"db.max_marginal_relevance_search(\"Testing the langchain integration with spanner\", k=3)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### 🔐 Delete Documents\n",
"To remove documents from the vector store, use the IDs that correspond to the values in the `row_id`` column when initializing the VectorStore."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"db.delete(ids=[\"id1\", \"id2\"])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### 🔐 Delete Documents\n",
"To remove documents from the vector store, you can utilize the documents themselves. The content column and metadata columns provided during VectorStore initialization will be used to find out the rows corresponding to the documents. Any matching rows will then be deleted."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"db.delete(documents=[documents[0], documents[1]])"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.6"
}
},
"nbformat": 4,
"nbformat_minor": 4
}

View File

@@ -0,0 +1,581 @@
{
"cells": [
{
"cell_type": "raw",
"metadata": {},
"source": [
"---\n",
"sidebar_label: Kinetica\n",
"---"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Kinetica Vectorstore API\n",
"\n",
">[Kinetica](https://www.kinetica.com/) is a database with integrated support for vector similarity search\n",
"\n",
"It supports:\n",
"- exact and approximate nearest neighbor search\n",
"- L2 distance, inner product, and cosine distance\n",
"\n",
"This notebook shows how to use the Kinetica vector store (`Kinetica`)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This needs an instance of Kinetica which can easily be setup using the instructions given here - [installation instruction](https://www.kinetica.com/developer-edition/)."
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m23.2.1\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m24.0\u001b[0m\n",
"\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpip install --upgrade pip\u001b[0m\n",
"Note: you may need to restart the kernel to use updated packages.\n",
"Requirement already satisfied: gpudb==7.2.0.0b in /home/anindyam/kinetica/kinetica-github/langchain/libs/langchain/.venv/lib/python3.8/site-packages (7.2.0.0b0)\n",
"Requirement already satisfied: future in /home/anindyam/kinetica/kinetica-github/langchain/libs/langchain/.venv/lib/python3.8/site-packages (from gpudb==7.2.0.0b) (0.18.3)\n",
"Requirement already satisfied: pyzmq in /home/anindyam/kinetica/kinetica-github/langchain/libs/langchain/.venv/lib/python3.8/site-packages (from gpudb==7.2.0.0b) (25.1.2)\n",
"\n",
"\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m23.2.1\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m24.0\u001b[0m\n",
"\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpip install --upgrade pip\u001b[0m\n",
"Note: you may need to restart the kernel to use updated packages.\n",
"\n",
"\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m23.2.1\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m24.0\u001b[0m\n",
"\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpip install --upgrade pip\u001b[0m\n",
"Note: you may need to restart the kernel to use updated packages.\n"
]
}
],
"source": [
"# Pip install necessary package\n",
"%pip install --upgrade --quiet langchain-openai\n",
"%pip install gpudb==7.2.0.1\n",
"%pip install --upgrade --quiet tiktoken"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We want to use `OpenAIEmbeddings` so we have to get the OpenAI API Key."
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [],
"source": [
"import getpass\n",
"import os\n",
"\n",
"os.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"OpenAI API Key:\")"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"False"
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"## Loading Environment Variables\n",
"from dotenv import load_dotenv\n",
"\n",
"load_dotenv()"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [],
"source": [
"from langchain.docstore.document import Document\n",
"from langchain.text_splitter import CharacterTextSplitter\n",
"from langchain_community.document_loaders import TextLoader\n",
"from langchain_community.vectorstores import (\n",
" DistanceStrategy,\n",
" Kinetica,\n",
" KineticaSettings,\n",
")\n",
"from langchain_openai import OpenAIEmbeddings"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [],
"source": [
"loader = TextLoader(\"../../modules/state_of_the_union.txt\")\n",
"documents = loader.load()\n",
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
"docs = text_splitter.split_documents(documents)\n",
"\n",
"embeddings = OpenAIEmbeddings()"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [],
"source": [
"# Kinetica needs the connection to the database.\n",
"# This is how to set it up.\n",
"HOST = os.getenv(\"KINETICA_HOST\", \"http://127.0.0.1:9191\")\n",
"USERNAME = os.getenv(\"KINETICA_USERNAME\", \"\")\n",
"PASSWORD = os.getenv(\"KINETICA_PASSWORD\", \"\")\n",
"OPENAI_API_KEY = os.getenv(\"OPENAI_API_KEY\", \"\")\n",
"\n",
"\n",
"def create_config() -> KineticaSettings:\n",
" return KineticaSettings(host=HOST, username=USERNAME, password=PASSWORD)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Similarity Search with Euclidean Distance (Default)"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [],
"source": [
"# The Kinetica Module will try to create a table with the name of the collection.\n",
"# So, make sure that the collection name is unique and the user has the permission to create a table.\n",
"\n",
"COLLECTION_NAME = \"state_of_the_union_test\"\n",
"connection = create_config()\n",
"\n",
"db = Kinetica.from_documents(\n",
" embedding=embeddings,\n",
" documents=docs,\n",
" collection_name=COLLECTION_NAME,\n",
" config=connection,\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [],
"source": [
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
"docs_with_score = db.similarity_search_with_score(query)"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"--------------------------------------------------------------------------------\n",
"Score: 0.6077010035514832\n",
"Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while youre at it, pass the Disclose Act so Americans can know who is funding our elections. \n",
"\n",
"Tonight, Id like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n",
"\n",
"One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n",
"\n",
"And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nations top legal minds, who will continue Justice Breyers legacy of excellence.\n",
"--------------------------------------------------------------------------------\n",
"--------------------------------------------------------------------------------\n",
"Score: 0.6077010035514832\n",
"Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while youre at it, pass the Disclose Act so Americans can know who is funding our elections. \n",
"\n",
"Tonight, Id like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n",
"\n",
"One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n",
"\n",
"And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nations top legal minds, who will continue Justice Breyers legacy of excellence.\n",
"--------------------------------------------------------------------------------\n",
"--------------------------------------------------------------------------------\n",
"Score: 0.6596046090126038\n",
"A former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus builder. Since shes been nominated, shes received a broad range of support—from the Fraternal Order of Police to former judges appointed by Democrats and Republicans. \n",
"\n",
"And if we are to advance liberty and justice, we need to secure the Border and fix the immigration system. \n",
"\n",
"We can do both. At our border, weve installed new technology like cutting-edge scanners to better detect drug smuggling. \n",
"\n",
"Weve set up joint patrols with Mexico and Guatemala to catch more human traffickers. \n",
"\n",
"Were putting in place dedicated immigration judges so families fleeing persecution and violence can have their cases heard faster. \n",
"\n",
"Were securing commitments and supporting partners in South and Central America to host more refugees and secure their own borders.\n",
"--------------------------------------------------------------------------------\n",
"--------------------------------------------------------------------------------\n",
"Score: 0.6597143411636353\n",
"A former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus builder. Since shes been nominated, shes received a broad range of support—from the Fraternal Order of Police to former judges appointed by Democrats and Republicans. \n",
"\n",
"And if we are to advance liberty and justice, we need to secure the Border and fix the immigration system. \n",
"\n",
"We can do both. At our border, weve installed new technology like cutting-edge scanners to better detect drug smuggling. \n",
"\n",
"Weve set up joint patrols with Mexico and Guatemala to catch more human traffickers. \n",
"\n",
"Were putting in place dedicated immigration judges so families fleeing persecution and violence can have their cases heard faster. \n",
"\n",
"Were securing commitments and supporting partners in South and Central America to host more refugees and secure their own borders.\n",
"--------------------------------------------------------------------------------\n"
]
}
],
"source": [
"for doc, score in docs_with_score:\n",
" print(\"-\" * 80)\n",
" print(\"Score: \", score)\n",
" print(doc.page_content)\n",
" print(\"-\" * 80)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Maximal Marginal Relevance Search (MMR)\n",
"Maximal marginal relevance optimizes for similarity to query AND diversity among selected documents."
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {},
"outputs": [],
"source": [
"docs_with_score = db.max_marginal_relevance_search_with_score(query)"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"--------------------------------------------------------------------------------\n",
"Score: 0.6077010035514832\n",
"Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while youre at it, pass the Disclose Act so Americans can know who is funding our elections. \n",
"\n",
"Tonight, Id like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n",
"\n",
"One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n",
"\n",
"And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nations top legal minds, who will continue Justice Breyers legacy of excellence.\n",
"--------------------------------------------------------------------------------\n",
"--------------------------------------------------------------------------------\n",
"Score: 0.6852865219116211\n",
"It is going to transform America and put us on a path to win the economic competition of the 21st Century that we face with the rest of the world—particularly with China. \n",
"\n",
"As Ive told Xi Jinping, it is never a good bet to bet against the American people. \n",
"\n",
"Well create good jobs for millions of Americans, modernizing roads, airports, ports, and waterways all across America. \n",
"\n",
"And well do it all to withstand the devastating effects of the climate crisis and promote environmental justice. \n",
"\n",
"Well build a national network of 500,000 electric vehicle charging stations, begin to replace poisonous lead pipes—so every child—and every American—has clean water to drink at home and at school, provide affordable high-speed internet for every American—urban, suburban, rural, and tribal communities. \n",
"\n",
"4,000 projects have already been announced. \n",
"\n",
"And tonight, Im announcing that this year we will start fixing over 65,000 miles of highway and 1,500 bridges in disrepair.\n",
"--------------------------------------------------------------------------------\n",
"--------------------------------------------------------------------------------\n",
"Score: 0.6866700053215027\n",
"We cant change how divided weve been. But we can change how we move forward—on COVID-19 and other issues we must face together. \n",
"\n",
"I recently visited the New York City Police Department days after the funerals of Officer Wilbert Mora and his partner, Officer Jason Rivera. \n",
"\n",
"They were responding to a 9-1-1 call when a man shot and killed them with a stolen gun. \n",
"\n",
"Officer Mora was 27 years old. \n",
"\n",
"Officer Rivera was 22. \n",
"\n",
"Both Dominican Americans whod grown up on the same streets they later chose to patrol as police officers. \n",
"\n",
"I spoke with their families and told them that we are forever in debt for their sacrifice, and we will carry on their mission to restore the trust and safety every community deserves. \n",
"\n",
"Ive worked on these issues a long time. \n",
"\n",
"I know what works: Investing in crime prevention and community police officers wholl walk the beat, wholl know the neighborhood, and who can restore trust and safety.\n",
"--------------------------------------------------------------------------------\n",
"--------------------------------------------------------------------------------\n",
"Score: 0.6936529278755188\n",
"But cancer from prolonged exposure to burn pits ravaged Heaths lungs and body. \n",
"\n",
"Danielle says Heath was a fighter to the very end. \n",
"\n",
"He didnt know how to stop fighting, and neither did she. \n",
"\n",
"Through her pain she found purpose to demand we do better. \n",
"\n",
"Tonight, Danielle—we are. \n",
"\n",
"The VA is pioneering new ways of linking toxic exposures to diseases, already helping more veterans get benefits. \n",
"\n",
"And tonight, Im announcing were expanding eligibility to veterans suffering from nine respiratory cancers. \n",
"\n",
"Im also calling on Congress: pass a law to make sure veterans devastated by toxic exposures in Iraq and Afghanistan finally get the benefits and comprehensive health care they deserve. \n",
"\n",
"And fourth, lets end cancer as we know it. \n",
"\n",
"This is personal to me and Jill, to Kamala, and to so many of you. \n",
"\n",
"Cancer is the #2 cause of death in Americasecond only to heart disease.\n",
"--------------------------------------------------------------------------------\n"
]
}
],
"source": [
"for doc, score in docs_with_score:\n",
" print(\"-\" * 80)\n",
" print(\"Score: \", score)\n",
" print(doc.page_content)\n",
" print(\"-\" * 80)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Working with vectorstore\n",
"\n",
"Above, we created a vectorstore from scratch. However, often times we want to work with an existing vectorstore.\n",
"In order to do that, we can initialize it directly."
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {},
"outputs": [],
"source": [
"store = Kinetica(\n",
" collection_name=COLLECTION_NAME,\n",
" config=connection,\n",
" embedding_function=embeddings,\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Add documents\n",
"We can add documents to the existing vectorstore."
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['b94dc67c-ce7e-11ee-b8cb-b940b0e45762']"
]
},
"execution_count": 35,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"store.add_documents([Document(page_content=\"foo\")])"
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {},
"outputs": [],
"source": [
"docs_with_score = db.similarity_search_with_score(\"foo\")"
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(Document(page_content='foo'), 0.0)"
]
},
"execution_count": 37,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"docs_with_score[0]"
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(Document(page_content='A former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus builder. Since shes been nominated, shes received a broad range of support—from the Fraternal Order of Police to former judges appointed by Democrats and Republicans. \\n\\nAnd if we are to advance liberty and justice, we need to secure the Border and fix the immigration system. \\n\\nWe can do both. At our border, weve installed new technology like cutting-edge scanners to better detect drug smuggling. \\n\\nWeve set up joint patrols with Mexico and Guatemala to catch more human traffickers. \\n\\nWere putting in place dedicated immigration judges so families fleeing persecution and violence can have their cases heard faster. \\n\\nWere securing commitments and supporting partners in South and Central America to host more refugees and secure their own borders.', metadata={'source': '../../modules/state_of_the_union.txt'}),\n",
" 0.6946534514427185)"
]
},
"execution_count": 38,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"docs_with_score[1]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Overriding a vectorstore\n",
"\n",
"If you have an existing collection, you override it by doing `from_documents` and setting `pre_delete_collection` = True"
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {},
"outputs": [],
"source": [
"db = Kinetica.from_documents(\n",
" documents=docs,\n",
" embedding=embeddings,\n",
" collection_name=COLLECTION_NAME,\n",
" config=connection,\n",
" pre_delete_collection=True,\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {},
"outputs": [],
"source": [
"docs_with_score = db.similarity_search_with_score(\"foo\")"
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(Document(page_content='A former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus builder. Since shes been nominated, shes received a broad range of support—from the Fraternal Order of Police to former judges appointed by Democrats and Republicans. \\n\\nAnd if we are to advance liberty and justice, we need to secure the Border and fix the immigration system. \\n\\nWe can do both. At our border, weve installed new technology like cutting-edge scanners to better detect drug smuggling. \\n\\nWeve set up joint patrols with Mexico and Guatemala to catch more human traffickers. \\n\\nWere putting in place dedicated immigration judges so families fleeing persecution and violence can have their cases heard faster. \\n\\nWere securing commitments and supporting partners in South and Central America to host more refugees and secure their own borders.', metadata={'source': '../../modules/state_of_the_union.txt'}),\n",
" 0.6946534514427185)"
]
},
"execution_count": 41,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"docs_with_score[0]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Using a VectorStore as a Retriever"
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {},
"outputs": [],
"source": [
"retriever = store.as_retriever()"
]
},
{
"cell_type": "code",
"execution_count": 43,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"tags=['Kinetica', 'OpenAIEmbeddings'] vectorstore=<langchain_community.vectorstores.kinetica.Kinetica object at 0x7f1644375e20>\n"
]
}
],
"source": [
"print(retriever)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": ".venv",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.10"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -17,6 +17,19 @@
"source": [
"# OpenAI functions\n",
"\n",
":::{.callout-caution}\n",
"\n",
"OpenAI API has deprecated `functions` in favor of `tools`. The difference between the two is that the `tools` API allows the model to request that multiple functions be invoked at once, which can reduce response times in some architectures. It's recommended to use the tools agent for OpenAI models.\n",
"\n",
"See the following links for more information:\n",
"\n",
"[OpenAI Tools](./openai_tools)\n",
"\n",
"[OpenAI chat create](https://platform.openai.com/docs/api-reference/chat/create)\n",
"\n",
"[OpenAI function calling](https://platform.openai.com/docs/guides/function-calling)\n",
":::\n",
"\n",
"Certain OpenAI models (like gpt-3.5-turbo-0613 and gpt-4-0613) have been fine-tuned to detect when a function should be called and respond with the inputs that should be passed to the function. In an API call, you can describe functions and have the model intelligently choose to output a JSON object containing arguments to call those functions. The goal of the OpenAI Function APIs is to more reliably return valid and useful function calls than a generic text completion or chat API.\n",
"\n",
"A number of open source models have adopted the same format for function calls and have also fine-tuned the model to detect when a function should be called.\n",
@@ -25,19 +38,7 @@
"\n",
"Install `openai`, `tavily-python` packages which are required as the LangChain packages call them internally.\n",
"\n",
"\n",
":::info\n",
"\n",
"OpenAI API has deprecated `functions` in favor of `tools`. The difference between the two is that the `tools` API allows the model to request that multiple functions be invoked at once, which can reduce response times in some architectures. It's recommended to use the tools agent for OpenAI models.\n",
"\n",
"See the following links for more information:\n",
"\n",
"[OpenAI chat create](https://platform.openai.com/docs/api-reference/chat/create)\n",
"\n",
"[OpenAI function calling](https://platform.openai.com/docs/guides/function-calling)\n",
":::\n",
"\n",
":::tip\n",
":::{.callout-tip}\n",
"The `functions` format remains relevant for open source models and providers that have adopted it, and this agent is expected to work for such models.\n",
":::\n"
]

File diff suppressed because one or more lines are too long

View File

@@ -91,8 +91,8 @@
"outputs": [],
"source": [
"from langchain.indexes import SQLRecordManager, index\n",
"from langchain_community.vectorstores import ElasticsearchStore\n",
"from langchain_core.documents import Document\n",
"from langchain_elasticsearch import ElasticsearchStore\n",
"from langchain_openai import OpenAIEmbeddings"
]
},

View File

@@ -17,6 +17,11 @@ The base Embeddings class in LangChain provides two methods: one for embedding d
### Setup
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
<Tabs>
<TabItem value="openai" label="OpenAI" default>
To start we'll need to install the OpenAI partner package:
```bash
@@ -44,6 +49,39 @@ from langchain_openai import OpenAIEmbeddings
embeddings_model = OpenAIEmbeddings()
```
</TabItem>
<TabItem value="cohere" label="Cohere">
To start we'll need to install the Cohere SDK package:
```bash
pip install cohere
```
Accessing the API requires an API key, which you can get by creating an account and heading [here](https://dashboard.cohere.com/api-keys). Once we have a key we'll want to set it as an environment variable by running:
```shell
export COHERE_API_KEY="..."
```
If you'd prefer not to set an environment variable you can pass the key in directly via the `cohere_api_key` named parameter when initiating the Cohere LLM class:
```python
from langchain_community.embeddings import CohereEmbeddings
embeddings_model = CohereEmbeddings(cohere_api_key="...")
```
Otherwise you can initialize without any params:
```python
from langchain_community.embeddings import CohereEmbeddings
embeddings_model = CohereEmbeddings()
```
</TabItem>
</Tabs>
### `embed_documents`
#### Embed list of texts

View File

@@ -1,492 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "dae8d4ed-9150-45da-b494-7717ab0a2960",
"metadata": {},
"source": [
"# Function calling\n",
"\n",
"Certain chat models, like [OpenAI's](https://platform.openai.com/docs/guides/function-calling), have a function-calling API that lets you describe functions and their arguments, and have the model return a JSON object with a function to invoke and the inputs to that function. Function-calling is extremely useful for building [tool-using chains and agents](/docs/use_cases/tool_use/), and for getting structured outputs from models more generally.\n",
"\n",
"LangChain comes with a number of utilities to make function-calling easy. Namely, it comes with\n",
"\n",
"* simple syntax for binding functions to models\n",
"* converters for formatting various types of objects to the expected function schemas\n",
"* output parsers for extracting the function invocations from API responses\n",
"\n",
"We'll focus here on the first two bullets. To see how output parsing works as well check out the [OpenAI Tools output parsers](/docs/modules/model_io/output_parsers/types/openai_tools)."
]
},
{
"cell_type": "markdown",
"id": "a177c64b-7c99-495c-b362-5ed3b40aa26a",
"metadata": {},
"source": [
"## Defining functions\n",
"\n",
"We'll focus on the [OpenAI function format](https://platform.openai.com/docs/api-reference/chat/create#chat-create-tools) here since as of this writing that is the main model provider that supports function calling. LangChain has a built-in converter that can turn Python functions, Pydantic classes, and LangChain Tools into the OpenAI function format:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "f6d1dc0c-6170-4977-809f-365099f628ea",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m23.2.1\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m23.3.2\u001b[0m\n",
"\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpip install --upgrade pip\u001b[0m\n",
"Note: you may need to restart the kernel to use updated packages.\n"
]
}
],
"source": [
"%pip install -qU langchain-core langchain-openai"
]
},
{
"cell_type": "markdown",
"id": "6bd290bd-7621-466b-a73e-fc8480f879ec",
"metadata": {},
"source": [
"### Python function"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "41ebab5c-0e9f-4b49-86ee-9290ced2fe96",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{\n",
" \"type\": \"function\",\n",
" \"function\": {\n",
" \"name\": \"multiply\",\n",
" \"description\": \"Multiply two integers together.\",\n",
" \"parameters\": {\n",
" \"type\": \"object\",\n",
" \"properties\": {\n",
" \"a\": {\n",
" \"type\": \"integer\",\n",
" \"description\": \"First integer\"\n",
" },\n",
" \"b\": {\n",
" \"type\": \"integer\",\n",
" \"description\": \"Second integer\"\n",
" }\n",
" },\n",
" \"required\": [\n",
" \"a\",\n",
" \"b\"\n",
" ]\n",
" }\n",
" }\n",
"}\n"
]
}
],
"source": [
"import json\n",
"\n",
"from langchain_core.utils.function_calling import convert_to_openai_tool\n",
"\n",
"\n",
"def multiply(a: int, b: int) -> int:\n",
" \"\"\"Multiply two integers together.\n",
"\n",
" Args:\n",
" a: First integer\n",
" b: Second integer\n",
" \"\"\"\n",
" return a * b\n",
"\n",
"\n",
"print(json.dumps(convert_to_openai_tool(multiply), indent=2))"
]
},
{
"cell_type": "markdown",
"id": "ecf22577-38ab-48f1-ba0b-371aaba1bacc",
"metadata": {},
"source": [
"### Pydantic class"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "ecc8ffd4-aed3-4f47-892d-1896cc1ca4dc",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{\n",
" \"type\": \"function\",\n",
" \"function\": {\n",
" \"name\": \"multiply\",\n",
" \"description\": \"Multiply two integers together.\",\n",
" \"parameters\": {\n",
" \"type\": \"object\",\n",
" \"properties\": {\n",
" \"a\": {\n",
" \"description\": \"First integer\",\n",
" \"type\": \"integer\"\n",
" },\n",
" \"b\": {\n",
" \"description\": \"Second integer\",\n",
" \"type\": \"integer\"\n",
" }\n",
" },\n",
" \"required\": [\n",
" \"a\",\n",
" \"b\"\n",
" ]\n",
" }\n",
" }\n",
"}\n"
]
}
],
"source": [
"from langchain_core.pydantic_v1 import BaseModel, Field\n",
"\n",
"\n",
"class multiply(BaseModel):\n",
" \"\"\"Multiply two integers together.\"\"\"\n",
"\n",
" a: int = Field(..., description=\"First integer\")\n",
" b: int = Field(..., description=\"Second integer\")\n",
"\n",
"\n",
"print(json.dumps(convert_to_openai_tool(multiply), indent=2))"
]
},
{
"cell_type": "markdown",
"id": "b83d5a88-50ed-4ae4-85cf-8b895617496f",
"metadata": {},
"source": [
"### LangChain Tool"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "696c7dd6-660c-4797-909f-bf878b3acf93",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{\n",
" \"type\": \"function\",\n",
" \"function\": {\n",
" \"name\": \"multiply\",\n",
" \"description\": \"Multiply two integers together.\",\n",
" \"parameters\": {\n",
" \"type\": \"object\",\n",
" \"properties\": {\n",
" \"a\": {\n",
" \"description\": \"First integer\",\n",
" \"type\": \"integer\"\n",
" },\n",
" \"b\": {\n",
" \"description\": \"Second integer\",\n",
" \"type\": \"integer\"\n",
" }\n",
" },\n",
" \"required\": [\n",
" \"a\",\n",
" \"b\"\n",
" ]\n",
" }\n",
" }\n",
"}\n"
]
}
],
"source": [
"from typing import Any, Type\n",
"\n",
"from langchain_core.tools import BaseTool\n",
"\n",
"\n",
"class MultiplySchema(BaseModel):\n",
" \"\"\"Multiply tool schema.\"\"\"\n",
"\n",
" a: int = Field(..., description=\"First integer\")\n",
" b: int = Field(..., description=\"Second integer\")\n",
"\n",
"\n",
"class Multiply(BaseTool):\n",
" args_schema: Type[BaseModel] = MultiplySchema\n",
" name: str = \"multiply\"\n",
" description: str = \"Multiply two integers together.\"\n",
"\n",
" def _run(self, a: int, b: int, **kwargs: Any) -> Any:\n",
" return a * b\n",
"\n",
"\n",
"# Note: we're passing in a Multiply object not the class itself.\n",
"print(json.dumps(convert_to_openai_tool(Multiply()), indent=2))"
]
},
{
"cell_type": "markdown",
"id": "04bda177-202f-4811-bb74-f3fa7094a15b",
"metadata": {},
"source": [
"## Binding functions\n",
"\n",
"Now that we've defined a function, we'll want to pass it in to our model."
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "a5aa93a7-6859-43e8-be85-619d975b908c",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_JvOu9oUwMrQHiDekZTbpNCHY', 'function': {'arguments': '{\\n \"a\": 5,\\n \"b\": 3\\n}', 'name': 'multiply'}, 'type': 'function'}]})"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from langchain_openai import ChatOpenAI\n",
"\n",
"llm = ChatOpenAI(model=\"gpt-3.5-turbo\")\n",
"llm.invoke(\"what's 5 times three\", tools=[convert_to_openai_tool(multiply)])"
]
},
{
"cell_type": "markdown",
"id": "dd0e7365-32d0-46a3-b8f2-caf27d5d9262",
"metadata": {},
"source": [
"And if we wanted this function to be passed in every time we call the tool, we could bind it to the tool:"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "87165d64-31a7-4332-965e-18fa939fda50",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_cwRoTnD1ux1SnWXLrTj2KlWH', 'function': {'arguments': '{\\n \"a\": 5,\\n \"b\": 3\\n}', 'name': 'multiply'}, 'type': 'function'}]})"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"llm_with_tool = llm.bind(tools=[convert_to_openai_tool(multiply)])\n",
"llm_with_tool.invoke(\"what's 5 times three\")"
]
},
{
"cell_type": "markdown",
"id": "21b4d000-3828-4e32-9226-55119f47ee67",
"metadata": {},
"source": [
"We can also enforce that a tool is called using the [tool_choice](https://platform.openai.com/docs/api-reference/chat/create#chat-create-tools) parameter."
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "2daa354c-cc85-4a60-a9b2-b681ec22ca33",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_sWjLyioSZAtYMQRLMTzncz1v', 'function': {'arguments': '{\\n \"a\": 5,\\n \"b\": 4\\n}', 'name': 'multiply'}, 'type': 'function'}]})"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"llm_with_tool = llm.bind(\n",
" tools=[convert_to_openai_tool(multiply)],\n",
" tool_choice={\"type\": \"function\", \"function\": {\"name\": \"multiply\"}},\n",
")\n",
"llm_with_tool.invoke(\n",
" \"don't answer my question. no do answer my question. no don't. what's five times four\"\n",
")"
]
},
{
"cell_type": "markdown",
"id": "ce013d11-49ea-4de9-8bbc-bc9ae203002c",
"metadata": {},
"source": [
"The [ChatOpenAI](https://api.python.langchain.com/en/latest/chat_models/langchain_openai.chat_models.base.ChatOpenAI.html#langchain_openai.chat_models.base.ChatOpenAI) class even comes with a `bind_tools` helper function that handles converting function-like objects to the OpenAI format and binding them for you:"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "842c9914-ac28-428f-9fcc-556177e8e715",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_LCdBa4cbhMJPRdtkhDzpRh7x', 'function': {'arguments': '{\\n \"a\": 5,\\n \"b\": 3\\n}', 'name': 'multiply'}, 'type': 'function'}]})"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"llm_with_tool = llm.bind_tools([multiply], tool_choice=\"multiply\")\n",
"llm_with_tool.invoke(\"what's 5 times three\")"
]
},
{
"cell_type": "markdown",
"id": "7d6e22d8-9f33-4845-9364-0d276df35ff5",
"metadata": {},
"source": [
"## Legacy args `functions` and `function_call`\n",
"\n",
"Until Fall of 2023 the OpenAI API expected arguments `functions` and `funtion_call` instead of `tools` and `tool_choice`, and they had a slightly different format than `tools` and `tool_choice`. LangChain maintains utilities for using the old API if you need to use that as well:"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "a317f71e-177e-404b-b09c-8fb365a4d8a2",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'name': 'multiply',\n",
" 'description': 'Multiply two integers together.',\n",
" 'parameters': {'type': 'object',\n",
" 'properties': {'a': {'description': 'First integer', 'type': 'integer'},\n",
" 'b': {'description': 'Second integer', 'type': 'integer'}},\n",
" 'required': ['a', 'b']}}"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from langchain_core.utils.function_calling import convert_to_openai_function\n",
"\n",
"convert_to_openai_function(multiply)"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "dd124259-75e2-4704-9f57-824d3e463bfa",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"AIMessage(content='', additional_kwargs={'function_call': {'arguments': '{\\n \"a\": 3,\\n \"b\": 1000000\\n}', 'name': 'multiply'}})"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"llm_with_functions = llm.bind(\n",
" functions=[convert_to_openai_function(multiply)], function_call={\"name\": \"multiply\"}\n",
")\n",
"llm_with_functions.invoke(\"what's 3 times a million\")"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "d9a90af9-1c81-4ace-b155-1589f7308a1c",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"AIMessage(content='', additional_kwargs={'function_call': {'arguments': '{\\n \"a\": 3,\\n \"b\": 1000000\\n}', 'name': 'multiply'}})"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"llm_with_functions = llm.bind_functions([multiply], function_call=\"multiply\")\n",
"llm_with_functions.invoke(\"what's 3 times a million\")"
]
},
{
"cell_type": "markdown",
"id": "7779808d-d75c-4d76-890d-ba8c6c571514",
"metadata": {},
"source": [
"## Next steps\n",
"\n",
"* **Output parsing**: See [OpenAI Tools output parsers](/docs/modules/model_io/output_parsers/types/openai_tools) and [OpenAI Functions output parsers](/docs/modules/model_io/output_parsers/types/openai_functions) to learn about extracting the function calling API responses into various formats.\n",
"* **Tool use**: See how to construct chains and agents that actually call the invoked tools in [these guides](/docs/use_cases/tool_use/)."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "poetry-venv",
"language": "python",
"name": "poetry-venv"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,535 @@
---
sidebar_position: 1
title: Function calling
---
# Function calling
A growing number of chat models, like
[OpenAI](https://platform.openai.com/docs/guides/function-calling),
[Gemini](https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/function-calling),
etc., have a function-calling API that lets you describe functions and
their arguments, and have the model return a JSON object with a function
to invoke and the inputs to that function. Function-calling is extremely
useful for building [tool-using chains and
agents](../../../../docs/use_cases/tool_use/), and for getting
structured outputs from models more generally.
LangChain comes with a number of utilities to make function-calling
easy. Namely, it comes with:
- simple syntax for binding functions to models
- converters for formatting various types of objects to the expected
function schemas
- output parsers for extracting the function invocations from API
responses
- chains for getting structured outputs from a model, built on top of
function calling
Well focus here on the first two points. For a detailed guide on output
parsing check out the [OpenAI Tools output
parsers](../../../../docs/modules/model_io/output_parsers/types/openai_tools)
and to see the structured output chains check out the [Structured output
guide](../../../../docs/guides/structured_output).
Before getting started make sure you have `langchain-core` installed.
```python
%pip install -qU langchain-core langchain-openai
```
```python
import getpass
import os
```
## Binding functions
A number of models implement helper methods that will take care of
formatting and binding different function-like objects to the model.
Lets take a look at how we might take the following Pydantic function
schema and get different models to invoke it:
```python
from langchain_core.pydantic_v1 import BaseModel, Field
# Note that the docstrings here are crucial, as they will be passed along
# to the model along with the class name.
class Multiply(BaseModel):
"""Multiply two integers together."""
a: int = Field(..., description="First integer")
b: int = Field(..., description="Second integer")
```
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
<Tabs>
<TabItem value="openai" label="OpenAI" default>
Set up dependencies and API keys:
```python
%pip install -qU langchain-openai
```
```python
os.environ["OPENAI_API_KEY"] = getpass.getpass()
```
We can use the `ChatOpenAI.bind_tools()` method to handle converting
`Multiply` to an OpenAI function and binding it to the model (i.e.,
passing it in each time the model is invoked).
```python
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model="gpt-3.5-turbo-0125", temperature=0)
llm_with_tools = llm.bind_tools([Multiply])
llm_with_tools.invoke("what's 3 * 12")
```
``` text
AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_Q8ZQ97Qrj5zalugSkYMGV1Uo', 'function': {'arguments': '{"a":3,"b":12}', 'name': 'Multiply'}, 'type': 'function'}]})
```
We can add a tool parser to extract the tool calls from the generated
message to JSON:
```python
from langchain_core.output_parsers.openai_tools import JsonOutputToolsParser
tool_chain = llm_with_tools | JsonOutputToolsParser()
tool_chain.invoke("what's 3 * 12")
```
``` text
[{'type': 'Multiply', 'args': {'a': 3, 'b': 12}}]
```
Or back to the original Pydantic class:
```python
from langchain_core.output_parsers.openai_tools import PydanticToolsParser
tool_chain = llm_with_tools | PydanticToolsParser(tools=[Multiply])
tool_chain.invoke("what's 3 * 12")
```
``` text
[Multiply(a=3, b=12)]
```
If we wanted to force that a tool is used (and that it is used only
once), we can set the `tool_choice` argument:
```python
llm_with_multiply = llm.bind_tools([Multiply], tool_choice="Multiply")
llm_with_multiply.invoke(
"make up some numbers if you really want but I'm not forcing you"
)
```
``` text
AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_f3DApOzb60iYjTfOhVFhDRMI', 'function': {'arguments': '{"a":5,"b":10}', 'name': 'Multiply'}, 'type': 'function'}]})
```
For more see the [ChatOpenAI API
reference](https://api.python.langchain.com/en/latest/chat_models/langchain_openai.chat_models.base.ChatOpenAI.html#langchain_openai.chat_models.base.ChatOpenAI.bind_tools).
</TabItem>
<TabItem value="fireworks" label="Fireworks">
Install dependencies and set API keys:
```python
%pip install -qU langchain-fireworks
```
```python
os.environ["FIREWORKS_API_KEY"] = getpass.getpass()
```
We can use the `ChatFireworks.bind_tools()` method to handle converting
`Multiply` to a valid function schema and binding it to the model (i.e.,
passing it in each time the model is invoked).
```python
from langchain_fireworks import ChatFireworks
llm = ChatFireworks(model="accounts/fireworks/models/firefunction-v1", temperature=0)
llm_with_tools = llm.bind_tools([Multiply])
llm_with_tools.invoke("what's 3 * 12")
```
``` text
AIMessage(content='Three multiplied by twelve is 36.')
```
If our model isnt using the tool, as is the case here, we can force
tool usage by specifying `tool_choice="any"` or by specifying the name
of the specific tool we want used:
```python
llm_with_tools = llm.bind_tools([Multiply], tool_choice="Multiply")
llm_with_tools.invoke("what's 3 * 12")
```
``` text
AIMessage(content='', additional_kwargs={'tool_calls': [{'index': 0, 'id': 'call_qIP2bJugb67LGvc6Zhwkvfqc', 'type': 'function', 'function': {'name': 'Multiply', 'arguments': '{"a": 3, "b": 12}'}}]})
```
We can add a tool parser to extract the tool calls from the generated
message to JSON:
```python
from langchain_core.output_parsers.openai_tools import JsonOutputToolsParser
tool_chain = llm_with_tools | JsonOutputToolsParser()
tool_chain.invoke("what's 3 * 12")
```
``` text
[{'type': 'Multiply', 'args': {'a': 3, 'b': 12}}]
```
Or back to the original Pydantic class:
```python
from langchain_core.output_parsers.openai_tools import PydanticToolsParser
tool_chain = llm_with_tools | PydanticToolsParser(tools=[Multiply])
tool_chain.invoke("what's 3 * 12")
```
``` text
[Multiply(a=3, b=12)]
```
For more see the [ChatFireworks](https://api.python.langchain.com/en/latest/chat_models/langchain_fireworks.chat_models.ChatFireworks.html#langchain_fireworks.chat_models.ChatFireworks.bind_tools) reference.
</TabItem>
<TabItem value="mistral" label="Mistral">
Install dependencies and set API keys:
```python
%pip install -qU langchain-mistralai
```
```python
os.environ["MISTRAL_API_KEY"] = getpass.getpass()
```
We can use the `ChatMistralAI.bind_tools()` method to handle converting
`Multiply` to a valid function schema and binding it to the model (i.e.,
passing it in each time the model is invoked).
```python
from langchain_mistralai import ChatMistralAI
llm = ChatMistralAI(model="mistral-large-latest", temperature=0)
llm_with_tools = llm.bind_tools([Multiply])
llm_with_tools.invoke("what's 3 * 12")
```
``` text
AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'null', 'type': <ToolType.function: 'function'>, 'function': {'name': 'Multiply', 'arguments': '{"a": 3, "b": 12}'}}]})
```
We can add a tool parser to extract the tool calls from the generated
message to JSON:
```python
from langchain_core.output_parsers.openai_tools import JsonOutputToolsParser
tool_chain = llm_with_tools | JsonOutputToolsParser()
tool_chain.invoke("what's 3 * 12")
```
``` text
[{'type': 'Multiply', 'args': {'a': 3, 'b': 12}}]
```
Or back to the original Pydantic class:
```python
from langchain_core.output_parsers.openai_tools import PydanticToolsParser
tool_chain = llm_with_tools | PydanticToolsParser(tools=[Multiply])
tool_chain.invoke("what's 3 * 12")
```
``` text
[Multiply(a=3, b=12)]
```
We can force tool usage by specifying `tool_choice="any"`:
```python
llm_with_tools = llm.bind_tools([Multiply], tool_choice="any")
llm_with_tools.invoke("I don't even want you to use the tool")
```
``` text
AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'null', 'type': <ToolType.function: 'function'>, 'function': {'name': 'Multiply', 'arguments': '{"a": 5, "b": 7}'}}]})
```
For more see the [ChatMistralAI API reference](https://api.python.langchain.com/en/latest/chat_models/langchain_mistralai.chat_models.ChatMistralAI.html#langchain_mistralai.chat_models.ChatMistralAI).
</TabItem>
<TabItem value="together" label="Together">
Since TogetherAI is a drop-in replacement for OpenAI, we can just use
the OpenAI integration.
Install dependencies and set API keys:
```python
%pip install -qU langchain-openai
```
```python
os.environ["TOGETHER_API_KEY"] = getpass.getpass()
```
We can use the `ChatOpenAI.bind_tools()` method to handle converting
`Multiply` to a valid function schema and binding it to the model (i.e.,
passing it in each time the model is invoked).
```python
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(
base_url="https://api.together.xyz/v1",
api_key=os.environ["TOGETHER_API_KEY"],
model="mistralai/Mixtral-8x7B-Instruct-v0.1",
)
llm_with_tools = llm.bind_tools([Multiply])
llm_with_tools.invoke("what's 3 * 12")
```
``` text
AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_4tc61dp0478zafqe33hfriee', 'function': {'arguments': '{"a":3,"b":12}', 'name': 'Multiply'}, 'type': 'function'}]})
```
We can add a tool parser to extract the tool calls from the generated
message to JSON:
```python
from langchain_core.output_parsers.openai_tools import JsonOutputToolsParser
tool_chain = llm_with_tools | JsonOutputToolsParser()
tool_chain.invoke("what's 3 * 12")
```
``` text
[{'type': 'Multiply', 'args': {'a': 3, 'b': 12}}]
```
Or back to the original Pydantic class:
```python
from langchain_core.output_parsers.openai_tools import PydanticToolsParser
tool_chain = llm_with_tools | PydanticToolsParser(tools=[Multiply])
tool_chain.invoke("what's 3 * 12")
```
``` text
[Multiply(a=3, b=12)]
```
If we wanted to force that a tool is used (and that it is used only
once), we can set the `tool_choice` argument:
```python
llm_with_multiply = llm.bind_tools([Multiply], tool_choice="Multiply")
llm_with_multiply.invoke(
"make up some numbers if you really want but I'm not forcing you"
)
```
``` text
AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_6k6d0gr3jhqil2kqf7sgeusl', 'function': {'arguments': '{"a":5,"b":7}', 'name': 'Multiply'}, 'type': 'function'}]})
```
For more see the [ChatOpenAI API
reference](https://api.python.langchain.com/en/latest/chat_models/langchain_openai.chat_models.base.ChatOpenAI.html#langchain_openai.chat_models.base.ChatOpenAI.bind_tools).
</TabItem>
</Tabs>
## Defining functions schemas
In case you need to access function schemas directly, LangChain has a built-in converter that can turn
Python functions, Pydantic classes, and LangChain Tools into the OpenAI format JSON schema:
### Python function
```python
import json
from langchain_core.utils.function_calling import convert_to_openai_tool
def multiply(a: int, b: int) -> int:
"""Multiply two integers together.
Args:
a: First integer
b: Second integer
"""
return a * b
print(json.dumps(convert_to_openai_tool(multiply), indent=2))
```
``` text
{
"type": "function",
"function": {
"name": "multiply",
"description": "Multiply two integers together.",
"parameters": {
"type": "object",
"properties": {
"a": {
"type": "integer",
"description": "First integer"
},
"b": {
"type": "integer",
"description": "Second integer"
}
},
"required": [
"a",
"b"
]
}
}
}
```
### Pydantic class
```python
from langchain_core.pydantic_v1 import BaseModel, Field
class multiply(BaseModel):
"""Multiply two integers together."""
a: int = Field(..., description="First integer")
b: int = Field(..., description="Second integer")
print(json.dumps(convert_to_openai_tool(multiply), indent=2))
```
``` text
{
"type": "function",
"function": {
"name": "multiply",
"description": "Multiply two integers together.",
"parameters": {
"type": "object",
"properties": {
"a": {
"description": "First integer",
"type": "integer"
},
"b": {
"description": "Second integer",
"type": "integer"
}
},
"required": [
"a",
"b"
]
}
}
}
```
### LangChain Tool
```python
from typing import Any, Type
from langchain_core.tools import BaseTool
class MultiplySchema(BaseModel):
"""Multiply tool schema."""
a: int = Field(..., description="First integer")
b: int = Field(..., description="Second integer")
class Multiply(BaseTool):
args_schema: Type[BaseModel] = MultiplySchema
name: str = "multiply"
description: str = "Multiply two integers together."
def _run(self, a: int, b: int, **kwargs: Any) -> Any:
return a * b
# Note: we're passing in a Multiply object not the class itself.
print(json.dumps(convert_to_openai_tool(Multiply()), indent=2))
```
``` text
{
"type": "function",
"function": {
"name": "multiply",
"description": "Multiply two integers together.",
"parameters": {
"type": "object",
"properties": {
"a": {
"description": "First integer",
"type": "integer"
},
"b": {
"description": "Second integer",
"type": "integer"
}
},
"required": [
"a",
"b"
]
}
}
}
```
## Next steps
- **Output parsing**: See [OpenAI Tools output
parsers](../../../../docs/modules/model_io/output_parsers/types/openai_tools)
and [OpenAI Functions output
parsers](../../../../docs/modules/model_io/output_parsers/types/openai_functions)
to learn about extracting the function calling API responses into
various formats.
- **Structured output chains**: [Some models have constructors](../../../../docs/guides/structured_output) that
handle creating a structured output chain for you.
- **Tool use**: See how to construct chains and agents that actually
call the invoked tools in [these
guides](../../../../docs/use_cases/tool_use/).

View File

@@ -46,7 +46,7 @@ llm = ChatOpenAI(openai_api_key="...")
```
</TabItem>
<TabItem value="local" label="Local">
<TabItem value="local" label="Local (using Ollama)">
[Ollama](https://ollama.ai/) allows you to run open-source large language models, such as Llama 2, locally.
@@ -62,6 +62,37 @@ from langchain_community.chat_models import ChatOllama
llm = Ollama(model="llama2")
chat_model = ChatOllama()
```
</TabItem>
<TabItem value="cohere" label="Cohere">
First we'll need to install their partner package:
```shell
pip install cohere
```
Accessing the API requires an API key, which you can get by creating an account and heading [here](https://dashboard.cohere.com/api-keys). Once we have a key we'll want to set it as an environment variable by running:
```shell
export COHERE_API_KEY="..."
```
We can then initialize the model:
```python
from langchain_community.chat_models import ChatCohere
llm = ChatCohere()
```
If you'd prefer not to set an environment variable you can pass the key in directly via the `cohere_api_key` named parameter when initiating the Cohere LLM class:
```python
from langchain_community.chat_models import ChatCohere
llm = ChatCohere(cohere_api_key="...")
```
</TabItem>

View File

@@ -19,9 +19,7 @@
"\n",
"## Use case\n",
"\n",
"Getting structured output from raw LLM generations is hard.\n",
"\n",
"For example, suppose you need the model output formatted with a specific schema for:\n",
"LLMs can be used to generate text that is structured according to a specific schema. This can be useful in a number of scenarios, including:\n",
"\n",
"- Extracting a structured row to insert into a database \n",
"- Extracting API parameters\n",
@@ -43,17 +41,23 @@
"source": [
"## Overview \n",
"\n",
"There are two primary approaches for this:\n",
"There are two broad approaches for this:\n",
"\n",
"- `Functions`: Some LLMs can call [functions](https://openai.com/blog/function-calling-and-other-api-updates) to extract arbitrary entities from LLM responses.\n",
"- `Tools and JSON mode`: Some LLMs specifically support structured output generation in certain contexts. Examples include OpenAI's [function and tool calling](https://platform.openai.com/docs/guides/function-calling) or [JSON mode](https://platform.openai.com/docs/guides/text-generation/json-mode).\n",
"\n",
"- `Parsing`: [Output parsers](/docs/modules/model_io/output_parsers/) are classes that structure LLM responses. \n",
"\n",
"Only some LLMs support functions (e.g., OpenAI), and they are more general than parsers. \n",
"- `Parsing`: LLMs can often be instructed to output their response in a dseired format. [Output parsers](/docs/modules/model_io/output_parsers/) will parse text generations into a structured form.\n",
"\n",
"Parsers extract precisely what is enumerated in a provided schema (e.g., specific attributes of a person).\n",
"\n",
"Functions can infer things beyond of a provided schema (e.g., attributes about a person that you did not ask for)."
"Functions and tools can infer things beyond of a provided schema (e.g., attributes about a person that you did not ask for)."
]
},
{
"cell_type": "markdown",
"id": "fbea06b5-66b6-4958-936d-23212061e4c8",
"metadata": {},
"source": [
"## Option 1: Leveraging tools and JSON mode"
]
},
{
@@ -61,13 +65,16 @@
"id": "25d89f21",
"metadata": {},
"source": [
"## Quickstart\n",
"### Quickstart\n",
"\n",
"OpenAI functions are one way to get started with extraction.\n",
"`create_structured_output_runnable` will create Runnables to support structured data extraction via OpenAI tool use and JSON mode.\n",
"\n",
"Define a schema that specifies the properties we want to extract from the LLM output.\n",
"The desired output schema can be expressed either via a Pydantic model or a Python dict representing valid [JsonSchema](https://json-schema.org/).\n",
"\n",
"Then, we can use `create_extraction_chain` to extract our desired schema using an OpenAI function call."
"This function supports three modes for structured data extraction:\n",
"- `\"openai-functions\"` will define OpenAI functions and bind them to the given LLM;\n",
"- `\"openai-tools\"` will define OpenAI tools and bind them to the given LLM;\n",
"- `\"openai-json\"` will bind `response_format={\"type\": \"json_object\"}` to the given LLM.\n"
]
},
{
@@ -86,28 +93,131 @@
},
{
"cell_type": "code",
"execution_count": 12,
"id": "3e017ba0",
"execution_count": 1,
"id": "4c2bc413-eacd-44bd-9fcb-bbbe1f97ca6c",
"metadata": {},
"outputs": [],
"source": [
"from typing import Optional\n",
"\n",
"from langchain.chains import create_structured_output_runnable\n",
"from langchain_core.pydantic_v1 import BaseModel\n",
"from langchain_openai import ChatOpenAI\n",
"\n",
"\n",
"class Person(BaseModel):\n",
" person_name: str\n",
" person_height: int\n",
" person_hair_color: str\n",
" dog_breed: Optional[str]\n",
" dog_name: Optional[str]\n",
"\n",
"\n",
"llm = ChatOpenAI(model=\"gpt-4-0125-preview\", temperature=0)\n",
"runnable = create_structured_output_runnable(Person, llm)"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "de8c9d7b-bb7b-45bc-9794-a355ed0d1508",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[{'name': 'Alex', 'height': 5, 'hair_color': 'blonde'},\n",
" {'name': 'Claudia', 'height': 6, 'hair_color': 'brunette'}]"
"Person(person_name='Alex', person_height=60, person_hair_color='blond', dog_breed=None, dog_name=None)"
]
},
"execution_count": 12,
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from langchain.chains import create_extraction_chain\n",
"from langchain_openai import ChatOpenAI\n",
"inp = \"Alex is 5 feet tall and has blond hair.\"\n",
"runnable.invoke(inp)"
]
},
{
"cell_type": "markdown",
"id": "02fd21ff-27a8-4890-bb18-fc852cafb18a",
"metadata": {},
"source": [
"### Specifying schemas"
]
},
{
"cell_type": "markdown",
"id": "a5a74f3e-92aa-4ac7-96f2-ea89b8740ba8",
"metadata": {},
"source": [
"A convenient way to express desired output schemas is via Pydantic. The above example specified the desired output schema via `Person`, a Pydantic model. Such schemas can be easily combined together to generate richer output formats:"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "c1c8fe71-0ae4-466a-b32f-001c59b62bb3",
"metadata": {},
"outputs": [],
"source": [
"from typing import Sequence\n",
"\n",
"# Schema\n",
"\n",
"class People(BaseModel):\n",
" \"\"\"Identifying information about all people in a text.\"\"\"\n",
"\n",
" people: Sequence[Person]\n",
"\n",
"\n",
"runnable = create_structured_output_runnable(People, llm)"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "c5aa9e43-9202-4b2d-a767-e596296b3a81",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"People(people=[Person(person_name='Alex', person_height=5, person_hair_color='blond', dog_breed=None, dog_name=None), Person(person_name='Claudia', person_height=6, person_hair_color='brunette', dog_breed='beagle', dog_name='Harry')])"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"inp = \"\"\"Alex is 5 feet tall and has blond hair.\n",
"Claudia is 1 feet taller Alex and jumps higher than him.\n",
"Claudia is a brunette and has a beagle named Harry.\"\"\"\n",
"\n",
"runnable.invoke(inp)"
]
},
{
"cell_type": "markdown",
"id": "53e316ea-b74a-4512-a9ab-c5d01ff583fe",
"metadata": {},
"source": [
"Note that `dog_breed` and `dog_name` are optional attributes, such that here they are extracted for Claudia and not for Alex.\n",
"\n",
"One can also specify the desired output format with a Python dict representing valid JsonSchema:"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "3e017ba0",
"metadata": {},
"outputs": [],
"source": [
"schema = {\n",
" \"type\": \"object\",\n",
" \"properties\": {\n",
" \"name\": {\"type\": \"string\"},\n",
" \"height\": {\"type\": \"integer\"},\n",
@@ -116,167 +226,51 @@
" \"required\": [\"name\", \"height\"],\n",
"}\n",
"\n",
"# Input\n",
"inp = \"\"\"Alex is 5 feet tall. Claudia is 1 feet taller Alex and jumps higher than him. Claudia is a brunette and Alex is blonde.\"\"\"\n",
"\n",
"# Run chain\n",
"llm = ChatOpenAI(temperature=0, model=\"gpt-3.5-turbo\")\n",
"chain = create_extraction_chain(schema, llm)\n",
"chain.run(inp)"
]
},
{
"cell_type": "markdown",
"id": "6f7eb826",
"metadata": {},
"source": [
"## Option 1: OpenAI functions\n",
"\n",
"### Looking under the hood\n",
"\n",
"Let's dig into what is happening when we call `create_extraction_chain`.\n",
"\n",
"The [LangSmith trace](https://smith.langchain.com/public/72bc3205-7743-4ca6-929a-966a9d4c2a77/r) shows that we call the function `information_extraction` on the input string, `inp`.\n",
"\n",
"![Image description](../../static/img/extraction_trace_function.png)\n",
"\n",
"This `information_extraction` function is defined [here](https://github.com/langchain-ai/langchain/blob/master/libs/langchain/langchain/chains/openai_functions/extraction.py) and returns a dict.\n",
"\n",
"We can see the `dict` in the model output:\n",
"```\n",
" {\n",
" \"info\": [\n",
" {\n",
" \"name\": \"Alex\",\n",
" \"height\": 5,\n",
" \"hair_color\": \"blonde\"\n",
" },\n",
" {\n",
" \"name\": \"Claudia\",\n",
" \"height\": 6,\n",
" \"hair_color\": \"brunette\"\n",
" }\n",
" ]\n",
" }\n",
"```\n",
"\n",
"The `create_extraction_chain` then parses the raw LLM output for us using [`JsonKeyOutputFunctionsParser`](https://github.com/langchain-ai/langchain/blob/f81e613086d211327b67b0fb591fd4d5f9a85860/libs/langchain/langchain/chains/openai_functions/extraction.py#L62).\n",
"\n",
"This results in the list of JSON objects returned by the chain above:\n",
"```\n",
"[{'name': 'Alex', 'height': 5, 'hair_color': 'blonde'},\n",
" {'name': 'Claudia', 'height': 6, 'hair_color': 'brunette'}]\n",
" ```"
]
},
{
"cell_type": "markdown",
"id": "dcb03138",
"metadata": {},
"source": [
"### Multiple entity types\n",
"\n",
"We can extend this further.\n",
"\n",
"Let's say we want to differentiate between dogs and people.\n",
"\n",
"We can add `person_` and `dog_` prefixes for each property"
"runnable = create_structured_output_runnable(schema, llm)"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "01eae733",
"execution_count": 6,
"id": "fb525991-643d-4d47-9111-a3d4364c03d7",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[{'person_name': 'Alex',\n",
" 'person_height': 5,\n",
" 'person_hair_color': 'blonde',\n",
" 'dog_name': 'Frosty',\n",
" 'dog_breed': 'labrador'},\n",
" {'person_name': 'Claudia',\n",
" 'person_height': 6,\n",
" 'person_hair_color': 'brunette'}]"
"{'name': 'Alex', 'height': 60}"
]
},
"execution_count": 8,
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"schema = {\n",
" \"properties\": {\n",
" \"person_name\": {\"type\": \"string\"},\n",
" \"person_height\": {\"type\": \"integer\"},\n",
" \"person_hair_color\": {\"type\": \"string\"},\n",
" \"dog_name\": {\"type\": \"string\"},\n",
" \"dog_breed\": {\"type\": \"string\"},\n",
" },\n",
" \"required\": [\"person_name\", \"person_height\"],\n",
"}\n",
"\n",
"chain = create_extraction_chain(schema, llm)\n",
"\n",
"inp = \"\"\"Alex is 5 feet tall. Claudia is 1 feet taller Alex and jumps higher than him. Claudia is a brunette and Alex is blonde.\n",
"Alex's dog Frosty is a labrador and likes to play hide and seek.\"\"\"\n",
"\n",
"chain.run(inp)"
]
},
{
"cell_type": "markdown",
"id": "f205905c",
"metadata": {},
"source": [
"### Unrelated entities\n",
"\n",
"If we use `required: []`, we allow the model to return **only** person attributes or **only** dog attributes for a single entity (person or dog)."
"inp = \"Alex is 5 feet tall. I don't know his hair color.\"\n",
"runnable.invoke(inp)"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "6ff4ac7e",
"execution_count": 7,
"id": "a3d3f0d2-c9d4-4ab8-9a5a-1ddda62db6ec",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[{'person_name': 'Alex', 'person_height': 5, 'person_hair_color': 'blonde'},\n",
" {'person_name': 'Claudia',\n",
" 'person_height': 6,\n",
" 'person_hair_color': 'brunette'},\n",
" {'dog_name': 'Willow', 'dog_breed': 'German Shepherd'},\n",
" {'dog_name': 'Milo', 'dog_breed': 'border collie'}]"
"{'name': 'Alex', 'height': 60, 'hair_color': 'blond'}"
]
},
"execution_count": 14,
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"schema = {\n",
" \"properties\": {\n",
" \"person_name\": {\"type\": \"string\"},\n",
" \"person_height\": {\"type\": \"integer\"},\n",
" \"person_hair_color\": {\"type\": \"string\"},\n",
" \"dog_name\": {\"type\": \"string\"},\n",
" \"dog_breed\": {\"type\": \"string\"},\n",
" },\n",
" \"required\": [],\n",
"}\n",
"\n",
"chain = create_extraction_chain(schema, llm)\n",
"\n",
"inp = \"\"\"Alex is 5 feet tall. Claudia is 1 feet taller Alex and jumps higher than him. Claudia is a brunette and Alex is blonde.\n",
"Willow is a German Shepherd that likes to play with other dogs and can always be found playing with Milo, a border collie that lives close by.\"\"\"\n",
"\n",
"chain.run(inp)"
"inp = \"Alex is 5 feet tall. He is blond.\"\n",
"runnable.invoke(inp)"
]
},
{
@@ -284,11 +278,9 @@
"id": "34f3b958",
"metadata": {},
"source": [
"### Extra information\n",
"#### Extra information\n",
"\n",
"The power of functions (relative to using parsers alone) lies in the ability to perform semantic extraction.\n",
"\n",
"In particular, `we can ask for things that are not explicitly enumerated in the schema`.\n",
"Runnables constructed via `create_structured_output_runnable` generally are capable of semantic extraction, such that they can populate information that is not explicitly enumerated in the schema.\n",
"\n",
"Suppose we want unspecified additional information about dogs. \n",
"\n",
@@ -297,44 +289,53 @@
},
{
"cell_type": "code",
"execution_count": 10,
"id": "40c7b26f",
"execution_count": 8,
"id": "0ed3b5e6-a7f3-453e-be61-d94fc665c16b",
"metadata": {},
"outputs": [],
"source": [
"inp = \"\"\"Alex is 5 feet tall and has blond hair.\n",
"Claudia is 1 feet taller Alex and jumps higher than him.\n",
"Claudia is a brunette and has a beagle named Harry.\n",
"Harry likes to play with other dogs and can always be found\n",
"playing with Milo, a border collie that lives close by.\"\"\""
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "be07928a-8022-4963-a15e-eb3097beef9f",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[{'person_name': 'Alex', 'person_height': 5, 'person_hair_color': 'blonde'},\n",
" {'person_name': 'Claudia',\n",
" 'person_height': 6,\n",
" 'person_hair_color': 'brunette'},\n",
" {'dog_name': 'Willow',\n",
" 'dog_breed': 'German Shepherd',\n",
" 'dog_extra_info': 'likes to play with other dogs'},\n",
" {'dog_name': 'Milo',\n",
" 'dog_breed': 'border collie',\n",
" 'dog_extra_info': 'lives close by'}]"
"People(people=[Person(person_name='Alex', person_height=60, person_hair_color='blond', dog_breed=None, dog_name=None, dog_extra_info=None), Person(person_name='Claudia', person_height=72, person_hair_color='brunette', dog_breed='beagle', dog_name='Harry', dog_extra_info='likes to play with other dogs and can always be found playing with Milo, a border collie that lives close by.')])"
]
},
"execution_count": 10,
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"schema = {\n",
" \"properties\": {\n",
" \"person_name\": {\"type\": \"string\"},\n",
" \"person_height\": {\"type\": \"integer\"},\n",
" \"person_hair_color\": {\"type\": \"string\"},\n",
" \"dog_name\": {\"type\": \"string\"},\n",
" \"dog_breed\": {\"type\": \"string\"},\n",
" \"dog_extra_info\": {\"type\": \"string\"},\n",
" },\n",
"}\n",
"class Person(BaseModel):\n",
" person_name: str\n",
" person_height: int\n",
" person_hair_color: str\n",
" dog_breed: Optional[str]\n",
" dog_name: Optional[str]\n",
" dog_extra_info: Optional[str]\n",
"\n",
"chain = create_extraction_chain(schema, llm)\n",
"chain.run(inp)"
"\n",
"class People(BaseModel):\n",
" \"\"\"Identifying information about all people in a text.\"\"\"\n",
"\n",
" people: Sequence[Person]\n",
"\n",
"\n",
"runnable = create_structured_output_runnable(People, llm)\n",
"runnable.invoke(inp)"
]
},
{
@@ -347,66 +348,289 @@
},
{
"cell_type": "markdown",
"id": "bf71ddce",
"id": "97ed9f5e-33be-4667-aa82-af49cc874e1d",
"metadata": {},
"source": [
"### Pydantic \n",
"### Specifying extraction mode\n",
"\n",
"Pydantic is a data validation and settings management library for Python. \n",
"`create_structured_output_runnable` supports varying implementations of the underlying extraction under the hood, which are configured via the `mode` parameter. This parameter can be one of `\"openai-functions\"`, `\"openai-tools\"`, or `\"openai-json\"`."
]
},
{
"cell_type": "markdown",
"id": "7c8e0b00-d6e6-432d-b9b0-8d0a3c0c6572",
"metadata": {},
"source": [
"#### OpenAI Functions and Tools"
]
},
{
"cell_type": "markdown",
"id": "07ccdbb1-cbe5-45af-87e4-dde42baee5eb",
"metadata": {},
"source": [
"Some LLMs are fine-tuned to support the invocation of functions or tools. If they are given an input schema for a tool and recognize an occasion to use it, they may emit JSON output conforming to that schema. We can leverage this to drive structured data extraction from natural language.\n",
"\n",
"It allows you to create data classes with attributes that are automatically validated when you instantiate an object.\n",
"\n",
"Lets define a class with attributes annotated with types."
"OpenAI originally released this via a [`functions` parameter in its chat completions API](https://openai.com/blog/function-calling-and-other-api-updates). This has since been deprecated in favor of a [`tools` parameter](https://platform.openai.com/docs/guides/function-calling), which can include (multiple) functions."
]
},
{
"cell_type": "markdown",
"id": "e6b02442-2884-4b45-a5a0-4fdac729fdb3",
"metadata": {},
"source": [
"Using OpenAI Functions:"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "d36a743b",
"execution_count": 10,
"id": "7b1c2266-b04b-4a23-83a9-da3cd2f88137",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[Properties(person_name='Alex', person_height=5, person_hair_color='blonde', dog_breed=None, dog_name=None),\n",
" Properties(person_name='Claudia', person_height=6, person_hair_color='brunette', dog_breed=None, dog_name=None)]"
"Person(person_name='Alex', person_height=60, person_hair_color='blond', dog_breed=None, dog_name=None, dog_extra_info=None)"
]
},
"execution_count": 4,
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from typing import Optional\n",
"runnable = create_structured_output_runnable(Person, llm, mode=\"openai-functions\")\n",
"\n",
"from langchain.chains import create_extraction_chain_pydantic\n",
"from langchain_core.pydantic_v1 import BaseModel\n",
"\n",
"\n",
"# Pydantic data class\n",
"class Properties(BaseModel):\n",
" person_name: str\n",
" person_height: int\n",
" person_hair_color: str\n",
" dog_breed: Optional[str]\n",
" dog_name: Optional[str]\n",
"\n",
"\n",
"# Extraction\n",
"chain = create_extraction_chain_pydantic(pydantic_schema=Properties, llm=llm)\n",
"\n",
"# Run\n",
"inp = \"\"\"Alex is 5 feet tall. Claudia is 1 feet taller Alex and jumps higher than him. Claudia is a brunette and Alex is blonde.\"\"\"\n",
"chain.run(inp)"
"inp = \"Alex is 5 feet tall and has blond hair.\"\n",
"runnable.invoke(inp)"
]
},
{
"cell_type": "markdown",
"id": "07a0351a",
"id": "1c07427b-a582-4489-a486-4c24a6c3165f",
"metadata": {},
"source": [
"As we can see from the [trace](https://smith.langchain.com/public/fed50ae6-26bb-4235-a254-e0b7a229d10f/r), we use the function `information_extraction`, as above, with the Pydantic schema. "
"Using OpenAI Tools:"
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "0b1ca93a-ffd9-4d37-8baa-377757405357",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Person(person_name='Alex', person_height=152, person_hair_color='blond', dog_breed=None, dog_name=None)"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"runnable = create_structured_output_runnable(Person, llm, mode=\"openai-tools\")\n",
"\n",
"runnable.invoke(inp)"
]
},
{
"cell_type": "markdown",
"id": "4018a8fc-1799-4c9d-b655-a66f618204b3",
"metadata": {},
"source": [
"The corresponding [LangSmith trace](https://smith.langchain.com/public/04cc37a7-7a1c-4bae-b972-1cb1a642568c/r) illustrates the tool call that generated our structured output.\n",
"\n",
"![Image description](../../static/img/extraction_trace_tool.png)"
]
},
{
"cell_type": "markdown",
"id": "fb2662d5-9492-4acc-935b-eb8fccebbe0f",
"metadata": {},
"source": [
"#### JSON Mode"
]
},
{
"cell_type": "markdown",
"id": "c0fd98ba-c887-4c30-8c9e-896ae90ac56a",
"metadata": {},
"source": [
"Some LLMs support generating JSON more generally. OpenAI implements this via a [`response_format` parameter](https://platform.openai.com/docs/guides/text-generation/json-mode) in its chat completions API.\n",
"\n",
"Note that this method may require explicit prompting (e.g., OpenAI requires that input messages contain the word \"json\" in some form when using this parameter)."
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "6b3e4679-eadc-42c8-b882-92a600083f2f",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Person(person_name='Alex', person_height=5, person_hair_color='blond', dog_breed=None, dog_name=None, dog_extra_info=None)"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from langchain_core.prompts import ChatPromptTemplate\n",
"\n",
"system_prompt = \"\"\"You extract information in structured JSON formats.\n",
"\n",
"Extract a valid JSON blob from the user input that matches the following JSON Schema:\n",
"\n",
"{output_schema}\"\"\"\n",
"prompt = ChatPromptTemplate.from_messages(\n",
" [\n",
" (\"system\", system_prompt),\n",
" (\"human\", \"{input}\"),\n",
" ]\n",
")\n",
"runnable = create_structured_output_runnable(\n",
" Person,\n",
" llm,\n",
" mode=\"openai-json\",\n",
" prompt=prompt,\n",
" enforce_function_usage=False,\n",
")\n",
"\n",
"runnable.invoke({\"input\": inp})"
]
},
{
"cell_type": "markdown",
"id": "b22d8262-a9b8-415c-a142-d0ee4db7ec2b",
"metadata": {},
"source": [
"### Few-shot examples"
]
},
{
"cell_type": "markdown",
"id": "a01c75f6-99d7-4d7b-a58f-b0ea7e8f338a",
"metadata": {},
"source": [
"Suppose we want to tune the behavior of our extractor. There are a few options available. For example, if we want to redact names but retain other information, we could adjust the system prompt:"
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "c5d16ad6-824e-434a-906a-d94e78259d4f",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Person(person_name='REDACTED', person_height=5, person_hair_color='blond', dog_breed=None, dog_name=None)"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"system_prompt = \"\"\"You extract information in structured JSON formats.\n",
"\n",
"Extract a valid JSON blob from the user input that matches the following JSON Schema:\n",
"\n",
"{output_schema}\n",
"\n",
"Redact all names.\n",
"\"\"\"\n",
"prompt = ChatPromptTemplate.from_messages(\n",
" [(\"system\", system_prompt), (\"human\", \"{input}\")]\n",
")\n",
"runnable = create_structured_output_runnable(\n",
" Person,\n",
" llm,\n",
" mode=\"openai-json\",\n",
" prompt=prompt,\n",
" enforce_function_usage=False,\n",
")\n",
"\n",
"runnable.invoke({\"input\": inp})"
]
},
{
"cell_type": "markdown",
"id": "be611688-1224-4d5a-9e34-a158b3c04296",
"metadata": {},
"source": [
"Few-shot examples are another, effective way to illustrate intended behavior. For instance, if we want to redact names with a specific character string, a one-shot example will convey this. We can use a `FewShotChatMessagePromptTemplate` to easily accommodate both a fixed set of examples as well as the dynamic selection of examples based on the input."
]
},
{
"cell_type": "code",
"execution_count": 20,
"id": "0aeee951-7f73-4e24-9033-c81a08af14dc",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Person(person_name='#####', person_height=5, person_hair_color='blond', dog_breed=None, dog_name=None)"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from langchain_core.prompts import FewShotChatMessagePromptTemplate\n",
"\n",
"examples = [\n",
" {\n",
" \"input\": \"Samus is 6 ft tall and blonde.\",\n",
" \"output\": Person(\n",
" person_name=\"######\",\n",
" person_height=5,\n",
" person_hair_color=\"blonde\",\n",
" ).dict(),\n",
" }\n",
"]\n",
"\n",
"example_prompt = ChatPromptTemplate.from_messages(\n",
" [(\"human\", \"{input}\"), (\"ai\", \"{output}\")]\n",
")\n",
"few_shot_prompt = FewShotChatMessagePromptTemplate(\n",
" examples=examples,\n",
" example_prompt=example_prompt,\n",
")\n",
"prompt = ChatPromptTemplate.from_messages(\n",
" [(\"system\", system_prompt), few_shot_prompt, (\"human\", \"{input}\")]\n",
")\n",
"runnable = create_structured_output_runnable(\n",
" Person,\n",
" llm,\n",
" mode=\"openai-json\",\n",
" prompt=prompt,\n",
" enforce_function_usage=False,\n",
")\n",
"\n",
"runnable.invoke({\"input\": inp})"
]
},
{
"cell_type": "markdown",
"id": "51846211-e86b-4807-9348-eb263999f7f7",
"metadata": {},
"source": [
"Here, the [LangSmith trace](https://smith.langchain.com/public/6fe5e694-9c04-48f7-83ff-e541da764781/r) for the chat model call shows how the one-shot example is formatted into the prompt.\n",
"\n",
"![Image description](../../static/img/extraction_trace_few_shot.png)"
]
},
{
@@ -418,41 +642,26 @@
"\n",
"[Output parsers](/docs/modules/model_io/output_parsers/) are classes that help structure language model responses. \n",
"\n",
"As shown above, they are used to parse the output of the OpenAI function calls in `create_extraction_chain`.\n",
"As shown above, they are used to parse the output of the runnable created by `create_structured_output_runnable`.\n",
"\n",
"But, they can be used independent of functions.\n",
"They can also be used more generally, if a LLM is instructed to emit its output in a certain format. Parsers include convenience methods for generating formatting instructions for use in prompts.\n",
"\n",
"### Pydantic\n",
"\n",
"Just as a above, let's parse a generation based on a Pydantic data class."
"Below we implement an example."
]
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 16,
"id": "64650362",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"People(people=[Person(person_name='Alex', person_height=5, person_hair_color='blonde', dog_breed=None, dog_name=None), Person(person_name='Claudia', person_height=6, person_hair_color='brunette', dog_breed=None, dog_name=None)])"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"outputs": [],
"source": [
"from typing import Optional, Sequence\n",
"\n",
"from langchain.output_parsers import PydanticOutputParser\n",
"from langchain.prompts import (\n",
" PromptTemplate,\n",
")\n",
"from langchain.prompts import PromptTemplate\n",
"from langchain_core.pydantic_v1 import BaseModel, Field, validator\n",
"from langchain_openai import OpenAI\n",
"from langchain_openai import ChatOpenAI\n",
"\n",
"\n",
"class Person(BaseModel):\n",
@@ -470,7 +679,7 @@
"\n",
"\n",
"# Run\n",
"query = \"\"\"Alex is 5 feet tall. Claudia is 1 feet taller Alex and jumps higher than him. Claudia is a brunette and Alex is blonde.\"\"\"\n",
"query = \"\"\"Alex is 5 feet tall. Claudia is 1 feet taller Alex and jumps higher than him. Claudia is a brunette and Alex is blond.\"\"\"\n",
"\n",
"# Set up a parser + inject instructions into the prompt template.\n",
"parser = PydanticOutputParser(pydantic_object=People)\n",
@@ -484,9 +693,30 @@
"\n",
"# Run\n",
"_input = prompt.format_prompt(query=query)\n",
"model = OpenAI(temperature=0)\n",
"output = model(_input.to_string())\n",
"parser.parse(output)"
"model = ChatOpenAI()"
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "727f3bf2-31b1-4b07-94f5-9568acf3ffdf",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"People(people=[Person(person_name='Alex', person_height=5, person_hair_color='blond', dog_breed=None, dog_name=None), Person(person_name='Claudia', person_height=6, person_hair_color='brunette', dog_breed=None, dog_name=None)])"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"output = model.invoke(_input.to_string())\n",
"\n",
"parser.parse(output.content)"
]
},
{
@@ -494,46 +724,31 @@
"id": "826899df",
"metadata": {},
"source": [
"We can see from the [LangSmith trace](https://smith.langchain.com/public/8e3aa858-467e-46a5-aa49-5db65f0a2b9a/r) that we get the same output as above.\n",
"We can see from the [LangSmith trace](https://smith.langchain.com/public/aec42dd3-d471-4d34-801b-20dd88444931/r) that we get the same output as above.\n",
"\n",
"![Image description](../../static/img/extraction_trace_function_2.png)\n",
"![Image description](../../static/img/extraction_trace_parsing.png)\n",
"\n",
"We can see that we provide a two-shot prompt in order to instruct the LLM to output in our desired format.\n",
"\n",
"And, we need to do a bit more work:\n",
"\n",
"* Define a class that holds multiple instances of `Person`\n",
"* Explicitly parse the output of the LLM to the Pydantic class\n",
"\n",
"We can see this for other cases, too."
"We can see that we provide a two-shot prompt in order to instruct the LLM to output in our desired format."
]
},
{
"cell_type": "code",
"execution_count": 11,
"execution_count": 21,
"id": "837c350e",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Joke(setup='Why did the chicken cross the road?', punchline='To get to the other side!')"
"Joke(setup=\"Why couldn't the bicycle find its way home?\", punchline='Because it lost its bearings!')"
]
},
"execution_count": 11,
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from langchain.output_parsers import PydanticOutputParser\n",
"from langchain.prompts import (\n",
" PromptTemplate,\n",
")\n",
"from langchain_core.pydantic_v1 import BaseModel, Field, validator\n",
"from langchain_openai import OpenAI\n",
"\n",
"\n",
"# Define your desired data structure.\n",
"class Joke(BaseModel):\n",
" setup: str = Field(description=\"question to set up a joke\")\n",
@@ -562,9 +777,9 @@
"\n",
"# Run\n",
"_input = prompt.format_prompt(query=joke_query)\n",
"model = OpenAI(temperature=0)\n",
"output = model(_input.to_string())\n",
"parser.parse(output)"
"model = ChatOpenAI(temperature=0)\n",
"output = model.invoke(_input.to_string())\n",
"parser.parse(output.content)"
]
},
{
@@ -574,9 +789,7 @@
"source": [
"As we can see, we get an output of the `Joke` class, which respects our originally desired schema: 'setup' and 'punchline'.\n",
"\n",
"We can look at the [LangSmith trace](https://smith.langchain.com/public/69f11d41-41be-4319-93b0-6d0eda66e969/r) to see exactly what is going on under the hood.\n",
"\n",
"![Image description](../../static/img/extraction_trace_joke.png)\n",
"We can look at the [LangSmith trace](https://smith.langchain.com/public/557ad630-af35-43e9-b043-93800539025f/r) to see exactly what is going on under the hood.\n",
"\n",
"### Going deeper\n",
"\n",
@@ -610,7 +823,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
"version": "3.10.4"
}
},
"nbformat": 4,

Binary file not shown.

After

Width:  |  Height:  |  Size: 325 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 63 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 132 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 432 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 336 KiB

View File

@@ -2,7 +2,11 @@ from typing import List
from langchain_community.agent_toolkits.base import BaseToolkit
from langchain_community.tools import BaseTool
from langchain_community.tools.polygon import PolygonLastQuote, PolygonTickerNews
from langchain_community.tools.polygon import (
PolygonFinancials,
PolygonLastQuote,
PolygonTickerNews,
)
from langchain_community.utilities.polygon import PolygonAPIWrapper
@@ -22,6 +26,9 @@ class PolygonToolkit(BaseToolkit):
PolygonTickerNews(
api_wrapper=polygon_api_wrapper,
),
PolygonFinancials(
api_wrapper=polygon_api_wrapper,
),
]
return cls(tools=tools)

Some files were not shown because too many files have changed in this diff Show More