Compare commits

..

135 Commits

Author SHA1 Message Date
Bagatur
0495ca0d10 Merge branch 'master' into bagatur/0.2 2024-03-20 10:17:19 -07:00
Eugene Yurtsev
aa9ccca775 langchain[patch]: Add tests for indexing (#19342)
This PR adds tests for the indexing API
2024-03-20 13:00:22 -04:00
William FH
68298cdc82 [Feat] Accept non-dict if only 1 prompt input variable (#19156)
For prompt templates with only 1 variable (common in e.g.,
MessageGraph), it's convenient to wrap the incoming object in the
variable before formatting.


The downside of this, of course, would be that some number of
invocations will successfully format when the user may have intended to
format it properly before
2024-03-20 09:59:32 -07:00
mackong
d9396bdec1 langchain[patch]: add stop for various non-openai agents (#19333)
* Description: add stop for various non-openai agents.
* Issue: N/A
* Dependencies: N/A

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2024-03-20 11:34:10 -04:00
Yudhajit Sinha
7d216ad1e1 community[patch]: Invoke callback prior to yielding token (titan_takeoff_pro) (#18624)
## PR title
community[patch]: Invoke callback prior to yielding token

## PR message
- Description: Invoke callback prior to yielding token in _stream_
method in llms/titan_takeoff_pro.
- Issue: #16913 
- Dependencies: None
2024-03-20 07:58:18 -07:00
Yudhajit Sinha
455a74486b community[patch]: Invoke callback prior to yielding token (sparkllm) (#18625)
## PR title
community[patch]: Invoke callback prior to yielding token

## PR message
- Description: Invoke callback prior to yielding token in _stream_
method in llms/sparkllm.
- Issue: #16913 
- Dependencies: None
2024-03-20 07:57:53 -07:00
Yudhajit Sinha
5ac1860484 community[patch]: Invoke callback prior to yielding token (replicate) (#18626)
## PR title
community[patch]: Invoke callback prior to yielding token

## PR message
- Description: Invoke callback prior to yielding token in _stream_
method in llms/replicate.
- Issue: #16913 
- Dependencies: None
2024-03-20 07:57:27 -07:00
Yudhajit Sinha
9525e392de community[patch]: Invoke callback prior to yielding token (pai_eas_endpoint) (#18627)
## PR title
community[patch]: Invoke callback prior to yielding token

## PR message
- Description: Invoke callback prior to yielding token in _stream_
method in llms/pai_eas_endpoint.
- Issue: #16913 
- Dependencies: None
2024-03-20 07:56:58 -07:00
Yudhajit Sinha
140f06e59a community[patch]: Invoke callback prior to yielding token (openai) (#18628)
## PR title
community[patch]: Invoke callback prior to yielding token

## PR message
- Description: Invoke callback prior to yielding token in _stream_
method in llms/openai.
- Issue: #16913 
- Dependencies: None
2024-03-20 07:56:30 -07:00
Yudhajit Sinha
280a914920 community[patch]: Invoke callback prior to yielding token (ollama) (#18629)
## PR title
community[patch]: Invoke callback prior to yielding token

## PR message
- Description: Invoke callback prior to yielding token in _stream_ &
_astream_ methods in llms/ollama.
- Issue: #16913 
- Dependencies: None
2024-03-20 07:56:09 -07:00
老阿張
9dfce56b31 docs: Fix typo in infino.ipynb (#18640)
Description: "conquerer should be conqueror "? 🤔
Issue: Typo
Dependencies: Nope
Twitter handle: laoazhang
2024-03-20 07:51:58 -07:00
Christophe Bornet
00614f332a community[minor]: Add InMemoryVectorStore (#19326)
This is a basic VectorStore implementation using an in-memory dict to
store the documents.
It doesn't need any extra/optional dependency as it uses numpy which is
already a dependency of langchain.
This is useful for quick testing, demos, examples.
Also it allows to write vendor-neutral tutorials, guides, etc...
2024-03-20 10:21:07 -04:00
Devesh Rahatekar
3c4529ac69 core: Updated docstring for RunnablePick (#18832)
**Description:** : Updated the docstring for RunnablePick. Added
Overview and an Example for RunnablePick class.
   **Issue:** : #18803
2024-03-20 13:54:42 +00:00
aditya thomas
e46419c851 docs: contribute / integrations code examples update (#19319)
**Description:** Update to make the code examples consistent with the
actual use
**Issue:** Code examples were different from actual use in the LangChain
code
**Dependencies:** Changes on top of
https://github.com/langchain-ai/langchain/pull/19294

Note: If these changes are acceptable, please merge them after
https://github.com/langchain-ai/langchain/pull/19294.
2024-03-20 09:27:53 -04:00
Leonid Ganeline
8609afbd10 core[patch]: Update messages namespace to fix API reference docs (#19161)
Classes and functions defined in __init__.py are not parsed into the API
Reference.
For example:
- libs/core/langchain_core/messages/__init__.py : AnyMessage,
MessageLikeRepresentation, get_buffer_string(), messages_from_dict(),
...

Opinionated: __init__.py is not a typical place to define artifacts.

Moved artifacts from __init__ into utils.py. 
Added `MessageLikeRepresentation` to __all__ since it is used outside of
`messages`, for example, in
`libs/core/langchain_core/language_models/base.py`
Added `_message_from_dict` to __all__ since it is used outside of
`messages`(???) I would add `message_from_dict` (without underscore) as
an alias. Please, advise.
2024-03-20 09:25:09 -04:00
Christophe Bornet
4c2e887276 core: Simplify astream logic in BaseChatModel and BaseLLM (#19332)
Covered by tests in
`libs/core/tests/unit_tests/language_models/chat_models/test_base.py`,
`libs/core/tests/unit_tests/language_models/llms/test_base.py` and
`libs/core/tests/unit_tests/runnables/test_runnable_events.py`
2024-03-20 09:05:51 -04:00
Bagatur
a84310cdcb graph chains 2024-03-20 01:20:29 -07:00
Bagatur
58b8747c44 fmt 2024-03-19 21:45:42 -07:00
Bagatur
c57e506f9c fmt 2024-03-19 18:49:38 -07:00
Bagatur
068620a871 fmt 2024-03-19 18:33:04 -07:00
Brace Sproul
40f846e65d docs[minor]: Add chat model selection tabs component (#19296)
<img width="1728" alt="image"
src="https://github.com/langchain-ai/langchain/assets/46789226/45e70a92-c2ee-48c8-9964-100eed22687b">
2024-03-19 18:12:46 -07:00
Bagatur
4812403b48 fmt 2024-03-19 16:32:04 -07:00
Erick Friis
69e9610f62 openai[patch]: pass message name (#17537) 2024-03-19 19:57:27 +00:00
Guangdong Liu
e5d7e455dc splitters: Add ensure_ascii parameter (#18485)
- **Description:** Add ensure_ascii parameter
2024-03-19 12:51:16 -07:00
Nithish Raghunandanan
7ad0a3f2a7 community: add Couchbase Vector Store (#18994)
- **Description:** Added support for Couchbase Vector Search to
LangChain.
- **Dependencies:** couchbase>=4.1.12
- **Twitter handle:** @nithishr

---------

Co-authored-by: Nithish Raghunandanan <nithishr@users.noreply.github.com>
2024-03-19 12:39:51 -07:00
Bagatur
ed75bccda8 fmt 2024-03-19 12:12:15 -07:00
Bagatur
5c194ee224 fmt 2024-03-19 12:08:54 -07:00
Chris Papademetrious
305d74c67a core: implement a batch_size parameter for CacheBackedEmbeddings (#18070)
**Description:**

Currently, `CacheBackedEmbeddings` computes vectors for *all* uncached
documents before updating the store. This pull request updates the
embedding computation loop to compute embeddings in batches, updating
the store after each batch.

I noticed this when I tried `CacheBackedEmbeddings` on our 30k document
set and the cache directory hadn't appeared on disk after 30 minutes.

The motivation is to minimize compute/data loss when problems occur:

* If there is a transient embedding failure (e.g. a network outage at
the embedding endpoint triggers an exception), at least the completed
vectors are written to the store instead of being discarded.
* If there is an issue with the store (e.g. no write permissions), the
condition is detected early without computing (and discarding!) all the
vectors.

**Issue:**
Implements enhancement #18026.

**Testing:**
I was unable to run unit tests; details in [this
post](https://github.com/langchain-ai/langchain/discussions/15019#discussioncomment-8576684).

---------

Signed-off-by: chrispy <chrispy@synopsys.com>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2024-03-19 18:55:43 +00:00
William FH
89af30807b Permit function eval on llm data type (#19287) 2024-03-19 11:53:50 -07:00
Jib
f8078e41e5 mongodb[patch]: Added scoring threshold to caching (#19286)
## Description
Semantic Cache can retrieve noisy information if the score threshold for
the value is too low. Adding the ability to set a `score_threshold` on
cache construction can allow for less noisy scores to appear.


- [x] **Add tests and docs**
  1. Added tests that confirm the `score_threshold` query is valid.


- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-03-19 11:30:02 -07:00
Christophe Bornet
30e4a35d7a community: Use langchain-astradb for AstraDB caches (#18419)
- [x] Needs https://github.com/langchain-ai/langchain-datastax/pull/4
- [x] Needs a new release of langchain-astradb
2024-03-19 14:04:36 -04:00
Brace Sproul
17c62e0f3a ci[minor]: Bump LC scripts package, add retry option (#19285)
The `retryFailed` option will retry all failed links, once at a time
with the goal of not triggering bot protection

`microsoft.com` is now hard coded into the whitelist
2024-03-19 10:42:59 -07:00
Erick Friis
7eb376d5fc docs: integration deprecation docs (#19283) 2024-03-19 17:11:15 +00:00
Guangdong Liu
2c835baae4 code[patch]: Add in code documentation to core Runnable with_retry method (docs only) (#19192)
- **Description:** Add in code documentation to core Runnable with_retry
method (docs only)
- **Issue:** #18804 
@baskaryan @eyurtsev PTAL

---------

Co-authored-by: ccurme <chester.curme@gmail.com>
2024-03-19 12:52:29 -04:00
Eugene Yurtsev
4b3dd34544 core[patch]: Pass sync run manager for sync stream fallback in astream (#19280)
This PR patches the fallback in chat models and language models to pass
in the appropriate version of the run manager (sync vs. async)
2024-03-19 16:32:33 +00:00
Leonid Ganeline
d314acb2d5 core[patch]: Move globals to a module instead of a package (non breaking change) (#19159)
Classes and functions defined in __init__.py are not parsed into the API
Reference.
For example: libs/core/langchain_core/globals/__init__.py :
`set_verbose` `get_llm_cache`, `set_llm_cache`, ...
And the whole `langchain_core.globals` namespace is not visible in the
API Reference. The refactoring is just file renaming.
2024-03-19 12:29:12 -04:00
Al-Ekram Elahee Hridoy
50f93d86ec core[minor]: Enhance cache flexibility in BaseChatModel (#17386)
- **Description:** Enhanced the `BaseChatModel` to support an
`Optional[Union[bool, BaseCache]]` type for the `cache` attribute,
allowing for both boolean flags and custom cache implementations.
Implemented logic within chat model methods to utilize the provided
custom cache implementation effectively. This change aims to provide
more flexibility in caching strategies for chat models.
  - **Issue:** Implements enhancement request #17242.
- **Dependencies:** No additional dependencies required for this change.

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2024-03-19 11:26:58 -04:00
HatsuneMK00
4761c09e94 docs: update slack toolkit ipynb in integration (#19219)
Thank you for contributing to LangChain!

- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
  - Example: "community: add foobar LLM"


- **PR message**:
- **Description:** Update the slack toolkit doc to use an agent that
support multiple inputs. Using ReAct agent will cause a ValidationError
when invoking the slack tools. This is because the agent return a string
like `'{"channel": "C05LDF54S21", "message": "Hello, world!"}'` but the
ReAct agent does not support multiple inputs.
- **Issue:** This is related to this
[Discussion#18083](https://github.com/langchain-ai/langchain/discussions/18083)
    - **Dependencies:** No dependencies required

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-03-19 10:39:09 -04:00
Zihong
ff31cc1648 experimental: update the notebook link of semantic chunk. (#19253)
update the notebook link of semantic chunk.
2024-03-19 07:24:51 -04:00
Frederico Wu
f36418a5b0 langchain: creating assistants with file_ids (#19199)
Changing OpenAIAssistantRunnable.create_assistant to send the `file_ids`
parameter to openai.beta.assistants.create

Co-authored-by: Frederico Wu <fred.diaswu@coxautoinc.com>
2024-03-18 21:34:03 -07:00
Vittorio Rigamonti
9b2f9ee952 community: VectorStore Infinispan, adding autoconfiguration (#18967)
**Description**:
this PR enable VectorStore autoconfiguration for Infinispan: if
metadatas are only of basic types, protobuf
config will be automatically generated for the user.
2024-03-18 21:33:45 -07:00
Max Jakob
6f544a6a25 elasticsearch: check for deployed models (#18973)
When creating a new index, if we use a retrieval strategy that expects a
model to be deployed in Elasticsearch, check if a model with this name
is indeed deployed before creating an index. This lowers the probability
to get into a state in which an index was created with a faulty model
ID, which cannot be overwritten any more (the index has to manually be
deleted).
2024-03-18 21:32:00 -07:00
gonvee
b82644078e community: Add keep_alive parameter to control how long the model w… (#19005)
Add `keep_alive` parameter to control how long the model will stay
loaded into memory with Ollama。

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-03-19 04:29:01 +00:00
Anthony Shaw
bb0dd8f82f docs: Embellish article on splitting by tokens with more examples and missing details (#18997)
**Description**

This PR adds some missing details from the "Split by tokens" page in the
documentation. Specifically:

- The `.from_tiktoken_encoder()` class methods for both the
`CharacterTextSplitter` and `RecursiveCharacterTextSplitter` default to
the old `gpt-2` encoding. I've added a comment to suggest specifying
`model_name` or `encoding`
- The docs didn't mention that the `from_tiktoken_encoder()` class
method passes additional kwargs down to the constructor of the splitter.
I only discovered this by reading the source code
- Added an example of using the `.from_tiktoken_encoder()` class method
with `RecursiveCharacterTextSplitter` which is the recommended approach
for most scenarios above `CharacterTextSplitter`
- Added a warning that `TokenTextSplitter` can split characters which
have multiple tokens (e.g. 猫 has 3 cl100k_base tokens) between multiple
chunks which creates malformed Unicode strings and should not be used in
these situations.

Side note: I think the default argument of `gpt2` for
`.from_tiktoken_encoder()` should be updated?

**Twitter handle** anthonypjshaw

---------

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
2024-03-18 21:28:17 -07:00
Roshan Santhosh
7afecec280 core: update _rm_titles to account for title argument name bug (#19036)
Issue : For functions which have an argument with the name 'title', the
convert_pydantic_to_openai_function generates an incorrect output and
omits the argument all together. This is because the _rm_titles function
removes all instances of the the key 'title' from the output.



Description : Updates the _rm_titles function to check the presence of
the 'type' key as well before removing the 'title' key. As the title key
that we wish to omit always has a type key along with it.

Potential gap if there is a function defined which has both title and
key as argument names, in which case this would fail. Maybe we could set
a filter on the function argument names and reject those with keyword
argument names.


No dependencies. Passed all tests. 


- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
  - Example: "community: add foobar LLM"


- [x] **PR message**: ***Delete this entire checklist*** and replace
with
    - **Description:** a description of the change
    - **Issue:** the issue # it fixes, if applicable
    - **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!


- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2024-03-18 21:25:06 -07:00
Harrison Chase
efcdf54edd Josha91 fix docstring (#19249)
Co-authored-by: Josha van Houdt <josha.van.houdt@sap.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
2024-03-18 21:19:56 -07:00
Simon Stone
58c7687174 langchain: preserve document metadata in FlashrankRerank (#19148)
**Description:** Preserves document metadata in `FlashrankRerank`
    - **Issue:** #19142
    - **Dependencies:** None
    - **Twitter handle:** n/a

---------

Co-authored-by: Simon Stone <simon.stone@dartmouth.edu>
2024-03-19 04:15:18 +00:00
Aaron Jimenez
bc648f6cfc core: Updated docstring for Context class (#19079)
- **Description:** Improves the docstring for `class Context` by
providing an overview and an example.
- **Issue:** #18803
2024-03-18 21:15:14 -07:00
Taqi Jaffri
044bc22acc Community: Add mistral oss model support to azureml endpoints, plus configurable timeout (#19123)
- **Description:** There was no formatter for mistral models for Azure
ML endpoints. Adding that, plus a configurable timeout (it was hard
coded before)
- **Dependencies:** none
- **Twitter handle:** @tjaffri @docugami
2024-03-18 21:10:42 -07:00
Kangmoon Seo
07de4abe70 core: Fix Exception handling in XMLOutputParser (#19126)
- **Description:** 
  - Exception handling in `XMLOutputParser`
1. Add Exception handling at `root = ET.fromstring(text)` // raises
`ET.ParseError`
    2. Fix Exception class (commonly uses in `BaseOutputParser` class)
  - AS-IS: raise `ValueError`, `ET.ParserError` without handling
    ```python
    # langchain_core/output_parsers/xml.py

        text = text.strip()
        if (text.startswith("<") or text.startswith("\n<")) and (
            text.endswith(">") or text.endswith(">\n")
        ):
            root = ET.fromstring(text)
            return self._root_to_dict(root)
        else:
            raise ValueError(f"Could not parse output: {text}")
    ```
  - TO-BE: raise `OutputParserException`
    ```python
    # langchain_core/output_parsers/xml.py

        text = text.strip()
        if (text.startswith("<") or text.startswith("\n<")) and (
            text.endswith(">") or text.endswith(">\n")
        ):
            try:
                root = ET.fromstring(text)
                return self._root_to_dict(root)

            except ET.ParseError:
raise OutputParserException(f"Could not parse output: {text}")

        else:
raise OutputParserException(f"Could not parse output: {text}")

    ``` 
- **Issue:** #19107  
- **Dependencies:** None
2024-03-18 21:08:32 -07:00
Hamza Muhammad Farooqi
24a0a4472a Add docstrings for Clickhouse class methods (#19195)
Thank you for contributing to LangChain!

- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
  - Example: "community: add foobar LLM"


- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
    - **Description:** a description of the change
    - **Issue:** the issue # it fixes, if applicable
    - **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!


- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
2024-03-19 04:03:12 +00:00
Simon Stone
dc4ce82ddd docs: fix import path for FlashrankRerank example notebook (#19146)
**Description:** Fixes the import paths for the `FlashrankRerank`
example notebook.
 **Issue:** #19139 
 **Dependencies:** None
 **Twitter handle:** n/a

---------

Co-authored-by: Simon Stone <simon.stone@dartmouth.edu>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
2024-03-18 21:03:00 -07:00
Saurav Kumar
bde199d128 Updating format of pip install (#19198)
Thank you for contributing to LangChain!

- [x] **PR title**: "Updating format of pip install in two files of
docs/cookbook"
- pip install is not reflecting properly in some of the files in
cookbook
- Example:
[docs/expression_language/cookbook/sql_db](https://python.langchain.com/docs/expression_language/cookbook/sql_db)


- [x] **PR message**: Updating format of pip install in two files of
docs/cookbook
    - **Description:** a description of the change
    - **Issue:** #19197 

- Note - let's do squash merge for the PR

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
2024-03-19 04:01:24 +00:00
Rohit Gupta
785f8ab174 [langchain_community] milvus vectorstores upsert: add **kwargs to make it use for other argument also (#19193)
add **kwargs in add_documents for upsert, to make it use for other
argument also.
Lets use this, it was unused as of now.

- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
  - Example: "community: add foobar LLM"


- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
    - **Description:** a description of the change
    - **Issue:** the issue # it fixes, if applicable
    - **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!


- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.

Co-authored-by: Rohit Gupta <rohit.gupta2@walmart.com>
2024-03-18 21:01:12 -07:00
Cycle
77868b1974 experimental: add buffer_size hyperparameter to SemanticChunker as in source video (#19208)
add buffer_size hyperparameter which used in combine_sentences function
2024-03-19 03:54:20 +00:00
HowardChan
ae3c7f702c docs:Make url as a markdown link (#19212)
**Description**: same as the title

Co-authored-by: ChenZhengHao <chenzhenghao@mail.teletraan.io>
2024-03-19 03:47:52 +00:00
Shotaro Sano
ca9c8c58ea text-splitters, infra: fix libs/langchain/dev.Dockerfile so that the text-splitter directory is copied before poetry installation (#19214)
## Description
This PR modifies the settings in `libs/langchain/dev.Dockerfile` to
ensure that the `text-splitters` directory is copied before the poetry
installation process begins.

Without this modification, the `docker build` command fails for
`dev.Dockerfile`, preventing the setup of some development environments,
including `.devcontainer`.

## Bug Details

### Repro
Run the following command:

```bash
docker build -f libs/langchain/dev.Dockerfile .
```

### Current Behavior
The docker build command fails, raising the following error:

```
...
 => [langchain-dev-dependencies 4/5] COPY libs/community/ ../community/                                                                                0.4s
 => ERROR [langchain-dev-dependencies 5/5] RUN poetry install --no-interaction --no-ansi --with dev,test,docs                                          1.1s
------                                                                                                                                                      
 > [langchain-dev-dependencies 5/5] RUN poetry install --no-interaction --no-ansi --with dev,test,docs:
#13 0.970 
#13 0.970 Directory ../text-splitters does not exist
------
executor failed running [/bin/sh -c poetry install --no-interaction --no-ansi --with dev,test,docs]: exit code: 1
```

### Expected Behavior
The `docker build` command successfully completes without the poetry
error.

### Analysis
The error occurs because the `text-splitters` directory is not copied
into the build environment, unlike the other packages under the `libs`
directory. I suspect that the `COPY` setting was overlooked since
`text-splitters` was separated in a recent PR.

## Fix
Add the following lines to the `libs/langchain/dev.Dockerfile`:

```dockerfile
# Copy the text-splitters library for installation
COPY libs/text-splitters/ ../text-splitters/
```
2024-03-18 20:45:35 -07:00
Guangdong Liu
c3310c5e7f community: Fix Milvus got multiple values for keyword argument 'timeout' (#19232)
- **Description:** Fix Milvus got multiple values for keyword argument
'timeout'
- **Issue:**  fix #18580
- @baskaryan @eyurtsev PTAL
2024-03-18 20:44:25 -07:00
Erick Friis
95904fe443 langchain[patch]: update base imports to core (#19248)
still deprecated, but was misleading before
2024-03-19 03:17:07 +00:00
Asaf Joseph Gardin
21c45475c5 ai21[patch]: AI21 Labs bump SDK version (#19114)
Description: Added support AI21 SDK version 2.1.2
Twitter handle: https://github.com/AI21Labs

---------

Co-authored-by: Asaf Gardin <asafg@ai21.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-03-18 19:47:08 -07:00
daniel ung
edf9d1c905 templates: Added template for JaguarDB (#16757)
- **Description:**: added langchain template for JaguarDB

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-03-19 02:36:24 +00:00
gustavo-yt
7c26ef88a1 templates: Add rag lantern template (#16523)
Replace this entire comment with:
  - **Description:** Added a template for lantern rag usage.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-03-19 02:34:46 +00:00
Bagatur
408bdd5604 fmt 2024-03-18 19:22:02 -07:00
Bagatur
6a93ff2a4b wip 2024-03-18 18:17:46 -07:00
Bagatur
7e96a7eaea wip 2024-03-18 16:34:00 -07:00
Jib
516cc44b3f langchain-mongodb: [test-fix] add explicit index_name setting on test vector creation (#19245)
- **Description:** Tests fail to do value lookup because it does not
specify the index name
  - **Issue:** the issue # Failing integration test
 

- [x] **Add tests and docs**: Tests now pass


- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
2024-03-18 15:52:28 -07:00
Estephania Calvo Carvajal
94e58dd827 docs:Fix links to LangSmith docs on Evaluation page (#19210) (#19216)
- **Description:** Same as the title
- **Issue:** #19210
2024-03-18 22:27:43 +00:00
William FH
780337488e [Enhancement] Add support for directly providing a run_id (#18990)
The root run id (~trace id's) is useful for assigning feedback, but the
current recommended approach is to use callbacks to retrieve it, which
has some drawbacks:
1. Doesn't work for streaming until after the first event
2. Doesn't let you call other endpoints with the same trace ID in
parallel (since you have to wait until the call is completed/started to
use

This PR lets you provide = "run_id" in the runnable config.

Couple considerations:

1. For batch calls, we split the trace up into separate trees (to permit
better rendering). We keep the provided run ID for the first one and
generate a unique one for other elements of the batch.
2. For nested calls, the provided ID is ONLY used on the top root/trace.



### Example Usage


```
chain.invoke("foo", {"run_id": uuid.uuid4()})
```
2024-03-18 15:03:04 -07:00
Jacob Lee
bd329e9aad core[patch]: Add LLM output to message response_metadata (#19158)
This will more easily expose token usage information.

CC @baskaryan

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-03-18 13:58:32 -07:00
Erick Friis
6fa1438334 mongodb[patch]: release 0.1.2 (#19243) 2024-03-18 13:35:45 -07:00
Leonid Ganeline
7de1d9acfd community: llms imports fixes (#18943)
Classes are missed in  __all__  and in different places of __init__.py
- BaichuanLLM 
- ChatDatabricks
- ChatMlflow
- Llamafile
- Mlflow
- Together
Added classes to __all__. I also sorted __all__ list.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-03-18 20:24:40 +00:00
Anush
aee5138930 templates: update qdrant self query (#19218)
## Description

This PR
- Updates the Qdrant self-query template to reflect the recent updates.
- Enables reading config values from `env` files as the README [mentions
it](https://github.com/Anush008/langchain/tree/self-query-qdrant/templates/self-query-qdrant#environment-setup).

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-03-18 19:59:08 +00:00
Kenzie Mihardja
21f75991d4 deprecate community docugami loader (#19230)
Thank you for contributing to LangChain!

- [x] **PR title**: "community: deprecate DocugamiLoader"

- [x] **PR message**: Deprecate the langchain_community and use the
docugami_langchain DocugamiLoader

---------

Co-authored-by: Kenzie Mihardja <kenzie28@cs.washington.edu>
2024-03-18 12:56:47 -07:00
Jib
ec026004cb mongodb[patch]: Remove in-memory cache from cache abstractions (#18987)
## Description
* In memory cache easily gets out of sync with the server cache, so we
will remove it entirely to reduce the issues around invalidated caches.

## Dependencies
None

- [x]  If you're adding a new integration, please include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-03-18 19:44:34 +00:00
Jib
866d6408af mongodb[patch]: Remove embedding retrieval from mongodb payload (#19035)
## Description
Returning the embedding is not necessary in the vector search
functionality unless specified as a debugging step. This change defaults
the behavior such that the server _only_ returns the embedding key if
explicitly requested, such as in the case of
`max_marginal_relevance_search`.


- [x] **Add tests and docs**: If you're adding a new integration, please
include
* Added `test_from_documents_no_embedding_return`


- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-03-18 19:43:50 +00:00
Leonid Kuligin
366ba77459 core[minor]: moved fake llms and embeddings to core (#19226)
- [ ] **PR title**: "core: moved fake llms and embeddings to core"


- [ ] **PR message**:
 - **Description:** moved fake llms and embeddings to core"
2024-03-18 10:01:26 -07:00
Pengfei Jiang
514fe80778 community[patch]: add stop parameter support to volcengine maas (#19052)
- **Description:** add stop parameter to volcengine maas model
- **Dependencies:** no

---------

Co-authored-by: 江鹏飞 <jiangpengfei.jiangpf@bytedance.com>
2024-03-17 01:58:50 +00:00
htaoruan
bcc771e37c docs: ChatTongyi example error (#19013) 2024-03-17 01:55:56 +00:00
Anubhav Madhav
9235dade90 docs: provided hyperlinks to text and fixed grammar (#19092)
1) Provided links to text in the prompt (Refer Page Link 1, Page Link 2
and Page Link 3)
2) Fixed Grammar in Considerations of Model I/O Concepts documentation
page - Update concepts.mdx (Page Link 4)

*Issues are on the following pages:*
Page Link 1:
https://python.langchain.com/docs/modules/model_io/concepts#prompttemplate
Page Link 2:
https://python.langchain.com/docs/modules/model_io/concepts#messageprompttemplate
Page Link 3:
https://python.langchain.com/docs/modules/model_io/concepts#chatprompttemplate
Page Link 4:
https://python.langchain.com/docs/modules/model_io/concepts#considerations


**Fix 1**:
Description: Fixed Grammar in Considerations of Model I/O Documentation
Page
Issue: "to work well with the model are you using" # "to work well with
the model you are using"
Dependencies: None
Twitter handle: @Anubhav_Madhav (https://twitter.com/Anubhav_Madhav)

**Fix 2**:
Description: Provided links to text in the prompt (Refer Page Link 1,
Page Link 2 and Page Link 3)
Issue: links not provided # links have been provided to the text
Dependencies: None
Twitter handle: @Anubhav_Madhav (https://twitter.com/Anubhav_Madhav)
baskaryan, efriis, eyurtsev, hwchase17.


*For Fix 1*
Refer to the first word 'This" word in the image attached with this PR.
PFA
<img width="839" alt="Screenshot 2024-03-15 at 3 04 17 AM"
src="https://github.com/langchain-ai/langchain/assets/42323737/94e8db16-249f-48c3-a1d1-dee8d36067fa">


If no one reviews your PR within a few days, please @-mention one of

---------

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
2024-03-17 01:37:42 +00:00
primate88
5aa68936e0 community: Fix import path for StreamingStdOutCallbackHandler example (#19170)
- Description:
- Updated the import path for `StreamingStdOutCallbackHandler` in the
streaming response example within `huggingface_endpoint.py`. This change
corrects the import statement to reflect the actual location of
`StreamingStdOutCallbackHandler` in
`langchain_core.callbacks.streaming_stdout`.
- Issue:
  - None
- Dependencies:
  - No additional dependencies are required for this change.
- Twitter handle:
  - None

## Note:
I have tested this change locally and confirmed that the
`StreamingStdOutCallbackHandler` works as expected with the updated
import path. This PR does not require the addition of new tests since it
is a correction to documentation/examples rather than functional code.
2024-03-17 00:50:37 +00:00
Bagatur
611d5a1618 openai[patch]: fix async http client (#19164)
Fix #19116
2024-03-16 17:50:22 -07:00
Nikhil Kumar
635b3372bd community[minor]: Add support for translation in HuggingFacePipeline (#19190)
- [x] **Support for translation**: "community: Add support for
translation in `HuggingFacePipeline`"


- [x] **Add support for translation in `HuggingFacePipeline`**:
- **Description:** Add support for translation in `HuggingFacePipeline`,
which earlier used to support only text summarization and generation.
    - **Issue:** N/A
    - **Dependencies:** N/A
    - **Twitter handle:** None
2024-03-17 00:48:13 +00:00
Nikhil Kumar
a1b26dd9b6 docs: Add docs for RouterRunnable (#19191)
- [x] **Docs for `RouterRunnable`**: core: Add docs for `RouterRunnable`

- [x] **Add docs for `RouterRunnable`**:
- **Description:** Add docs for `RouterRunnable`, which was previously
missing documentation
    - **Issue:** #18803 
    - **Dependencies:** N/A
    - **Twitter handle:** None
2024-03-17 00:48:00 +00:00
k.muto
8d2c34e655 community: Fix all page numbers were the same for _BaseGoogleVertexAISearchRetriever (#19175)
- Description:
- This pull request is to fix a bug where page numbers were not set
correctly. In the current code, all chunks share the same metadata
object doc_metadata, so the page number is set with the same value for
all documents. To fix this, I changed to using separate metadata objects
for each chunk.
- Issue:
  - None
- Dependencies:
  - No additional dependencies are required for this change.
- Twitter handle:
  - @eycjur

- Test
- Even if it's not a bug, there are cases where everything ends up with
the same number of pages, so it's very difficult for me to write
integration tests.
2024-03-16 22:28:56 +00:00
Matt Frediani
160a7077b0 Update README.md (#19172)
Thank you for contributing to LangChain!

- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
  - Example: "community: add foobar LLM"


- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
    - **Description:** a description of the change
    - **Issue:** the issue # it fixes, if applicable
    - **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!


- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
2024-03-16 15:23:25 -07:00
inpyeong
7c092f479f docs: Update why.ipynb (#19173)
I think that cell type for pip command may be 'code'.
Please check, thank you :)

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
2024-03-16 22:21:51 +00:00
Vitalii Korsakov
d96e0b2de7 docs: Remove duplicated line in Get Started section (#19182)
Line `from langchain_openai import ChatOpenAI` is put twice in Get
Started / Serving with LangServe section.
Imports on lines 559 and 566 are identical

Co-authored-by: Vitalii <vitalii@localhost>
2024-03-16 22:21:25 +00:00
Cailin Wang
7cd87d2f6a community: Add partition parameter to DashVector (#19023)
**Description**: DashVector Add partition parameter
**Twitter handle**: @CailinWang_

---------

Co-authored-by: root <root@Bluedot-AI>
2024-03-16 15:20:30 -07:00
Rodrigo Nogueira
e64cf1aba4 community: Add model argument for maritalk models and better error handling (#19187) 2024-03-16 15:18:56 -07:00
samanhappy
ff94f86ce1 docs: fix link to interface TextSplitter (#19177) 2024-03-16 15:16:34 -07:00
Sergey Kozlov
1a55e950aa community[patch]: support fastembed v1 and v2 (#19125)
**Description:**
#18040 forces `fastembed>2.0`, and this causes dependency conflicts with
the new `unstructured` package (different `onnxruntime`). There may be
other dependency conflicts.. The only way to use
`langchain-community>=0.0.28` is rollback to `unstructured 0.10.X`. But
new `unstructured` contains many fixes.

This PR allows to use both `fastembed` `v1` and `v2`.

How to reproduce:

`pyproject.toml`:
```toml
[tool.poetry]
name = "depstest"
version = "0.0.0"
description = "test"
authors = ["<dev@example.org>"]

[tool.poetry.dependencies]
python = ">=3.10,<3.12"
langchain-community = "^0.0.28"
fastembed = "^0.2.0"
unstructured = {extras = ["pdf"], version = "^0.12"}
```

```bash
$ poetry lock
```

Co-authored-by: Sergey Kozlov <sergey.kozlov@ludditelabs.io>
2024-03-15 18:33:51 -07:00
six17
fd4f536c77 text-splitters[patch]: fix json split of RecursiveJsonSplitter (#19119)
- **Description:** This modification addresses the issue of mutable
default parameters in functions. In the original code, the `chunks`
parameter is defaulted to a list containing an empty dictionary, which
is mutable. Since default parameters in Python are evaluated only once
at function definition time, modifications to the parameter would
persist across future calls. By changing the default to `None` and
checking/initializing within the function, a new list is created for
each call, thus avoiding potential issues.

---------

Co-authored-by: sixiang <sixiang@lixiang.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-03-15 16:46:49 -07:00
aditya thomas
05008c4f94 docs: update stale links in Together AI documentation (#19011)
**Description:** Update stales link in Together AI documentation
**Issue:** Some links pointed to legacy webpages on the Together AI
website
**Dependencies:** None
**Lint and test**: `make format`, `make lint` were run
2024-03-15 16:38:04 -07:00
aditya thomas
80eb510a7b docs: update docstring of Together class (#19008)
**Description:** Update docstring of Together class to show example and
update API URL
**Issue:** Improves usability
**Dependencies:** None
**Lint and test**: `make format`, `make lint` and `make test` were run
2024-03-15 16:30:45 -07:00
高远
ef9813dae6 docs: add vikingdb docstrings(#19016)
Co-authored-by: gaoyuan <gaoyuan.20001218@bytedance.com>
2024-03-15 16:29:29 -07:00
wulixuan
0e0030f494 community[patch]: fix yuan2 chat model errors while invoke. (#19015)
1. fix yuan2 chat model errors while invoke.
2. update related tests.
3. fix some deprecationWarning.
2024-03-15 16:28:36 -07:00
Shuai Liu
c244e1a50b community[patch]: Fixed bug in merging generation_info during chunk concatenation in Tongyi and ChatTongyi (#19014)
- **Description:** 

In #16218 , during the `GenerationChunk` and `ChatGenerationChunk`
concatenation, the `generation_info` merging changed from simple keys &
values replacement to using the util method
[`merge_dicts`](https://github.com/langchain-ai/langchain/blob/master/libs/core/langchain_core/utils/_merge.py):


![image](https://github.com/langchain-ai/langchain/assets/2098020/10f315bf-7fe0-43a7-a0ce-6a3834b99a15)

The `merge_dicts` method could not handle merging values of `int` or
some other types, and would raise a
[`TypeError`](https://github.com/langchain-ai/langchain/blob/master/libs/core/langchain_core/utils/_merge.py#L55).

This PR fixes this issue in the **Tongyi and ChatTongyi Model** by
adopting the `generation_info` of the last chunk
and discarding the `generation_info` of the intermediate chunks,
ensuring that `stream` and `astream` function correctly.

- **Issue:**  
    - Related issues or PRs about Tongyi & ChatTongyi: #16605, #17105 
    - Other models or cases: #18441, #17376
- **Dependencies:** No new dependencies
2024-03-15 16:27:53 -07:00
wulixuan
f79d0cb9fb docs: update docs for yuan2 in LLMs and Chat models integration. (#19028)
update yuan2.0 notebook in LLMs and Chat models.

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2024-03-15 16:03:18 -07:00
Taraka Nithin Vankala
eec023766e docs: Corrected error (#19030)
- [ ] **PR title**: "docs: correction in
"https://github.com/langchain-ai/langchain/blob/master/docs/docs/get_started/quickstart.mdx",
line 289".
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
  - Example: "community: add foobar LLM"


- [ ] **PR message**: 
    - Corrected the spelling mistake
    - #18981
2024-03-15 16:02:33 -07:00
Christophe Bornet
f2a7dda4bd community[patch]: Use langchain-astradb for AstraDB doc loader (#19071)
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
2024-03-15 22:57:25 +00:00
Leonid Ganeline
a49ac55964 docs: providers update 8 (#19053)
Added missed providers. Added missed integrations. Fixed format.
2024-03-15 15:49:14 -07:00
Holt Skinner
cee03630d9 community[patch]: Add Blended Search Support to GoogleVertexAISearchRetriever (#19082)
https://cloud.google.com/generative-ai-app-builder/docs/create-data-store-es#multi-data-stores

---------

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
2024-03-15 22:39:31 +00:00
Eugene Yurtsev
0ddfe7fc9d langchain[patch]: make hub work with older langchainhub versions (#19076)
Make it work with older clients
2024-03-15 15:37:52 -07:00
William W Wang
0a784074d1 docs: Update llm_caching.ipynb (#19085) 2024-03-15 22:35:48 +00:00
William W Wang
6327be9048 docsUpdate azure_cosmos_db.ipynb (#19087)
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
2024-03-15 22:33:26 +00:00
Anubhav Madhav
553a520ab6 docs: Fixed Grammar in Considerations of Model I/O Concepts (#19091)
Fixed Grammar in Considerations of Model I/O Concepts documentation page
- Update concepts.mdx

Page Link:
https://python.langchain.com/docs/modules/model_io/concepts#considerations

- **Description:** Fixed Grammar in Considerations of Model I/O
Documentation Page
- **Issue:** "to work well with the model are you using" # "to work well
with the model you are using"
- **Dependencies:** None
- **Twitter handle:** @Anubhav_Madhav
(https://twitter.com/Anubhav_Madhav)


If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
2024-03-15 22:31:39 +00:00
Shotaro Sano
d647ff1a9a docs: Fix execution results of docs/docs/modules/data_connection/indexing.ipynb (#19112)
## Description
This PR addresses a documentation issue in the
[Indexing](https://python.langchain.com/docs/modules/data_connection/indexing)
page. Specifically, it corrects the execution results of the Jupyter
notebook under the
[Source](https://python.langchain.com/docs/modules/data_connection/indexing#source)
section, which were broken as detailed below.

## Problem
The execution results following the statement, `This should delete the
old versions of documents associated with doggy.txt source and replace
them with the new versions.`, appear to be incorrect, as described
below.

### Current Behavior
- For some reason, the `index` function fails to add the new content of
`doggy.txt`. Although it deletes the document objects associated with
the `doggy.txt` source, it does not add the objects in
`changed_doggy_docs`. Consequently, the execution result displays
`num_added: 0`.
- This unexpected behavior also impacts the results of
`vectorstore.similarity_search("dog", k=30)`, showing only the contents
of `kitty.txt`. It appears as though the contents of `doggy.txt` have
been completely removed from the index:

```
 Document(page_content='tty kitty', metadata={'source': 'kitty.txt'}),
 Document(page_content='tty kitty ki', metadata={'source': 'kitty.txt'}),
 Document(page_content='kitty kit', metadata={'source': 'kitty.txt'})]
```

### Expected Behavior
- The `index` function should successfully add the objects in
`changed_doggy_docs` after removing the old content of `doggy.txt`. The
anticipated execution result is `num_added: 2`.
- Subsequently, the modified content of `doggy.txt` should appear in the
results of `vectorstore.similarity_search("dog", k=30)` as follows:

```
[Document(page_content='woof woof', metadata={'source': 'doggy.txt'}),
 Document(page_content='woof woof woof', metadata={'source': 'doggy.txt'}),
 Document(page_content='tty kitty', metadata={'source': 'kitty.txt'}),
 Document(page_content='tty kitty ki', metadata={'source': 'kitty.txt'}),
 Document(page_content='kitty kit', metadata={'source': 'kitty.txt'})]
```

## Fix
I reran `docs/docs/modules/data_connection/indexing.ipynb` and have
included the diff in this PR.
2024-03-15 22:27:15 +00:00
case-k
ebc4a64f9e docs: fix databricks document url (#19096)
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
2024-03-15 22:25:11 +00:00
Guangdong Liu
4468e5bdbe docs: Add in code documentation to core Runnable with_fallbacks method (docs only) (#19104)
- Description: [a description of the change] Add in code documentation
to core Runnable with_fallbacks method (docs only)
- Issue: the issue #18804 
@eyurtsev PTAL
2024-03-15 15:21:10 -07:00
Guangdong Liu
cced3eb9bc community[patch]: Fix sparkllm embeddings api bug. (#19122)
- **Description:** Fix sparkllm embeddings api bug.
@baskaryan PTAL
2024-03-15 15:08:49 -07:00
samanhappy
b9c62fb905 docs: fix API link for BaseLoader (#19128)
The link to the BaseLoader API requires an update as it has been moved
into the `langchain_core` package.
2024-03-15 14:46:05 -07:00
kaijietti
c20aeef79a community[patch]: implement qdrant _aembed_query and use it in other async funcs (#19155)
`amax_marginal_relevance_search ` and `asimilarity_search_with_score `
should use an async version of `_embed_query `.
2024-03-15 21:20:12 +00:00
Kostas Botsas
527676a753 docs: Fix source column xata.ipynb (#19137)
Docs fix: replace column name search with source.

The Xata integration expects metadata column named "source".

The docs suggest the name "search", which if used, yields the following
error:

```
File "/usr/local/lib/python3.11/site-packages/langchain_community/vectorstores/xata.py", line 95, in _add_vectors
    raise Exception(f"Error adding vectors to Xata: {r.status_code} {r}")
Exception: Error adding vectors to Xata: 400 {'errors': [{'status': 400, 'message': 'invalid record: column [source]: column not found'}]}
```
2024-03-15 14:06:18 -07:00
Barun Amalkumar Halder
34d6f0557d community[patch] : publishes duration as milliseconds to Fiddler (#19166)
**Description:** Many LLM steps complete in sub-second duration, which
can lead to non-collection of duration field for Fiddler. This PR
updates duration from seconds to milliseconds.
**Issue:** [INTERNAL] FDL-17568
**Dependencies:** NA
**Twitter handle:** behalder

Co-authored-by: Barun Halder <barun@fiddler.ai>
2024-03-15 14:04:56 -07:00
Eugene Yurtsev
745d2476a2 langchain: upgrade mypy (#19163)
Update mypy in langchain
2024-03-15 16:37:09 -04:00
Maxime Perrin
aa785fa6ec core[minor]: allow LLMs async streaming to fallback on sync streaming (#18960)
- **Description:** Handling fallbacks when calling async streaming for a
LLM that doesn't support it.
- **Issue:** #18920 
- **Twitter handle:**@maximeperrin_

---------

Co-authored-by: Maxime Perrin <mperrin@doing.fr>
2024-03-15 16:06:50 -04:00
Erick Friis
caf47ab666 infra: run min version ci before integration tests (#18945) 2024-03-15 12:14:44 -07:00
Barun Amalkumar Halder
b551d49cf5 community[patch] : adds feedback and status for Fiddler callback handler events (#19157)
**Description:** This PR adds updates the fiddler events schema to also
pass user feedback, and llm status to fiddler
   **Tickets:** [INTERNAL] FDL-17559 
   **Dependencies:**  NA
   **Twitter handle:** behalder

Co-authored-by: Barun Halder <barun@fiddler.ai>
2024-03-15 12:03:49 -07:00
Juan Felipe Arias
f5b9aedc48 community[patch]: add args_schema to sql_database tools for langGraph integration (#18595)
- **Description:** This modification adds pydantic input definition for
sql_database tools. This helps for function calling capability in
LangGraph. Since actions nodes will usually check for the args_schema
attribute on tools, This update should make these tools compatible with
it (only implemented on the InfoSQLDatabaseTool)
  - **Issue:** N/A
  - **Dependencies:** N/A
  - **Twitter handle:** juanfe8881
2024-03-15 19:03:36 +00:00
fengjial
c922ea36cb community[minor]: Add Baidu VectorDB as vector store (#17997)
Co-authored-by: fengjialin <fengjialin@MacBook-Pro.local>
2024-03-15 19:01:58 +00:00
aditya thomas
190887c5cd docs: update the list of providers (#19012)
**Description:** Update the list of LangChain providers
**Issue:** Make the list of LangChain providers current
**Dependencies:** None
2024-03-15 12:00:24 -07:00
Erick Friis
bbe164ad28 docs: voyageai as provider (#19154) 2024-03-15 10:12:37 -07:00
Erick Friis
781aee0068 community, langchain, infra: revert store extended test deps outside of poetry (#19153)
Reverts langchain-ai/langchain#18995

Because it makes installing dependencies in python 3.11 extended testing
take 80 minutes
2024-03-15 17:10:47 +00:00
Leonid Kuligin
e3ff107e4f docs: updated google integration related imports in the documentation (#19131)
updated imports in the documentation for google vertex
2024-03-15 09:30:50 -04:00
Erick Friis
9e569d85a4 community, langchain, infra: store extended test deps outside of poetry (#18995)
poetry can't reliably handle resolving the number of optional "extended
test" dependencies we have. If we instead just rely on pip to install
extended test deps in CI, this isn't an issue.
2024-03-15 05:55:30 +00:00
Bagatur
191ddbc77e core[patch]: rc release 0.1.33-rc.1 (#19103) 2024-03-14 20:21:54 -07:00
Nuno Campos
508f75853c core[patch]: Change structured prompt lc id to match js (#19099) 2024-03-14 20:02:52 -07:00
Erick Friis
7ce81eb6f4 voyageai[patch]: init package (#19098)
Co-authored-by: fodizoltan <zoltan@conway.expert>
Co-authored-by: Yujie Qian <thomasq0809@gmail.com>
Co-authored-by: fzowl <160063452+fzowl@users.noreply.github.com>
2024-03-15 00:56:10 +00:00
Brace Sproul
5157b15446 ci[patch]: Set root dir to ./docs (#19102) 2024-03-14 17:55:04 -07:00
Brace Sproul
98cd8f673b docs[minor]ci[minor]: Add script & CI to check recurring links daily (#19100) 2024-03-14 17:42:22 -07:00
Asaf Joseph Gardin
4d7f6fa968 ai21[patch]: AI21 Labs Batch Support in Embeddings (#18633)
Description: Added support for batching when using AI21 Embeddings model
Twitter handle: https://github.com/AI21Labs

---------

Co-authored-by: Asaf Gardin <asafg@ai21.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-03-14 23:10:23 +00:00
Tomaz Bratanic
321db89e87 templates: Switch neo4j generation template to LLMGraphTransformer (#19024) 2024-03-14 16:00:42 -07:00
Erick Friis
d5cf360329 ibm[patch]: release 0.1.3 (#19094) 2024-03-14 15:59:42 -07:00
Mateusz Szewczyk
b15d150d22 ibm[patch]: add async tests, add tokenize support (#18898)
- **Description:** add async tests, add tokenize support
- **Dependencies:**
[ibm-watsonx-ai](https://pypi.org/project/ibm-watsonx-ai/),
  - **Tag maintainer:** 

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally -> 
Please make sure integration_tests passing locally -> 

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-03-14 22:57:05 +00:00
billytrend-cohere
7253b816cc community: Add support for cohere SDK v5 (keeps v4 backwards compatibility) (#19084)
- **Description:** Add support for cohere SDK v5 (keeps v4 backwards
compatibility)

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-03-14 15:53:24 -07:00
2604 changed files with 30779 additions and 309641 deletions

View File

@@ -4,7 +4,12 @@ import tomllib
from packaging.version import parse as parse_version
import re
MIN_VERSION_LIBS = ["langchain-core", "langchain-community", "langchain", "langchain-text-splitters"]
MIN_VERSION_LIBS = [
"langchain-core",
"langchain-community",
"langchain",
"langchain-text-splitters",
]
def get_min_version(version: str) -> str:
@@ -56,12 +61,13 @@ def get_min_version_from_toml(toml_path: str):
return min_versions
# Get the TOML file path from the command line argument
toml_file = sys.argv[1]
if __name__ == "__main__":
# Get the TOML file path from the command line argument
toml_file = sys.argv[1]
# Call the function to get the minimum versions
min_versions = get_min_version_from_toml(toml_file)
# Call the function to get the minimum versions
min_versions = get_min_version_from_toml(toml_file)
print(
" ".join([f"{lib}=={version}" for lib, version in min_versions.items()])
) # noqa: T201
print(
" ".join([f"{lib}=={version}" for lib, version in min_versions.items()])
) # noqa: T201

View File

@@ -75,6 +75,7 @@ jobs:
ES_API_KEY: ${{ secrets.ES_API_KEY }}
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} # for airbyte
MONGODB_ATLAS_URI: ${{ secrets.MONGODB_ATLAS_URI }}
VOYAGE_API_KEY: ${{ secrets.VOYAGE_API_KEY }}
run: |
make integration_tests

View File

@@ -157,6 +157,24 @@ jobs:
run: make tests
working-directory: ${{ inputs.working-directory }}
- name: Get minimum versions
working-directory: ${{ inputs.working-directory }}
id: min-version
run: |
poetry run pip install packaging
min_versions="$(poetry run python $GITHUB_WORKSPACE/.github/scripts/get_min_versions.py pyproject.toml)"
echo "min-versions=$min_versions" >> "$GITHUB_OUTPUT"
echo "min-versions=$min_versions"
- name: Run unit tests with minimum dependency versions
if: ${{ steps.min-version.outputs.min-versions != '' }}
env:
MIN_VERSIONS: ${{ steps.min-version.outputs.min-versions }}
run: |
poetry run pip install $MIN_VERSIONS
make tests
working-directory: ${{ inputs.working-directory }}
- name: 'Authenticate to Google Cloud'
id: 'auth'
uses: google-github-actions/auth@v2
@@ -196,27 +214,10 @@ jobs:
ES_API_KEY: ${{ secrets.ES_API_KEY }}
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} # for airbyte
MONGODB_ATLAS_URI: ${{ secrets.MONGODB_ATLAS_URI }}
VOYAGE_API_KEY: ${{ secrets.VOYAGE_API_KEY }}
run: make integration_tests
working-directory: ${{ inputs.working-directory }}
- name: Get minimum versions
working-directory: ${{ inputs.working-directory }}
id: min-version
run: |
poetry run pip install packaging
min_versions="$(poetry run python $GITHUB_WORKSPACE/.github/scripts/get_min_versions.py pyproject.toml)"
echo "min-versions=$min_versions" >> "$GITHUB_OUTPUT"
echo "min-versions=$min_versions"
- name: Run unit tests with minimum dependency versions
if: ${{ steps.min-version.outputs.min-versions != '' }}
env:
MIN_VERSIONS: ${{ steps.min-version.outputs.min-versions }}
run: |
poetry run pip install $MIN_VERSIONS
make tests
working-directory: ${{ inputs.working-directory }}
publish:
needs:
- build

View File

@@ -0,0 +1,24 @@
name: Check Broken Links
on:
workflow_dispatch:
schedule:
- cron: '0 13 * * *'
jobs:
check-links:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Use Node.js 18.x
uses: actions/setup-node@v3
with:
node-version: 18.x
cache: "yarn"
cache-dependency-path: ./docs/yarn.lock
- name: Install dependencies
run: yarn install --immutable --mode=skip-build
working-directory: ./docs
- name: Check broken links
run: yarn check-broken-links
working-directory: ./docs

1
.gitignore vendored
View File

@@ -116,6 +116,7 @@ celerybeat.pid
.env
.envrc
.venv*
venv*
env/
ENV/
env.bak/

View File

@@ -9,7 +9,7 @@
" \n",
"[Together AI](https://python.langchain.com/docs/integrations/llms/together) has a broad set of OSS LLMs via inference API.\n",
"\n",
"See [here](https://api.together.xyz/playground). We use `\"mistralai/Mixtral-8x7B-Instruct-v0.1` for RAG on the Mixtral paper.\n",
"See [here](https://docs.together.ai/docs/inference-models). We use `\"mistralai/Mixtral-8x7B-Instruct-v0.1` for RAG on the Mixtral paper.\n",
"\n",
"Download the paper:\n",
"https://arxiv.org/pdf/2401.04088.pdf"
@@ -148,7 +148,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.16"
"version": "3.9.6"
}
},
"nbformat": 4,

View File

@@ -14,19 +14,20 @@ For the most part, new integrations should be added to the Community package. Pa
In the following sections, we'll walk through how to contribute to each of these packages from a fake company, `Parrot Link AI`.
## Community Package
## Community package
The `langchain-community` package is in `libs/community` and contains most integrations.
It is installed by users with `pip install langchain-community`, and exported members can be imported with code like
It can be installed with `pip install langchain-community`, and exported members can be imported with code like
```python
from langchain_community.chat_models import ParrotLinkLLM
from langchain_community.llms import ChatParrotLink
from langchain_community.chat_models import ChatParrotLink
from langchain_community.llms import ParrotLinkLLM
from langchain_community.vectorstores import ParrotLinkVectorStore
```
The community package relies on manually-installed dependent packages, so you will see errors if you try to import a package that is not installed. In our fake example, if you tried to import `ParrotLinkLLM` without installing `parrot-link-sdk`, you will see an `ImportError` telling you to install it when trying to use it.
The `community` package relies on manually-installed dependent packages, so you will see errors
if you try to import a package that is not installed. In our fake example, if you tried to import `ParrotLinkLLM` without installing `parrot-link-sdk`, you will see an `ImportError` telling you to install it when trying to use it.
Let's say we wanted to implement a chat model for Parrot Link AI. We would create a new file in `libs/community/langchain_community/chat_models/parrot_link.py` with the following code:
@@ -39,7 +40,7 @@ class ChatParrotLink(BaseChatModel):
Example:
.. code-block:: python
from langchain_parrot_link import ChatParrotLink
from langchain_community.chat_models import ChatParrotLink
model = ChatParrotLink()
"""
@@ -56,9 +57,16 @@ And add documentation to:
- `docs/docs/integrations/chat/parrot_link.ipynb`
## Partner Packages
## Partner package in LangChain repo
Partner packages are in `libs/partners/*` and are installed by users with `pip install langchain-{partner}`, and exported members can be imported with code like
Partner packages can be hosted in the `LangChain` monorepo or in an external repo.
Partner package in the `LangChain` repo is placed in `libs/partners/{partner}`
and the package source code is in `libs/partners/{partner}/langchain_{partner}`.
A package is
installed by users with `pip install langchain-{partner}`, and the package members
can be imported with code like:
```python
from langchain_{partner} import X
@@ -123,13 +131,49 @@ By default, this will include stubs for a Chat Model, an LLM, and/or a Vector St
### Write Unit and Integration Tests
Some basic tests are generated in the tests/ directory. You should add more tests to cover your package's functionality.
Some basic tests are presented in the `tests/` directory. You should add more tests to cover your package's functionality.
For information on running and implementing tests, see the [Testing guide](./testing).
### Write documentation
Documentation is generated from Jupyter notebooks in the `docs/` directory. You should move the generated notebooks to the relevant `docs/docs/integrations` directory in the monorepo root.
Documentation is generated from Jupyter notebooks in the `docs/` directory. You should place the notebooks with examples
to the relevant `docs/docs/integrations` directory in the monorepo root.
### (If Necessary) Deprecate community integration
Note: this is only necessary if you're migrating an existing community integration into
a partner package. If the component you're integrating is net-new to LangChain (i.e.
not already in the `community` package), you can skip this step.
Let's pretend we migrated our `ChatParrotLink` chat model from the community package to
the partner package. We would need to deprecate the old model in the community package.
We would do that by adding a `@deprecated` decorator to the old model as follows, in
`libs/community/langchain_community/chat_models/parrot_link.py`.
Before our change, our chat model might look like this:
```python
class ChatParrotLink(BaseChatModel):
...
```
After our change, it would look like this:
```python
from langchain_core._api.deprecation import deprecated
@deprecated(
since="0.0.<next community version>",
removal="0.2.0",
alternative_import="langchain_parrot_link.ChatParrotLink"
)
class ChatParrotLink(BaseChatModel):
...
```
You should do this for *each* component that you're migrating to the partner package.
### Additional steps
@@ -143,3 +187,15 @@ Maintainer steps (Contributors should **not** do these):
- [ ] set up pypi and test pypi projects
- [ ] add credential secrets to Github Actions
- [ ] add package to conda-forge
## Partner package in external repo
If you are creating a partner package in an external repo, you should follow the same steps as above,
but you will need to set up your own CI/CD and package management.
Name your package as `langchain-{partner}-{integration}`.
Still, you have to create the `libs/partners/{partner}-{integration}` folder in the `LangChain` monorepo
and add a `README.md` file with a link to the external repo.
See this [example](https://github.com/langchain-ai/langchain/tree/master/libs/partners/google-genai).
This allows keeping track of all the partner packages in the `LangChain` documentation.

View File

@@ -20,9 +20,11 @@
]
},
{
"cell_type": "raw",
"cell_type": "code",
"id": "0f316b5c",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%pip install --upgrade --quiet langchain langchain-openai"
]

View File

@@ -20,9 +20,11 @@
]
},
{
"cell_type": "raw",
"cell_type": "code",
"id": "b3121aa8",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%pip install --upgrade --quiet langchain langchain-openai"
]

View File

@@ -36,9 +36,11 @@
]
},
{
"cell_type": "raw",
"cell_type": "code",
"execution_count": null,
"id": "b99b47ec",
"metadata": {},
"outputs": [],
"source": [
"%pip install --upgrade --quiet langchain-core langchain-openai langchain-anthropic"
]

View File

@@ -286,7 +286,7 @@ embeddings = OllamaEmbeddings()
</TabItem>
<TabItem value="cohere" label="Cohere (API)" default>
Make sure you have the `cohere` package installed an the appropriate environment variables set (these are the same as needed for the LLM).
Make sure you have the `cohere` package installed and the appropriate environment variables set (these are the same as needed for the LLM).
```python
from langchain_community.embeddings import CohereEmbeddings
@@ -563,7 +563,6 @@ from langchain_community.vectorstores import FAISS
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain.tools.retriever import create_retriever_tool
from langchain_community.tools.tavily_search import TavilySearchResults
from langchain_openai import ChatOpenAI
from langchain import hub
from langchain.agents import create_openai_functions_agent
from langchain.agents import AgentExecutor

View File

@@ -23,7 +23,7 @@ We also are working to share guides and cookbooks that demonstrate how to use th
## LangSmith Evaluation
LangSmith provides an integrated evaluation and tracing framework that allows you to check for regressions, compare systems, and easily identify and fix any sources of errors and performance issues. Check out the docs on [LangSmith Evaluation](https://docs.smith.langchain.com/category/testing--evaluation) and additional [cookbooks](https://docs.smith.langchain.com/category/langsmith-cookbook) for more detailed information on evaluating your applications.
LangSmith provides an integrated evaluation and tracing framework that allows you to check for regressions, compare systems, and easily identify and fix any sources of errors and performance issues. Check out the docs on [LangSmith Evaluation](https://docs.smith.langchain.com/evaluation) and additional [cookbooks](https://docs.smith.langchain.com/cookbook) for more detailed information on evaluating your applications.
## LangChain benchmarks

View File

@@ -129,7 +129,7 @@
"Who was famed for their Christian spirit?\n",
"Who assimilted the Roman language?\n",
"Who ruled the country of Normandy?\n",
"What principality did William the conquerer found?\n",
"What principality did William the conqueror found?\n",
"What is the original meaning of the word Norman?\n",
"When was the Latin version of the word Norman first recorded?\n",
"What name comes from the English words Normans/Normanz?\"\"\"\n",

View File

@@ -40,18 +40,10 @@
},
{
"cell_type": "code",
"execution_count": 6,
"execution_count": 1,
"id": "2108b517-1e8d-473d-92fa-4f930e8072a7",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"········\n"
]
}
],
"outputs": [],
"source": [
"import getpass\n",
"import os\n",
@@ -90,7 +82,7 @@
},
{
"cell_type": "code",
"execution_count": 1,
"execution_count": 3,
"id": "d4a7c55d-b235-4ca4-a579-c90cc9570da9",
"metadata": {
"tags": []
@@ -103,7 +95,7 @@
},
{
"cell_type": "code",
"execution_count": 2,
"execution_count": 4,
"id": "70cf04e8-423a-4ff6-8b09-f11fb711c817",
"metadata": {
"tags": []
@@ -115,7 +107,7 @@
},
{
"cell_type": "code",
"execution_count": 3,
"execution_count": 5,
"id": "8199ef8f-eb8b-4253-9ea0-6c24a013ca4c",
"metadata": {
"tags": []
@@ -124,22 +116,22 @@
{
"data": {
"text/plain": [
"AIMessage(content=\"Who's there?\")"
"AIMessage(content=\"4! That's one, two, three, four. Keep adding and we'll reach new heights!\", response_metadata={'documents': None, 'citations': None, 'search_results': None, 'search_queries': None, 'token_count': {'prompt_tokens': 73, 'response_tokens': 21, 'total_tokens': 94, 'billed_tokens': 25}})"
]
},
"execution_count": 3,
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"messages = [HumanMessage(content=\"knock knock\")]\n",
"messages = [HumanMessage(content=\"1\"), HumanMessage(content=\"2 3\")]\n",
"chat.invoke(messages)"
]
},
{
"cell_type": "code",
"execution_count": 4,
"execution_count": 6,
"id": "c5fac0e9-05a4-4fc1-a3b3-e5bbb24b971b",
"metadata": {
"tags": []
@@ -148,10 +140,10 @@
{
"data": {
"text/plain": [
"AIMessage(content=\"Who's there?\")"
"AIMessage(content='4! According to the rules of addition, 1 + 2 equals 3, and 3 + 3 equals 6.', response_metadata={'documents': None, 'citations': None, 'search_results': None, 'search_queries': None, 'token_count': {'prompt_tokens': 73, 'response_tokens': 28, 'total_tokens': 101, 'billed_tokens': 32}})"
]
},
"execution_count": 4,
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
@@ -162,7 +154,7 @@
},
{
"cell_type": "code",
"execution_count": 5,
"execution_count": 7,
"id": "025be980-e50d-4a68-93dc-c9c7b500ce34",
"metadata": {
"tags": []
@@ -172,7 +164,7 @@
"name": "stdout",
"output_type": "stream",
"text": [
"Who's there?"
"4! It's a pleasure to be of service in this mathematical game."
]
}
],
@@ -183,17 +175,17 @@
},
{
"cell_type": "code",
"execution_count": 6,
"execution_count": 8,
"id": "064288e4-f184-4496-9427-bcf148fa055e",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[AIMessage(content=\"Who's there?\")]"
"[AIMessage(content='4! According to the rules of addition, 1 + 2 equals 3, and 3 + 3 equals 6.', response_metadata={'documents': None, 'citations': None, 'search_results': None, 'search_queries': None, 'token_count': {'prompt_tokens': 73, 'response_tokens': 28, 'total_tokens': 101, 'billed_tokens': 32}})]"
]
},
"execution_count": 6,
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
@@ -214,7 +206,7 @@
},
{
"cell_type": "code",
"execution_count": 7,
"execution_count": 9,
"id": "0851b103",
"metadata": {},
"outputs": [],
@@ -227,17 +219,17 @@
},
{
"cell_type": "code",
"execution_count": 8,
"execution_count": 10,
"id": "ae950c0f-1691-47f1-b609-273033cae707",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"AIMessage(content=\"Why did the bear go to the chiropractor?\\n\\nBecause she was feeling a bit grizzly!\\n\\nHope you found that joke about bears to be a little bit amusing! If you'd like to hear another one, just let me know. In the meantime, if you have any other questions or need assistance with a different topic, feel free to let me know. \\n\\nJust remember, even if you have a sore back like the bear, it's always best to consult a licensed professional for injuries or pain you may be experiencing. \\n\\nWould you like me to tell you another joke?\")"
"AIMessage(content='What do you call a bear with no teeth? A gummy bear!', response_metadata={'documents': None, 'citations': None, 'search_results': None, 'search_queries': None, 'token_count': {'prompt_tokens': 72, 'response_tokens': 14, 'total_tokens': 86, 'billed_tokens': 20}})"
]
},
"execution_count": 8,
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
@@ -263,7 +255,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
"version": "3.11.7"
}
},
"nbformat": 4,

View File

@@ -65,6 +65,7 @@
"from langchain_core.output_parsers import StrOutputParser\n",
"\n",
"llm = ChatMaritalk(\n",
" model=\"sabia-2-medium\", # Available models: sabia-2-small and sabia-2-medium\n",
" api_key=\"\", # Insert your API key here\n",
" temperature=0.7,\n",
" max_tokens=100,\n",

View File

@@ -4,7 +4,7 @@
"cell_type": "raw",
"source": [
"---\n",
"sidebar_label: YUAN2\n",
"sidebar_label: Yuan2.0\n",
"---"
],
"metadata": {
@@ -22,7 +22,7 @@
}
},
"source": [
"# YUAN2.0\n",
"# Yuan2.0\n",
"\n",
"This notebook shows how to use [YUAN2 API](https://github.com/IEIT-Yuan/Yuan-2.0/blob/main/docs/inference_server.md) in LangChain with the langchain.chat_models.ChatYuan2.\n",
"\n",
@@ -96,9 +96,9 @@
},
"source": [
"### Setting Up Your API server\n",
"Setting up your OpenAI compatible API server following [yuan2 openai api server](https://github.com/IEIT-Yuan/Yuan-2.0/blob/main/README-EN.md).\n",
"If you deployed api server locally, you can simply set `api_key=\"EMPTY\"` or anything you want.\n",
"Just make sure, the `api_base` is set correctly."
"Setting up your OpenAI compatible API server following [yuan2 openai api server](https://github.com/IEIT-Yuan/Yuan-2.0/blob/main/docs/Yuan2_fastchat.md).\n",
"If you deployed api server locally, you can simply set `yuan2_api_key=\"EMPTY\"` or anything you want.\n",
"Just make sure, the `yuan2_api_base` is set correctly."
]
},
{
@@ -187,7 +187,7 @@
},
"outputs": [],
"source": [
"print(chat(messages))"
"print(chat.invoke(messages))"
]
},
{
@@ -247,7 +247,7 @@
},
"outputs": [],
"source": [
"chat(messages)"
"chat.invoke(messages)"
]
},
{

View File

@@ -22,7 +22,7 @@
"outputs": [],
"source": [
"# You need the dgml-utils package to use the DocugamiLoader (run pip install directly without \"poetry run\" if you are not using poetry)\n",
"!poetry run pip install dgml-utils==0.3.0 --upgrade --quiet"
"!poetry run pip install docugami-langchain dgml-utils==0.3.0 --upgrade --quiet"
]
},
{
@@ -56,7 +56,7 @@
"source": [
"import os\n",
"\n",
"from langchain_community.document_loaders import DocugamiLoader"
"from docugami_langchain.document_loaders import DocugamiLoader"
]
},
{
@@ -470,7 +470,7 @@
"source": [
"from typing import Dict, List\n",
"\n",
"from langchain_community.document_loaders import DocugamiLoader\n",
"from docugami_langchain.document_loaders import DocugamiLoader\n",
"from langchain_core.documents import Document\n",
"\n",
"loader = DocugamiLoader(docset_id=\"zo954yqy53wp\")\n",
@@ -655,7 +655,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.1"
"version": "3.9.18"
}
},
"nbformat": 4,

View File

@@ -1357,7 +1357,9 @@
{
"cell_type": "markdown",
"source": [
"## Azure Cosmos DB Semantic Cache"
"## Azure Cosmos DB Semantic Cache\n",
"\n",
"You can use this integrated [vector database](https://learn.microsoft.com/en-us/azure/cosmos-db/vector-database) for caching."
],
"metadata": {
"collapsed": false

View File

@@ -13,7 +13,7 @@
"https://api.together.xyz/settings/api-keys. This can be passed in as init param\n",
"``together_api_key`` or set as environment variable ``TOGETHER_API_KEY``.\n",
"\n",
"Together API reference: https://docs.together.ai/reference/inference"
"Together API reference: https://docs.together.ai/reference"
]
},
{

View File

@@ -45,7 +45,7 @@
"outputs": [],
"source": [
"# default infer_api for a local deployed Yuan2.0 inference server\n",
"infer_api = \"http://127.0.0.1:8000\"\n",
"infer_api = \"http://127.0.0.1:8000/yuan\"\n",
"\n",
"# direct access endpoint in a proxied environment\n",
"# import os\n",
@@ -56,7 +56,6 @@
" max_tokens=2048,\n",
" temp=1.0,\n",
" top_p=0.9,\n",
" top_k=40,\n",
" use_history=False,\n",
")\n",
"\n",
@@ -89,7 +88,7 @@
},
"outputs": [],
"source": [
"print(yuan_llm(question))"
"print(yuan_llm.invoke(question))"
]
}
],

View File

@@ -503,21 +503,21 @@ from langchain_google_cloud_sql_pg import PostgreSQLEngine, PostgresVectorStore
### Vertex AI Vector Search
> [Google Cloud Vertex AI Vector Search](https://cloud.google.com/vertex-ai/docs/matching-engine/overview) from Google Cloud,
> [Google Cloud Vertex AI Vector Search](https://cloud.google.com/vertex-ai/docs/vector-search/overview) from Google Cloud,
> formerly known as `Vertex AI Matching Engine`, provides the industry's leading high-scale
> low latency vector database. These vector databases are commonly
> referred to as vector similarity-matching or an approximate nearest neighbor (ANN) service.
We need to install several python packages.
Install the python package:
```bash
pip install tensorflow langchain-google-vertexai tensorflow-hub tensorflow-text
pip install langchain-google-vertexai
```
See a [usage example](/docs/integrations/vectorstores/google_vertex_ai_vector_search).
```python
from langchain_community.vectorstores import MatchingEngine
from langchain_google_vertexai import VectorSearchVectorStore
```
### ScaNN

View File

@@ -12,13 +12,17 @@ LangChain integrates with many providers.
These providers have standalone `langchain-{provider}` packages for improved versioning, dependency management and testing.
- [AI21](/docs/integrations/providers/ai21)
- [Airbyte](/docs/integrations/providers/airbyte)
- [Anthropic](/docs/integrations/platforms/anthropic)
- [Astra DB](/docs/integrations/providers/astradb)
- [Elasticsearch](/docs/integrations/providers/elasticsearch)
- [Exa Search](/docs/integrations/providers/exa_search)
- [Fireworks](/docs/integrations/providers/fireworks)
- [Google](/docs/integrations/platforms/google)
- [Groq](/docs/integrations/providers/groq)
- [IBM](/docs/integrations/providers/ibm)
- [MistralAI](/docs/integrations/providers/mistralai)
- [MongoDB](/docs/integrations/providers/mongodb_atlas)
- [Nomic](/docs/integrations/providers/nomic)
- [Nvidia](/docs/integrations/providers/nvidia)
- [OpenAI](/docs/integrations/platforms/openai)

View File

@@ -3,6 +3,15 @@
All functionality related to `Microsoft Azure` and other `Microsoft` products.
## LLMs
### Azure ML
See a [usage example](/docs/integrations/llms/azure_ml).
```python
from langchain_community.llms.azureml_endpoint import AzureMLOnlineEndpoint
```
### Azure OpenAI
See a [usage example](/docs/integrations/llms/azure_openai).

View File

@@ -0,0 +1,30 @@
# Arcee
>[Arcee](https://www.arcee.ai/about/about-us) enables the development and advancement
> of what we coin as SLMs—small, specialized, secure, and scalable language models.
> By offering a SLM Adaptation System and a seamless, secure integration,
> `Arcee` empowers enterprises to harness the full potential of
> domain-adapted language models, driving the transformative
> innovation in operations.
## Installation and Setup
Get your `Arcee API` key.
## LLMs
See a [usage example](/docs/integrations/llms/arcee).
```python
from langchain_community.llms import Arcee
```
## Retrievers
See a [usage example](/docs/integrations/retrievers/arcee).
```python
from langchain_community.retrievers import ArceeRetriever
```

View File

@@ -10,12 +10,7 @@ See a [tutorial provided by DataStax](https://docs.datastax.com/en/astra/astra-d
Install the following Python package:
```bash
pip install "langchain-astradb>=0.0.1"
```
Some old integrations require the `astrapy` package:
```bash
pip install "astrapy>=0.7.1"
pip install "langchain-astradb>=0.1.0"
```
Get the [connection secrets](https://docs.datastax.com/en/astra/astra-db-vector/get-started/quickstart.html).
@@ -61,7 +56,7 @@ See the [usage example](/docs/integrations/memory/astradb_chat_message_history#e
```python
from langchain.globals import set_llm_cache
from langchain_community.cache import AstraDBCache
from langchain_astradb import AstraDBCache
set_llm_cache(AstraDBCache(
api_endpoint=ASTRA_DB_API_ENDPOINT,
@@ -76,7 +71,7 @@ Learn more in the [example notebook](/docs/integrations/llms/llm_caching#astra-d
```python
from langchain.globals import set_llm_cache
from langchain_community.cache import
from langchain_astradb import AstraDBSemanticCache
set_llm_cache(AstraDBSemanticCache(
embedding=my_embedding,
@@ -92,7 +87,7 @@ Learn more in the [example notebook](/docs/integrations/memory/astradb_chat_mess
## Document loader
```python
from langchain_community.document_loaders import AstraDBLoader
from langchain_astradb import AstraDBLoader
loader = AstraDBLoader(
collection_name="my_collection",
@@ -129,7 +124,7 @@ Learn more in the [example notebook](/docs/integrations/retrievers/self_query/as
## Store
```python
from langchain_community.storage import AstraDBStore
from langchain_astradb import AstraDBStore
store = AstraDBStore(
collection_name="my_kv_store",
@@ -143,7 +138,7 @@ Learn more in the [example notebook](/docs/integrations/stores/astradb#astradbst
## Byte Store
```python
from langchain_community.storage import AstraDBByteStore
from langchain_astradb import AstraDBByteStore
store = AstraDBByteStore(
collection_name="my_kv_store",

View File

@@ -0,0 +1,50 @@
# Baidu
>[Baidu Cloud](https://cloud.baidu.com/) is a cloud service provided by `Baidu, Inc.`,
> headquartered in Beijing. It offers a cloud storage service, client software,
> file management, resource sharing, and Third Party Integration.
## Installation and Setup
Register and get the `Qianfan` `AK` and `SK` keys [here](https://cloud.baidu.com/product/wenxinworkshop).
## LLMs
### Baidu Qianfan
See a [usage example](/docs/integrations/llms/baidu_qianfan_endpoint).
```python
from langchain_community.llms import QianfanLLMEndpoint
```
## Chat models
### Qianfan Chat Endpoint
See a [usage example](/docs/integrations/chat/baidu_qianfan_endpoint).
```python
from langchain_community.chat_models import QianfanChatEndpoint
```
## Embedding models
### Baidu Qianfan
See a [usage example](/docs/integrations/text_embedding/baidu_qianfan_endpoint).
```python
from langchain_community.embeddings import QianfanEmbeddingsEndpoint
```
## Vector stores
### Baidu Cloud ElasticSearch VectorSearch
See a [usage example](/docs/integrations/vectorstores/baiducloud_vector_search).
```python
from langchain_community.vectorstores import BESVectorStore
```

View File

@@ -0,0 +1,30 @@
# CTranslate2
>[CTranslate2](https://opennmt.net/CTranslate2/quickstart.html) is a C++ and Python library
> for efficient inference with Transformer models.
>
>The project implements a custom runtime that applies many performance optimization
> techniques such as weights quantization, layers fusion, batch reordering, etc.,
> to accelerate and reduce the memory usage of Transformer models on CPU and GPU.
>
>A full list of features and supported models is included in the
> [projects repository](https://opennmt.net/CTranslate2/guides/transformers.html).
> To start, please check out the official [quickstart guide](https://opennmt.net/CTranslate2/quickstart.html).
## Installation and Setup
Install the Python package:
```bash
pip install ctranslate2
```
## LLMs
See a [usage example](/docs/integrations/llms/ctranslate2).
```python
from langchain_community.llms import CTranslate2
```

View File

@@ -8,9 +8,8 @@ It is broken into two parts: installation and setup, and then examples of DeepSp
- Install the Python package with `pip install deepsparse`
- Choose a [SparseZoo model](https://sparsezoo.neuralmagic.com/?useCase=text_generation) or export a support model to ONNX [using Optimum](https://github.com/neuralmagic/notebooks/blob/main/notebooks/opt-text-generation-deepsparse-quickstart/OPT_Text_Generation_DeepSparse_Quickstart.ipynb)
## Wrappers
### LLM
## LLMs
There exists a DeepSparse LLM wrapper, which you can access with:

View File

@@ -9,6 +9,7 @@
```bash
pip install dgml-utils
pip install docugami-langchain
```
## Document Loader
@@ -16,5 +17,5 @@ pip install dgml-utils
See a [usage example](/docs/integrations/document_loaders/docugami).
```python
from langchain_community.document_loaders import DocugamiLoader
from docugami_langchain.document_loaders import DocugamiLoader
```

View File

@@ -0,0 +1,62 @@
# Eden AI
>[Eden AI](https://docs.edenai.co/docs/getting-started-with-eden-ai) user interface (UI)
> is designed for handling the AI projects. With `Eden AI Portal`,
> you can perform no-code AI using the best engines for the market.
## Installation and Setup
Accessing the Eden AI API requires an API key, which you can get by
[creating an account](https://app.edenai.run/user/register) and
heading [here](https://app.edenai.run/admin/account/settings).
## LLMs
See a [usage example](/docs/integrations/llms/edenai).
```python
from langchain_community.llms import EdenAI
```
## Chat models
See a [usage example](/docs/integrations/chat/edenai).
```python
from langchain_community.chat_models.edenai import ChatEdenAI
```
## Embedding models
See a [usage example](/docs/integrations/text_embedding/edenai).
```python
from langchain_community.embeddings.edenai import EdenAiEmbeddings
```
## Tools
Eden AI provides a list of tools that grants your Agent the ability to do multiple tasks, such as:
* speech to text
* text to speech
* text explicit content detection
* image explicit content detection
* object detection
* OCR invoice parsing
* OCR ID parsing
See a [usage example](/docs/integrations/tools/edenai_tools).
```python
from langchain_community.tools.edenai import (
EdenAiExplicitImageTool,
EdenAiObjectDetectionTool,
EdenAiParsingIDTool,
EdenAiParsingInvoiceTool,
EdenAiSpeechToTextTool,
EdenAiTextModerationTool,
EdenAiTextToSpeechTool,
)
```

View File

@@ -0,0 +1,27 @@
# ElevenLabs
>[ElevenLabs](https://elevenlabs.io/about) is a voice AI research & deployment company
> with a mission to make content universally accessible in any language & voice.
>
>`ElevenLabs` creates the most realistic, versatile and contextually-aware
> AI audio, providing the ability to generate speech in hundreds of
> new and existing voices in 29 languages.
## Installation and Setup
First, you need to set up an ElevenLabs account. You can follow the
[instructions here](https://docs.elevenlabs.io/welcome/introduction).
Install the Python package:
```bash
pip install elevenlabs
```
## Tools
See a [usage example](/docs/integrations/tools/eleven_labs_tts).
```python
from langchain_community.tools import ElevenLabsText2SpeechTool
```

View File

@@ -0,0 +1,21 @@
# PygmalionAI
>[PygmalionAI](https://pygmalion.chat/) is a company supporting the
> open-source models by serving the inference endpoint
> for the [Aphrodite Engine](https://github.com/PygmalionAI/aphrodite-engine).
## Installation and Setup
```bash
pip install aphrodite-engine
```
## LLMs
See a [usage example](/docs/integrations/llms/aphrodite).
```python
from langchain_community.llms import Aphrodite
```

View File

@@ -12,7 +12,7 @@
"https://api.together.xyz/settings/api-keys. This can be passed in as init param\n",
"``together_api_key`` or set as environment variable ``TOGETHER_API_KEY``.\n",
"\n",
"Together API reference: https://docs.together.ai/reference/inference\n",
"Together API reference: https://docs.together.ai/reference\n",
"\n",
"You will also need to install the `langchain-together` integration package:"
]

View File

@@ -0,0 +1,24 @@
# VoyageAI
All functionality related to VoyageAI
>[VoyageAI](https://www.voyageai.com/) Voyage AI builds embedding models, customized for your domain and company, for better retrieval quality.
> customized for your domain and company, for better retrieval quality.
## Installation and Setup
Install the integration package with
```bash
pip install langchain-voyageai
```
Get an VoyageAI api key and set it as an environment variable (`VOYAGE_API_KEY`)
## Text Embedding Model
See a [usage example](/docs/integrations/text_embedding/voyageai)
```python
from langchain_voyageai import VoyageAIEmbeddings
```

View File

@@ -10,6 +10,19 @@
"This notebook covers how to get started with Cohere RAG retriever. This allows you to leverage the ability to search documents over various connectors or by supplying your own."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2c367be3",
"metadata": {},
"outputs": [],
"source": [
"import getpass\n",
"import os\n",
"\n",
"os.environ[\"COHERE_API_KEY\"] = getpass.getpass()"
]
},
{
"cell_type": "code",
"execution_count": null,
@@ -218,7 +231,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.12"
"version": "3.11.7"
}
},
"nbformat": 4,

View File

@@ -28,17 +28,17 @@
},
"outputs": [],
"source": [
"% pip install --upgrade --quiet flashrank\n",
"% pip install --upgrade --quiet faiss\n",
"%pip install --upgrade --quiet flashrank\n",
"%pip install --upgrade --quiet faiss\n",
"\n",
"# OR (depending on Python version)\n",
"\n",
"% pip install --upgrade --quiet faiss_cpu"
"%pip install --upgrade --quiet faiss_cpu"
]
},
{
"cell_type": "code",
"execution_count": 13,
"execution_count": 2,
"metadata": {
"collapsed": false,
"jupyter": {
@@ -53,7 +53,10 @@
"def pretty_print_docs(docs):\n",
" print(\n",
" f\"\\n{'-' * 100}\\n\".join(\n",
" [f\"Document {i+1}:\\n\\n\" + d.page_content for i, d in enumerate(docs)]\n",
" [\n",
" f\"Document {i+1}:\\n\\n{d.page_content}\\nMetadata: {d.metadata}\"\n",
" for i, d in enumerate(docs)\n",
" ]\n",
" )\n",
" )"
]
@@ -73,7 +76,7 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 3,
"metadata": {
"collapsed": false,
"jupyter": {
@@ -90,7 +93,7 @@
},
{
"cell_type": "code",
"execution_count": 15,
"execution_count": 4,
"metadata": {
"collapsed": false,
"jupyter": {
@@ -247,14 +250,6 @@
"----------------------------------------------------------------------------------------------------\n",
"Document 15:\n",
"\n",
"My plan to fight inflation will lower your costs and lower the deficit. \n",
"\n",
"17 Nobel laureates in economics say my plan will ease long-term inflationary pressures. Top business leaders and most Americans support my plan. And heres the plan: \n",
"\n",
"First cut the cost of prescription drugs. Just look at insulin. One in ten Americans has diabetes. In Virginia, I met a 13-year-old boy named Joshua Davis.\n",
"----------------------------------------------------------------------------------------------------\n",
"Document 16:\n",
"\n",
"And soon, well strengthen the Violence Against Women Act that I first wrote three decades ago. It is important for us to show the nation that we can come together and do big things. \n",
"\n",
"So tonight Im offering a Unity Agenda for the Nation. Four big things we can do together. \n",
@@ -263,15 +258,15 @@
"\n",
"There is so much we can do. Increase funding for prevention, treatment, harm reduction, and recovery.\n",
"----------------------------------------------------------------------------------------------------\n",
"Document 17:\n",
"Document 16:\n",
"\n",
"So lets not abandon our streets. Or choose between safety and equal justice. \n",
"My plan to fight inflation will lower your costs and lower the deficit. \n",
"\n",
"Lets come together to protect our communities, restore trust, and hold law enforcement accountable. \n",
"17 Nobel laureates in economics say my plan will ease long-term inflationary pressures. Top business leaders and most Americans support my plan. And heres the plan: \n",
"\n",
"Thats why the Justice Department required body cameras, banned chokeholds, and restricted no-knock warrants for its officers.\n",
"First cut the cost of prescription drugs. Just look at insulin. One in ten Americans has diabetes. In Virginia, I met a 13-year-old boy named Joshua Davis.\n",
"----------------------------------------------------------------------------------------------------\n",
"Document 18:\n",
"Document 17:\n",
"\n",
"My plan will not only lower costs to give families a fair shot, it will lower the deficit. \n",
"\n",
@@ -281,6 +276,14 @@
"\n",
"Were going after the criminals who stole billions in relief money meant for small businesses and millions of Americans.\n",
"----------------------------------------------------------------------------------------------------\n",
"Document 18:\n",
"\n",
"So lets not abandon our streets. Or choose between safety and equal justice. \n",
"\n",
"Lets come together to protect our communities, restore trust, and hold law enforcement accountable. \n",
"\n",
"Thats why the Justice Department required body cameras, banned chokeholds, and restricted no-knock warrants for its officers.\n",
"----------------------------------------------------------------------------------------------------\n",
"Document 19:\n",
"\n",
"I understand. \n",
@@ -316,6 +319,8 @@
").load()\n",
"text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=100)\n",
"texts = text_splitter.split_documents(documents)\n",
"for idx, text in enumerate(texts):\n",
" text.metadata[\"id\"] = idx\n",
"\n",
"embedding = OpenAIEmbeddings(model=\"text-embedding-ada-002\")\n",
"retriever = FAISS.from_documents(texts, embedding).as_retriever(search_kwargs={\"k\": 20})\n",
@@ -340,16 +345,25 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 5,
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [],
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[0, 5, 3]\n"
]
}
],
"source": [
"from langchain.retrievers import ContextualCompressionRetriever, FlashrankRerank\n",
"from langchain.retrievers import ContextualCompressionRetriever\n",
"from langchain.retrievers.document_compressors import FlashrankRerank\n",
"from langchain_openai import ChatOpenAI\n",
"\n",
"llm = ChatOpenAI(temperature=0)\n",
@@ -379,7 +393,7 @@
},
{
"cell_type": "code",
"execution_count": 20,
"execution_count": 6,
"metadata": {
"collapsed": false,
"jupyter": {
@@ -399,6 +413,16 @@
"----------------------------------------------------------------------------------------------------\n",
"Document 2:\n",
"\n",
"He met the Ukrainian people. \n",
"\n",
"From President Zelenskyy to every Ukrainian, their fearlessness, their courage, their determination, inspires the world. \n",
"\n",
"Groups of citizens blocking tanks with their bodies. Everyone from students to retirees teachers turned soldiers defending their homeland. \n",
"\n",
"In this struggle as President Zelenskyy said in his speech to the European Parliament “Light will win over darkness.” The Ukrainian Ambassador to the United States is here tonight.\n",
"----------------------------------------------------------------------------------------------------\n",
"Document 3:\n",
"\n",
"And tonight, Im announcing that the Justice Department will name a chief prosecutor for pandemic fraud. \n",
"\n",
"By the end of this year, the deficit will be down to less than half what it was before I took office. \n",
@@ -409,19 +433,7 @@
"\n",
"Im a capitalist, but capitalism without competition isnt capitalism. \n",
"\n",
"Its exploitation—and it drives up prices.\n",
"----------------------------------------------------------------------------------------------------\n",
"Document 3:\n",
"\n",
"As Ohio Senator Sherrod Brown says, “Its time to bury the label “Rust Belt.” \n",
"\n",
"Its time. \n",
"\n",
"But with all the bright spots in our economy, record job growth and higher wages, too many families are struggling to keep up with the bills. \n",
"\n",
"Inflation is robbing them of the gains they might otherwise feel. \n",
"\n",
"I get it. Thats why my top priority is getting prices under control.\n"
"Its exploitation—and it drives up prices.\n"
]
}
],
@@ -443,7 +455,7 @@
},
{
"cell_type": "code",
"execution_count": 18,
"execution_count": 7,
"metadata": {
"collapsed": false,
"jupyter": {
@@ -459,7 +471,7 @@
},
{
"cell_type": "code",
"execution_count": 19,
"execution_count": 8,
"metadata": {
"collapsed": false,
"jupyter": {
@@ -471,10 +483,10 @@
"data": {
"text/plain": [
"{'query': 'What did the president say about Ketanji Brown Jackson',\n",
" 'result': \"The President said that Ketanji Brown Jackson is one of our nation's top legal minds and will continue Justice Breyer's legacy of excellence.\"}"
" 'result': \"The President mentioned that Ketanji Brown Jackson is one of the nation's top legal minds and will continue Justice Breyer's legacy of excellence.\"}"
]
},
"execution_count": 19,
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
@@ -500,7 +512,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.12"
"version": "3.12.2"
}
},
"nbformat": 4,

View File

@@ -30,7 +30,7 @@
"metadata": {},
"outputs": [],
"source": [
"%pip install --upgrade --quiet google-cloud-discoveryengine"
"%pip install --upgrade --quiet google-cloud-discoveryengine"
]
},
{
@@ -115,10 +115,12 @@
" - `global` (default)\n",
" - `us`\n",
" - `eu`\n",
"- `data_store_id` - The ID of the data store you want to use.\n",
" - Note: This was called `search_engine_id` in previous versions of the retriever.\n",
"\n",
"The `project_id` and `data_store_id` parameters can be provided explicitly in the retriever's constructor or through the environment variables - `PROJECT_ID` and `DATA_STORE_ID`.\n",
"One of:\n",
"- `search_engine_id` - The ID of the search app you want to use. (Required for Blended Search)\n",
"- `data_store_id` - The ID of the data store you want to use.\n",
"\n",
"The `project_id`, `search_engine_id` and `data_store_id` parameters can be provided explicitly in the retriever's constructor or through the environment variables - `PROJECT_ID`, `SEARCH_ENGINE_ID` and `DATA_STORE_ID`.\n",
"\n",
"You can also configure a number of optional parameters, including:\n",
"\n",
@@ -137,17 +139,17 @@
"- `engine_data_type` - Defines the Vertex AI Search data type\n",
" - `0` - Unstructured data\n",
" - `1` - Structured data\n",
" - `2` - Website data with [Advanced Website Indexing](https://cloud.google.com/generative-ai-app-builder/docs/about-advanced-features#advanced-website-indexing)\n",
" - `2` - Website data\n",
" - `3` - [Blended search](https://cloud.google.com/generative-ai-app-builder/docs/create-data-store-es#multi-data-stores)\n",
"\n",
"### Migration guide for `GoogleCloudEnterpriseSearchRetriever`\n",
"\n",
"In previous versions, this retriever was called `GoogleCloudEnterpriseSearchRetriever`. Some backwards-incompatible changes had to be made to the retriever after the General Availability launch due to changes in the product behavior.\n",
"In previous versions, this retriever was called `GoogleCloudEnterpriseSearchRetriever`.\n",
"\n",
"To update to the new retriever, make the following changes:\n",
"\n",
"- Change the import from: `from langchain.retrievers import GoogleCloudEnterpriseSearchRetriever` -> `from langchain.retrievers import GoogleVertexAISearchRetriever`.\n",
"- Change all class references from `GoogleCloudEnterpriseSearchRetriever` -> `GoogleVertexAISearchRetriever`.\n",
"- Upon class initialization, change the `search_engine_id` parameter name to `data_store_id`.\n"
"- Change all class references from `GoogleCloudEnterpriseSearchRetriever` -> `GoogleVertexAISearchRetriever`.\n"
]
},
{
@@ -170,6 +172,7 @@
"\n",
"PROJECT_ID = \"<YOUR PROJECT ID>\" # Set to your Project ID\n",
"LOCATION_ID = \"<YOUR LOCATION>\" # Set to your data store location\n",
"SEARCH_ENGINE_ID = \"<YOUR SEARCH APP ID>\" # Set to your search app ID\n",
"DATA_STORE_ID = \"<YOUR DATA STORE ID>\" # Set to your data store ID"
]
},
@@ -281,6 +284,32 @@
" print(doc)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Configure and use the retriever for **blended** data\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"retriever = GoogleVertexAISearchRetriever(\n",
" project_id=PROJECT_ID,\n",
" location_id=LOCATION_ID,\n",
" search_engine_id=SEARCH_ENGINE_ID,\n",
" max_documents=3,\n",
" engine_data_type=3,\n",
")\n",
"\n",
"result = retriever.get_relevant_documents(query)\n",
"for doc in result:\n",
" print(doc)"
]
},
{
"cell_type": "markdown",
"metadata": {},
@@ -322,7 +351,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.12"
"version": "3.11.0"
}
},
"nbformat": 4,

File diff suppressed because one or more lines are too long

View File

@@ -25,14 +25,21 @@
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"execution_count": 1,
"metadata": {
"ExecuteTime": {
"end_time": "2024-03-15T09:36:13.753824100Z",
"start_time": "2024-03-15T09:36:13.225834400Z"
}
},
"outputs": [],
"source": [
"from langchain_community.embeddings import SparkLLMTextEmbeddings\n",
"\n",
"embeddings = SparkLLMTextEmbeddings(\n",
" spark_app_id=\"sk-*\", spark_api_key=\"\", spark_api_secret=\"\"\n",
" spark_app_id=\"<spark_app_id>\",\n",
" spark_api_key=\"<spark_api_key>\",\n",
" spark_api_secret=\"<spark_api_secret>\",\n",
")"
]
},
@@ -45,44 +52,67 @@
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"execution_count": 3,
"metadata": {
"ExecuteTime": {
"end_time": "2024-03-15T09:36:25.436201400Z",
"start_time": "2024-03-15T09:36:25.313456600Z"
}
},
"outputs": [
{
"data": {
"text/plain": "[-0.043609619140625,\n 0.2017822265625,\n 0.0270843505859375,\n -0.250244140625,\n -0.024993896484375,\n -0.0382080078125,\n 0.06207275390625,\n -0.0146331787109375]"
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import os\n",
"text_q = \"Introducing iFlytek\"\n",
"\n",
"os.environ[\"SPARK_APP_ID\"] = \"YOUR_APP_ID\"\n",
"os.environ[\"SPARK_API_KEY\"] = \"YOUR_API_KEY\"\n",
"os.environ[\"SPARK_API_SECRET\"] = \"YOUR_API_SECRET\""
"text_1 = \"Science and Technology Innovation Company Limited, commonly known as iFlytek, is a leading Chinese technology company specializing in speech recognition, natural language processing, and artificial intelligence. With a rich history and remarkable achievements, iFlytek has emerged as a frontrunner in the field of intelligent speech and language technologies.iFlytek has made significant contributions to the advancement of human-computer interaction through its cutting-edge innovations. Their advanced speech recognition technology has not only improved the accuracy and efficiency of voice input systems but has also enabled seamless integration of voice commands into various applications and devices.The company's commitment to research and development has been instrumental in its success. iFlytek invests heavily in fostering talent and collaboration with academic institutions, resulting in groundbreaking advancements in speech synthesis and machine translation. Their dedication to innovation has not only transformed the way we communicate but has also enhanced accessibility for individuals with disabilities.\"\n",
"\n",
"text_2 = \"Moreover, iFlytek's impact extends beyond domestic boundaries, as they actively promote international cooperation and collaboration in the field of artificial intelligence. They have consistently participated in global competitions and contributed to the development of international standards.In recognition of their achievements, iFlytek has received numerous accolades and awards both domestically and internationally. Their contributions have revolutionized the way we interact with technology and have paved the way for a future where voice-based interfaces play a vital role.Overall, iFlytek is a trailblazer in the field of intelligent speech and language technologies, and their commitment to innovation and excellence deserves commendation.\"\n",
"\n",
"query_result = embeddings.embed_query(text_q)\n",
"query_result[:8]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"text_1 = \"iFLYTEK is a well-known intelligent speech and artificial intelligence publicly listed company in the Asia-Pacific Region. Since its establishment, the company is devoted to cornerstone technological research in speech and languages, natural language understanding, machine learning, machine reasoning, adaptive learning, and has maintained the world-leading position in those domains. The company actively promotes the development of A.I. products and their sector-based applications, with visions of enabling machines to listen and speak, understand and think, creating a better world with artificial intelligence.\"\n",
"text_2 = \"iFLYTEK Open Platform was launched in 2010 by iFLYTEK as Chinas first Artificial Intelligence open platform for Mobile Internet and intelligent hardware developers.\"\n",
"\n",
"query_result = embeddings.embed_query(text_2)\n",
"query_result"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"execution_count": 5,
"metadata": {
"ExecuteTime": {
"end_time": "2024-03-15T09:36:54.657224Z",
"start_time": "2024-03-15T09:36:54.404690400Z"
}
},
"outputs": [
{
"data": {
"text/plain": "[-0.161865234375,\n 0.58984375,\n 0.998046875,\n 0.365966796875,\n 0.72900390625,\n 0.6015625,\n -0.8408203125,\n -0.2666015625]"
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"doc_result = embeddings.embed_documents([text_1, text_2])\n",
"doc_result"
"doc_result[0][:8]"
]
}
],
"metadata": {
"language_info": {
"name": "python"
},
"kernelspec": {
"name": "python3",
"language": "python",
"display_name": "Python 3 (ipykernel)"
}
},
"nbformat": 4,

View File

@@ -9,7 +9,7 @@
"\n",
">[Voyage AI](https://www.voyageai.com/) provides cutting-edge embedding/vectorizations models.\n",
"\n",
"Let's load the Voyage Embedding class."
"Let's load the Voyage Embedding class. (Install the LangChain partner package with `pip install langchain-voyageai`)"
]
},
{
@@ -19,7 +19,7 @@
"metadata": {},
"outputs": [],
"source": [
"from langchain_community.embeddings import VoyageEmbeddings"
"from langchain_voyageai import VoyageAIEmbeddings"
]
},
{
@@ -37,7 +37,7 @@
"metadata": {},
"outputs": [],
"source": [
"embeddings = VoyageEmbeddings(\n",
"embeddings = VoyageAIEmbeddings(\n",
" voyage_api_key=\"[ Your Voyage API key ]\", model=\"voyage-2\"\n",
")"
]

View File

@@ -124,7 +124,7 @@
"outputs": [],
"source": [
"from langchain import hub\n",
"from langchain.agents import AgentExecutor, create_react_agent\n",
"from langchain.agents import AgentExecutor, create_openai_tools_agent\n",
"from langchain_openai import ChatOpenAI"
]
},
@@ -135,8 +135,8 @@
"outputs": [],
"source": [
"llm = ChatOpenAI(temperature=0, model=\"gpt-4\")\n",
"prompt = hub.pull(\"hwchase17/react\")\n",
"agent = create_react_agent(\n",
"prompt = hub.pull(\"hwchase17/openai-tools-agent\")\n",
"agent = create_openai_tools_agent(\n",
" tools=toolkit.get_tools(),\n",
" llm=llm,\n",
" prompt=prompt,\n",
@@ -151,7 +151,9 @@
"outputs": [],
"source": [
"agent_executor.invoke(\n",
" {\"input\": \"Send a greeting to my coworkers in the #general channel.\"}\n",
" {\n",
" \"input\": \"Send a greeting to my coworkers in the #general channel. Note use `channel` as key of channel id, and `message` as key of content to sent in the channel.\"\n",
" }\n",
")"
]
},

View File

@@ -9,17 +9,13 @@
"source": [
"# Azure Cosmos DB\n",
"\n",
">[Azure Cosmos DB for MongoDB vCore](https://learn.microsoft.com/en-us/azure/cosmos-db/mongodb/vcore/) makes it easy to create a database with full native MongoDB support.\n",
"> You can apply your MongoDB experience and continue to use your favorite MongoDB drivers, SDKs, and tools by pointing your application to the API for MongoDB vCore account's connection string.\n",
"> Use vector search in Azure Cosmos DB for MongoDB vCore to seamlessly integrate your AI-based applications with your data that's stored in Azure Cosmos DB.\n",
"\n",
"This notebook shows you how to leverage the [Vector Search](https://learn.microsoft.com/en-us/azure/cosmos-db/mongodb/vcore/vector-search) capabilities within Azure Cosmos DB for Mongo vCore to store documents in collections, create indicies and perform vector search queries using approximate nearest neighbor algorithms such as COS (cosine distance), L2 (Euclidean distance), and IP (inner product) to locate documents close to the query vectors. \n",
"This notebook shows you how to leverage this integrated [vector database](https://learn.microsoft.com/en-us/azure/cosmos-db/vector-database) to store documents in collections, create indicies and perform vector search queries using approximate nearest neighbor algorithms such as COS (cosine distance), L2 (Euclidean distance), and IP (inner product) to locate documents close to the query vectors. \n",
" \n",
"Azure Cosmos DB for MongoDB vCore provides developers with a fully managed MongoDB-compatible database service for building modern applications with a familiar architecture.\n",
"Azure Cosmos DB is the database that powers OpenAI's ChatGPT service. It offers single-digit millisecond response times, automatic and instant scalability, along with guaranteed speed at any scale. \n",
"\n",
"With Cosmos DB for MongoDB vCore, developers can enjoy the benefits of native Azure integrations, low total cost of ownership (TCO), and the familiar vCore architecture when migrating existing applications or building new ones.\n",
"Azure Cosmos DB for MongoDB vCore(https://learn.microsoft.com/en-us/azure/cosmos-db/mongodb/vcore/) provides developers with a fully managed MongoDB-compatible database service for building modern applications with a familiar architecture. You can apply your MongoDB experience and continue to use your favorite MongoDB drivers, SDKs, and tools by pointing your application to the API for MongoDB vCore account's connection string.\n",
"\n",
"[Sign Up](https://azure.microsoft.com/en-us/free/) for free to get started today.\n",
"[Sign Up](https://azure.microsoft.com/en-us/free/) for lifetime free access to get started today.\n",
" "
]
},

View File

@@ -9,13 +9,15 @@
}
},
"source": [
"# Tencent Cloud VectorDB\n",
"# Baidu VectorDB\n",
"\n",
">[Tencent Cloud VectorDB](https://cloud.tencent.com/document/product/1709) is a fully managed, self-developed, enterprise-level distributed database service designed for storing, retrieving, and analyzing multi-dimensional vector data. The database supports multiple index types and similarity calculation methods. A single index can support a vector scale of up to 1 billion and can support millions of QPS and millisecond-level query latency. Tencent Cloud Vector Database can not only provide an external knowledge base for large models to improve the accuracy of large model responses but can also be widely used in AI fields such as recommendation systems, NLP services, computer vision, and intelligent customer service.\n",
">[Baidu VectorDB](https://cloud.baidu.com/product/vdb.html) is a robust, enterprise-level distributed database service, meticulously developed and fully managed by Baidu Intelligent Cloud. It stands out for its exceptional ability to store, retrieve, and analyze multi-dimensional vector data. At its core, VectorDB operates on Baidu's proprietary \"Mochow\" vector database kernel, which ensures high performance, availability, and security, alongside remarkable scalability and user-friendliness.\n",
"\n",
"This notebook shows how to use functionality related to the Tencent vector database.\n",
">This database service supports a diverse range of index types and similarity calculation methods, catering to various use cases. A standout feature of VectorDB is its capacity to manage an immense vector scale of up to 10 billion, while maintaining impressive query performance, supporting millions of queries per second (QPS) with millisecond-level query latency.\n",
"\n",
"To run, you should have a [Database instance.](https://cloud.tencent.com/document/product/1709/95101)."
"This notebook shows how to use functionality related to the Baidu VectorDB. \n",
"\n",
"To run, you should have a [Database instance.](https://cloud.baidu.com/doc/VDB/s/hlrsoazuf)."
]
},
{
@@ -24,20 +26,22 @@
"metadata": {},
"outputs": [],
"source": [
"!pip3 install tcvectordb"
"!pip3 install pymochow"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"from langchain.text_splitter import CharacterTextSplitter\n",
"from langchain_community.document_loaders import TextLoader\n",
"from langchain_community.embeddings.fake import FakeEmbeddings\n",
"from langchain_community.vectorstores import TencentVectorDB\n",
"from langchain_community.vectorstores.tencentvectordb import ConnectionParams\n",
"from langchain_text_splitters import CharacterTextSplitter"
"from langchain_community.vectorstores import BaiduVectorDB\n",
"from langchain_community.vectorstores.baiduvectordb import ConnectionParams"
]
},
{
@@ -60,17 +64,11 @@
"outputs": [],
"source": [
"conn_params = ConnectionParams(\n",
" url=\"http://10.0.X.X\",\n",
" key=\"eC4bLRy2va******************************\",\n",
" username=\"root\",\n",
" timeout=20,\n",
" endpoint=\"http://192.168.xx.xx:xxxx\", account=\"root\", api_key=\"****\"\n",
")\n",
"\n",
"vector_db = TencentVectorDB.from_documents(\n",
" docs,\n",
" embeddings,\n",
" connection_params=conn_params,\n",
" # drop_old=True,\n",
"vector_db = BaiduVectorDB.from_documents(\n",
" docs, embeddings, connection_params=conn_params, drop=True\n",
")"
]
},
@@ -91,8 +89,7 @@
"metadata": {},
"outputs": [],
"source": [
"vector_db = TencentVectorDB(embeddings, conn_params)\n",
"\n",
"vector_db = BaiduVectorDB(embeddings, conn_params)\n",
"vector_db.add_texts([\"Ankush went to Princeton\"])\n",
"query = \"Where did Ankush go to college?\"\n",
"docs = vector_db.max_marginal_relevance_search(query)\n",
@@ -116,7 +113,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
"version": "3.9.6"
}
},
"nbformat": 4,

View File

@@ -0,0 +1,787 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "f63dfcf9-fd9d-4ac1-a0b3-c02d4dce7faf",
"metadata": {},
"source": [
"# Couchbase \n",
"[Couchbase](http://couchbase.com/) is an award-winning distributed NoSQL cloud database that delivers unmatched versatility, performance, scalability, and financial value for all of your cloud, mobile, AI, and edge computing applications. Couchbase embraces AI with coding assistance for developers and vector search for their applications.\n",
"\n",
"Vector Search is a part of the [Full Text Search Service](https://docs.couchbase.com/server/current/learn/services-and-indexes/services/search-service.html) (Search Service) in Couchbase.\n",
"\n",
"This tutorial explains how to use Vector Search in Couchbase. You can work with both [Couchbase Capella](https://www.couchbase.com/products/capella/) and your self-managed Couchbase Server."
]
},
{
"cell_type": "markdown",
"id": "43326be4-4433-4de2-ad42-6eb91a722bad",
"metadata": {},
"source": [
"## Installation"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "bec8d532-fec7-4dc7-9be3-020aa7bdb01f",
"metadata": {},
"outputs": [],
"source": [
"%pip install --upgrade --quiet langchain langchain-openai couchbase"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4a972cbc-bf59-46eb-9b50-e5dc3a69dcf0",
"metadata": {},
"outputs": [],
"source": [
"import getpass\n",
"import os\n",
"\n",
"os.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"OpenAI API Key:\")"
]
},
{
"cell_type": "markdown",
"id": "acf1b168-622f-465c-a9a5-d27a6d7e7a8f",
"metadata": {},
"source": [
"## Import the Vector Store and Embeddings"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "23ce45ab-bfd2-42e1-b681-514a550f0232",
"metadata": {},
"outputs": [],
"source": [
"from langchain_community.vectorstores import CouchbaseVectorStore\n",
"from langchain_openai import OpenAIEmbeddings"
]
},
{
"cell_type": "markdown",
"id": "3144ba02-1eaa-4449-853e-f034ca5706bf",
"metadata": {},
"source": [
"## Create Couchbase Connection Object\n",
"We create a connection to the Couchbase cluster initially and then pass the cluster object to the Vector Store. \n",
"\n",
"Here, we are connecting using the username and password. You can also connect using any other supported way to your cluster. \n",
"\n",
"For more information on connecting to the Couchbase cluster, please check the [Python SDK documentation](https://docs.couchbase.com/python-sdk/current/hello-world/start-using-sdk.html#connect)."
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "52fe583a-12db-4dc2-9281-1174bf1d4e5c",
"metadata": {},
"outputs": [],
"source": [
"COUCHBASE_CONNECTION_STRING = (\n",
" \"couchbase://localhost\" # or \"couchbases://localhost\" if using TLS\n",
")\n",
"DB_USERNAME = \"Administrator\"\n",
"DB_PASSWORD = \"Password\""
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "9986c6b9",
"metadata": {},
"outputs": [],
"source": [
"from datetime import timedelta\n",
"\n",
"from couchbase.auth import PasswordAuthenticator\n",
"from couchbase.cluster import Cluster\n",
"from couchbase.options import ClusterOptions\n",
"\n",
"auth = PasswordAuthenticator(DB_USERNAME, DB_PASSWORD)\n",
"options = ClusterOptions(auth)\n",
"cluster = Cluster(COUCHBASE_CONNECTION_STRING, options)\n",
"\n",
"# Wait until the cluster is ready for use.\n",
"cluster.wait_until_ready(timedelta(seconds=5))"
]
},
{
"cell_type": "markdown",
"id": "90c5dec9-f6cb-41eb-9f30-13cab7b107db",
"metadata": {},
"source": [
"We will now set the bucket, scope, and collection names in the Couchbase cluster that we want to use for Vector Search. \n",
"\n",
"For this example, we are using the default scope & collections."
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "1b1d0a26-e9d4-4823-9800-9549d24d3d16",
"metadata": {},
"outputs": [],
"source": [
"BUCKET_NAME = \"testing\"\n",
"SCOPE_NAME = \"_default\"\n",
"COLLECTION_NAME = \"_default\"\n",
"SEARCH_INDEX_NAME = \"vector-index\""
]
},
{
"cell_type": "markdown",
"id": "efbac6ff-c2ac-4443-9250-7cc88061346b",
"metadata": {},
"source": [
"For this tutorial, we will use OpenAI embeddings"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "87625579-86d7-4de4-8a4d-cee674a6b676",
"metadata": {},
"outputs": [],
"source": [
"embeddings = OpenAIEmbeddings()"
]
},
{
"cell_type": "markdown",
"id": "3677b4b0-3711-419c-89ff-32ef4d3e3022",
"metadata": {},
"source": [
"## Create the Search Index\n",
"Currently, the Search index needs to be created from the Couchbase Capella or Server UI or using the REST interface. \n",
"\n",
"Let us define a Search index with the name `vector-index` on the testing bucket\n",
"\n",
"For this example, let us use the Import Index feature on the Search Service on the UI. \n",
"\n",
"We are defining an index on the `testing` bucket's `_default` scope on the `_default` collection with the vector field set to `embedding` with 1536 dimensions and the text field set to `text`. We are also indexing and storing all the fields under `metadata` in the document as a dynamic mapping to account for varying document structures. The similarity metric is set to `dot_product`."
]
},
{
"cell_type": "markdown",
"id": "655117ae-9b1f-4139-b437-ca7685975a54",
"metadata": {},
"source": [
"### How to Import an Index to the Full Text Search service?\n",
" - [Couchbase Server](https://docs.couchbase.com/server/current/search/import-search-index.html)\n",
" - Click on Search -> Add Index -> Import\n",
" - Copy the following Index definition in the Import screen\n",
" - Click on Create Index to create the index.\n",
" - [Couchbase Capella](https://docs.couchbase.com/cloud/search/import-search-index.html)\n",
" - Copy the index definition to a new file `index.json`\n",
" - Import the file in Capella using the instructions in the documentation.\n",
" - Click on Create Index to create the index.\n",
" \n"
]
},
{
"cell_type": "markdown",
"id": "f85bc468-d9b8-487d-999a-3b5d2fb78e41",
"metadata": {},
"source": [
"### Index Definition\n",
"```\n",
"{\n",
" \"name\": \"vector-index\",\n",
" \"type\": \"fulltext-index\",\n",
" \"params\": {\n",
" \"doc_config\": {\n",
" \"docid_prefix_delim\": \"\",\n",
" \"docid_regexp\": \"\",\n",
" \"mode\": \"type_field\",\n",
" \"type_field\": \"type\"\n",
" },\n",
" \"mapping\": {\n",
" \"default_analyzer\": \"standard\",\n",
" \"default_datetime_parser\": \"dateTimeOptional\",\n",
" \"default_field\": \"_all\",\n",
" \"default_mapping\": {\n",
" \"dynamic\": true,\n",
" \"enabled\": true,\n",
" \"properties\": {\n",
" \"metadata\": {\n",
" \"dynamic\": true,\n",
" \"enabled\": true\n",
" },\n",
" \"embedding\": {\n",
" \"enabled\": true,\n",
" \"dynamic\": false,\n",
" \"fields\": [\n",
" {\n",
" \"dims\": 1536,\n",
" \"index\": true,\n",
" \"name\": \"embedding\",\n",
" \"similarity\": \"dot_product\",\n",
" \"type\": \"vector\",\n",
" \"vector_index_optimized_for\": \"recall\"\n",
" }\n",
" ]\n",
" },\n",
" \"text\": {\n",
" \"enabled\": true,\n",
" \"dynamic\": false,\n",
" \"fields\": [\n",
" {\n",
" \"index\": true,\n",
" \"name\": \"text\",\n",
" \"store\": true,\n",
" \"type\": \"text\"\n",
" }\n",
" ]\n",
" }\n",
" }\n",
" },\n",
" \"default_type\": \"_default\",\n",
" \"docvalues_dynamic\": false,\n",
" \"index_dynamic\": true,\n",
" \"store_dynamic\": true,\n",
" \"type_field\": \"_type\"\n",
" },\n",
" \"store\": {\n",
" \"indexType\": \"scorch\",\n",
" \"segmentVersion\": 16\n",
" }\n",
" },\n",
" \"sourceType\": \"gocbcore\",\n",
" \"sourceName\": \"testing\",\n",
" \"sourceParams\": {},\n",
" \"planParams\": {\n",
" \"maxPartitionsPerPIndex\": 103,\n",
" \"indexPartitions\": 10,\n",
" \"numReplicas\": 0\n",
" }\n",
"}\n",
"```"
]
},
{
"cell_type": "markdown",
"id": "556dc68c-9089-4390-8dc9-b77051e7fc34",
"metadata": {},
"source": [
"For more details on how to create a Search index with support for Vector fields, please refer to the documentation.\n",
"\n",
"- [Couchbase Capella](https://docs.couchbase.com/cloud/vector-search/create-vector-search-index-ui.html)\n",
" \n",
"- [Couchbase Server](https://docs.couchbase.com/server/current/vector-search/create-vector-search-index-ui.html)"
]
},
{
"cell_type": "markdown",
"id": "75f4037d-e509-4de7-a8d1-63a05de24e9d",
"metadata": {},
"source": [
"## Create Vector Store\n",
"We create the vector store object with the cluster information and the search index name."
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "33db4670-76c5-49ba-94d6-a8fa35583058",
"metadata": {},
"outputs": [],
"source": [
"vector_store = CouchbaseVectorStore(\n",
" cluster=cluster,\n",
" bucket_name=BUCKET_NAME,\n",
" scope_name=SCOPE_NAME,\n",
" collection_name=COLLECTION_NAME,\n",
" embedding=embeddings,\n",
" index_name=SEARCH_INDEX_NAME,\n",
")"
]
},
{
"cell_type": "markdown",
"id": "0aa98793-5ac2-4f76-bbba-2d40856c2d58",
"metadata": {},
"source": [
"### Specify the Text & Embeddings Field\n",
"You can optionally specify the text & embeddings field for the document using the `text_key` and `embedding_key` fields.\n",
"```\n",
"vector_store = CouchbaseVectorStore(\n",
" cluster=cluster,\n",
" bucket_name=BUCKET_NAME,\n",
" scope_name=SCOPE_NAME,\n",
" collection_name=COLLECTION_NAME,\n",
" embedding=embeddings,\n",
" index_name=SEARCH_INDEX_NAME,\n",
" text_key=\"text\",\n",
" embedding_key=\"embedding\",\n",
")\n",
"```"
]
},
{
"cell_type": "markdown",
"id": "790dc1ac-0ab8-4cb5-989d-31ca7c241068",
"metadata": {},
"source": [
"## Basic Vector Search Example\n",
"For this example, we are going to load the \"state_of_the_union.txt\" file via the TextLoader, chunk the text into 500 character chunks with no overlaps and index all these chunks into Couchbase.\n",
"\n",
"After the data is indexed, we perform a simple query to find the top 4 chunks that are similar to the query \"What did president say about Ketanji Brown Jackson\".\n"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "440350df-cbc6-48f7-8009-2e783be18306",
"metadata": {},
"outputs": [],
"source": [
"from langchain.text_splitter import CharacterTextSplitter\n",
"from langchain_community.document_loaders import TextLoader\n",
"\n",
"loader = TextLoader(\"../../modules/state_of_the_union.txt\")\n",
"documents = loader.load()\n",
"text_splitter = CharacterTextSplitter(chunk_size=500, chunk_overlap=0)\n",
"docs = text_splitter.split_documents(documents)"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "9d3b4c7c-abd6-4dfa-ad63-470f16661319",
"metadata": {},
"outputs": [],
"source": [
"vector_store = CouchbaseVectorStore.from_documents(\n",
" documents=docs,\n",
" embedding=embeddings,\n",
" cluster=cluster,\n",
" bucket_name=BUCKET_NAME,\n",
" scope_name=SCOPE_NAME,\n",
" collection_name=COLLECTION_NAME,\n",
" index_name=SEARCH_INDEX_NAME,\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "91fdce6c-8f7c-4060-865a-2fd742846664",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"page_content='One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \\n\\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nations top legal minds, who will continue Justice Breyers legacy of excellence.' metadata={'source': '../../modules/state_of_the_union.txt'}\n"
]
}
],
"source": [
"query = \"What did president say about Ketanji Brown Jackson\"\n",
"results = vector_store.similarity_search(query)\n",
"print(results[0])"
]
},
{
"cell_type": "markdown",
"id": "d9b46c93-65f6-4e4f-87a2-5cebea3b7a6b",
"metadata": {},
"source": [
"## Similarity Search with Score\n",
"You can fetch the scores for the results by calling the `similarity_search_with_score` method."
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "24b146b2-55a2-4fe8-8659-3649032f5dc7",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"page_content='One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \\n\\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nations top legal minds, who will continue Justice Breyers legacy of excellence.' metadata={'source': '../../modules/state_of_the_union.txt'}\n",
"Score: 0.8211871385574341\n"
]
}
],
"source": [
"query = \"What did president say about Ketanji Brown Jackson\"\n",
"results = vector_store.similarity_search_with_score(query)\n",
"document, score = results[0]\n",
"print(document)\n",
"print(f\"Score: {score}\")"
]
},
{
"cell_type": "markdown",
"id": "9983e83d-efd0-4b75-80db-150e0694e822",
"metadata": {},
"source": [
"## Specifying Fields to Return\n",
"You can specify the fields to return from the document using `fields` parameter in the searches. These fields are returned as part of the `metadata` object in the returned Document. You can fetch any field that is stored in the Search index. The `text_key` of the document is returned as part of the document's `page_content`.\n",
"\n",
"If you do not specify any fields to be fetched, all the fields stored in the index are returned.\n",
"\n",
"If you want to fetch one of the fields in the metadata, you need to specify it using `.`\n",
"\n",
"For example, to fetch the `source` field in the metadata, you need to specify `metadata.source`.\n"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "ffa743dc-4e89-405b-ad71-7390338889e6",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"page_content='One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \\n\\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nations top legal minds, who will continue Justice Breyers legacy of excellence.' metadata={'source': '../../modules/state_of_the_union.txt'}\n"
]
}
],
"source": [
"query = \"What did president say about Ketanji Brown Jackson\"\n",
"results = vector_store.similarity_search(query, fields=[\"metadata.source\"])\n",
"print(results[0])"
]
},
{
"cell_type": "markdown",
"id": "a5e45eb2-aa97-45df-bcc5-410e9626e506",
"metadata": {},
"source": [
"## Hybrid Search\n",
"Couchbase allows you to do hybrid searches by combining Vector Search results with searches on non-vector fields of the document like the `metadata` object. \n",
"\n",
"The results will be based on the combination of the results from both Vector Search and the searches supported by Search Service. The scores of each of the component searches are added up to get the total score of the result.\n",
"\n",
"To perform hybrid searches, there is an optional parameter, `search_options` that can be passed to all the similarity searches. \n",
"The different search/query possibilities for the `search_options` can be found [here](https://docs.couchbase.com/server/current/search/search-request-params.html#query-object)."
]
},
{
"cell_type": "markdown",
"id": "a5db3685-1918-4c63-8148-0bb3a71ea677",
"metadata": {},
"source": [
"### Create Diverse Metadata for Hybrid Search\n",
"In order to simulate hybrid search, let us create some random metadata from the existing documents. \n",
"We uniformly add three fields to the metadata, `date` between 2010 & 2020, `rating` between 1 & 5 and `author` set to either John Doe or Jane Doe. "
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "7d2e607d-6bbc-4cef-83e3-b6a28bb269ea",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'author': 'John Doe', 'date': '2016-01-01', 'rating': 2, 'source': '../../modules/state_of_the_union.txt'}\n"
]
}
],
"source": [
"# Adding metadata to documents\n",
"for i, doc in enumerate(docs):\n",
" doc.metadata[\"date\"] = f\"{range(2010, 2020)[i % 10]}-01-01\"\n",
" doc.metadata[\"rating\"] = range(1, 6)[i % 5]\n",
" doc.metadata[\"author\"] = [\"John Doe\", \"Jane Doe\"][i % 2]\n",
"\n",
"vector_store.add_documents(docs)\n",
"\n",
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
"results = vector_store.similarity_search(query)\n",
"print(results[0].metadata)"
]
},
{
"cell_type": "markdown",
"id": "6cad893b-3977-4556-ab1d-d12bce68b306",
"metadata": {},
"source": [
"### Example: Search by Exact Value\n",
"We can search for exact matches on a textual field like the author in the `metadata` object."
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "dc06ba4a-8a6b-4c55-bb69-95cd92db273f",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"page_content='This is personal to me and Jill, to Kamala, and to so many of you. \\n\\nCancer is the #2 cause of death in Americasecond only to heart disease. \\n\\nLast month, I announced our plan to supercharge \\nthe Cancer Moonshot that President Obama asked me to lead six years ago. \\n\\nOur goal is to cut the cancer death rate by at least 50% over the next 25 years, turn more cancers from death sentences into treatable diseases. \\n\\nMore support for patients and families.' metadata={'author': 'John Doe'}\n"
]
}
],
"source": [
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
"results = vector_store.similarity_search(\n",
" query,\n",
" search_options={\"query\": {\"field\": \"metadata.author\", \"match\": \"John Doe\"}},\n",
" fields=[\"metadata.author\"],\n",
")\n",
"print(results[0])"
]
},
{
"cell_type": "markdown",
"id": "9106b594-b41e-4329-b98c-9b9f8a34d6f7",
"metadata": {},
"source": [
"### Example: Search by Partial Match\n",
"We can search for partial matches by specifying a fuzziness for the search. This is useful when you want to search for slight variations or misspellings of a search query.\n",
"\n",
"Here, \"Jae\" is close (fuzziness of 1) to \"Jane\"."
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "fd4749e6-ef4f-4cb5-95ff-37c4fa8283d8",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"page_content='A former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus builder. Since shes been nominated, shes received a broad range of support—from the Fraternal Order of Police to former judges appointed by Democrats and Republicans. \\n\\nAnd if we are to advance liberty and justice, we need to secure the Border and fix the immigration system.' metadata={'author': 'Jane Doe'}\n"
]
}
],
"source": [
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
"results = vector_store.similarity_search(\n",
" query,\n",
" search_options={\n",
" \"query\": {\"field\": \"metadata.author\", \"match\": \"Jae\", \"fuzziness\": 1}\n",
" },\n",
" fields=[\"metadata.author\"],\n",
")\n",
"print(results[0])"
]
},
{
"cell_type": "markdown",
"id": "1bbf9449-6e30-4bd1-9eeb-f3b60952fcab",
"metadata": {},
"source": [
"### Example: Search by Date Range Query\n",
"We can search for documents that are within a date range query on a date field like `metadata.date`."
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "b7b47e7d-c32f-4999-bce9-3c3c3cebffd0",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"page_content='He will never extinguish their love of freedom. He will never weaken the resolve of the free world. \\n\\nWe meet tonight in an America that has lived through two of the hardest years this nation has ever faced. \\n\\nThe pandemic has been punishing. \\n\\nAnd so many families are living paycheck to paycheck, struggling to keep up with the rising cost of food, gas, housing, and so much more. \\n\\nI understand.' metadata={'author': 'Jane Doe', 'date': '2017-01-01', 'rating': 3, 'source': '../../modules/state_of_the_union.txt'}\n"
]
}
],
"source": [
"query = \"Any mention about independence?\"\n",
"results = vector_store.similarity_search(\n",
" query,\n",
" search_options={\n",
" \"query\": {\n",
" \"start\": \"2016-12-31\",\n",
" \"end\": \"2017-01-02\",\n",
" \"inclusive_start\": True,\n",
" \"inclusive_end\": False,\n",
" \"field\": \"metadata.date\",\n",
" }\n",
" },\n",
")\n",
"print(results[0])"
]
},
{
"cell_type": "markdown",
"id": "a18d4ea2-bfab-4f15-9839-674faf1c6f0d",
"metadata": {},
"source": [
"### Example: Search by Numeric Range Query\n",
"We can search for documents that are within a range for a numeric field like `metadata.rating`."
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "7e8bf7c5-07d1-4c3f-86d7-1fa3a454dc7f",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"(Document(page_content='He will never extinguish their love of freedom. He will never weaken the resolve of the free world. \\n\\nWe meet tonight in an America that has lived through two of the hardest years this nation has ever faced. \\n\\nThe pandemic has been punishing. \\n\\nAnd so many families are living paycheck to paycheck, struggling to keep up with the rising cost of food, gas, housing, and so much more. \\n\\nI understand.', metadata={'author': 'Jane Doe', 'date': '2017-01-01', 'rating': 3, 'source': '../../modules/state_of_the_union.txt'}), 0.9000703597577832)\n"
]
}
],
"source": [
"query = \"Any mention about independence?\"\n",
"results = vector_store.similarity_search_with_score(\n",
" query,\n",
" search_options={\n",
" \"query\": {\n",
" \"min\": 3,\n",
" \"max\": 5,\n",
" \"inclusive_min\": True,\n",
" \"inclusive_max\": True,\n",
" \"field\": \"metadata.rating\",\n",
" }\n",
" },\n",
")\n",
"print(results[0])"
]
},
{
"cell_type": "markdown",
"id": "0f16bf86-f01c-4a77-8406-275f7313f493",
"metadata": {},
"source": [
"### Example: Combining Multiple Search Queries\n",
"Different search queries can be combined using AND (conjuncts) or OR (disjuncts) operators.\n",
"\n",
"In this example, we are checking for documents with a rating between 3 & 4 and dated between 2015 & 2018."
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "dd0fe7f1-aa40-4c6f-889b-99ad5efcd88b",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"(Document(page_content='He will never extinguish their love of freedom. He will never weaken the resolve of the free world. \\n\\nWe meet tonight in an America that has lived through two of the hardest years this nation has ever faced. \\n\\nThe pandemic has been punishing. \\n\\nAnd so many families are living paycheck to paycheck, struggling to keep up with the rising cost of food, gas, housing, and so much more. \\n\\nI understand.', metadata={'author': 'Jane Doe', 'date': '2017-01-01', 'rating': 3, 'source': '../../modules/state_of_the_union.txt'}), 1.3598770370389914)\n"
]
}
],
"source": [
"query = \"Any mention about independence?\"\n",
"results = vector_store.similarity_search_with_score(\n",
" query,\n",
" search_options={\n",
" \"query\": {\n",
" \"conjuncts\": [\n",
" {\"min\": 3, \"max\": 4, \"inclusive_max\": True, \"field\": \"metadata.rating\"},\n",
" {\"start\": \"2016-12-31\", \"end\": \"2017-01-02\", \"field\": \"metadata.date\"},\n",
" ]\n",
" }\n",
" },\n",
")\n",
"print(results[0])"
]
},
{
"cell_type": "markdown",
"id": "39258571-3233-45c3-a6ad-5c3c90ea2b1c",
"metadata": {},
"source": [
"### Other Queries\n",
"Similarly, you can use any of the supported Query methods like Geo Distance, Polygon Search, Wildcard, Regular Expressions, etc in the `search_options` parameter. Please refer to the documentation for more details on the available query methods and their syntax.\n",
"\n",
"- [Couchbase Capella](https://docs.couchbase.com/cloud/search/search-request-params.html#query-object)\n",
"- [Couchbase Server](https://docs.couchbase.com/server/current/search/search-request-params.html#query-object)"
]
},
{
"cell_type": "markdown",
"id": "80958c2b-6a67-45e6-b7f0-fd2461d75e0f",
"metadata": {},
"source": [
"# Frequently Asked Questions"
]
},
{
"cell_type": "markdown",
"id": "4f7f9838-cc20-44bc-a72d-06f2cb6c3fca",
"metadata": {},
"source": [
"## Question: Should I create the Search index before creating the CouchbaseVectorStore object?\n",
"Yes, currently you need to create the Search index before creating the `CouchbaseVectoreStore` object.\n"
]
},
{
"cell_type": "markdown",
"id": "3f0dbc1b-9e82-4ec3-9330-6b54de00661e",
"metadata": {},
"source": [
"## Question: I am not seeing all the fields that I specified in my search results. \n",
"\n",
"In Couchbase, we can only return the fields stored in the Search index. Please ensure that the field that you are trying to access in the search results is part of the Search index.\n",
"\n",
"One way to handle this is to index and store a document's fields dynamically in the index. \n",
"\n",
"- In Capella, you need to go to \"Advanced Mode\" then under the chevron \"General Settings\" you can check \"[X] Store Dynamic Fields\" or \"[X] Index Dynamic Fields\"\n",
"- In Couchbase Server, in the Index Editor (not Quick Editor) under the chevron \"Advanced\" you can check \"[X] Store Dynamic Fields\" or \"[X] Index Dynamic Fields\"\n",
"\n",
"Note that these options will increase the size of the index.\n",
"\n",
"For more details on dynamic mappings, please refer to the [documentation](https://docs.couchbase.com/cloud/search/customize-index.html).\n"
]
},
{
"cell_type": "markdown",
"id": "3702977a-2e25-48b6-b662-edd5cb94cdec",
"metadata": {},
"source": [
"## Question: I am unable to see the metadata object in my search results. \n",
"This is most likely due to the `metadata` field in the document not being indexed and/or stored by the Couchbase Search index. In order to index the `metadata` field in the document, you need to add it to the index as a child mapping. \n",
"\n",
"If you select to map all the fields in the mapping, you will be able to search by all metadata fields. Alternatively, to optimize the index, you can select the specific fields inside `metadata` object to be indexed. You can refer to the [docs](https://docs.couchbase.com/cloud/search/customize-index.html) to learn more about indexing child mappings.\n",
"\n",
"Creating Child Mappings\n",
"\n",
"* [Couchbase Capella](https://docs.couchbase.com/cloud/search/create-child-mapping.html)\n",
"* [Couchbase Server](https://docs.couchbase.com/server/current/search/create-child-mapping.html)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.13"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -9,7 +9,7 @@
"\n",
"This notebook shows how to use functionality related to the `Google Cloud Vertex AI Vector Search` vector database.\n",
"\n",
"> [Google Vertex AI Vector Search](https://cloud.google.com/vertex-ai/docs/matching-engine/overview), formerly known as Vertex AI Matching Engine, provides the industry's leading high-scale low latency vector database. These vector databases are commonly referred to as vector similarity-matching or an approximate nearest neighbor (ANN) service.\n",
"> [Google Vertex AI Vector Search](https://cloud.google.com/vertex-ai/docs/vector-search/overview), formerly known as Vertex AI Matching Engine, provides the industry's leading high-scale low latency vector database. These vector databases are commonly referred to as vector similarity-matching or an approximate nearest neighbor (ANN) service.\n",
"\n",
"**Note**: This module expects an endpoint and deployed index already created as the creation time takes close to one hour. To see how to create an index refer to the section [Create Index and deploy it to an Endpoint](#create-index-and-deploy-it-to-an-endpoint)"
]
@@ -29,7 +29,7 @@
"metadata": {},
"outputs": [],
"source": [
"from langchain_community.vectorstores import MatchingEngine"
"from langchain_google_vertexai import VectorSearchVectorStore"
]
},
{
@@ -50,7 +50,7 @@
"]\n",
"\n",
"\n",
"vector_store = MatchingEngine.from_components(\n",
"vector_store = VectorSearchVectorStore.from_components(\n",
" texts=texts,\n",
" project_id=\"<my_project_id>\",\n",
" region=\"<my_region>\",\n",

View File

@@ -37,9 +37,21 @@
"\n",
"To run this demo we need a running Infinispan instance without authentication and a data file.\n",
"In the next three cells we're going to:\n",
"- download the data file\n",
"- create the configuration\n",
"- run Infinispan in docker\n",
"- download the data file"
"- run Infinispan in docker"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9678d5ce-894c-4e28-bf68-20d45507122f",
"metadata": {},
"outputs": [],
"source": [
"%%bash\n",
"#get an archive of news\n",
"wget https://raw.githubusercontent.com/rigazilla/infinispan-vector/main/bbc_news.csv.gz"
]
},
{
@@ -76,18 +88,6 @@
"' > infinispan-noauth.yaml"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9678d5ce-894c-4e28-bf68-20d45507122f",
"metadata": {},
"outputs": [],
"source": [
"%%bash\n",
"#get an archive of news\n",
"wget https://raw.githubusercontent.com/rigazilla/infinispan-vector/main/bbc_news.csv.gz"
]
},
{
"cell_type": "code",
"execution_count": null,
@@ -95,7 +95,8 @@
"metadata": {},
"outputs": [],
"source": [
"!docker run -d --name infinispanvs-demo -v $(pwd):/user-config -p 11222:11222 infinispan/server:15.0.0.Dev09 -c /user-config/infinispan-noauth.yaml "
"!docker rm --force infinispanvs-demo\n",
"!docker run -d --name infinispanvs-demo -v $(pwd):/user-config -p 11222:11222 infinispan/server:15.0 -c /user-config/infinispan-noauth.yaml"
]
},
{
@@ -133,80 +134,8 @@
"## Setup Infinispan cache\n",
"\n",
"Infinispan is a very flexible key-value store, it can store raw bits as well as complex data type.\n",
"We need to configure it to store data containing embedded vectors.\n",
"\n",
"In the next cells we're going to:\n",
"- create an empty Infinispan VectoreStore\n",
"- deploy a protobuf definition of our data\n",
"- create a cache"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "49668bf1-778b-466d-86fb-41747ed52b74",
"metadata": {},
"outputs": [],
"source": [
"# Creating a langchain_core.VectorStore\n",
"from langchain_community.vectorstores import InfinispanVS\n",
"\n",
"ispnvs = InfinispanVS.from_texts(\n",
" texts={}, embedding=hf, cache_name=\"demo_cache\", entity_name=\"demo_entity\"\n",
")\n",
"ispn = ispnvs.ispn"
]
},
{
"cell_type": "markdown",
"id": "0cedf066-aaab-4185-b049-93eea9b48329",
"metadata": {},
"source": [
"### Protobuf definition\n",
"\n",
"Below there's the protobuf definition of our data type that contains:\n",
"- embedded vector (field 1)\n",
"- text of the news (2)\n",
"- title of the news (3)\n",
"\n",
"As you can see, there are additional annotations in the comments that tell Infinispan that:\n",
"- data type must be indexed (`@Indexed`)\n",
"- field 1 is an embeddeded vector (`@Vector`)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1fa0add0-8317-4667-9b8c-5d91c47f752a",
"metadata": {},
"outputs": [],
"source": [
"import json\n",
"\n",
"# Infinispan supports protobuf schemas\n",
"schema_vector = \"\"\"\n",
"/**\n",
" * @Indexed\n",
" */\n",
"message demo_entity {\n",
"/**\n",
" * @Vector(dimension=384)\n",
" */\n",
"repeated float vector = 1;\n",
"optional string text = 2;\n",
"optional string title = 3;\n",
"}\n",
"\"\"\"\n",
"# Cleanup before deploy a new schema\n",
"ispnvs.schema_delete()\n",
"output = ispnvs.schema_create(schema_vector)\n",
"assert output.status_code == 200\n",
"assert json.loads(output.text)[\"error\"] is None\n",
"# Create the cache\n",
"ispnvs.cache_create()\n",
"# Cleanup old data and index\n",
"ispnvs.cache_clear()\n",
"ispnvs.cache_index_reindex()"
"User has complete freedom in the datagrid configuration, but for simple data type everything is automatically\n",
"configured by the python layer. We take advantage of this feature so we can focus on our application."
]
},
{
@@ -216,8 +145,7 @@
"source": [
"## Prepare the data\n",
"\n",
"In this demo we choose to store text,vector and metadata in the same cache, but other options\n",
"are possible: i.e. content can be store somewhere else and vector store could contain only a reference to the actual content."
"In this demo we rely on the default configuration, thus texts, metadatas and vectors in the same cache, but other options are possible: i.e. content can be store somewhere else and vector store could contain only a reference to the actual content."
]
},
{
@@ -239,15 +167,12 @@
" metas = []\n",
" embeds = []\n",
" for row in spamreader:\n",
" # first and fifth value are joined to form the content\n",
" # first and fifth values are joined to form the content\n",
" # to be processed\n",
" text = row[0] + \".\" + row[4]\n",
" texts.append(text)\n",
" # Storing meta\n",
" # Store text and title as metadata\n",
" meta = {}\n",
" meta[\"text\"] = row[4]\n",
" meta[\"title\"] = row[0]\n",
" meta = {\"text\": row[4], \"title\": row[0]}\n",
" metas.append(meta)\n",
" i = i + 1\n",
" # Change this to change the number of news you want to load\n",
@@ -271,7 +196,10 @@
"outputs": [],
"source": [
"# add texts and fill vector db\n",
"keys = ispnvs.add_texts(texts, metas)"
"\n",
"from langchain_community.vectorstores import InfinispanVS\n",
"\n",
"ispnvs = InfinispanVS.from_texts(texts, hf, metas)"
]
},
{
@@ -361,18 +289,6 @@
"print_docs(ispnvs.similarity_search(\"How to stay young\", 5))"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "862e4af2-9f8a-4985-90cb-997477901b1e",
"metadata": {},
"outputs": [],
"source": [
"# Clean up\n",
"ispnvs.schema_delete()\n",
"ispnvs.cache_delete()"
]
},
{
"cell_type": "code",
"execution_count": null,
@@ -400,7 +316,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.18"
"version": "3.9.18"
}
},
"nbformat": 4,

View File

@@ -30,7 +30,7 @@
"\n",
"* `content` of type \"Text\". This is used to store the `Document.pageContent` values.\n",
"* `embedding` of type \"Vector\". Use the dimension used by the model you plan to use. In this notebook we use OpenAI embeddings, which have 1536 dimensions.\n",
"* `search` of type \"Text\". This is used as a metadata column by this example.\n",
"* `source` of type \"Text\". This is used as a metadata column by this example.\n",
"* any other columns you want to use as metadata. They are populated from the `Document.metadata` object. For example, if in the `Document.metadata` object you have a `title` property, you can create a `title` column in the table and it will be populated.\n"
]
},

View File

@@ -10,7 +10,7 @@
"Splits the text based on semantic similarity.\n",
"\n",
"Taken from Greg Kamradt's wonderful notebook:\n",
"https://github.com/FullStackRetrieval-com/RetrievalTutorials/blob/main/5_Levels_Of_Text_Splitting.ipynb\n",
"[5_Levels_Of_Text_Splitting](https://github.com/FullStackRetrieval-com/RetrievalTutorials/blob/main/5_Levels_Of_Text_Splitting.ipynb)\n",
"\n",
"All credit to him.\n",
"\n",

View File

@@ -49,6 +49,14 @@
"from langchain_text_splitters import CharacterTextSplitter"
]
},
{
"cell_type": "markdown",
"id": "a3ba1d8a",
"metadata": {},
"source": [
"The `.from_tiktoken_encoder()` method takes either `encoding` as an argument (e.g. `cl100k_base`), or the `model_name` (e.g. `gpt-4`). All additional arguments like `chunk_size`, `chunk_overlap`, and `separators` are used to instantiate `CharacterTextSplitter`:"
]
},
{
"cell_type": "code",
"execution_count": 2,
@@ -57,7 +65,7 @@
"outputs": [],
"source": [
"text_splitter = CharacterTextSplitter.from_tiktoken_encoder(\n",
" chunk_size=100, chunk_overlap=0\n",
" encoding=\"cl100k_base\", chunk_size=100, chunk_overlap=0\n",
")\n",
"texts = text_splitter.split_text(state_of_the_union)"
]
@@ -91,9 +99,31 @@
"id": "de5b6a6e",
"metadata": {},
"source": [
"Note that if we use `CharacterTextSplitter.from_tiktoken_encoder`, text is only split by `CharacterTextSplitter` and `tiktoken` tokenizer is used to merge splits. It means that split can be larger than chunk size measured by `tiktoken` tokenizer. We can use `RecursiveCharacterTextSplitter.from_tiktoken_encoder` to make sure splits are not larger than chunk size of tokens allowed by the language model, where each split will be recursively split if it has a larger size.\n",
"Note that if we use `CharacterTextSplitter.from_tiktoken_encoder`, text is only split by `CharacterTextSplitter` and `tiktoken` tokenizer is used to merge splits. It means that split can be larger than chunk size measured by `tiktoken` tokenizer. We can use `RecursiveCharacterTextSplitter.from_tiktoken_encoder` to make sure splits are not larger than chunk size of tokens allowed by the language model, where each split will be recursively split if it has a larger size:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0262a991",
"metadata": {},
"outputs": [],
"source": [
"from langchain_text_splitters import RecursiveCharacterTextSplitter\n",
"\n",
"We can also load a tiktoken splitter directly, which ensure each split is smaller than chunk size."
"text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(\n",
" model_name=\"gpt-4\",\n",
" chunk_size=100,\n",
" chunk_overlap=0,\n",
")"
]
},
{
"cell_type": "markdown",
"id": "04457e3a",
"metadata": {},
"source": [
"We can also load a tiktoken splitter directly, which will ensure each split is smaller than chunk size."
]
},
{
@@ -111,6 +141,14 @@
"print(texts[0])"
]
},
{
"cell_type": "markdown",
"id": "3bc155d0",
"metadata": {},
"source": [
"Some written languages (e.g. Chinese and Japanese) have characters which encode to 2 or more tokens. Using the `TokenTextSplitter` directly can split the tokens for a character between two chunks causing malformed Unicode characters. Use `RecursiveCharacterTextSplitter.from_tiktoken_encoder` or `CharacterTextSplitter.from_tiktoken_encoder` to ensure chunks contain valid Unicode strings."
]
},
{
"cell_type": "markdown",
"id": "55f95f06",

View File

@@ -60,7 +60,7 @@
" * document addition by id (`add_documents` method with `ids` argument)\n",
" * delete by id (`delete` method with `ids` argument)\n",
"\n",
"Compatible Vectorstores: `AnalyticDB`, `AstraDB`, `AwaDB`, `Bagel`, `Cassandra`, `Chroma`, `DashVector`, `DatabricksVectorSearch`, `DeepLake`, `Dingo`, `ElasticVectorSearch`, `ElasticsearchStore`, `FAISS`, `HanaDB`, `Milvus`, `MyScale`, `OpenSearchVectorSearch`, `PGVector`, `Pinecone`, `Qdrant`, `Redis`, `Rockset`, `ScaNN`, `SupabaseVectorStore`, `SurrealDBStore`, `TimescaleVector`, `Vald`, `Vearch`, `VespaStore`, `Weaviate`, `ZepVectorStore`.\n",
"Compatible Vectorstores: `AnalyticDB`, `AstraDB`, `AwaDB`, `Bagel`, `Cassandra`, `Chroma`, `CouchbaseVectorStore`, `DashVector`, `DatabricksVectorSearch`, `DeepLake`, `Dingo`, `ElasticVectorSearch`, `ElasticsearchStore`, `FAISS`, `HanaDB`, `Milvus`, `MyScale`, `OpenSearchVectorSearch`, `PGVector`, `Pinecone`, `Qdrant`, `Redis`, `Rockset`, `ScaNN`, `SupabaseVectorStore`, `SurrealDBStore`, `TimescaleVector`, `Vald`, `Vearch`, `VespaStore`, `Weaviate`, `ZepVectorStore`.\n",
" \n",
"## Caution\n",
"\n",
@@ -85,7 +85,7 @@
},
{
"cell_type": "code",
"execution_count": 2,
"execution_count": 1,
"id": "15f7263e-c82e-4914-874f-9699ea4de93e",
"metadata": {},
"outputs": [],
@@ -192,7 +192,7 @@
},
{
"cell_type": "code",
"execution_count": 3,
"execution_count": 6,
"id": "67d2a5c8-f2bd-489a-b58e-2c7ba7fefe6f",
"metadata": {},
"outputs": [],
@@ -724,7 +724,7 @@
{
"data": {
"text/plain": [
"{'num_added': 0, 'num_updated': 0, 'num_skipped': 2, 'num_deleted': 2}"
"{'num_added': 2, 'num_updated': 0, 'num_skipped': 0, 'num_deleted': 2}"
]
},
"execution_count": 30,
@@ -751,7 +751,9 @@
{
"data": {
"text/plain": [
"[Document(page_content='tty kitty', metadata={'source': 'kitty.txt'}),\n",
"[Document(page_content='woof woof', metadata={'source': 'doggy.txt'}),\n",
" Document(page_content='woof woof woof', metadata={'source': 'doggy.txt'}),\n",
" Document(page_content='tty kitty', metadata={'source': 'kitty.txt'}),\n",
" Document(page_content='tty kitty ki', metadata={'source': 'kitty.txt'}),\n",
" Document(page_content='kitty kit', metadata={'source': 'kitty.txt'})]"
]
@@ -904,7 +906,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
"version": "3.9.12"
}
},
"nbformat": 4,

View File

@@ -22,10 +22,11 @@
"Caching embeddings can be done using a `CacheBackedEmbeddings`. The cache backed embedder is a wrapper around an embedder that caches\n",
"embeddings in a key-value store. The text is hashed and the hash is used as the key in the cache.\n",
"\n",
"The main supported way to initialized a `CacheBackedEmbeddings` is `from_bytes_store`. This takes in the following parameters:\n",
"The main supported way to initialize a `CacheBackedEmbeddings` is `from_bytes_store`. It takes the following parameters:\n",
"\n",
"- underlying_embedder: The embedder to use for embedding.\n",
"- document_embedding_cache: Any [`ByteStore`](/docs/integrations/stores/) for caching document embeddings.\n",
"- batch_size: (optional, defaults to `None`) The number of documents to embed between store updates.\n",
"- namespace: (optional, defaults to `\"\"`) The namespace to use for document cache. This namespace is used to avoid collisions with other caches. For example, set it to the name of the embedding model used.\n",
"\n",
"**Attention**: Be sure to set the `namespace` parameter to avoid collisions of the same text embedded using different embeddings models."

View File

@@ -24,7 +24,7 @@ they take a list of chat messages as input and they return an AI message as outp
These two API types have pretty different input and output schemas. This means that best way to interact with them may be quite different. Although LangChain makes it possible to treat them interchangeably, that doesn't mean you **should**. In particular, the prompting strategies for LLMs vs ChatModels may be quite different. This means that you will want to make sure the prompt you are using is designed for the model type you are working with.
Additionally, not all models are the same. Different models have different prompting strategies that work best for them. For example, Anthropic's models work best with XML while OpenAI's work best with JSON. This means that the prompt you use for one model may not transfer to other ones. LangChain provides a lot of default prompts, however these are not guaranteed to work well with the model are you using. Historically speaking, most prompts work well with OpenAI but are not heavily tested on other models. This is something we are working to address, but it is something you should keep in mind.
Additionally, not all models are the same. Different models have different prompting strategies that work best for them. For example, Anthropic's models work best with XML while OpenAI's work best with JSON. This means that the prompt you use for one model may not transfer to other ones. LangChain provides a lot of default prompts, however these are not guaranteed to work well with the model you are using. Historically speaking, most prompts work well with OpenAI but are not heavily tested on other models. This is something we are working to address, but it is something you should keep in mind.
## Messages
@@ -68,11 +68,11 @@ ChatModels and LLMs take different input types. PromptValue is a class designed
### PromptTemplate
This is an example of a prompt template. This consists of a template string. This string is then formatted with user inputs to produce a final string.
[This](/docs/modules/model_io/prompts/quick_start#prompttemplate) is an example of a prompt template. This consists of a template string. This string is then formatted with user inputs to produce a final string.
### MessagePromptTemplate
This is an example of a prompt template. This consists of a template **message** - meaning a specific role and a PromptTemplate. This PromptTemplate is then formatted with user inputs to produce a final string that becomes the `content` of this message.
[This](/docs/modules/model_io/prompts/message_prompts) is an example of a prompt template. This consists of a template **message** - meaning a specific role and a PromptTemplate. This PromptTemplate is then formatted with user inputs to produce a final string that becomes the `content` of this message.
#### HumanMessagePromptTemplate
@@ -92,7 +92,7 @@ Oftentimes inputs to prompts can be a list of messages. This is when you would u
### ChatPromptTemplate
This is an example of a prompt template. This consists of a list of MessagePromptTemplates or MessagePlaceholders. These are then formatted with user inputs to produce a final list of messages.
[This](/docs/modules/model_io/prompts/quick_start#chatprompttemplate) is an example of a prompt template. This consists of a list of MessagePromptTemplates or MessagePlaceholders. These are then formatted with user inputs to produce a final list of messages.
## Output Parsers

View File

@@ -253,7 +253,7 @@ In
Detailed documentation on how to use `DocumentLoaders`.
- [Integrations](../../../docs/integrations/document_loaders/): 160+
integrations to choose from.
- [Interface](https://api.python.langchain.com/en/latest/document_loaders/langchain_community.document_loaders.base.BaseLoader.html):
- [Interface](https://api.python.langchain.com/en/latest/document_loaders/langchain_core.document_loaders.base.BaseLoader.html):
API reference  for the base interface.
## 2. Indexing: Split {#indexing-split}
@@ -324,7 +324,7 @@ split in the original `Document`: - [Markdown
files](../../../docs/modules/data_connection/document_transformers/markdown_header_metadata)
- [Code (py or js)](../../../docs/integrations/document_loaders/source_code)
- [Scientific papers](../../../docs/integrations/document_loaders/grobid)
- [Interface](https://api.python.langchain.com/en/latest/text_splitter/langchain_text_splitters.TextSplitter.html): API reference for the base interface.
- [Interface](https://api.python.langchain.com/en/latest/base/langchain_text_splitters.base.TextSplitter.html): API reference for the base interface.
`DocumentTransformer`: Object that performs a transformation on a list
of `Document`s.

View File

@@ -82,8 +82,6 @@ const config = {
({
docs: {
sidebarPath: require.resolve("./sidebars.js"),
lastVersion: "current",
versions: {current: {label: "0.2.x", path: "0.2.x"}},
remarkPlugins: [
[require("@docusaurus/remark-plugin-npm2yarn"), { sync: true }],
],
@@ -219,12 +217,6 @@ const config = {
},
]
},
{
type: 'docsVersionDropdown',
position: 'left',
dropdownItemsAfter: [{to: '/versions', label: 'All versions'}],
dropdownActiveClassDisabled: true,
},
{
type: "dropdown",
label: "🦜️🔗",

View File

@@ -18,7 +18,8 @@
"format": "prettier --write \"**/*.{js,jsx,ts,tsx,md,mdx}\"",
"format:check": "prettier --check \"**/*.{js,jsx,ts,tsx,md,mdx}\"",
"gen": "yarn gen:supabase",
"gen:supabase": "npx supabase gen types typescript --project-id 'xsqpnijvmbodcxyapnyq' --schema public > ./src/supabase.d.ts"
"gen:supabase": "npx supabase gen types typescript --project-id 'xsqpnijvmbodcxyapnyq' --schema public > ./src/supabase.d.ts",
"check-broken-links": "bash vercel_build.sh && node ./scripts/check-broken-links.js"
},
"dependencies": {
"@docusaurus/core": "2.4.3",
@@ -38,6 +39,7 @@
},
"devDependencies": {
"@babel/eslint-parser": "^7.18.2",
"@langchain/scripts": "^0.0.10",
"docusaurus-plugin-typedoc": "next",
"dotenv": "^16.4.5",
"eslint": "^8.19.0",

View File

@@ -0,0 +1,7 @@
// Sorry py folks, gotta be js for this one
const { checkBrokenLinks } = require("@langchain/scripts/check_broken_links");
checkBrokenLinks("docs", {
timeout: 10000,
retryFailed: true,
});

View File

@@ -0,0 +1,103 @@
/* eslint-disable react/jsx-props-no-spreading */
import React from "react";
import Tabs from "@theme/Tabs";
import TabItem from "@theme/TabItem";
import CodeBlock from "@theme-original/CodeBlock";
function Setup({ apiKeyName, packageName }) {
const apiKeyText = `import getpass
import os
os.environ["${apiKeyName}"] = getpass.getpass()`;
return (
<>
<h5>Install dependencies</h5>
<CodeBlock language="bash">{`pip install -qU ${packageName}`}</CodeBlock>
<h5>Set environment variables</h5>
<CodeBlock language="python">{apiKeyText}</CodeBlock>
</>
);
}
/**
* @param {{ openaiParams?: string, anthropicParams?: string, fireworksParams?: string, mistralParams?: string, googleParams?: string, hideOpenai?: boolean, hideAnthropic?: boolean, hideFireworks?: boolean, hideMistral?: boolean, hideGoogle?: boolean }} props
*/
export default function ChatModelTabs(props) {
const {
openaiParams,
anthropicParams,
fireworksParams,
mistralParams,
googleParams,
hideOpenai,
hideAnthropic,
hideFireworks,
hideMistral,
hideGoogle,
} = props;
const openAIParamsOrDefault = openaiParams ?? `model="gpt-3.5-turbo-0125"`
const anthropicParamsOrDefault = anthropicParams ?? `model="claude-3-sonnet-20240229"`
const fireworksParamsOrDefault = fireworksParams ?? `model="accounts/fireworks/models/mixtral-8x7b-instruct"`
const mistralParamsOrDefault = mistralParams ?? `model="mistral-large-latest"`
const googleParamsOrDefault = googleParams ?? `model="gemini-pro"`
const tabItems = [
{
value: "OpenAI",
label: "OpenAI",
text: `from langchain_openai import ChatOpenAI\n\nmodel = ChatOpenAI(${openAIParamsOrDefault})`,
apiKeyName: "OPENAI_API_KEY",
packageName: "langchain-openai",
default: true,
shouldHide: hideOpenai,
},
{
value: "Anthropic",
label: "Anthropic",
text: `from langchain_anthropic import ChatAnthropic\n\nmodel = ChatAnthropic(${anthropicParamsOrDefault})`,
apiKeyName: "ANTHROPIC_API_KEY",
packageName: "langchain-anthropic",
default: false,
shouldHide: hideAnthropic,
},
{
value: "FireworksAI",
label: "FireworksAI",
text: `from langchain_fireworks import ChatFireworks\n\nmodel = ChatFireworks(${fireworksParamsOrDefault})`,
apiKeyName: "FIREWORKS_API_KEY",
packageName: "langchain-fireworks",
default: false,
shouldHide: hideFireworks,
},
{
value: "MistralAI",
label: "MistralAI",
text: `from langchain_mistralai import ChatMistralAI\n\nmodel = ChatMistralAI(${mistralParamsOrDefault})`,
apiKeyName: "MISTRAL_API_KEY",
packageName: "langchain-mistralai",
default: false,
shouldHide: hideMistral,
},
{
value: "Google",
label: "Google",
text: `from langchain_google_genai import ChatGoogleGenerativeAI\n\nmodel = ChatGoogleGenerativeAI(${googleParamsOrDefault})`,
apiKeyName: "GOOGLE_API_KEY",
packageName: "langchain-google-genai",
default: false,
shouldHide: hideGoogle,
}
]
return (
<Tabs groupId="modelTabs">
{tabItems.filter((tabItem) => !tabItem.shouldHide).map((tabItem) => (
<TabItem value={tabItem.value} label={tabItem.label} default={tabItem.default}>
<Setup apiKeyName={tabItem.apiKeyName} packageName={tabItem.packageName} />
<CodeBlock language="python">{tabItem.text}</CodeBlock>
</TabItem>
))}
</Tabs>
);
}

Binary file not shown.

Before

Width:  |  Height:  |  Size: 147 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 56 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 148 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 193 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 64 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 42 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 190 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 121 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 168 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 52 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 74 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 166 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 42 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 150 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 167 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 98 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 117 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 16 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 777 B

Binary file not shown.

Before

Width:  |  Height:  |  Size: 192 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 20 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 22 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 93 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 102 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 84 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 54 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 78 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 116 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 54 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 164 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 125 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 325 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 131 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 432 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 336 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 542 B

Binary file not shown.

Before

Width:  |  Height:  |  Size: 1.2 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 15 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 13 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 67 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 59 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 74 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 47 KiB

Some files were not shown because too many files have changed in this diff Show More