Thank you for contributing to LangChain!
bilibili-api-python use https://github.com/Nemo2011/bilibili-api repo.
Change to the correct address.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
**Description:** Update module imports for Fireworks documentation
**Issue:** Module imports not present or in incorrect location
**Dependencies:** None
**Description:** Update import paths and move to lcel for llama.cpp
examples
**Issue:** Update import paths to reflect package refactoring and move
chains to LCEL in examples
**Dependencies:** None
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
**Description:** Moving FireworksEmbeddings documentation to the
location docs/integration/text_embedding/ from langchain_fireworks/docs/
**Issue:** FireworksEmbeddings documentation was not in the correct
location
**Dependencies:** None
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
**Description:**
Currently, `CacheBackedEmbeddings` computes vectors for *all* uncached
documents before updating the store. This pull request updates the
embedding computation loop to compute embeddings in batches, updating
the store after each batch.
I noticed this when I tried `CacheBackedEmbeddings` on our 30k document
set and the cache directory hadn't appeared on disk after 30 minutes.
The motivation is to minimize compute/data loss when problems occur:
* If there is a transient embedding failure (e.g. a network outage at
the embedding endpoint triggers an exception), at least the completed
vectors are written to the store instead of being discarded.
* If there is an issue with the store (e.g. no write permissions), the
condition is detected early without computing (and discarding!) all the
vectors.
**Issue:**
Implements enhancement #18026.
**Testing:**
I was unable to run unit tests; details in [this
post](https://github.com/langchain-ai/langchain/discussions/15019#discussioncomment-8576684).
---------
Signed-off-by: chrispy <chrispy@synopsys.com>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
The `retryFailed` option will retry all failed links, once at a time
with the goal of not triggering bot protection
`microsoft.com` is now hard coded into the whitelist
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- **PR message**:
- **Description:** Update the slack toolkit doc to use an agent that
support multiple inputs. Using ReAct agent will cause a ValidationError
when invoking the slack tools. This is because the agent return a string
like `'{"channel": "C05LDF54S21", "message": "Hello, world!"}'` but the
ReAct agent does not support multiple inputs.
- **Issue:** This is related to this
[Discussion#18083](https://github.com/langchain-ai/langchain/discussions/18083)
- **Dependencies:** No dependencies required
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
**Description**:
this PR enable VectorStore autoconfiguration for Infinispan: if
metadatas are only of basic types, protobuf
config will be automatically generated for the user.
**Description**
This PR adds some missing details from the "Split by tokens" page in the
documentation. Specifically:
- The `.from_tiktoken_encoder()` class methods for both the
`CharacterTextSplitter` and `RecursiveCharacterTextSplitter` default to
the old `gpt-2` encoding. I've added a comment to suggest specifying
`model_name` or `encoding`
- The docs didn't mention that the `from_tiktoken_encoder()` class
method passes additional kwargs down to the constructor of the splitter.
I only discovered this by reading the source code
- Added an example of using the `.from_tiktoken_encoder()` class method
with `RecursiveCharacterTextSplitter` which is the recommended approach
for most scenarios above `CharacterTextSplitter`
- Added a warning that `TokenTextSplitter` can split characters which
have multiple tokens (e.g. 猫 has 3 cl100k_base tokens) between multiple
chunks which creates malformed Unicode strings and should not be used in
these situations.
Side note: I think the default argument of `gpt2` for
`.from_tiktoken_encoder()` should be updated?
**Twitter handle** anthonypjshaw
---------
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
**Description:** Fixes the import paths for the `FlashrankRerank`
example notebook.
**Issue:** #19139
**Dependencies:** None
**Twitter handle:** n/a
---------
Co-authored-by: Simon Stone <simon.stone@dartmouth.edu>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Thank you for contributing to LangChain!
- [x] **PR title**: "Updating format of pip install in two files of
docs/cookbook"
- pip install is not reflecting properly in some of the files in
cookbook
- Example:
[docs/expression_language/cookbook/sql_db](https://python.langchain.com/docs/expression_language/cookbook/sql_db)
- [x] **PR message**: Updating format of pip install in two files of
docs/cookbook
- **Description:** a description of the change
- **Issue:** #19197
- Note - let's do squash merge for the PR
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
Thank you for contributing to LangChain!
- [x] **PR title**: "community: deprecate DocugamiLoader"
- [x] **PR message**: Deprecate the langchain_community and use the
docugami_langchain DocugamiLoader
---------
Co-authored-by: Kenzie Mihardja <kenzie28@cs.washington.edu>
I think that cell type for pip command may be 'code'.
Please check, thank you :)
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
Line `from langchain_openai import ChatOpenAI` is put twice in Get
Started / Serving with LangServe section.
Imports on lines 559 and 566 are identical
Co-authored-by: Vitalii <vitalii@localhost>
**Description:** Update stales link in Together AI documentation
**Issue:** Some links pointed to legacy webpages on the Together AI
website
**Dependencies:** None
**Lint and test**: `make format`, `make lint` were run
- [ ] **PR title**: "docs: correction in
"https://github.com/langchain-ai/langchain/blob/master/docs/docs/get_started/quickstart.mdx",
line 289".
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**:
- Corrected the spelling mistake
- #18981
Fixed Grammar in Considerations of Model I/O Concepts documentation page
- Update concepts.mdx
Page Link:
https://python.langchain.com/docs/modules/model_io/concepts#considerations
- **Description:** Fixed Grammar in Considerations of Model I/O
Documentation Page
- **Issue:** "to work well with the model are you using" # "to work well
with the model you are using"
- **Dependencies:** None
- **Twitter handle:** @Anubhav_Madhav
(https://twitter.com/Anubhav_Madhav)
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
## Description
This PR addresses a documentation issue in the
[Indexing](https://python.langchain.com/docs/modules/data_connection/indexing)
page. Specifically, it corrects the execution results of the Jupyter
notebook under the
[Source](https://python.langchain.com/docs/modules/data_connection/indexing#source)
section, which were broken as detailed below.
## Problem
The execution results following the statement, `This should delete the
old versions of documents associated with doggy.txt source and replace
them with the new versions.`, appear to be incorrect, as described
below.
### Current Behavior
- For some reason, the `index` function fails to add the new content of
`doggy.txt`. Although it deletes the document objects associated with
the `doggy.txt` source, it does not add the objects in
`changed_doggy_docs`. Consequently, the execution result displays
`num_added: 0`.
- This unexpected behavior also impacts the results of
`vectorstore.similarity_search("dog", k=30)`, showing only the contents
of `kitty.txt`. It appears as though the contents of `doggy.txt` have
been completely removed from the index:
```
Document(page_content='tty kitty', metadata={'source': 'kitty.txt'}),
Document(page_content='tty kitty ki', metadata={'source': 'kitty.txt'}),
Document(page_content='kitty kit', metadata={'source': 'kitty.txt'})]
```
### Expected Behavior
- The `index` function should successfully add the objects in
`changed_doggy_docs` after removing the old content of `doggy.txt`. The
anticipated execution result is `num_added: 2`.
- Subsequently, the modified content of `doggy.txt` should appear in the
results of `vectorstore.similarity_search("dog", k=30)` as follows:
```
[Document(page_content='woof woof', metadata={'source': 'doggy.txt'}),
Document(page_content='woof woof woof', metadata={'source': 'doggy.txt'}),
Document(page_content='tty kitty', metadata={'source': 'kitty.txt'}),
Document(page_content='tty kitty ki', metadata={'source': 'kitty.txt'}),
Document(page_content='kitty kit', metadata={'source': 'kitty.txt'})]
```
## Fix
I reran `docs/docs/modules/data_connection/indexing.ipynb` and have
included the diff in this PR.
Docs fix: replace column name search with source.
The Xata integration expects metadata column named "source".
The docs suggest the name "search", which if used, yields the following
error:
```
File "/usr/local/lib/python3.11/site-packages/langchain_community/vectorstores/xata.py", line 95, in _add_vectors
raise Exception(f"Error adding vectors to Xata: {r.status_code} {r}")
Exception: Error adding vectors to Xata: 400 {'errors': [{'status': 400, 'message': 'invalid record: column [source]: column not found'}]}
```
poetry can't reliably handle resolving the number of optional "extended
test" dependencies we have. If we instead just rely on pip to install
extended test deps in CI, this isn't an issue.