Looking at tokens / page of our docs, we see a few outliers:
<img width="761" alt="image"
src="https://github.com/langchain-ai/langchain/assets/122662504/677aa2d6-0a29-45e4-882a-db2bbf46d02b">
It is due to non-rendering images in one case, and output spamming.
Clean these, along with other cases of excessing output spamming in
docs.
All get sucked into chat-langchain for retrieval.
Thank you for contributing to LangChain!
bilibili-api-python use https://github.com/Nemo2011/bilibili-api repo.
Change to the correct address.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
Thank you for contributing to LangChain!
- [x] **PR title**: "community: deprecate DocugamiLoader"
- [x] **PR message**: Deprecate the langchain_community and use the
docugami_langchain DocugamiLoader
---------
Co-authored-by: Kenzie Mihardja <kenzie28@cs.washington.edu>
- **Description:** Adding an optional parameter `linearization_config`
to the `AmazonTextractPDFLoader` so the caller can define how the output
will be linearized, instead of forcing a predefined set of linearization
configs. It will still have a default configuration as this will be an
optional parameter.
- **Issue:** #17457
- **Dependencies:** The same ones that already exist for
`AmazonTextractPDFLoader`
- **Twitter handle:** [@lvieirajr19](https://twitter.com/lvieirajr19)
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
This pull request introduces initial support for the TiDB vector store.
The current version is basic, laying the foundation for the vector store
integration. While this implementation provides the essential features,
we plan to expand and improve the TiDB vector store support with
additional enhancements in future updates.
Upcoming Enhancements:
* Support for Vector Index Creation: To enhance the efficiency and
performance of the vector store.
* Support for max marginal relevance search.
* Customized Table Structure Support: Recognizing the need for
flexibility, we plan for more tailored and efficient data store
solutions.
Simple use case exmaple
```python
from typing import List, Tuple
from langchain.docstore.document import Document
from langchain_community.vectorstores import TiDBVectorStore
from langchain_openai import OpenAIEmbeddings
db = TiDBVectorStore.from_texts(
embedding=embeddings,
texts=['Andrew like eating oranges', 'Alexandra is from England', 'Ketanji Brown Jackson is a judge'],
table_name="tidb_vector_langchain",
connection_string=tidb_connection_url,
distance_strategy="cosine",
)
query = "Can you tell me about Alexandra?"
docs_with_score: List[Tuple[Document, float]] = db.similarity_search_with_score(query)
for doc, score in docs_with_score:
print("-" * 80)
print("Score: ", score)
print(doc.page_content)
print("-" * 80)
```
**Description:**
modified the user_name to username to conform with the expected inputs
to TelegramChatApiLoader
**Issue:**
Current code fails in langchain-community 0.0.24
<loader = TelegramChatApiLoader(
chat_entity="<CHAT_URL>", # recommended to use Entity here
api_hash="<API HASH >",
api_id="<API_ID>",
user_name="", # needed only for caching the session.
)>
In this commit we update the documentation for Google El Carro for Oracle Workloads. We amend the documentation in the Google Providers page to use the correct name which is El Carro for Oracle Workloads. We also add changes to the document_loaders and memory pages to reflect changes we made in our repo.
- **Description:** A generic document loader adapter for SQLAlchemy on
top of LangChain's `SQLDatabaseLoader`.
- **Needed by:** https://github.com/crate-workbench/langchain/pull/1
- **Depends on:** GH-16655
- **Addressed to:** @baskaryan, @cbornet, @eyurtsev
Hi from CrateDB again,
in the same spirit like GH-16243 and GH-16244, this patch breaks out
another commit from https://github.com/crate-workbench/langchain/pull/1,
in order to reduce the size of this patch before submitting it, and to
separate concerns.
To accompany the SQLAlchemy adapter implementation, the patch includes
integration tests for both SQLite and PostgreSQL. Let me know if
corresponding utility resources should be added at different spots.
With kind regards,
Andreas.
### Software Tests
```console
docker compose --file libs/community/tests/integration_tests/document_loaders/docker-compose/postgresql.yml up
```
```console
cd libs/community
pip install psycopg2-binary
pytest -vvv tests/integration_tests -k sqldatabase
```
```
14 passed
```

---------
Co-authored-by: Andreas Motl <andreas.motl@crate.io>
**Description:** This PR introduces a new "Astra DB" Partner Package.
So far only the vector store class is _duplicated_ there, all others
following once this is validated and established.
Along with the move to separate package, incidentally, the class name
will change `AstraDB` => `AstraDBVectorStore`.
The strategy has been to duplicate the module (with prospected removal
from community at LangChain 0.2). Until then, the code will be kept in
sync with minimal, known differences (there is a makefile target to
automate drift control. Out of convenience with this check, the
community package has a class `AstraDBVectorStore` aliased to `AstraDB`
at the end of the module).
With this PR several bugfixes and improvement come to the vector store,
as well as a reshuffling of the doc pages/notebooks (Astra and
Cassandra) to align with the move to a separate package.
**Dependencies:** A brand new pyproject.toml in the new package, no
changes otherwise.
**Twitter handle:** `@rsprrs`
---------
Co-authored-by: Christophe Bornet <cbornet@hotmail.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
## Description
I am submitting this for a school project as part of a team of 5. Other
team members are @LeilaChr, @maazh10, @Megabear137, @jelalalamy. This PR
also has contributions from community members @Harrolee and @Mario928.
Initial context is in the issue we opened (#11229).
This pull request adds:
- Generic framework for expanding the languages that `LanguageParser`
can handle, using the
[tree-sitter](https://github.com/tree-sitter/py-tree-sitter#py-tree-sitter)
parsing library and existing language-specific parsers written for it
- Support for the following additional languages in `LanguageParser`:
- C
- C++
- C#
- Go
- Java (contributed by @Mario928
https://github.com/ThatsJustCheesy/langchain/pull/2)
- Kotlin
- Lua
- Perl
- Ruby
- Rust
- Scala
- TypeScript (contributed by @Harrolee
https://github.com/ThatsJustCheesy/langchain/pull/1)
Here is the [design
document](https://docs.google.com/document/d/17dB14cKCWAaiTeSeBtxHpoVPGKrsPye8W0o_WClz2kk)
if curious, but no need to read it.
## Issues
- Closes#11229
- Closes#10996
- Closes#8405
## Dependencies
`tree_sitter` and `tree_sitter_languages` on PyPI. We have tried to add
these as optional dependencies.
## Documentation
We have updated the list of supported languages, and also added a
section to `source_code.ipynb` detailing how to add support for
additional languages using our framework.
## Maintainer
- @hwchase17 (previously reviewed
https://github.com/langchain-ai/langchain/pull/6486)
Thanks!!
## Git commits
We will gladly squash any/all of our commits (esp merge commits) if
necessary. Let us know if this is desirable, or if you will be
squash-merging anyway.
<!-- Thank you for contributing to LangChain!
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes (if applicable),
- **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
---------
Co-authored-by: Maaz Hashmi <mhashmi373@gmail.com>
Co-authored-by: LeilaChr <87657694+LeilaChr@users.noreply.github.com>
Co-authored-by: Jeremy La <jeremylai511@gmail.com>
Co-authored-by: Megabear137 <zubair.alnoor27@gmail.com>
Co-authored-by: Lee Harrold <lhharrold@sep.com>
Co-authored-by: Mario928 <88029051+Mario928@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
- **Description:** Pebblo opensource project enables developers to
safely load data to their Gen AI apps. It identifies semantic topics and
entities found in the loaded data and summarizes them in a
developer-friendly report.
- **Dependencies:** none
- **Twitter handle:** srics
@hwchase17
- **Description:** Adds the document loader for [AWS
Athena](https://aws.amazon.com/athena/), a serverless and interactive
analytics service.
- **Dependencies:** Added boto3 as a dependency
### Description
support load any github file content based on file extension.
Why not use [git
loader](https://python.langchain.com/docs/integrations/document_loaders/git#load-existing-repository-from-disk)
?
git loader clones the whole repo even only interested part of files,
that's too heavy. This GithubFileLoader only downloads that you are
interested files.
### Twitter handle
my twitter: @shufanhaotop
---------
Co-authored-by: Hao Fan <h_fan@apple.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
- **Description:** Updated `_get_elements()` function of
`UnstructuredFileLoader `class to check if the argument self.file_path
is a file or list of files. If it is a list of files then it iterates
over the list of file paths, calls the partition function for each one,
and appends the results to the elements list. If self.file_path is not a
list, it calls the partition function as before.
- **Issue:** Fixed#15607,
- **Dependencies:** NA
- **Twitter handle:** NA
Co-authored-by: H161961 <Raunak.Raunak@Honeywell.com>
**Description** : New documents loader for visio files (with extension
.vsdx)
A [visio file](https://fr.wikipedia.org/wiki/Microsoft_Visio) (with
extension .vsdx) is associated with Microsoft Visio, a diagram creation
software. It stores information about the structure, layout, and
graphical elements of a diagram. This format facilitates the creation
and sharing of visualizations in areas such as business, engineering,
and computer science.
A Visio file can contain multiple pages. Some of them may serve as the
background for others, and this can occur across multiple layers. This
loader extracts the textual content from each page and its associated
pages, enabling the extraction of all visible text from each page,
similar to what an OCR algorithm would do.
**Dependencies** : xmltodict package
<!-- Thank you for contributing to LangChain!
Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes if applicable,
- **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
See preview :
https://langchain-git-fork-cbornet-astra-loader-doc-langchain.vercel.app/docs/integrations/document_loaders/astradb
Updates docs and cookbooks to import ChatOpenAI, OpenAI, and OpenAI
Embeddings from `langchain_openai`
There are likely more
---------
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
…tch]: import models from community
ran
```bash
git grep -l 'from langchain\.chat_models' | xargs -L 1 sed -i '' "s/from\ langchain\.chat_models/from\ langchain_community.chat_models/g"
git grep -l 'from langchain\.llms' | xargs -L 1 sed -i '' "s/from\ langchain\.llms/from\ langchain_community.llms/g"
git grep -l 'from langchain\.embeddings' | xargs -L 1 sed -i '' "s/from\ langchain\.embeddings/from\ langchain_community.embeddings/g"
git checkout master libs/langchain/tests/unit_tests/llms
git checkout master libs/langchain/tests/unit_tests/chat_models
git checkout master libs/langchain/tests/unit_tests/embeddings/test_imports.py
make format
cd libs/langchain; make format
cd ../experimental; make format
cd ../core; make format
```
`integrations/document_loaders/` `Excel` and `OneNote` pages in the
navbar were in the wrong sort order. It is because the file names are
not equal to the page titles.
- renamed `excel` and `onenote` file names
add_video_info should be false in the first example
<!-- Thank you for contributing to LangChain!
Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes if applicable,
- **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
- Description: Just a minor add to the documentation to clarify how to
load all files from a folder. I assumed and try to do it specifying it
in the bucket (BUCKET/FOLDER), instead of using the prefix.
- updated `Tencent` provider page: added a chat model and document
loader references; company description
- updated Chat model and Document loader pages with descriptions, links
- renamed files to consistent formats; redirected file names
Note:
I was getting this linting error on code that **was not changed in my
PR**!
> Error:
docs/docs/guides/safety/hugging_face_prompt_injection.ipynb:1:1: I001
Import block is un-sorted or un-formatted
> make: *** [Makefile:47: lint_package] Error 1
I've fixed this error in the notebook