Commit Graph

244 Commits

Author SHA1 Message Date
Leonid Ganeline
c75c0775e1
docs supabase update (#4935)
# docs: updated `Supabase` notebook

- the title of the notebook was inconsistent (included redundant
"Vectorstore"). Removed this "Vectorstore"
- added `Postgress` to the title. It is important. The `Postgres` name
is much more popular than `Supabase`.
- added description for the `Postrgress`
- added more info to the `Supabase` description
2023-05-18 10:42:08 -07:00
Yuekai Zhang
1ed4228822
Fix bilibili (#4860)
# Fix bilibili api import error

bilibili-api package is depracated and there is no sync module.

<!--
Thank you for contributing to LangChain! Your PR will appear in our next
release under the title you set. Please make sure it highlights your
valuable contribution.

Replace this with a description of the change, the issue it fixes (if
applicable), and relevant context. List any dependencies required for
this change.

After you're done, someone will review your PR. They may suggest
improvements. If no one reviews your PR within a few days, feel free to
@-mention the same people again, as notifications can get lost.
-->

<!-- Remove if not applicable -->

Fixes #2673 #2724 

## Before submitting

<!-- If you're adding a new integration, include an integration test and
an example notebook showing its use! -->

## Who can review?

Community members can review the PR once tests pass. Tag
maintainers/contributors who might be interested:
@vowelparrot  @liaokongVFX 

<!-- For a quicker response, figure out the right person to tag with @

        @hwchase17 - project lead

        Tracing / Callbacks
        - @agola11

        Async
        - @agola11

        DataLoaders
        - @eyurtsev

        Models
        - @hwchase17
        - @agola11

        Agents / Tools / Toolkits
        - @vowelparrot
        
        VectorStores / Retrievers / Memory
        - @dev2049
        
 -->
2023-05-18 09:56:51 -04:00
Eugene Yurtsev
e46202829f
feat #4479: TextLoader auto detect encoding and improved exceptions (#4927)
# TextLoader auto detect encoding and enhanced exception handling

- Add an option to enable encoding detection on `TextLoader`. 
- The detection is done using `chardet`
- The loading is done by trying all detected encodings by order of
confidence or raise an exception otherwise.

### New Dependencies:
- `chardet`

Fixes #4479 

## Before submitting

<!-- If you're adding a new integration, include an integration test and
an example notebook showing its use! -->

## Who can review?

Community members can review the PR once tests pass. Tag
maintainers/contributors who might be interested:

- @eyurtsev

---------

Co-authored-by: blob42 <spike@w530>
2023-05-18 09:55:14 -04:00
Eugene Yurtsev
c06a47a691
Load specific file types from Google Drive (issue #4878) (#4926)
# Load specific file types from Google Drive (issue #4878)
Add the possibility to define what file types you want to load from
Google Drive.
 
```
 loader = GoogleDriveLoader(
    folder_id="1yucgL9WGgWZdM1TOuKkeghlPizuzMYb5",
    file_types=["document", "pdf"]
    recursive=False
)
```

Fixes ##4878

## Who can review?
Community members can review the PR once tests pass. Tag
maintainers/contributors who might be interested:
DataLoaders
- @eyurtsev

Twitter: [@UmerHAdil](https://twitter.com/@UmerHAdil) | Discord:
RicChilligerDude#7589

---------

Co-authored-by: UmerHA <40663591+UmerHA@users.noreply.github.com>
2023-05-18 09:27:53 -04:00
Leonid Ganeline
c998569c8f
docs: text splitters improvements (#4490)
#docs: text splitters improvements

Changes are only in the Jupyter notebooks.
- added links to the source packages and a short description of these
packages
- removed " Text Splitters" suffixes from the TOC elements (they made
the list of the text splitters messy)
- moved text splitters, based on the length function into a separate
list. They can be mixed with any classes from the "Text Splitters", so
it is a different classification.

## Who can review?
        @hwchase17 - project lead
        @eyurtsev
        @vowelparrot

NOTE: please, check out the results of the `Python code` text splitter
example (text_splitters/examples/python.ipynb). It looks suboptimal.
2023-05-17 21:33:34 -07:00
Davis Chase
df0c33a005
Faiss no avx2 (#4895)
Co-authored-by: Ali Mirlou <alimirlou@gmail.com>
2023-05-17 19:18:57 -07:00
Leonid Ganeline
b96ab4b763
docs retriever improvements (#4430)
# Docs: improvements in the `retrievers/examples/` notebooks

Its primary purpose is to make the Jupyter notebook examples
**consistent** and more suitable for first-time viewers.
- add links to the integration source (if applicable) with a short
description of this source;
- removed `_retriever` suffix from the file names (where it existed) for
consistency;
- removed ` retriever` from the notebook title (where it existed) for
consistency;
- added code to install necessary Python package(s);
- added code to set up the necessary API Key.
- very small fixes in notebooks from other folders (for consistency):
  - docs/modules/indexes/vectorstores/examples/elasticsearch.ipynb
  - docs/modules/indexes/vectorstores/examples/pinecone.ipynb
  - docs/modules/models/llms/integrations/cohere.ipynb
- fixed misspelling in langchain/retrievers/time_weighted_retriever.py
comment (sorry, about this change in a .py file )

## Who can review
@dev2049
2023-05-17 15:29:22 -07:00
Justin Levi Winter
0147f845f1
Update getting_started.ipynb (#4850)
minor grammer issue
2023-05-17 13:19:14 -07:00
UmerHA
e257380deb
Typos (#4851)
# Fixed typos (issues #4818 & #4668 & more typos)
- At some places, it said `model = ChatOpenAI(model='gpt-3.5-turbo')`
but should be `model = ChatOpenAI(model_name='gpt-3.5-turbo')`
- Fixes some other typos

Fixes #4818, #4668

## Who can review?
Community members can review the PR once tests pass. Tag
maintainers/contributors who might be interested:
        Models
        - @hwchase17
        - @agola11
        Agents / Tools / Toolkits
        - @vowelparrot
2023-05-17 11:52:22 -04:00
Harrison Chase
720ac49f42
2markdown loader (#4796)
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2023-05-16 23:42:53 -07:00
Brendan Mannix
4e56d3119c
update qdrant docs to reflect the proper way to initialize Qdrant() constructor (#4596)
# update qdrant docs to reflect the proper way to initialize Qdrant()
constructor

The [Qdrant
docs](https://python.langchain.com/en/latest/modules/indexes/vectorstores/examples/qdrant.html)
still contain an old reference for passing an `embedding_function` into
the constructor. This is no longer supported.

This PR updates the docs to reflect the proper way to initialize
`Qdrant()`

Old:
![Screenshot 2023-05-12 at 3 06 33
PM](https://github.com/hwchase17/langchain/assets/1552962/dd4063d2-2a07-4340-91bb-e305f7215ddd)

New:
![Screenshot 2023-05-12 at 3 21 09
PM](https://github.com/hwchase17/langchain/assets/1552962/aebc3f63-1a8b-4ca3-93c0-a2ce30dcd282)
2023-05-16 17:30:38 -07:00
Raduan Al-Shedivat
00c6ec8a2d
fix(document_loaders/telegram): fix pandas calls + add tests (#4806)
# Fix Telegram API loader + add tests.
I was testing this integration and it was broken with next error:
```python
message_threads = loader._get_message_threads(df)
KeyError: False
```
Also, this particular loader didn't have any tests / related group in
poetry, so I added those as well.

@hwchase17 / @eyurtsev please take a look on this fix PR.

---------

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-16 14:35:25 -07:00
了空
f7e3d97b19
Remove unnecessary spaces from document object’s page_content of BiliBiliLoader (#4619)
- Remove unnecessary spaces from document object’s page_content of
BiliBiliLoader
- Fix BiliBiliLoader document and test file
2023-05-16 13:13:57 -04:00
Eugene Yurtsev
f47ec5b4b6
Docugami docs: First cell should be a title cell (#4735)
# Make first cell a title in docugami docs

This makes the first cell a title cell in docugami notebook
2023-05-16 13:12:14 -04:00
shiyu22
21b9397342
Update the milvus example (#4706)
# Fix issue when running example

- add the query content
- update the `user` parameter with Zilliz

Signed-off-by: shiyu22 <shiyu.chen@zilliz.com>
2023-05-15 16:16:57 -07:00
Harrison Chase
dd95f0892d
Harrison/add top k (#4707)
Co-authored-by: blc16 <benlc@umich.edu>
2023-05-15 09:09:22 -07:00
Eugene Yurtsev
3c490b5ba3
Docugami DataLoader (#4727)
### Adds a document loader for Docugami

Specifically:

1. Adds a data loader that talks to the [Docugami](http://docugami.com)
API to download processed documents as semantic XML
2. Parses the semantic XML into chunks, with additional metadata
capturing chunk semantics
3. Adds a detailed notebook showing how you can use additional metadata
returned by Docugami for techniques like the [self-querying
retriever](https://python.langchain.com/en/latest/modules/indexes/retrievers/examples/self_query_retriever.html)
4. Adds an integration test, and related documentation

Here is an example of a result that is not possible without the
capabilities added by Docugami (from the notebook):

<img width="1585" alt="image"
src="https://github.com/hwchase17/langchain/assets/749277/bb6c1ce3-13dc-4349-a53b-de16681fdd5b">

---------

Co-authored-by: Taqi Jaffri <tjaffri@docugami.com>
Co-authored-by: Taqi Jaffri <tjaffri@gmail.com>
2023-05-15 10:53:00 -04:00
Lester Yang
cd3f9865f3
Feature: pdfplumber PDF loader with BaseBlobParser (#4552)
# Feature: pdfplumber PDF loader with BaseBlobParser

* Adds pdfplumber as a PDF loader
* Adds pdfplumber as a blob parser.
2023-05-15 09:47:02 -04:00
Harrison Chase
b6e3ac17c4
Harrison/sitemap local (#4704)
Co-authored-by: Lukas Bauer <lukas.bauer@mayflower.de>
2023-05-14 22:04:38 -07:00
Harrison Chase
12b4ee1fc7
Harrison/telegram chat loader (#4698)
Co-authored-by: Akinwande Komolafe <47945512+Sensei-akin@users.noreply.github.com>
Co-authored-by: Akinwande Komolafe <akhinoz@gmail.com>
2023-05-14 22:04:27 -07:00
Harrison Chase
6f47ab17a4
Harrison/param notion db (#4689)
Co-authored-by: Edward Park <ed.sh.park@gmail.com>
2023-05-14 18:26:25 -07:00
Harrison Chase
243886be93
Harrison/virtual time (#4658)
Co-authored-by: ifsheldon <39153080+ifsheldon@users.noreply.github.com>
Co-authored-by: maple.liang <maple.liang@gempoll.com>
2023-05-14 10:29:17 -07:00
Harrison Chase
44ae673388
Harrison/multithreading directory loader (#4650)
Co-authored-by: PawelFaron <42373772+PawelFaron@users.noreply.github.com>
Co-authored-by: Pawel Faron <ext-pawel.faron@vaisala.com>
2023-05-13 21:46:02 -07:00
Leonid Ganeline
3ce78ef6c4
docs: document_loaders classification (#4069)
**Problem statement:** the
[document_loaders](https://python.langchain.com/en/latest/modules/indexes/document_loaders.html#)
section is too long and hard to comprehend.
**Proposal:** group document_loaders by 3 classes: (see `Files changed`
tab)

UPDATE: I've completely reworked the document_loader classification.
Now this PR changes only one file! 

FYI @eyurtsev @hwchase17
2023-05-13 19:17:32 -07:00
Tim Asp
ed0d557ede
docs: fix pdf docs hierarchy and formatting (#4593)
# Fix pdf loader docs page


![image](https://github.com/hwchase17/langchain/assets/707699/4a11f379-00ed-4f7a-9870-71f74e0cadc6)

Using h1's messes with hierarchy, this fixes that, and moves the
PyPDFium2 loader out of the middle of PDFMiner docs
2023-05-12 15:03:01 -04:00
Davis Chase
a4a9d1f403
Improve vespa interface (#4546)
![Screenshot 2023-05-11 at 7 50 31
PM](https://github.com/hwchase17/langchain/assets/130488702/bc8ab4bb-8006-44fc-ba07-df54e84ee2c1)
2023-05-12 10:11:26 -07:00
Neil Ruaro
3a2855945b
added documentation on retrieving a PG vectorstore (#4578)
This PR adds in documentation on querying an existing vectorstore in PG 

Fixes 3191 (issue)
2023-05-12 13:04:06 -04:00
Leonid Ganeline
e17d0319d5
Add arxiv retriever (#4538) 2023-05-11 22:48:38 -07:00
Harrison Chase
3ce29cb4a6
Harrison/new search (#4359)
Co-authored-by: Jiaping(JP) Zhang <vincentzhangv@gmail.com>
2023-05-10 17:09:16 -07:00
Davis Chase
9ec60ad832
Add azure cognitive search retriever (#4467)
All credit to @UmerHA, made a couple small changes

---------

Co-authored-by: UmerHA <40663591+UmerHA@users.noreply.github.com>
2023-05-10 15:27:27 -07:00
Davis Chase
46b100ea63
Add DocArray vector stores (#4483)
Thanks to @anna-charlotte and @jupyterjazz for the contribution! Made
few small changes to get it across the finish line

---------

Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai>
Signed-off-by: jupyterjazz <saba.sturua@jina.ai>
Co-authored-by: anna-charlotte <charlotte.gerhaher@jina.ai>
Co-authored-by: jupyterjazz <saba.sturua@jina.ai>
Co-authored-by: Saba Sturua <45267439+jupyterjazz@users.noreply.github.com>
2023-05-10 15:22:16 -07:00
Matt Robinson
3637d6da6e
feat: add loader for open office odt files (#4405)
# ODF File Loader

Adds a data loader for handling Open Office ODT files. Requires
`unstructured>=0.6.3`.

### Testing

The following should work using the `fake.odt` example doc from the
[`unstructured` repo](https://github.com/Unstructured-IO/unstructured).

```python
from langchain.document_loaders import UnstructuredODTLoader

loader = UnstructuredODTLoader(file_path="fake.odt", mode="elements")
loader.load()

loader = UnstructuredODTLoader(file_path="fake.odt", mode="single")
loader.load()
```
2023-05-10 01:37:17 -07:00
Leonid Ganeline
ce15ffae6a
added Wikipedia retriever (#4302)
- added `Wikipedia` retriever. It is effectively a wrapper for
`WikipediaAPIWrapper`. It wrapps load() into get_relevant_documents()
- sorted `__all__` in the `retrievers/__init__`
- added integration tests for the WikipediaRetriever
- added an example (as Jupyter notebook) for the WikipediaRetriever
2023-05-09 10:08:39 -07:00
BioErrorLog
04f765b838
Fix grammar in Text Splitters docs (#4373)
# Fix grammar in Text Splitters docs

Just a small fix of grammar in the documentation:

"That means there two different axes" -> "That means there are two
different axes"
2023-05-08 22:38:40 -04:00
Leonid Ganeline
9544b30821
added Wikipedia document loader (#4141)
- Added the `Wikipedia` document loader. It is based on the existing
`unilities/WikipediaAPIWrapper`
- Added a respective ut-s and example notebook
- Sorted list of classes in __init__
2023-05-06 09:32:45 -07:00
Davis Chase
5ca13cc1f0
Dev2049/pypdfium2 (#4209)
thanks @jerrytigerxu for the addition!

---------

Co-authored-by: Jere Xu <jtxu2008@gmail.com>
Co-authored-by: jerrytigerxu <jere.tiger.xu@gmailc.om>
2023-05-05 17:55:31 -07:00
Leonid Ganeline
59204a5033
docs: document_loaders improvements (#4200)
- made notebooks consistent: titles, service/format descriptions.
- corrected short names to full names, for example, `Word` -> `Microsoft
Word`
- added missed descriptions
- renamed notebook files to make ToC correctly sorted
2023-05-05 17:44:54 -07:00
Aivin V. Solatorio
6567b73e1a
JSON loader (#4067)
This implements a loader of text passages in JSON format. The `jq`
syntax is used to define a schema for accessing the relevant contents
from the JSON file. This requires dependency on the `jq` package:
https://pypi.org/project/jq/.

---------

Signed-off-by: Aivin V. Solatorio <avsolatorio@gmail.com>
2023-05-05 14:48:13 -07:00
Harrison Chase
26534457f5
simplify csv args (#4182) 2023-05-05 09:22:08 -07:00
Davis Chase
d84bb02881
Add Chroma self query (#4149)
Add internal query language -> chroma metadata filter translator
2023-05-05 08:43:08 -07:00
Harrison Chase
a9c2450330
Harrison/toml loader (#4090)
Co-authored-by: Mika Ayenson <Mikaayenson@users.noreply.github.com>
2023-05-03 23:14:39 -07:00
Harrison Chase
fba6921b50
Harrison/one drive loader (#4081)
Co-authored-by: José Ferraz Neto <netoferraz@gmail.com>
2023-05-03 22:55:34 -07:00
Davis Chase
7f8727bbcd
Router chains (#4019)
Unpolished router examples to help flesh out abstractions and use cases 
![Screenshot 2023-05-02 at 7 02 58
PM](https://user-images.githubusercontent.com/130488702/235820394-389e5584-db0b-415e-a260-2824b5555167.png)

---------

Co-authored-by: Shreya Rajpal <shreya.rajpal@gmail.com>
2023-05-03 22:02:55 -07:00
Harrison Chase
5f30cc8713
Harrison/knn retriever (#4083)
Co-authored-by: Yuichi Tateno (secon) <hotchpotch@users.noreply.github.com>
2023-05-03 21:21:58 -07:00
Harrison Chase
5a269d3175
Harrison/media wiki xml (#4072)
Co-authored-by: Géraud de Drouas <gdedrouas@users.noreply.github.com>
2023-05-03 20:45:33 -07:00
Ivo Stranic
3b556eae44
Update deeplake example (#4055) 2023-05-03 18:03:51 -07:00
Steve Kim
9b830f437c
Deleted importing Document from document_loaders.base because Documen… (#4068)
Hi,

- Modification:
https://python.langchain.com/en/latest/modules/indexes/document_loaders/examples/arxiv.html
- Reason: In this example, the first line is unnecessary because the
Document class does not exist in the base.
- Resolves: Issue #4052

--------
P.S: This pull-request is my first time, so please let me know if I need
to correct or write more explanation.
2023-05-03 17:54:30 -07:00
Akash Sharma
525db1b6cb
Fixed typo leading to broken link (#4034) 2023-05-03 14:45:54 -07:00
Zander Chase
aa38355999
Vwp/docs improved document loaders (#4006)
Huge thanks to @leo-gan for improving the document loaders notebooks

---------

Co-authored-by: Leonid Ganeline <leo.gan.57@gmail.com>
2023-05-02 15:24:53 -07:00
Jinto Jose
013208cce6
Fix Documentation - Nomic - Atlas Jupyter Notebook (#3987)
Correction to Numic-Atlas Jupyter Notebook Docs
2023-05-02 14:20:01 -07:00
Harrison Chase
f04faf8496
Harrison/spreedly (#3937)
Co-authored-by: Esmit Pérez <esmitperez@users.noreply.github.com>
2023-05-01 20:56:56 -07:00
Davis Chase
e7e29f9937
Dev2049/add modern treasury (#3924)
Modified Modern Treasury and Strip slightly so credentials don't have to
be passed in explicitly. Thanks @mattgmarcus for adding Modern Treasury!

---------

Co-authored-by: Matt Marcus <matt.g.marcus@gmail.com>
2023-05-01 20:28:02 -07:00
Harrison Chase
e28c6403aa
Harrison/cohere reranker (#3904) 2023-05-01 15:40:16 -07:00
Nikolas Garske
c4d3d74148
Fix typos in arxiv.ipynb (#3887)
Several minor typos in the doc for the arxiv document loaders were
fixed.
2023-05-01 09:17:37 -07:00
Harrison Chase
20aad0bed1 stripe docs 2023-04-29 08:16:37 -07:00
Sheldon
399065e858
update zilliz example (#3578)
1. Now the Zilliz example can't connect to Zilliz Cloud, fixed

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-04-28 22:10:13 -07:00
Harrison Chase
c494ca3ad2
Harrison/doc2txt (#3772)
Co-authored-by: rishni ratnam <rishniratnam@gmail.com>
2023-04-28 21:54:16 -07:00
Harrison Chase
0c0f14407c
Harrison/tair (#3770)
Co-authored-by: Seth Huang <848849+seth-hg@users.noreply.github.com>
2023-04-28 21:25:33 -07:00
Harrison Chase
b7ae9f715d
Langchain with reddit (#3661) (#3768)
I have added a reddit document loader which fetches the text from the
Posts of Subreddits or Reddit users, using the `praw` Python package. I
have also added an example notebook reddit.ipynb in order to guide users
to use this dataloader.
This code was made in format similar to twiiter document loader. I have
run code formating, linting and also checked the code myself for
different scenarios.

This is my first contribution to an open source project and I am really
excited about this. If you want to suggest some improvements in my code,
I will be happy to do it. :)

Co-authored-by: Taaha Bajwa <taaha.s.bajwa@gmail.com>
2023-04-28 20:59:56 -07:00
Jon Saginaw
f8d69e4e52
Enhancement: Blockchain Document Loader with better Metadata support (#3710)
This PR includes some minor alignment updates, including:

- metadata object extended to support contractAddress, blockchainType,
and tokenId
- notebook doc better aligned to standard langchain format
- startToken changed from int to str to support multiple hex value types
on the Alchemy API

The updated metadata will look like the below. It's possible for a
single contractAddress to exist across multiple blockchains (e.g.
Ethereum, Polygon, etc.) so it's important to include the
blockchainType.

```
 metadata = {"source": self.contract_address, 
                      "blockchain": self.blockchainType,
                      "tokenId": tokenId}
```
2023-04-28 20:13:05 -07:00
Davis Chase
220a7076ac
Add Mathpix pdf loader (#3727)
Inspo
https://twitter.com/danielgross/status/1651695062307274754?s=46&t=1zHLap5WG4I_kQPPjfW9fA

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-04-28 20:11:22 -07:00
Harrison Chase
40f6e60e68
Harrison/stripe (#3762)
Co-authored-by: Ismail Pelaseyed <homanp@gmail.com>
2023-04-28 20:03:21 -07:00
Harrison Chase
7a129ac043
Harrison/pypdf loader (#3764)
Co-authored-by: Felipe Meres <felipe@felipemeres.com>
2023-04-28 19:56:21 -07:00
Harrison Chase
c55ba43093
Harrison/vespa (#3761)
Co-authored-by: Lester Solbakken <lesters@users.noreply.github.com>
2023-04-28 19:48:43 -07:00
leo-gan
e510732ad2
docs: improved vectorstore notebooks (#3724)
- Added links to the vectorstore providers
- Added installation code (it is not clear that we have to go to the
`LangChan Ecosystem` page to get installation instructions.)
2023-04-28 19:26:50 -07:00
Alan Cha
e3b7a20454
Fix typo (#3728) 2023-04-28 13:01:09 -07:00
Davis Chase
3b609642ae
Self-query with generic query constructor (#3607)
Alternate implementation of #3452 that relies on a generic query
constructor chain and language and then has vector store-specific
translation layer. Still refactoring and updating examples but general
structure is there and seems to work s well as #3452 on exampels

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-04-27 08:36:00 -07:00
Harrison Chase
a35bbbfa9e
Harrison/lancedb (#3634)
Co-authored-by: Minh Le <minhle@canva.com>
2023-04-27 08:14:36 -07:00
Shukri
fac4f36a87
Update models used for embeddings in the weaviate example (#3594)
Use text-embedding-ada-002 because it [outperforms all other
models](https://openai.com/blog/new-and-improved-embedding-model).
2023-04-26 21:48:08 -07:00
leo-gan
36c59e0c25
Arxiv document loader (#3627)
It makes sense to use `arxiv` as another source of the documents for
downloading.
- Added the `arxiv` document_loader, based on the
`utilities/arxiv.py:ArxivAPIWrapper`
- added tests
- added an example notebook
- sorted `__all__` in `__init__.py` (otherwise it is hard to find a
class in the very long list)
2023-04-26 21:04:56 -07:00
Eric Peter
603ea75bcd
Fix docs error for google drive loader (#3574) 2023-04-25 22:52:59 -07:00
apurvsibal
af7906f100
Update Alchemy Key URL (#3559)
Update Alchemy Key URL in Blockchain Document Loader. I want to say
thank you for the incredible work the LangChain library creators have
done.

I am amazed at how seamlessly the Loader integrates with Ethereum
Mainnet, Ethereum Testnet, Polygon Mainnet, and Polygon Testnet, and I
am excited to see how this technology can be extended in the future.

@hwchase17 - Please let me know if I can improve or if I have missed any
community guidelines in making the edit? Thank you again for your hard
work and dedication to the open source community.
2023-04-25 16:08:42 -07:00
Harrison Chase
0fc0aa62f2
Harrison/blockchain docloader (#3491)
Co-authored-by: Jon Saginaw <saginawj@users.noreply.github.com>
2023-04-25 08:07:06 -07:00
Maxwell Mullin
696f840426
GuessedAtParserWarning from RTD document loader documentation example (#3397)
Addresses #3396 by adding 

`features='html.parser'` in example
2023-04-24 21:54:39 -07:00
jrhe
980cc41709
Adds progress bar using tqdm to directory_loader (#3349)
Approach copied from `WebBaseLoader`. Assumes the user doesn't have
`tqdm` installed.
2023-04-24 21:42:42 -07:00
Davit Buniatyan
2c0023393b
Deep Lake mini upgrades (#3375)
Improvements
* set default num_workers for ingestion to 0
* upgraded notebooks for avoiding dataset creation ambiguity
* added `force_delete_dataset_by_path`
* bumped deeplake to 3.3.0
* creds arg passing to deeplake object that would allow custom S3

Notes
* please double check if poetry is not messed up (thanks!)

Asks
* Would be great to create a shared slack channel for quick questions

---------

Co-authored-by: Davit Buniatyan <d@activeloop.ai>
2023-04-23 21:23:54 -07:00
Haste171
93d53e417a
Update unstructured_file.ipynb (#3377)
Fix typo in docs
2023-04-23 21:22:38 -07:00
Harrison Chase
e5ffbee5eb
Harrison/hf document loader (#3394)
Co-authored-by: Azam Iftikhar <azamiftikhar1000@gmail.com>
2023-04-23 10:17:43 -07:00
Harrison Chase
a6664be79c
Harrison/myscale (#3352)
Co-authored-by: Fangrui Liu <fangruil@moqi.ai>
Co-authored-by: 刘 方瑞 <fangrui.liu@outlook.com>
Co-authored-by: Fangrui.Liu <fangrui.liu@ubc.ca>
2023-04-22 09:17:38 -07:00
Honkware
a5ad1c270f
Add ChatGPT Data Loader (#3336)
This pull request adds a ChatGPT document loader to the document loaders
module in `langchain/document_loaders/chatgpt.py`. Additionally, it
includes an example Jupyter notebook in
`docs/modules/indexes/document_loaders/examples/chatgpt_loader.ipynb`
which uses fake sample data based on the original structure of the
`conversations.json` file.

The following files were added/modified:
- `langchain/document_loaders/__init__.py`
- `langchain/document_loaders/chatgpt.py`
- `docs/modules/indexes/document_loaders/examples/chatgpt_loader.ipynb`
-
`docs/modules/indexes/document_loaders/examples/example_data/fake_conversations.json`

This pull request was made in response to the recent release of ChatGPT
data exports by email:
https://help.openai.com/en/articles/7260999-how-do-i-export-my-chatgpt-history
2023-04-22 09:06:24 -07:00
Richy Wang
88a8f59aa7
Add a full PostgresSQL syntax database 'AnalyticDB' as vector store. (#3135)
Hi there!
I'm excited to open this PR to add support for using a fully Postgres
syntax compatible database 'AnalyticDB' as a vector.
As AnalyticDB has been proved can be used with AutoGPT,
ChatGPT-Retrieve-Plugin, and LLama-Index, I think it is also good for
you.
AnalyticDB is a distributed Alibaba Cloud-Native vector database. It
works better when data comes to large scale. The PR includes:

- [x]  A new memory: AnalyticDBVector
- [x]  A suite of integration tests verifies the AnalyticDB integration

I have read your [contributing
guidelines](72b7d76d79/.github/CONTRIBUTING.md).
And I have passed the tests below
- [x]  make format
- [x]  make lint
- [x]  make coverage
- [x]  make test
2023-04-22 08:25:41 -07:00
Paul Garner
aa9d5707e0
Add PythonLoader which auto-detects encoding of Python files (#3311)
This PR contributes a `PythonLoader`, which inherits from
`TextLoader` but detects and sets the encoding automatically.
2023-04-21 10:47:57 -07:00
Daniel Chalef
1ecbeec24e
Fix example match_documents fn table name, grammar (#3294)
ref
https://github.com/hwchase17/langchain/pull/3100#issuecomment-1517086472

Co-authored-by: Daniel Chalef <daniel.chalef@private.org>
2023-04-21 10:21:23 -07:00
Davis Chase
46542dc774
Contextual compression retriever (#2915)
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-04-20 17:01:14 -07:00
Harrison Chase
8f22949dc4 update nnotebook title 2023-04-20 11:53:23 -07:00
Harrison Chase
36c10f8a52
nits (#3203) 2023-04-19 21:14:46 -07:00
Daniel Chalef
27cdf8d675
supabase vectorstore - first cut (#3100)
First cut of a supabase vectorstore loosely patterned on the langchainjs
equivalent. Doesn't support async operations which is a limitation of
the supabase python client.

---------

Co-authored-by: Daniel Chalef <daniel.chalef@private.org>
2023-04-19 21:06:44 -07:00
Harrison Chase
96809b5794
Harrison/discord loader (#3200)
Co-authored-by: Rajtilak Bhattacharjee <rajtilak.blog@gmail.com>
2023-04-19 21:04:12 -07:00
Pranabendra Prasad Chandra
7b1f0656b8
Fix typo in ElasticSearch sample notebook (#3171)
Added missing parenthesis in example notebook
[elasticsearch.ipynb](https://github.com/hwchase17/langchain/blob/master/docs/modules/indexes/vectorstores/examples/elasticsearch.ipynb)
2023-04-19 16:06:31 -07:00
Harrison Chase
aad0a498ac
Harrison/output error (#3094)
Co-authored-by: yummydum <sumita@nowcast.co.jp>
2023-04-18 08:59:56 -07:00
Harrison Chase
1c1b77bbfe
Harrison/discord (#3092)
Co-authored-by: Rajtilak Bhattacharjee <rajtilak.blog@gmail.com>
2023-04-18 08:19:23 -07:00
James O'Dwyer
0257829776
Bump Metal to use index_id (#3089)
## Use `index_id` over `app_id`
We made a major update to index + retrieve based on Metal Indexes
(instead of apps). With this change, we accept an index instead of an
app in each of our respective core apis. [More details
here](https://docs.getmetal.io/api-reference/core/indexing).
2023-04-18 07:28:13 -07:00
Hamza Kyamanywa
064a1db2b2
[Documentation] Show how to initiate pinecone from an existing index (#3070)
## What is this PR for:
* This PR adds a commented line of code in the documentation that shows
how someone can use the Pinecone client with an already existing
Pinecone index
* The documentation currently only shows how to create a pinecone index
from langchain documents but not how to load one that already exists
2023-04-18 07:27:46 -07:00
Harrison Chase
1920536d99
Harrison/obsidian (#3060)
Co-authored-by: Ben Hofferber <hofferber.ben@gmail.com>
2023-04-17 21:57:32 -07:00
Zander Chase
93c0514105
Add Twitter Tweet Loader (#3050)
Reformatted version of #3022

---------

Co-authored-by: LiaoKong <568250549@qq.com>
2023-04-17 21:44:54 -07:00
Harrison Chase
5107fac656
Harrison/rec gd (#3054)
Co-authored-by: Benjamin Scholtz <BenSchZA@users.noreply.github.com>
2023-04-17 21:02:35 -07:00
Harrison Chase
db7106cb79
Harrison/image caption loader (#3051)
Co-authored-by: Sean Saito <saitosean@ymail.com>
2023-04-17 20:49:10 -07:00
Harrison Chase
afd3e70ae5
Harrison/confluent loader (#2994)
Co-authored-by: Justin Flick <Justinjayflick@gmail.com>
2023-04-17 20:23:45 -07:00
Harrison Chase
f1d15b4a75 update nb 2023-04-16 22:09:31 -07:00
vowelparrot
99c0382209
Generative Characters (#2859)
Add a time-weighted memory retriever and a notebook that approximates a
Generative Agent from https://arxiv.org/pdf/2304.03442.pdf


The "daily plan" components are removed for now since they are less
useful without a virtual world, but the memory is an interesting
component to build off.

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-04-16 21:41:00 -07:00