Commit Graph

4751 Commits

Author SHA1 Message Date
Harrison Chase
d29f74114e copy paste loader (#1302) 2023-02-26 17:26:37 -08:00
Harrison Chase
ce441edd9c improve docs (#1309) 2023-02-26 11:25:16 -08:00
Harrison Chase
6f30d68581 add example of using agent with vectorstores (#1285) 2023-02-25 13:27:24 -08:00
Matt Robinson
2f15c11b87 feat: document loader for MS Word documents (#1282)
### Summary

Adds a document loader for MS Word Documents. Works with both `.docx`
and `.doc` files as longer as the user has installed
`unstructured>=0.4.11`.

### Testing

The follow workflow test the loader for both `.doc` and `.docx` files
using example docs from the `unstructured` repo.

#### `.docx`

```python
from langchain.document_loaders import UnstructuredWordDocumentLoader

filename = "../unstructured/example-docs/fake.docx"
loader = UnstructuredWordDocumentLoader(filename)
loader.load()
```

#### `.doc`

```python
from langchain.document_loaders import UnstructuredWordDocumentLoader

filename = "../unstructured/example-docs/fake.doc"
loader = UnstructuredWordDocumentLoader(filename)
loader.load()
```
2023-02-24 08:26:19 -08:00
Harrison Chase
96db6ed073 cleanup (#1274) 2023-02-24 07:38:24 -08:00
Harrison Chase
42167a1e24 Harrison/fb loader (#1277)
Co-authored-by: Vairo Di Pasquale <vairo.dp@gmail.com>
2023-02-24 07:22:48 -08:00
Klein Tahiraj
8a0751dadd adding .ipynb loader and documentation Fixes #1248 (#1252)
`NotebookLoader.load()` loads the `.ipynb` notebook file into a
`Document` object.

**Parameters**:

* `include_outputs` (bool): whether to include cell outputs in the
resulting document (default is False).
* `max_output_length` (int): the maximum number of characters to include
from each cell output (default is 10).
* `remove_newline` (bool): whether to remove newline characters from the
cell sources and outputs (default is False).
* `traceback` (bool): whether to include full traceback (default is
False).
2023-02-24 07:10:35 -08:00
Enrico Shippole
9becdeaadf Add Writer, Banana, Modal, StochasticAI (#1270)
Add LLM wrappers and examples for Banana, Writer, Modal, Stochastic AI

Added rigid json format for Banana and Modal
2023-02-24 06:58:58 -08:00
Matt Robinson
10e73a3723 docs: remove nltk download steps (#1253)
### Summary

Updates the docs to remove the `nltk` download steps from
`unstructured`. As of `unstructured` `0.4.14`, this is handled
automatically in the relevant modules within `unstructured`.
2023-02-23 12:34:44 -08:00
Justin Torre
5bc6dc076e added caching and properties docs (#1255) 2023-02-23 11:03:04 -08:00
Iskren Ivov Chernev
8e3cd3e0dd Add DeepInfra LLM support (#1232)
DeepInfra is an Inference-as-a-Service provider. Add a simple wrapper
using HTTPS requests.
2023-02-23 07:37:15 -08:00
Dmitri Melikyan
b7765a95a0 docs: add Graphsignal ecosystem page (#1228)
Adds a Graphsignal ecosystem page
2023-02-23 07:33:00 -08:00
Harrison Chase
6085fe18d4 add ifttt tool (#1244) 2023-02-22 22:29:43 -08:00
Harrison Chase
71709ad5d5 Update key_concepts.md (#1209) (#1237)
Link for easier navigation (it's not immediately clear where to find
more info on SimpleSequentialChain (3 clicks away)

---------

Co-authored-by: Larry Fisherman <l4rryfisherman@protonmail.com>
2023-02-22 13:30:53 -08:00
Dennis Antela Martinez
53c67e04d4 add aleph alpha llm (#1207)
Integrate Aleph Alpha's client into Langchain to provide access to the
luminous models - more info on latest benchmarks here:
https://www.aleph-alpha.com/luminous-performance-benchmarks
2023-02-22 10:37:36 -08:00
Ikko Eltociear Ashimine
334b553260 Update petals.md (#1225)
Huggingface -> Hugging Face
2023-02-22 10:34:16 -08:00
Sason
cc7d2e5621 Correct typo in "Question Answering" How-To Guide (#1221) 2023-02-21 17:02:58 -08:00
Matt Robinson
3d5f56a8a1 docs: add quotes to unstructured[local-inference] install instructions (#1208)
### Summary

Corrects the install instruction for local inference to `pip install
"unstructured[local-inference]"`
2023-02-21 08:06:43 -08:00
Harrison Chase
047231840d add docs for chroma persistance (#1202) 2023-02-20 23:04:17 -08:00
Harrison Chase
5bdb8dd6fe Harrison/unstructured io (#1200) 2023-02-20 22:54:49 -08:00
Harrison Chase
d90a287d8f Harrison/updating docs (#1196) 2023-02-20 22:54:26 -08:00
Dennis Antela Martinez
23243ae69c add gitbook document loader (#1180)
Added a GitBook document loader. It lets you both, (1) fetch text from
any single GitBook page, or (2) fetch all relative paths and return
their respective content in Documents.

I've modified the `scrape` method in the `WebBaseLoader` to accept
custom web paths if given, but happy to remove it and move that logic
into the `GitbookLoader` itself.
2023-02-20 20:05:04 -08:00
Naveen Tatikonda
0118706fd6 Add Support for OpenSearch Vector database (#1191)
### Description
This PR adds a wrapper which adds support for the OpenSearch vector
database. Using opensearch-py client we are ingesting the embeddings of
given text into opensearch cluster using Bulk API. We can perform the
`similarity_search` on the index using the 3 popular searching methods
of OpenSearch k-NN plugin:

- `Approximate k-NN Search` use approximate nearest neighbor (ANN)
algorithms from the [nmslib](https://github.com/nmslib/nmslib),
[faiss](https://github.com/facebookresearch/faiss), and
[Lucene](https://lucene.apache.org/) libraries to power k-NN search.
- `Script Scoring` extends OpenSearch’s script scoring functionality to
execute a brute force, exact k-NN search.
- `Painless Scripting` adds the distance functions as painless
extensions that can be used in more complex combinations. Also, supports
brute force, exact k-NN search like Script Scoring.

### Issues Resolved 
https://github.com/hwchase17/langchain/issues/1054

---------

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>
2023-02-20 18:39:34 -08:00
Harrison Chase
926c121b98 Harrison/text splitter docs (#1188) 2023-02-20 15:14:03 -08:00
Harrison Chase
91446a5e9b clean up text splitting docs (#1184) 2023-02-20 11:24:31 -08:00
Harrison Chase
5a954efdd7 update gallery with slack bot (#1177) 2023-02-20 08:21:00 -08:00
blob42
9962bda70b searx_search: docs updates (#1175)
- fix notebook formatting, remove empty cells and add scrolling for long
text

---------

Co-authored-by: blob42 <spike@w530>
2023-02-20 06:46:44 -08:00
Harrison Chase
4f3fbd7267 improve docs for indexes (#1146) 2023-02-19 23:14:50 -08:00
Harrison Chase
28781a6213 Harrison/markdown splitter (#1169)
Co-authored-by: Michael Chen <flamingdescent@gmail.com>
Co-authored-by: Michael Chen <michaelchen@stripe.com>
2023-02-19 21:31:58 -08:00
Nan Wang
e8f224fd3a docs: add missing links to toc (#1163)
add missing links to toc

---------

Signed-off-by: Nan Wang <nan.wang@jina.ai>
2023-02-19 21:15:11 -08:00
Nick
afe884fb96 AI21 documentation incorrectly titled Cohere (#1167) 2023-02-19 21:14:59 -08:00
Harrison Chase
955c89fccb pass in prompts to vectordbqa (#1158) 2023-02-19 20:47:17 -08:00
Harrison Chase
65cc81c479 directory loader improvements (#1162) 2023-02-19 20:47:08 -08:00
Harrison Chase
9d6d8f85da Harrison/self hosted runhouse (#1154)
Co-authored-by: Donny Greenberg <dongreenberg2@gmail.com>
Co-authored-by: John Dagdelen <jdagdelen@users.noreply.github.com>
Co-authored-by: Harrison Chase <harrisonchase@Harrisons-MBP.attlocal.net>
Co-authored-by: Andrew White <white.d.andrew@gmail.com>
Co-authored-by: Peng Qu <82029664+pengqu123@users.noreply.github.com>
Co-authored-by: Matt Robinson <mthw.wm.robinson@gmail.com>
Co-authored-by: jeff <tangj1122@gmail.com>
Co-authored-by: Harrison Chase <harrisonchase@Harrisons-MacBook-Pro.local>
Co-authored-by: zanderchase <zander@unfold.ag>
Co-authored-by: Charles Frye <cfrye59@gmail.com>
Co-authored-by: zanderchase <zanderchase@gmail.com>
Co-authored-by: Shahriar Tajbakhsh <sh.tajbakhsh@gmail.com>
Co-authored-by: Stefan Keselj <skeselj@princeton.edu>
Co-authored-by: Francisco Ingham <fpingham@gmail.com>
Co-authored-by: Dhruv Anand <105786647+dhruv-anand-aintech@users.noreply.github.com>
Co-authored-by: cragwolfe <cragcw@gmail.com>
Co-authored-by: Anton Troynikov <atroyn@users.noreply.github.com>
Co-authored-by: William FH <13333726+hinthornw@users.noreply.github.com>
Co-authored-by: Oliver Klingefjord <oliver@klingefjord.com>
Co-authored-by: blob42 <contact@blob42.xyz>
Co-authored-by: blob42 <spike@w530>
Co-authored-by: Enrico Shippole <henryshippole@gmail.com>
Co-authored-by: Ibis Prevedello <ibiscp@gmail.com>
Co-authored-by: jped <jonathanped@gmail.com>
Co-authored-by: Justin Torre <justintorre75@gmail.com>
Co-authored-by: Ivan Vendrov <ivan@anthropic.com>
Co-authored-by: Sasmitha Manathunga <70096033+mmz-001@users.noreply.github.com>
Co-authored-by: Ankush Gola <9536492+agola11@users.noreply.github.com>
Co-authored-by: Matt Robinson <mrobinson@unstructuredai.io>
Co-authored-by: Jeff Huber <jeffchuber@gmail.com>
Co-authored-by: Akshay <64036106+akshayvkt@users.noreply.github.com>
Co-authored-by: Andrew Huang <jhuang16888@gmail.com>
Co-authored-by: rogerserper <124558887+rogerserper@users.noreply.github.com>
Co-authored-by: seanaedmiston <seane999@gmail.com>
Co-authored-by: Hasegawa Yuya <52068175+Hase-U@users.noreply.github.com>
Co-authored-by: Ivan Vendrov <ivendrov@gmail.com>
Co-authored-by: Chen Wu (吴尘) <henrychenwu@cmu.edu>
Co-authored-by: Dennis Antela Martinez <dennis.antela@gmail.com>
Co-authored-by: Maxime Vidal <max.vidal@hotmail.fr>
Co-authored-by: Rishabh Raizada <110235735+rishabh-ti@users.noreply.github.com>
2023-02-19 09:53:45 -08:00
CG80499
af8f5c1a49 Added constitutional chain. (#1147)
- Added self-critique constitutional chain based on this
[paper](https://www.anthropic.com/constitutional.pdf).
2023-02-18 19:31:51 -08:00
Harrison Chase
a83ba44efa Harrison/ver0089 (#1144) 2023-02-18 14:25:37 -08:00
Ankush Gola
7b5e160d28 Make Tools own model, add ToolKit Concept (#1095)
Follow-up of @hinthornw's PR:

- Migrate the Tool abstraction to a separate file (`BaseTool`).
- `Tool` implementation of `BaseTool` takes in function and coroutine to
more easily maintain backwards compatibility
- Add a Toolkit abstraction that can own the generation of tools around
a shared concept or state

---------

Co-authored-by: William FH <13333726+hinthornw@users.noreply.github.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Co-authored-by: Francisco Ingham <fpingham@gmail.com>
Co-authored-by: Dhruv Anand <105786647+dhruv-anand-aintech@users.noreply.github.com>
Co-authored-by: cragwolfe <cragcw@gmail.com>
Co-authored-by: Anton Troynikov <atroyn@users.noreply.github.com>
Co-authored-by: Oliver Klingefjord <oliver@klingefjord.com>
Co-authored-by: William Fu-Hinthorn <whinthorn@Williams-MBP-3.attlocal.net>
Co-authored-by: Bruno Bornsztein <bruno.bornsztein@gmail.com>
2023-02-18 13:40:43 -08:00
Harrison Chase
45b5640fe5 fix sql (#1141) 2023-02-18 11:49:08 -08:00
Sam Hogan
85c1449a96 Fix typo in HyDE docs (#1142) 2023-02-18 11:48:46 -08:00
Harrison Chase
fb3c73d194 add srt loader (#1140) 2023-02-18 10:58:39 -08:00
Harrison Chase
483821ea3b fix docs (#1133) 2023-02-18 08:13:54 -08:00
Harrison Chase
d5f3dfa1e1 Harrison/hn loader (#1130)
Co-authored-by: William X <william.y.xuan@gmail.com>
2023-02-17 15:15:02 -08:00
Harrison Chase
511d41114f return source documents for chat vector db chain (#1128) 2023-02-17 13:40:52 -08:00
Matt Robinson
b956070f08 docs: add an unstructured section to the ecosystem page (#1125)
### Summary

Adds an Unstructured section to the ecosystem page.
2023-02-17 13:02:23 -08:00
Francisco Ingham
3462130e2d Modify number of types of chains (#1089)
Changed number of types of chains to make it consistent with the rest of
the docs
2023-02-16 07:06:30 -08:00
Harrison Chase
7745505482 chat qa with sources (#1084) 2023-02-16 00:29:47 -08:00
Harrison Chase
badeeb37b0 fix stuff count (#1083) 2023-02-15 23:57:13 -08:00
Harrison Chase
971458c5de docs for batch size (#1082) 2023-02-15 23:53:56 -08:00
Harrison Chase
5e10e19bfe Harrison/align table (#1081)
Co-authored-by: Francisco Ingham <fpingham@gmail.com>
2023-02-15 23:53:37 -08:00
Harrison Chase
c60954d0f8 Harrison/telegram loader (#1080)
Co-authored-by: Maxime Vidal <max.vidal@hotmail.fr>
2023-02-15 23:24:32 -08:00