Commit Graph

5082 Commits

Author SHA1 Message Date
Harrison Chase
bc2ed93b77 fix doc tags (#2019) 2023-03-26 21:43:51 -07:00
Ankush Gola
c71f2a7b26 small nit on index page (#2018) 2023-03-27 00:15:24 -04:00
Harrison Chase
51681f653f fix docs (#2017) 2023-03-26 20:50:36 -07:00
Harrison Chase
705431aecc big docs refactor (#1978)
Co-authored-by: Ankush Gola <ankush.gola@gmail.com>
2023-03-26 19:49:46 -07:00
Harrison Chase
b83e826510 plugin tool (#1974) 2023-03-24 12:30:08 -07:00
Harrison Chase
6ec5780547 add docs for openai retriever ingest (#1969) 2023-03-24 08:24:33 -07:00
Harrison Chase
47d37db2d2 WIP: Harrison/base retriever (#1765) 2023-03-24 07:46:49 -07:00
Enwei Jiao
4f364db9a9 Add milvus for ecosystem (#1951) 2023-03-23 22:01:28 -07:00
Tim Asp
030ce9f506 fix import error of bs4 (#1952)
Ran into a broken build if bs4 wasn't installed in the project.

Minor tweak to follow the other doc loaders optional package-loading
conventions.

Also updated html docs to include reference to this new html loader.

side note: Should there be 2 different html-to-text document loaders?
This new one only handles local files, while the existing unstructured
html loader handles HTML from local and remote. So it seems like the
improvement was adding the title to the metadata, which is useful but
could also be added to `html.py`
2023-03-23 21:56:13 -07:00
Harrison Chase
8990122d5d retrievers interface (#1948) 2023-03-23 19:00:38 -07:00
Harrison Chase
52d6bf04d0 tracing improvements to docs (#1947) 2023-03-23 19:00:18 -07:00
Harrison Chase
b5667bed9e human input default (#1911) 2023-03-22 20:30:45 -07:00
Eric Zhu
b3be83c750 Add human as a tool (#1879)
Human can help AI.  #1871
2023-03-22 20:14:52 -07:00
Harrison Chase
50626a10ee Hx23840 feat/add redisearch vectorstore (#1909)
Co-authored-by: Peter <peter.shi@alephf.com>
Co-authored-by: Peter Shi <42536066+hx23840@users.noreply.github.com>
2023-03-22 19:57:56 -07:00
Harrison Chase
6e1b5b8f7e Harrison/figma doc loader (#1908)
Co-authored-by: Ismail Pelaseyed <homanp@gmail.com>
2023-03-22 19:57:46 -07:00
Klein Tahiraj
d3d4503ce2 Remove redundant .docx loader (closes #1716) + update how_to_guides.rst (#1891)
In https://github.com/hwchase17/langchain/issues/1716 , it was
identified that there were two .py files performing similar tasks. As a
resolution, one of the files has been removed, as its purpose had
already been fulfilled by the other file. Additionally, the init has
been updated accordingly.

Furthermore, the how_to_guides.rst file has been updated to include
links to documentation that was previously missing. This was deemed
necessary as the existing list on
https://langchain.readthedocs.io/en/latest/modules/document_loaders/how_to_guides.html
was incomplete, causing confusion for users who rely on the full list of
documentation on the left sidebar of the website.
2023-03-22 15:19:42 -07:00
Harrison Chase
1f93c5cf69 extraction docs (#1898) 2023-03-22 15:00:44 -07:00
Sean Zheng
15b5a08f4b Update how_to_guides.rst (#1893)
Adding OpenSearch examples
2023-03-22 14:30:43 -07:00
Harrison Chase
ce5d97bcb3 Harrison/guarded output parser (#1804)
Co-authored-by: jerwelborn <jeremy.welborn@gmail.com>
2023-03-21 22:07:23 -07:00
DeadBranch
8fa1764c60 docs: update gpt index references to LlamaIndex (#1856)
The GPT Index project is transitioning to the new project name,
LlamaIndex.

I've updated a few files referencing the old project name and repository
URL to the current ones.

From the [LlamaIndex repo](https://github.com/jerryjliu/llama_index):
> NOTE: We are rebranding GPT Index as LlamaIndex! We will carry out
this transition gradually.
>
> 2/25/2023: By default, our docs/notebooks/instructions now reference
"LlamaIndex" instead of "GPT Index".
>
> 2/19/2023: By default, our docs/notebooks/instructions now use the
llama-index package. However the gpt-index package still exists as a
duplicate!
>
> 2/16/2023: We have a duplicate llama-index pip package. Simply replace
all imports of gpt_index with llama_index if you choose to pip install
llama-index.

I'm not associated with LlamaIndex in any way. I just noticed the
discrepancy when studying the lanchain documentation.
2023-03-21 22:01:05 -07:00
Harrison Chase
f299bd1416 clean up sagemaker nb (#1875) 2023-03-21 22:00:08 -07:00
Philipp Schmid
064be93edf [Embeddings] Add SageMaker Endpoint Embedding class (#1859)
# What does this PR do? 

This PR adds similar to `llms` a SageMaker-powered `embeddings` class.
This is helpful if you want to leverage Hugging Face models on SageMaker
for creating your indexes.

I added a example into the
[docs/modules/indexes/examples/embeddings.ipynb](https://github.com/hwchase17/langchain/compare/master...philschmid:add-sm-embeddings?expand=1#diff-e82629e2894974ec87856aedd769d4bdfe400314b03734f32bee5990bc7e8062)
document. The example currently includes some `_### TEMPORARY: Showing
how to deploy a SageMaker Endpoint from a Hugging Face model ###_ ` code
showing how you can deploy a sentence-transformers to SageMaker and then
run the methods of the embeddings class.

@hwchase17 please let me know if/when i should remove the `_###
TEMPORARY: Showing how to deploy a SageMaker Endpoint from a Hugging
Face model ###_` in the description i linked to a detail blog on how to
deploy a Sentence Transformers so i think we don't need to include those
steps here.

I also reused the `ContentHandlerBase` from
`langchain.llms.sagemaker_endpoint` and changed the output type to `any`
since it is depending on the implementation.
2023-03-21 21:51:48 -07:00
anupam-tiwari
86822d1cc2 Fixes the import typo in the vector db text generator notebook (#1874)
Fixes the import typo in the vector db text generator notebook for the
chroma library

Co-authored-by: Anupam <anupam@10-16-252-145.dynapool.wireless.nyu.edu>
2023-03-21 21:48:26 -07:00
Harrison Chase
a581bce379 remove key (#1863) 2023-03-21 12:43:41 -07:00
Harrison Chase
2ffc643086 add listen api docs (#1855) 2023-03-21 09:29:34 -07:00
Tomoko Uchida
b706966ebc Add setup instruction in Getting Started for Indexing (#1847)
`VectorstoreIndexCreator` [uses Chroma as the vectorstore by
default](1c22657256/langchain/indexes/vectorstore.py (L49)).
It may be helpful to add a short note for the setup.

You can see how the notebook looks here.

https://github.com/mocobeta/langchain/blob/feat/add-setup-instruction-to-index-getting-started/docs/modules/indexes/getting_started.ipynb
2023-03-21 09:06:35 -07:00
Harrison Chase
1c22657256 Harrison/faiss merge (#1843)
Co-authored-by: Ting Su <ting.su.1995@outlook.com>
2023-03-20 22:54:08 -07:00
Simon Zhou
3674074eb0 Add Qdrant to ecosystem page (#1830)
Add [Qdrant](https://qdrant.tech/) to [LangChain
ecosystem](https://langchain.readthedocs.io/en/latest/ecosystem.html)
page.
2023-03-20 22:06:40 -07:00
Wenbin Fang
a7e09d46c5 Add podcast api tool to use NLP to search all podcasts or episodes. (#1833)
Use the following code to test:

```python
import os
from langchain.llms import OpenAI
from langchain.chains.api import podcast_docs
from langchain.chains import APIChain

# Get api key here: https://openai.com/pricing
os.environ["OPENAI_API_KEY"] = "sk-xxxxx"

# Get api key here: https://www.listennotes.com/api/pricing/
listen_api_key = 'xxx'

llm = OpenAI(temperature=0)
headers = {"X-ListenAPI-Key": listen_api_key}
chain = APIChain.from_llm_and_api_docs(llm, podcast_docs.PODCAST_DOCS, headers=headers, verbose=True)
chain.run("Search for 'silicon valley bank' podcast episodes, audio length is more than 30 minutes, return only 1 results")
```

Known issues: the api response data might be too big, and we'll get such
error:
`openai.error.InvalidRequestError: This model's maximum context length
is 4097 tokens, however you requested 6733 tokens (6477 in your prompt;
256 for the completion). Please reduce your prompt; or completion
length.`
2023-03-20 22:04:17 -07:00
Ikko Eltociear Ashimine
9555bbd5bb Fix typo in sqlite.ipynb (#1828)
overriden -> overridden
2023-03-20 16:47:19 -07:00
Harrison Chase
d5b4393bb2 Harrison/llm math (#1808)
Co-authored-by: Vadym Barda <vadim.barda@gmail.com>
2023-03-20 07:53:26 -07:00
Bryan Helmig
7b6ff7fe00 Follow up to #1803 to remove dynamic docs route. (#1818)
The base docs are going to be more stable and familiar for folks.
Dynamic route is currently in flux.
2023-03-20 07:52:41 -07:00
Harrison Chase
76c7b1f677 Harrison/wandb (#1764)
Co-authored-by: Anish Shah <93145909+ash0ts@users.noreply.github.com>
2023-03-20 07:52:27 -07:00
Harrison Chase
d5d50c39e6 Harrison/azure embeddings (#1787)
Co-authored-by: Hemant <4627288+ghaccount@users.noreply.github.com>
2023-03-19 10:42:33 -07:00
Harrison Chase
1f18698b2a Harrison/token buffer memory (#1786)
Co-authored-by: Aratako <127325395+Aratako@users.noreply.github.com>
2023-03-19 10:42:24 -07:00
Harrison Chase
ef4945af6b Harrison/chat token usage (#1785) 2023-03-19 10:32:31 -07:00
Harrison Chase
7de2ada3ea Harrison/add source column (#1784)
Co-authored-by: Brian Graham <46691715+briangrahamww@users.noreply.github.com>
Co-authored-by: briangrahamww <brian.graham@ww.com>
2023-03-19 10:32:13 -07:00
hitoshi44
3cf493b089 Fix Document & Expose StringPromptTemplate as a custom-prompt-template. (#1753)
Regarding [this
issue](https://github.com/hwchase17/langchain/issues/1754), the code in
the document [Creating a custom prompt
template](https://langchain.readthedocs.io/en/latest/modules/prompts/examples/custom_prompt_template.html)
is no longer functional and outdated.

To address this, I have made the following changes:

1. Updated the guide in the document to use `StringPromptTemplate`
instead of `BasePromptTemplate`.
2. Exposed `StringPromptTemplate` in `prompts/__init__.py` for easier
importing.
2023-03-19 09:47:56 -07:00
hung_ng__
3d6fcb85dc Add load json prompt example (#1776)
Hi, I just want to add a PR on the prompt serialization examples of
loading from JSON so that it can contain the same as loading from YAML.
2023-03-19 09:28:56 -07:00
Piyush Jain
1a8790d808 Corrects copyright year (#1762)
Corrected copyright year.
2023-03-18 19:55:05 -07:00
Harrison Chase
8685d53adc querying tabular data (#1758) 2023-03-18 11:12:18 -07:00
Harrison Chase
dd90fd02d5 Harrison/move docs (#1741) 2023-03-17 08:49:10 -07:00
Harrison Chase
07766a69f3 move docs (#1740) 2023-03-17 08:42:28 -07:00
Harrison Chase
96ebe98dc2 Harrison/latex splitter (#1738)
Co-authored-by: Aidan Holland <thehappydinoa@gmail.com>
Co-authored-by: Jan de Boer <44832123+Janldeboer@users.noreply.github.com>
2023-03-17 08:10:27 -07:00
Harrison Chase
45f05fc939 Harrison/blackboard loader (#1737)
Co-authored-by: Aidan Holland <thehappydinoa@gmail.com>
2023-03-17 08:02:44 -07:00
Vincent Liao
cf9c3f54f7 docs: add docs link to agent toolkits (#1735)
New to Langchain, was a bit confused where I should find the toolkits
section when I'm at `agent/key_concepts` docs. I added a short link that
points to the how to section.
2023-03-17 07:59:49 -07:00
Piyush Jain
cdff6c8181 Sagemaker Endpoint LLM (#1686)
Updates #965

---------

Co-authored-by: Nimisha Mehta <116048415+nimimeht@users.noreply.github.com>
Co-authored-by: Harrison Chase <harrisonchase@Harrisons-MBP.attlocal.net>
2023-03-16 21:58:06 -07:00
libra
8a95fdaee1 Fix all the bug in init Tool in docs (#1725)
Fix all the example in the docs when init `Tool`

Test by render with jupyter
2023-03-16 21:55:44 -07:00
jerwelborn
55efbb8a7e pydantic/json parsing (#1722)
```
class Joke(BaseModel):
    setup: str = Field(description="question to set up a joke")
    punchline: str = Field(description="answer to resolve the joke")

joke_query = "Tell me a joke."

# Or, an example with compound type fields.
#class FloatArray(BaseModel):
#    values: List[float] = Field(description="list of floats")
#
#float_array_query = "Write out a few terms of fiboacci."

model = OpenAI(model_name='text-davinci-003', temperature=0.0)
parser = PydanticOutputParser(pydantic_object=Joke)
prompt = PromptTemplate(
    template="Answer the user query.\n{format_instructions}\n{query}\n",
    input_variables=["query"],
    partial_variables={"format_instructions": parser.get_format_instructions()}
)

_input = prompt.format_prompt(query=joke_query)
print("Prompt:\n", _input.to_string())
output = model(_input.to_string())
print("Completion:\n", output)
parsed_output = parser.parse(output)
print("Parsed completion:\n", parsed_output)
```

```
Prompt:
 Answer the user query.
The output should be formatted as a JSON instance that conforms to the JSON schema below.  For example, the object {"foo":  ["bar", "baz"]} conforms to the schema {"foo": {"description": "a list of strings field", "type": "string"}}.

Here is the output schema:
---
{"setup": {"description": "question to set up a joke", "type": "string"}, "punchline": {"description": "answer to resolve the joke", "type": "string"}}
---

Tell me a joke.

Completion:
 {"setup": "Why don't scientists trust atoms?", "punchline": "Because they make up everything!"}

Parsed completion:
 setup="Why don't scientists trust atoms?" punchline='Because they make up everything!'
```

Ofc, works only with LMs of sufficient capacity. DaVinci is reliable but
not always.

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-03-16 21:43:11 -07:00
Jonathan Pedoeem
606605925d Adding ability to return_pl_id to all PromptLayer Models in LangChain (#1699)
PromptLayer now has support for [several different tracking
features.](https://magniv.notion.site/Track-4deee1b1f7a34c1680d085f82567dab9)
In order to use any of these features you need to have a request id
associated with the request.

In this PR we add a boolean argument called `return_pl_id` which will
add `pl_request_id` to the `generation_info` dictionary associated with
a generation.

We also updated the relevant documentation.
2023-03-16 17:05:23 -07:00