Fixes to the docs for timescale vector template (#12756)

This commit is contained in:
Matvey Arye 2023-11-01 21:48:23 -04:00 committed by GitHub
parent 33810126bd
commit 9369d6aca0
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
2 changed files with 63 additions and 18 deletions

View File

@ -21,7 +21,7 @@ These templates cover advanced retrieval techniques.
- [Anthropic Iterative Search](../anthropic-iterative-search): This retrieval technique uses iterative prompting to determine what to retrieve and whether the retriever documents are good enough.
- [Neo4j Parent Document Retrieval](../neo4j-parent): This retrieval technique stores embeddings for smaller chunks, but then returns larger chunks to pass to the model for generation.
- [Semi-Structured RAG](../rag-semi-structured): The template shows how to do retrieval over semi-structured data (e.g. data that involves both text and tables).
- [Temporal RAG](../rag-timescale-hybrid-search-time): The template shows how to do retrieval over data that has a strong time-based component.
- [Temporal RAG](../rag-timescale-hybrid-search-time): The template shows how to do hybrid search over data with a time-based component using [Timescale Vector](https://www.timescale.com/ai?utm_campaign=vectorlaunch&utm_source=langchain&utm_medium=referral).
## 🔍Advanced Retrieval - Query Transformation

View File

@ -10,8 +10,7 @@ This is useful any time your data has a strong time-based component. Some exampl
Such items are often searched by both similarity and time. For example: Show me all news about Toyota trucks from 2022.
[Timescale Vector](https://www.timescale.com/ai?utm_campaign=vectorlaunch&utm_source=langchain&utm_medium=referral) provides superior performance when searching for embeddings within a particular
timeframe by leveraging automatic table partitioning to isolate data for particular time-ranges.
[Timescale Vector](https://www.timescale.com/ai?utm_campaign=vectorlaunch&utm_source=langchain&utm_medium=referral) provides superior performance when searching for embeddings within a particular timeframe by leveraging automatic table partitioning to isolate data for particular time-ranges.
Langchain's self-query retriever allows deducing time-ranges (as well as other search criteria) from the text of user queries.
@ -35,29 +34,75 @@ Timescale Vector is available on [Timescale](https://www.timescale.com/products?
- To get started, [signup](https://console.cloud.timescale.com/signup?utm_campaign=vectorlaunch&utm_source=langchain&utm_medium=referral) to Timescale, create a new database and follow this notebook!
- See the [installation instructions](https://github.com/timescale/python-vector) for more details on using Timescale Vector in python.
### Using Timescale Vector with this template
## Environment Setup
This template uses TimescaleVector as a vectorstore and requires that `TIMESCALES_SERVICE_URL` is set.
This template uses Timescale Vector as a vectorstore and requires that `TIMESCALES_SERVICE_URL`. Signup for a 90-day trial [here](https://console.cloud.timescale.com/signup?utm_campaign=vectorlaunch&utm_source=langchain&utm_medium=referral) if you don't yet have an account.
## LLM
To load the sample dataset, set `LOAD_SAMPLE_DATA=1`. To load your own dataset see the section below.
Be sure that `OPENAI_API_KEY` is set in order to the OpenAI models.
Set the `OPENAI_API_KEY` environment variable to access the OpenAI models.
## Loading sample data
## Usage
We have provided a sample dataset you can use for demoing this template. It consists of the git history of the timescale project.
To use this package, you should first have the LangChain CLI installed:
To load this dataset, set the `LOAD_SAMPLE_DATA` environmental variable.
```shell
pip install -U "langchain-cli[serve]"
```
## Loading your own dataset.
To create a new LangChain project and install this as the only package, you can do:
```shell
langchain app new my-app --package rag-timescale-hybrid-search-time
```
If you want to add this to an existing project, you can just run:
```shell
langchain app add rag-timescale-hybrid-search-time
```
And add the following code to your `server.py` file:
```python
from rag_timescale_hybrid_search import chain as rag_timescale_hybrid_search_chain
add_routes(app, rag_timescale_hybrid_search_chain, path="/rag-timescale-hybrid-search")
```
(Optional) Let's now configure LangSmith.
LangSmith will help us trace, monitor and debug LangChain applications.
LangSmith is currently in private beta, you can sign up [here](https://smith.langchain.com/).
If you don't have access, you can skip this section
```shell
export LANGCHAIN_TRACING_V2=true
export LANGCHAIN_API_KEY=<your-api-key>
export LANGCHAIN_PROJECT=<your-project> # if not specified, defaults to "default"
```
If you are inside this directory, then you can spin up a LangServe instance directly by:
```shell
langchain serve
```
This will start the FastAPI app with a server is running locally at
[http://localhost:8000](http://localhost:8000)
We can see all templates at [http://127.0.0.1:8000/docs](http://127.0.0.1:8000/docs)
We can access the playground at [http://127.0.0.1:8000/rag-timescale-hybrid-search/playground](http://127.0.0.1:8000/rag-timescale-hybrid-search/playground)
We can access the template from code with:
```python
from langserve.client import RemoteRunnable
runnable = RemoteRunnable("http://localhost:8000/rag-timescale-hybrid-search")
```
## Loading your own dataset
To load your own dataset you will have to modify the code in the `DATASET SPECIFIC CODE` section of `chain.py`.
This code defines the name of the collection, how to load the data, and the human-language description of both the
contents of the collection and all of the metadata. The human-language descriptions are used by the self-query retriever
to help the LLM convert the question into filters on the metadata when searching the data in Timescale-vector.
## Using in your own applications
This is a standard LangServe template. Instructions on how to use it with your LangServe applications are [here](https://github.com/langchain-ai/langchain/blob/master/templates/README.md).
to help the LLM convert the question into filters on the metadata when searching the data in Timescale-vector.