mirror of
https://github.com/hwchase17/langchain.git
synced 2025-06-19 13:23:35 +00:00
Pgvector template (#13267)
Including pvector template, adapting what is covered in the [cookbook](https://github.com/langchain-ai/langchain/blob/master/cookbook/retrieval_in_sql.ipynb). --------- Co-authored-by: Lance Martin <lance@langchain.dev> Co-authored-by: Erick Friis <erick@langchain.dev>
This commit is contained in:
parent
be854225c7
commit
58f5a4d30a
1
templates/sql-pgvector/.gitignore
vendored
Normal file
1
templates/sql-pgvector/.gitignore
vendored
Normal file
@ -0,0 +1 @@
|
||||
__pycache__
|
21
templates/sql-pgvector/LICENSE
Normal file
21
templates/sql-pgvector/LICENSE
Normal file
@ -0,0 +1,21 @@
|
||||
MIT License
|
||||
|
||||
Copyright (c) 2023 LangChain, Inc.
|
||||
|
||||
Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||
of this software and associated documentation files (the "Software"), to deal
|
||||
in the Software without restriction, including without limitation the rights
|
||||
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
||||
copies of the Software, and to permit persons to whom the Software is
|
||||
furnished to do so, subject to the following conditions:
|
||||
|
||||
The above copyright notice and this permission notice shall be included in all
|
||||
copies or substantial portions of the Software.
|
||||
|
||||
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
||||
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
||||
SOFTWARE.
|
105
templates/sql-pgvector/README.md
Normal file
105
templates/sql-pgvector/README.md
Normal file
@ -0,0 +1,105 @@
|
||||
# sql-pgvector
|
||||
|
||||
This template enables user to use `pgvector` for combining postgreSQL with semantic search / RAG.
|
||||
|
||||
It uses [PGVector](https://github.com/pgvector/pgvector) extension as shown in the [RAG empowered SQL cookbook](cookbook/retrieval_in_sql.ipynb)
|
||||
|
||||
## Environment Setup
|
||||
|
||||
If you are using `ChatOpenAI` as your LLM, make sure the `OPENAI_API_KEY` is set in your environment. You can change both the LLM and embeddings model inside `chain.py`
|
||||
|
||||
And you can configure configure the following environment variables
|
||||
for use by the template (defaults are in parentheses)
|
||||
|
||||
- `POSTGRES_USER` (postgres)
|
||||
- `POSTGRES_PASSWORD` (test)
|
||||
- `POSTGRES_DB` (vectordb)
|
||||
- `POSTGRES_HOST` (localhost)
|
||||
- `POSTGRES_PORT` (5432)
|
||||
|
||||
If you don't have a postgres instance, you can run one locally in docker:
|
||||
|
||||
```bash
|
||||
docker run \
|
||||
--name some-postgres \
|
||||
-e POSTGRES_PASSWORD=test \
|
||||
-e POSTGRES_USER=postgres \
|
||||
-e POSTGRES_DB=vectordb \
|
||||
-p 5432:5432 \
|
||||
postgres:16
|
||||
```
|
||||
|
||||
And to start again later, use the `--name` defined above:
|
||||
```bash
|
||||
docker start some-postgres
|
||||
```
|
||||
|
||||
### PostgreSQL Database setup
|
||||
|
||||
Apart from having `pgvector` extension enabled, you will need to do some setup before being able to run semantic search within your SQL queries.
|
||||
|
||||
In order to run RAG over your postgreSQL database you will need to generate the embeddings for the specific columns you want.
|
||||
|
||||
This process is covered in the [RAG empowered SQL cookbook](cookbook/retrieval_in_sql.ipynb), but the overall approach consist of:
|
||||
1. Querying for unique values in the column
|
||||
2. Generating embeddings for those values
|
||||
3. Store the embeddings in a separate column or in an auxiliary table.
|
||||
|
||||
## Usage
|
||||
|
||||
To use this package, you should first have the LangChain CLI installed:
|
||||
|
||||
```shell
|
||||
pip install -U langchain-cli
|
||||
```
|
||||
|
||||
To create a new LangChain project and install this as the only package, you can do:
|
||||
|
||||
```shell
|
||||
langchain app new my-app --package sql-pgvector
|
||||
```
|
||||
|
||||
If you want to add this to an existing project, you can just run:
|
||||
|
||||
```shell
|
||||
langchain app add sql-pgvector
|
||||
```
|
||||
|
||||
And add the following code to your `server.py` file:
|
||||
```python
|
||||
from sql_pgvector import chain as sql_pgvector_chain
|
||||
|
||||
add_routes(app, sql_pgvector_chain, path="/sql-pgvector")
|
||||
```
|
||||
|
||||
(Optional) Let's now configure LangSmith.
|
||||
LangSmith will help us trace, monitor and debug LangChain applications.
|
||||
LangSmith is currently in private beta, you can sign up [here](https://smith.langchain.com/).
|
||||
If you don't have access, you can skip this section
|
||||
|
||||
|
||||
```shell
|
||||
export LANGCHAIN_TRACING_V2=true
|
||||
export LANGCHAIN_API_KEY=<your-api-key>
|
||||
export LANGCHAIN_PROJECT=<your-project> # if not specified, defaults to "default"
|
||||
```
|
||||
|
||||
If you are inside this directory, then you can spin up a LangServe instance directly by:
|
||||
|
||||
```shell
|
||||
langchain serve
|
||||
```
|
||||
|
||||
This will start the FastAPI app with a server is running locally at
|
||||
[http://localhost:8000](http://localhost:8000)
|
||||
|
||||
We can see all templates at [http://127.0.0.1:8000/docs](http://127.0.0.1:8000/docs)
|
||||
We can access the playground at [http://127.0.0.1:8000/sql-pgvector/playground](http://127.0.0.1:8000/sql-pgvector/playground)
|
||||
|
||||
We can access the template from code with:
|
||||
|
||||
```python
|
||||
from langserve.client import RemoteRunnable
|
||||
|
||||
runnable = RemoteRunnable("http://localhost:8000/sql-pgvector")
|
||||
```
|
1790
templates/sql-pgvector/poetry.lock
generated
Normal file
1790
templates/sql-pgvector/poetry.lock
generated
Normal file
File diff suppressed because it is too large
Load Diff
26
templates/sql-pgvector/pyproject.toml
Normal file
26
templates/sql-pgvector/pyproject.toml
Normal file
@ -0,0 +1,26 @@
|
||||
[tool.poetry]
|
||||
name = "sql-pgvector"
|
||||
version = "0.0.1"
|
||||
description = ""
|
||||
authors = []
|
||||
readme = "README.md"
|
||||
|
||||
[tool.poetry.dependencies]
|
||||
python = ">=3.8.1,<4.0"
|
||||
langchain = ">=0.0.313, <0.1"
|
||||
openai = "^0.28.1"
|
||||
psycopg2 = "^2.9.9"
|
||||
tiktoken = "^0.5.1"
|
||||
|
||||
[tool.poetry.group.dev.dependencies]
|
||||
langchain-cli = ">=0.0.15"
|
||||
fastapi = "^0.104.0"
|
||||
sse-starlette = "^1.6.5"
|
||||
|
||||
[tool.langserve]
|
||||
export_module = "sql_pgvector"
|
||||
export_attr = "chain"
|
||||
|
||||
[build-system]
|
||||
requires = ["poetry-core"]
|
||||
build-backend = "poetry.core.masonry.api"
|
3
templates/sql-pgvector/sql_pgvector/__init__.py
Normal file
3
templates/sql-pgvector/sql_pgvector/__init__.py
Normal file
@ -0,0 +1,3 @@
|
||||
from sql_pgvector.chain import chain
|
||||
|
||||
__all__ = ["chain"]
|
118
templates/sql-pgvector/sql_pgvector/chain.py
Normal file
118
templates/sql-pgvector/sql_pgvector/chain.py
Normal file
@ -0,0 +1,118 @@
|
||||
import os
|
||||
import re
|
||||
|
||||
from langchain.chat_models import ChatOpenAI
|
||||
from langchain.embeddings import OpenAIEmbeddings
|
||||
from langchain.prompts import ChatPromptTemplate
|
||||
from langchain.pydantic_v1 import BaseModel
|
||||
from langchain.schema.output_parser import StrOutputParser
|
||||
from langchain.schema.runnable import RunnableLambda, RunnablePassthrough
|
||||
from langchain.sql_database import SQLDatabase
|
||||
|
||||
from sql_pgvector.prompt_templates import final_template, postgresql_template
|
||||
|
||||
"""
|
||||
IMPORTANT: For using this template, you will need to
|
||||
follow the setup steps in the readme file
|
||||
"""
|
||||
|
||||
if os.environ.get("OPENAI_API_KEY", None) is None:
|
||||
raise Exception("Missing `OPENAI_API_KEY` environment variable")
|
||||
|
||||
postgres_user = os.environ.get("POSTGRES_USER", "postgres")
|
||||
postgres_password = os.environ.get("POSTGRES_PASSWORD", "test")
|
||||
postgres_db = os.environ.get("POSTGRES_DB", "vectordb")
|
||||
postgres_host = os.environ.get("POSTGRES_HOST", "localhost")
|
||||
postgres_port = os.environ.get("POSTGRES_PORT", "5432")
|
||||
|
||||
# Connect to DB
|
||||
# Replace with your own
|
||||
CONNECTION_STRING = (
|
||||
f"postgresql+psycopg2://{postgres_user}:{postgres_password}"
|
||||
f"@{postgres_host}:{postgres_port}/{postgres_db}"
|
||||
)
|
||||
db = SQLDatabase.from_uri(CONNECTION_STRING)
|
||||
|
||||
# Choose LLM and embeddings model
|
||||
llm = ChatOpenAI(temperature=0)
|
||||
embeddings_model = OpenAIEmbeddings()
|
||||
|
||||
|
||||
# # Ingest code - you will need to run this the first time
|
||||
# # Insert your query e.g. "SELECT Name FROM Track"
|
||||
# column_to_embed = db.run('replace-with-your-own-select-query')
|
||||
# column_values = [s[0] for s in eval(column_to_embed)]
|
||||
# embeddings = embeddings_model.embed_documents(column_values)
|
||||
|
||||
# for i in range(len(embeddings)):
|
||||
# value = column_values[i].replace("'", "''")
|
||||
# embedding = embeddings[i]
|
||||
|
||||
# # Replace with your own SQL command for your column and table.
|
||||
# sql_command = (
|
||||
# f'UPDATE "Track" SET "embeddings" = ARRAY{embedding} WHERE "Name" ='
|
||||
# + f"'{value}'"
|
||||
# )
|
||||
# db.run(sql_command)
|
||||
|
||||
|
||||
# -----------------
|
||||
# Define functions
|
||||
# -----------------
|
||||
def get_schema(_):
|
||||
return db.get_table_info()
|
||||
|
||||
|
||||
def run_query(query):
|
||||
return db.run(query)
|
||||
|
||||
|
||||
def replace_brackets(match):
|
||||
words_inside_brackets = match.group(1).split(", ")
|
||||
embedded_words = [
|
||||
str(embeddings_model.embed_query(word)) for word in words_inside_brackets
|
||||
]
|
||||
return "', '".join(embedded_words)
|
||||
|
||||
|
||||
def get_query(query):
|
||||
sql_query = re.sub(r"\[([\w\s,]+)\]", replace_brackets, query)
|
||||
return sql_query
|
||||
|
||||
|
||||
# -----------------------
|
||||
# Now we create the chain
|
||||
# -----------------------
|
||||
|
||||
query_generation_prompt = ChatPromptTemplate.from_messages(
|
||||
[("system", postgresql_template), ("human", "{question}")]
|
||||
)
|
||||
|
||||
sql_query_chain = (
|
||||
RunnablePassthrough.assign(schema=get_schema)
|
||||
| query_generation_prompt
|
||||
| llm.bind(stop=["\nSQLResult:"])
|
||||
| StrOutputParser()
|
||||
)
|
||||
|
||||
|
||||
final_prompt = ChatPromptTemplate.from_messages(
|
||||
[("system", final_template), ("human", "{question}")]
|
||||
)
|
||||
|
||||
full_chain = (
|
||||
RunnablePassthrough.assign(query=sql_query_chain)
|
||||
| RunnablePassthrough.assign(
|
||||
schema=get_schema,
|
||||
response=RunnableLambda(lambda x: db.run(get_query(x["query"]))),
|
||||
)
|
||||
| final_prompt
|
||||
| llm
|
||||
)
|
||||
|
||||
|
||||
class InputType(BaseModel):
|
||||
question: str
|
||||
|
||||
|
||||
chain = full_chain.with_types(input_type=InputType)
|
50
templates/sql-pgvector/sql_pgvector/prompt_templates.py
Normal file
50
templates/sql-pgvector/sql_pgvector/prompt_templates.py
Normal file
@ -0,0 +1,50 @@
|
||||
postgresql_template = (
|
||||
"You are a Postgres expert. Given an input question, first create a "
|
||||
"syntactically correct Postgres query to run, then look at the results "
|
||||
"of the query and return the answer to the input question.\n"
|
||||
"Unless the user specifies in the question a specific number of "
|
||||
"examples to obtain, query for at most 5 results using the LIMIT clause "
|
||||
"as per Postgres. You can order the results to return the most "
|
||||
"informative data in the database.\n"
|
||||
"Never query for all columns from a table. You must query only the "
|
||||
"columns that are needed to answer the question. Wrap each column name "
|
||||
'in double quotes (") to denote them as delimited identifiers.\n'
|
||||
"Pay attention to use only the column names you can see in the tables "
|
||||
"below. Be careful to not query for columns that do not exist. Also, "
|
||||
"pay attention to which column is in which table.\n"
|
||||
"Pay attention to use date('now') function to get the current date, "
|
||||
'if the question involves "today".\n\n'
|
||||
"You can use an extra extension which allows you to run semantic "
|
||||
"similarity using <-> operator on tables containing columns named "
|
||||
'"embeddings".\n'
|
||||
"<-> operator can ONLY be used on embeddings vector columns.\n"
|
||||
"The embeddings value for a given row typically represents the semantic "
|
||||
"meaning of that row.\n"
|
||||
"The vector represents an embedding representation of the question, "
|
||||
"given below. \n"
|
||||
"Do NOT fill in the vector values directly, but rather specify a "
|
||||
"`[search_word]` placeholder, which should contain the word that would "
|
||||
"be embedded for filtering.\n"
|
||||
"For example, if the user asks for songs about 'the feeling of "
|
||||
"loneliness' the query could be:\n"
|
||||
'\'SELECT "[whatever_table_name]"."SongName" FROM '
|
||||
'"[whatever_table_name]" ORDER BY "embeddings" <-> \'[loneliness]\' '
|
||||
"LIMIT 5'\n\n"
|
||||
"Use the following format:\n\n"
|
||||
"Question: <Question here>\n"
|
||||
"SQLQuery: <SQL Query to run>\n"
|
||||
"SQLResult: <Result of the SQLQuery>\n"
|
||||
"Answer: <Final answer here>\n\n"
|
||||
"Only use the following tables:\n\n"
|
||||
"{schema}\n"
|
||||
)
|
||||
|
||||
|
||||
final_template = (
|
||||
"Based on the table schema below, question, sql query, and sql response, "
|
||||
"write a natural language response:\n"
|
||||
"{schema}\n\n"
|
||||
"Question: {question}\n"
|
||||
"SQL Query: {query}\n"
|
||||
"SQL Response: {response}"
|
||||
)
|
0
templates/sql-pgvector/tests/__init__.py
Normal file
0
templates/sql-pgvector/tests/__init__.py
Normal file
Loading…
Reference in New Issue
Block a user