diff --git a/docs/extras/integrations/vectorstores/timescalevector.ipynb b/docs/extras/integrations/vectorstores/timescalevector.ipynb new file mode 100644 index 00000000000..02f318dd679 --- /dev/null +++ b/docs/extras/integrations/vectorstores/timescalevector.ipynb @@ -0,0 +1,1696 @@ +{ + "cells": [ + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Timescale Vector (Postgres)\n", + "\n", + "This notebook shows how to use the Postgres vector database `Timescale Vector`. You'll learn how to use TimescaleVector for (1) semantic search, (2) time-based vector search, (3) self-querying, and (4) how to create indexes to speed up queries.\n", + "\n", + "## What is Timescale Vector?\n", + "**[Timescale Vector](https://www.timescale.com/ai) is PostgreSQL++ for AI applications.**\n", + "\n", + "Timescale Vector enables you to efficiently store and query millions of vector embeddings in `PostgreSQL`.\n", + "- Enhances `pgvector` with faster and more accurate similarity search on 100M+ vectors via `DiskANN` inspired indexing algorithm.\n", + "- Enables fast time-based vector search via automatic time-based partitioning and indexing.\n", + "- Provides a familiar SQL interface for querying vector embeddings and relational data.\n", + "\n", + "Timescale Vector is cloud PostgreSQL for AI that scales with you from POC to production:\n", + "- Simplifies operations by enabling you to store relational metadata, vector embeddings, and time-series data in a single database.\n", + "- Benefits from rock-solid PostgreSQL foundation with enterprise-grade feature liked streaming backups and replication, high-availability and row-level security.\n", + "- Enables a worry-free experience with enterprise-grade security and compliance.\n", + "\n", + "## How to access Timescale Vector\n", + "Timescale Vector is available on [Timescale](https://www.timescale.com/ai), the cloud PostgreSQL platform. (There is no self-hosted version at this time.)\n", + "\n", + "LangChain users get a 90-day free trial for Timescale Vector.\n", + "- To get started, [signup](https://console.cloud.timescale.com/signup?utm_campaign=vectorlaunch&utm_source=langchain&utm_medium=referral) to Timescale, create a new database and follow this notebook!\n", + "- See the [Timescale Vector explainer blog](https://www.timescale.com/blog/how-we-made-postgresql-the-best-vector-database/?utm_campaign=vectorlaunch&utm_source=langchain&utm_medium=referral) for more details and performance benchmarks.\n", + "- See the [installation instructions](https://github.com/timescale/python-vector) for more details on using Timescale Vector in python." + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Setup\n", + "\n", + "Follow these steps to get ready to follow this tutorial." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# Pip install necessary packages\n", + "!pip install timescale-vector\n", + "!pip install openai\n", + "!pip install tiktoken" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In this example, we'll use `OpenAIEmbeddings`, so let's load your OpenAI API key." + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "# Run export OPENAI_API_KEY=sk-YOUR_OPENAI_API_KEY...\n", + "# Get openAI api key by reading local .env file\n", + "from dotenv import load_dotenv, find_dotenv\n", + "_ = load_dotenv(find_dotenv())\n", + "OPENAI_API_KEY = os.environ['OPENAI_API_KEY']" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Get the API key and save it as an environment variable\n", + "#import os\n", + "#import getpass\n", + "#os.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"OpenAI API Key:\")" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "from typing import List, Tuple" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Next we'll import the needed Python libraries and libraries from LangChain. Note that we import the `timescale-vector` library as well as the TimescaleVector LangChain vectorstore." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "import timescale_vector\n", + "from datetime import datetime, timedelta\n", + "from langchain.embeddings.openai import OpenAIEmbeddings\n", + "from langchain.text_splitter import CharacterTextSplitter\n", + "from langchain.document_loaders import TextLoader\n", + "from langchain.document_loaders.json_loader import JSONLoader\n", + "from langchain.docstore.document import Document\n", + "from langchain.vectorstores.timescalevector import TimescaleVector" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 1. Similarity Search with Euclidean Distance (Default)\n", + "\n", + "First, we'll look at an example of doing a similarity search query on the State of the Union speech to find the most similar sentences to a given query sentence. We'll use the [Euclidean distance](https://en.wikipedia.org/wiki/Euclidean_distance) as our similarity metric." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "# Load the text and split it into chunks\n", + "loader = TextLoader(\"../../../extras/modules/state_of_the_union.txt\")\n", + "documents = loader.load()\n", + "text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n", + "docs = text_splitter.split_documents(documents)\n", + "\n", + "embeddings = OpenAIEmbeddings()" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Next, we'll load the service URL for our Timescale database. \n", + "\n", + "If you haven't already, [signup for Timescale](https://console.cloud.timescale.com/signup?utm_campaign=vectorlaunch&utm_source=langchain&utm_medium=referral), and create a new database.\n", + "\n", + "Then, to connect to your PostgreSQL database, you'll need your service URI, which can be found in the cheatsheet or `.env` file you downloaded after creating a new database. \n", + "\n", + "The URI will look something like this: `postgres://tsdbadmin:@.tsdb.cloud.timescale.com:/tsdb?sslmode=require`. " + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "# Timescale Vector needs the service url to your cloud database. You can see this as soon as you create the \n", + "# service in the cloud UI or in your credentials.sql file\n", + "SERVICE_URL = os.environ['TIMESCALE_SERVICE_URL']\n", + "\n", + "# Specify directly if testing\n", + "#SERVICE_URL = \"postgres://tsdbadmin:@.tsdb.cloud.timescale.com:/tsdb?sslmode=require\"\n", + "\n", + "# # You can get also it from an enviornment variables. We suggest using a .env file.\n", + "# import os\n", + "# SERVICE_URL = os.environ.get(\"TIMESCALE_SERVICE_URL\", \"\")" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Next we create a TimescaleVector vectorstore. We specify a collection name, which will be the name of the table our data is stored in. \n", + "\n", + "Note: When creating a new instance of TimescaleVector, the TimescaleVector Module will try to create a table with the name of the collection. So, make sure that the collection name is unique (i.e it doesn't already exist)." + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "# The TimescaleVector Module will create a table with the name of the collection.\n", + "COLLECTION_NAME = \"state_of_the_union_test\"\n", + "\n", + "# Create a Timescale Vector instance from the collection of documents\n", + "db = TimescaleVector.from_documents(\n", + " embedding=embeddings,\n", + " documents=docs,\n", + " collection_name=COLLECTION_NAME,\n", + " service_url=SERVICE_URL,\n", + ")" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now that we've loaded our data, we can perform a similarity search." + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [], + "source": [ + "query = \"What did the president say about Ketanji Brown Jackson\"\n", + "docs_with_score = db.similarity_search_with_score(query)" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "--------------------------------------------------------------------------------\n", + "Score: 0.18443380687035138\n", + "Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \n", + "\n", + "Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n", + "\n", + "One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n", + "\n", + "And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.\n", + "--------------------------------------------------------------------------------\n", + "--------------------------------------------------------------------------------\n", + "Score: 0.18452197313308139\n", + "Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \n", + "\n", + "Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n", + "\n", + "One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n", + "\n", + "And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.\n", + "--------------------------------------------------------------------------------\n", + "--------------------------------------------------------------------------------\n", + "Score: 0.21720781018594182\n", + "A former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus builder. Since she’s been nominated, she’s received a broad range of support—from the Fraternal Order of Police to former judges appointed by Democrats and Republicans. \n", + "\n", + "And if we are to advance liberty and justice, we need to secure the Border and fix the immigration system. \n", + "\n", + "We can do both. At our border, we’ve installed new technology like cutting-edge scanners to better detect drug smuggling. \n", + "\n", + "We’ve set up joint patrols with Mexico and Guatemala to catch more human traffickers. \n", + "\n", + "We’re putting in place dedicated immigration judges so families fleeing persecution and violence can have their cases heard faster. \n", + "\n", + "We’re securing commitments and supporting partners in South and Central America to host more refugees and secure their own borders.\n", + "--------------------------------------------------------------------------------\n", + "--------------------------------------------------------------------------------\n", + "Score: 0.21724902288621384\n", + "A former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus builder. Since she’s been nominated, she’s received a broad range of support—from the Fraternal Order of Police to former judges appointed by Democrats and Republicans. \n", + "\n", + "And if we are to advance liberty and justice, we need to secure the Border and fix the immigration system. \n", + "\n", + "We can do both. At our border, we’ve installed new technology like cutting-edge scanners to better detect drug smuggling. \n", + "\n", + "We’ve set up joint patrols with Mexico and Guatemala to catch more human traffickers. \n", + "\n", + "We’re putting in place dedicated immigration judges so families fleeing persecution and violence can have their cases heard faster. \n", + "\n", + "We’re securing commitments and supporting partners in South and Central America to host more refugees and secure their own borders.\n", + "--------------------------------------------------------------------------------\n" + ] + } + ], + "source": [ + "for doc, score in docs_with_score:\n", + " print(\"-\" * 80)\n", + " print(\"Score: \", score)\n", + " print(doc.page_content)\n", + " print(\"-\" * 80)" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Using a Timescale Vector as a Retriever\n", + "After initializing a TimescaleVector store, you can use it as a [retriever](https://python.langchain.com/docs/modules/data_connection/retrievers/)." + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [], + "source": [ + "# Use TimescaleVector as a retriever\n", + "retriever = db.as_retriever()" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "tags=['TimescaleVector', 'OpenAIEmbeddings'] metadata=None vectorstore= search_type='similarity' search_kwargs={}\n" + ] + } + ], + "source": [ + "print(retriever)" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let's look at an example of using Timescale Vector as a retriever with the [RetrievalQA chain](https://python.langchain.com/docs/use_cases/question_answering/how_to/vector_db_qa) and the [stuff chain](https://python.langchain.com/docs/modules/chains/document/stuff).\n", + "\n", + "In this example, we'll ask the same query as above, but this time we'll pass the relevant documents returned from Timescale Vector to an LLM to use as context to answer our question.\n", + "\n", + "First we'll create our stuff chain:" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [], + "source": [ + "# Initialize GPT3.5 model\n", + "from langchain.chat_models import ChatOpenAI\n", + "llm = ChatOpenAI(temperature = 0.1, model = 'gpt-3.5-turbo-16k')\n", + "\n", + "# Initialize a RetrievalQA class from a stuff chain\n", + "from langchain.chains import RetrievalQA\n", + "qa_stuff = RetrievalQA.from_chain_type(\n", + " llm=llm, \n", + " chain_type=\"stuff\", \n", + " retriever=retriever,\n", + " verbose=True,\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "\n", + "\u001b[1m> Entering new RetrievalQA chain...\u001b[0m\n", + "\n", + "\u001b[1m> Finished chain.\u001b[0m\n" + ] + } + ], + "source": [ + "query = \"What did the president say about Ketanji Brown Jackson?\"\n", + "response = qa_stuff.run(query)" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "The President said that he nominated Circuit Court of Appeals Judge Ketanji Brown Jackson, who is one of our nation's top legal minds and will continue Justice Breyer's legacy of excellence. He also mentioned that since her nomination, she has received a broad range of support from various groups, including the Fraternal Order of Police and former judges appointed by Democrats and Republicans.\n" + ] + } + ], + "source": [ + "print(response)" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 2. Similarity Search with time-based filtering\n", + "\n", + "A key use case for Timescale Vector is efficient time-based vector search. Timescale Vector enables this by automatically partitioning vectors (and associated metadata) by time. This allows you to efficiently query vectors by both similarity to a query vector and time.\n", + "\n", + "Time-based vector search functionality is helpful for applications like:\n", + "- Storing and retrieving LLM response history (e.g. chatbots)\n", + "- Finding the most recent embeddings that are similar to a query vector (e.g recent news).\n", + "- Constraining similarity search to a relevant time range (e.g asking time-based questions about a knowledge base)\n", + "\n", + "To illustrate how to use TimescaleVector's time-based vector search functionality, we'll ask questions about the git log history for TimescaleDB . We'll illustrate how to add documents with a time-based uuid and how run similarity searches with time range filters." + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Extract content and metadata from git log JSON\n", + "First lets load in the git log data into a new collection in our PostgreSQL database named `timescale_commits`." + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [], + "source": [ + "import json" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We'll define a helper funciton to create a uuid for a document and associated vector embedding based on its timestamp. We'll use this function to create a uuid for each git log entry.\n", + "\n", + "Important note: If you are working with documents and want the current date and time associated with vector for time-based search, you can skip this step. A uuid will be automatically generated when the documents are ingested by default." + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [], + "source": [ + "from timescale_vector import client\n", + "# Function to take in a date string in the past and return a uuid v1\n", + "def create_uuid(date_string: str):\n", + " if date_string is None:\n", + " return None\n", + " time_format = '%a %b %d %H:%M:%S %Y %z'\n", + " datetime_obj = datetime.strptime(date_string, time_format)\n", + " uuid = client.uuid_from_time(datetime_obj)\n", + " return str(uuid)" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Next, we'll define a metadata function to extract the relevant metadata from the JSON record. We'll pass this function to the JSONLoader. See the [JSON document loader docs](https://python.langchain.com/docs/modules/data_connection/document_loaders/json) for more details." + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [], + "source": [ + "# Helper function to split name and email given an author string consisting of Name Lastname \n", + "def split_name(input_string: str) -> Tuple[str, str]:\n", + " if input_string is None:\n", + " return None, None\n", + " start = input_string.find(\"<\")\n", + " end = input_string.find(\">\")\n", + " name = input_string[:start].strip()\n", + " email = input_string[start+1:end].strip()\n", + " return name, email\n", + "\n", + "# Helper function to transform a date string into a timestamp_tz string\n", + "def create_date(input_string: str) -> datetime:\n", + " if input_string is None:\n", + " return None\n", + " # Define a dictionary to map month abbreviations to their numerical equivalents\n", + " month_dict = {\n", + " \"Jan\": \"01\",\n", + " \"Feb\": \"02\",\n", + " \"Mar\": \"03\",\n", + " \"Apr\": \"04\",\n", + " \"May\": \"05\",\n", + " \"Jun\": \"06\",\n", + " \"Jul\": \"07\",\n", + " \"Aug\": \"08\",\n", + " \"Sep\": \"09\",\n", + " \"Oct\": \"10\",\n", + " \"Nov\": \"11\",\n", + " \"Dec\": \"12\",\n", + " }\n", + "\n", + " # Split the input string into its components\n", + " components = input_string.split()\n", + " # Extract relevant information\n", + " day = components[2]\n", + " month = month_dict[components[1]]\n", + " year = components[4]\n", + " time = components[3]\n", + " timezone_offset_minutes = int(components[5]) # Convert the offset to minutes\n", + " timezone_hours = timezone_offset_minutes // 60 # Calculate the hours\n", + " timezone_minutes = timezone_offset_minutes % 60 # Calculate the remaining minutes\n", + " # Create a formatted string for the timestamptz in PostgreSQL format\n", + " timestamp_tz_str = f\"{year}-{month}-{day} {time}+{timezone_hours:02}{timezone_minutes:02}\"\n", + " return timestamp_tz_str\n", + "\n", + "# Metadata extraction function to extract metadata from a JSON record\n", + "def extract_metadata(record: dict, metadata: dict) -> dict:\n", + " record_name, record_email = split_name(record[\"author\"])\n", + " metadata[\"id\"] = create_uuid(record[\"date\"])\n", + " metadata[\"date\"] = create_date(record[\"date\"])\n", + " metadata[\"author_name\"] = record_name\n", + " metadata[\"author_email\"] = record_email\n", + " metadata[\"commit_hash\"] = record[\"commit\"]\n", + " return metadata" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Next, you'll need to [download the sample dataset](https://s3.amazonaws.com/assets.timescale.com/ai/ts_git_log.json) and place it in the same directory as this notebook.\n", + "\n", + "You can use following command:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "vscode": { + "languageId": "shellscript" + } + }, + "outputs": [], + "source": [ + "# Download the file using curl and save it as commit_history.csv\n", + "# Note: Execute this command in your terminal, in the same directory as the notebook\n", + "curl -O https://s3.amazonaws.com/assets.timescale.com/ai/ts_git_log.json" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Finally we can initialize the JSON loader to parse the JSON records. We also remove empty records for simplicity." + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [], + "source": [ + "# Define path to the JSON file relative to this notebook\n", + "# Change this to the path to your JSON file\n", + "FILE_PATH = \"../../../../../ts_git_log.json\"\n", + "\n", + "# Load data from JSON file and extract metadata\n", + "loader = JSONLoader(\n", + " file_path=FILE_PATH,\n", + " jq_schema='.commit_history[]',\n", + " text_content=False,\n", + " metadata_func=extract_metadata\n", + ")\n", + "documents = loader.load()\n", + "\n", + "# Remove documents with None dates\n", + "documents = [doc for doc in documents if doc.metadata[\"date\"] is not None]" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "page_content='{\"commit\": \"44e41c12ab25e36c202f58e068ced262eadc8d16\", \"author\": \"Lakshmi Narayanan Sreethar\", \"date\": \"Tue Sep 5 21:03:21 2023 +0530\", \"change summary\": \"Fix segfault in set_integer_now_func\", \"change details\": \"When an invalid function oid is passed to set_integer_now_func, it finds out that the function oid is invalid but before throwing the error, it calls ReleaseSysCache on an invalid tuple causing a segfault. Fixed that by removing the invalid call to ReleaseSysCache. Fixes #6037 \"}' metadata={'source': '/Users/avtharsewrathan/sideprojects2023/timescaleai/tsv-langchain/ts_git_log.json', 'seq_num': 1, 'id': '8b407680-4c01-11ee-96a6-b82284ddccc6', 'date': '2023-09-5 21:03:21+0850', 'author_name': 'Lakshmi Narayanan Sreethar', 'author_email': 'lakshmi@timescale.com', 'commit_hash': '44e41c12ab25e36c202f58e068ced262eadc8d16'}\n" + ] + } + ], + "source": [ + "print(documents[0])" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Load documents and metadata into TimescaleVector vectorstore\n", + "Now that we have prepared our documents, let's process them and load them, along with their vector embedding representations into our TimescaleVector vectorstore.\n", + "\n", + "Since this is a demo, we will only load the first 500 records. In practice, you can load as many records as you want." + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": {}, + "outputs": [], + "source": [ + "NUM_RECORDS = 500\n", + "documents = documents[:NUM_RECORDS]" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Then we use the CharacterTextSplitter to split the documents into smaller chunks if needed for easier embedding. Note that this splitting process retains the metadata for each document." + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": {}, + "outputs": [], + "source": [ + "# Split the documents into chunks for embedding\n", + "text_splitter = CharacterTextSplitter(\n", + " chunk_size=1000,\n", + " chunk_overlap=200,)\n", + "docs = text_splitter.split_documents(documents)" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Next we'll create a Timescale Vector instance from the collection of documents that we finished pre-processsing.\n", + "\n", + "First, we'll define a collection name, which will be the name of our table in the PostgreSQL database. \n", + "\n", + "We'll also define a time delta, which we pass to the `time_partition_interval` argument, which will be used to as the interval for partitioning the data by time. Each partition will consist of data for the specified length of time. We'll use 7 days for simplicity, but you can pick whatever value make sense for your use case -- for example if you query recent vectors frequently you might want to use a smaller time delta like 1 day, or if you query vectors over a decade long time period then you might want to use a larger time delta like 6 months or 1 year.\n", + "\n", + "Finally, we'll create the TimescaleVector instance. We specify the `ids` argument to be the `uuid` field in our metadata that we created in the pre-processing step above. We do this because we want the time part of our uuids to reflect dates in the past (i.e when the commit was made). However, if we wanted the current date and time to be associated with our document, we can remove the id argument and uuid's will be automatically created with the current date and time." + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": {}, + "outputs": [], + "source": [ + "# Define collection name\n", + "COLLECTION_NAME = \"timescale_commits\"\n", + "embeddings = OpenAIEmbeddings()\n", + "\n", + "# Create a Timescale Vector instance from the collection of documents\n", + "db = TimescaleVector.from_documents(\n", + " embedding=embeddings,\n", + " ids = [doc.metadata[\"id\"] for doc in docs],\n", + " documents=docs,\n", + " collection_name=COLLECTION_NAME,\n", + " service_url=SERVICE_URL,\n", + " time_partition_interval=timedelta(days = 7),)" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Querying vectors by time and similarity\n", + "\n", + "Now that we have loaded our documents into TimescaleVector, we can query them by time and similarity.\n", + "\n", + "TimescaleVector provides multiple methods for querying vectors by doing similarity search with time-based filtering.\n", + "\n", + "Let's take a look at each method below:" + ] + }, + { + "cell_type": "code", + "execution_count": 31, + "metadata": {}, + "outputs": [], + "source": [ + "# Time filter variables\n", + "start_dt = datetime(2023, 8, 1, 22, 10, 35) # Start date = 1 August 2023, 22:10:35\n", + "end_dt = datetime(2023, 8, 30, 22, 10, 35) # End date = 30 August 2023, 22:10:35\n", + "td = timedelta(days=7) # Time delta = 7 days\n", + "\n", + "query = \"What's new with TimescaleDB functions?\"" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Method 1: Filter within a provided start date and end date.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 32, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "--------------------------------------------------------------------------------\n", + "Score: 0.17488396167755127\n", + "Date: 2023-08-29 18:13:24+0320\n", + "{\"commit\": \" e4facda540286b0affba47ccc63959fefe2a7b26\", \"author\": \"Sven Klemm\", \"date\": \"Tue Aug 29 18:13:24 2023 +0200\", \"change summary\": \"Add compatibility layer for _timescaledb_internal functions\", \"change details\": \"With timescaledb 2.12 all the functions present in _timescaledb_internal were moved into the _timescaledb_functions schema to improve schema security. This patch adds a compatibility layer so external callers of these internal functions will not break and allow for more flexibility when migrating. \"}\n", + "--------------------------------------------------------------------------------\n", + "--------------------------------------------------------------------------------\n", + "Score: 0.18102192878723145\n", + "Date: 2023-08-20 22:47:10+0320\n", + "{\"commit\": \" 0a66bdb8d36a1879246bd652e4c28500c4b951ab\", \"author\": \"Sven Klemm\", \"date\": \"Sun Aug 20 22:47:10 2023 +0200\", \"change summary\": \"Move functions to _timescaledb_functions schema\", \"change details\": \"To increase schema security we do not want to mix our own internal objects with user objects. Since chunks are created in the _timescaledb_internal schema our internal functions should live in a different dedicated schema. This patch make the necessary adjustments for the following functions: - to_unix_microseconds(timestamptz) - to_timestamp(bigint) - to_timestamp_without_timezone(bigint) - to_date(bigint) - to_interval(bigint) - interval_to_usec(interval) - time_to_internal(anyelement) - subtract_integer_from_now(regclass, bigint) \"}\n", + "--------------------------------------------------------------------------------\n", + "--------------------------------------------------------------------------------\n", + "Score: 0.18150119891755445\n", + "Date: 2023-08-22 12:01:19+0320\n", + "{\"commit\": \" cf04496e4b4237440274eb25e4e02472fc4e06fc\", \"author\": \"Sven Klemm\", \"date\": \"Tue Aug 22 12:01:19 2023 +0200\", \"change summary\": \"Move utility functions to _timescaledb_functions schema\", \"change details\": \"To increase schema security we do not want to mix our own internal objects with user objects. Since chunks are created in the _timescaledb_internal schema our internal functions should live in a different dedicated schema. This patch make the necessary adjustments for the following functions: - generate_uuid() - get_git_commit() - get_os_info() - tsl_loaded() \"}\n", + "--------------------------------------------------------------------------------\n", + "--------------------------------------------------------------------------------\n", + "Score: 0.18422493887617963\n", + "Date: 2023-08-9 15:26:03+0500\n", + "{\"commit\": \" 44eab9cf9bef34274c88efd37a750eaa74cd8044\", \"author\": \"Konstantina Skovola\", \"date\": \"Wed Aug 9 15:26:03 2023 +0300\", \"change summary\": \"Release 2.11.2\", \"change details\": \"This release contains bug fixes since the 2.11.1 release. We recommend that you upgrade at the next available opportunity. **Features** * #5923 Feature flags for TimescaleDB features **Bugfixes** * #5680 Fix DISTINCT query with JOIN on multiple segmentby columns * #5774 Fixed two bugs in decompression sorted merge code * #5786 Ensure pg_config --cppflags are passed * #5906 Fix quoting owners in sql scripts. * #5912 Fix crash in 1-step integer policy creation **Thanks** * @mrksngl for submitting a PR to fix extension upgrade scripts * @ericdevries for reporting an issue with DISTINCT queries using segmentby columns of compressed hypertable \"}\n", + "--------------------------------------------------------------------------------\n" + ] + } + ], + "source": [ + "# Method 1: Query for vectors between start_date and end_date\n", + "docs_with_score = db.similarity_search_with_score(query, start_date=start_dt, end_date=end_dt)\n", + "\n", + "for doc, score in docs_with_score:\n", + " print(\"-\" * 80)\n", + " print(\"Score: \", score)\n", + " print(\"Date: \", doc.metadata[\"date\"])\n", + " print(doc.page_content)\n", + " print(\"-\" * 80)" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Note how the query only returns results within the specified date range." + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Method 2: Filter within a provided start date, and a time delta later." + ] + }, + { + "cell_type": "code", + "execution_count": 33, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "--------------------------------------------------------------------------------\n", + "Score: 0.18458807468414307\n", + "Date: 2023-08-3 14:30:23+0500\n", + "{\"commit\": \" 7aeed663b9c0f337b530fd6cad47704a51a9b2ec\", \"author\": \"Dmitry Simonenko\", \"date\": \"Thu Aug 3 14:30:23 2023 +0300\", \"change summary\": \"Feature flags for TimescaleDB features\", \"change details\": \"This PR adds several GUCs which allow to enable/disable major timescaledb features: - enable_hypertable_create - enable_hypertable_compression - enable_cagg_create - enable_policy_create \"}\n", + "--------------------------------------------------------------------------------\n", + "--------------------------------------------------------------------------------\n", + "Score: 0.20492422580718994\n", + "Date: 2023-08-7 18:31:40+0320\n", + "{\"commit\": \" 07762ea4cedefc88497f0d1f8712d1515cdc5b6e\", \"author\": \"Sven Klemm\", \"date\": \"Mon Aug 7 18:31:40 2023 +0200\", \"change summary\": \"Test timescaledb debian 12 packages in CI\", \"change details\": \"\"}\n", + "--------------------------------------------------------------------------------\n", + "--------------------------------------------------------------------------------\n", + "Score: 0.21106326580047607\n", + "Date: 2023-08-3 14:36:39+0500\n", + "{\"commit\": \" 2863daf3df83c63ee36c0cf7b66c522da5b4e127\", \"author\": \"Dmitry Simonenko\", \"date\": \"Thu Aug 3 14:36:39 2023 +0300\", \"change summary\": \"Support CREATE INDEX ONLY ON main table\", \"change details\": \"This PR adds support for CREATE INDEX ONLY ON clause which allows to create index only on the main table excluding chunks. Fix #5908 \"}\n", + "--------------------------------------------------------------------------------\n", + "--------------------------------------------------------------------------------\n", + "Score: 0.21698051691055298\n", + "Date: 2023-08-2 20:24:14+0140\n", + "{\"commit\": \" 3af0d282ea71d9a8f27159a6171e9516e62ec9cb\", \"author\": \"Lakshmi Narayanan Sreethar\", \"date\": \"Wed Aug 2 20:24:14 2023 +0100\", \"change summary\": \"PG16: ExecInsertIndexTuples requires additional parameter\", \"change details\": \"PG16 adds a new boolean parameter to the ExecInsertIndexTuples function to denote if the index is a BRIN index, which is then used to determine if the index update can be skipped. The fix also removes the INDEX_ATTR_BITMAP_ALL enum value. Adapt these changes by updating the compat function to accomodate the new parameter added to the ExecInsertIndexTuples function and using an alternative for the removed INDEX_ATTR_BITMAP_ALL enum value. postgres/postgres@19d8e23 \"}\n", + "--------------------------------------------------------------------------------\n" + ] + } + ], + "source": [ + "# Method 2: Query for vectors between start_dt and a time delta td later\n", + "# Most relevant vectors between 1 August and 7 days later\n", + "docs_with_score = db.similarity_search_with_score(query, start_date=start_dt, time_delta=td)\n", + "\n", + "for doc, score in docs_with_score:\n", + " print(\"-\" * 80)\n", + " print(\"Score: \", score)\n", + " print(\"Date: \", doc.metadata[\"date\"])\n", + " print(doc.page_content)\n", + " print(\"-\" * 80)\n" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Once again, notice how we get results within the specified time filter, different from the previous query." + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Method 3: Filter within a provided end date and a time delta earlier." + ] + }, + { + "cell_type": "code", + "execution_count": 34, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "--------------------------------------------------------------------------------\n", + "Score: 0.17488396167755127\n", + "Date: 2023-08-29 18:13:24+0320\n", + "{\"commit\": \" e4facda540286b0affba47ccc63959fefe2a7b26\", \"author\": \"Sven Klemm\", \"date\": \"Tue Aug 29 18:13:24 2023 +0200\", \"change summary\": \"Add compatibility layer for _timescaledb_internal functions\", \"change details\": \"With timescaledb 2.12 all the functions present in _timescaledb_internal were moved into the _timescaledb_functions schema to improve schema security. This patch adds a compatibility layer so external callers of these internal functions will not break and allow for more flexibility when migrating. \"}\n", + "--------------------------------------------------------------------------------\n", + "--------------------------------------------------------------------------------\n", + "Score: 0.18496227264404297\n", + "Date: 2023-08-29 10:49:47+0320\n", + "{\"commit\": \" a9751ccd5eb030026d7b975d22753f5964972389\", \"author\": \"Sven Klemm\", \"date\": \"Tue Aug 29 10:49:47 2023 +0200\", \"change summary\": \"Move partitioning functions to _timescaledb_functions schema\", \"change details\": \"To increase schema security we do not want to mix our own internal objects with user objects. Since chunks are created in the _timescaledb_internal schema our internal functions should live in a different dedicated schema. This patch make the necessary adjustments for the following functions: - get_partition_for_key(val anyelement) - get_partition_hash(val anyelement) \"}\n", + "--------------------------------------------------------------------------------\n", + "--------------------------------------------------------------------------------\n", + "Score: 0.1871250867843628\n", + "Date: 2023-08-28 23:26:23+0320\n", + "{\"commit\": \" b2a91494a11d8b82849b6f11f9ea6dc26ef8a8cb\", \"author\": \"Sven Klemm\", \"date\": \"Mon Aug 28 23:26:23 2023 +0200\", \"change summary\": \"Move ddl_internal functions to _timescaledb_functions schema\", \"change details\": \"To increase schema security we do not want to mix our own internal objects with user objects. Since chunks are created in the _timescaledb_internal schema our internal functions should live in a different dedicated schema. This patch make the necessary adjustments for the following functions: - chunk_constraint_add_table_constraint(_timescaledb_catalog.chunk_constraint) - chunk_drop_replica(regclass,name) - chunk_index_clone(oid) - chunk_index_replace(oid,oid) - create_chunk_replica_table(regclass,name) - drop_stale_chunks(name,integer[]) - health() - hypertable_constraint_add_table_fk_constraint(name,name,name,integer) - process_ddl_event() - wait_subscription_sync(name,name,integer,numeric) \"}\n", + "--------------------------------------------------------------------------------\n", + "--------------------------------------------------------------------------------\n", + "Score: 0.18867712088363497\n", + "Date: 2023-08-27 13:20:04+0320\n", + "{\"commit\": \" e02b1f348eb4c48def00b7d5227238b4d9d41a4a\", \"author\": \"Sven Klemm\", \"date\": \"Sun Aug 27 13:20:04 2023 +0200\", \"change summary\": \"Simplify schema move update script\", \"change details\": \"Use dynamic sql to create the ALTER FUNCTION statements for those functions that may not exist in previous versions. \"}\n", + "--------------------------------------------------------------------------------\n" + ] + } + ], + "source": [ + "# Method 3: Query for vectors between end_dt and a time delta td earlier\n", + "# Most relevant vectors between 30 August and 7 days earlier\n", + "docs_with_score = db.similarity_search_with_score(query, end_date=end_dt, time_delta=td)\n", + "\n", + "for doc, score in docs_with_score:\n", + " print(\"-\" * 80)\n", + " print(\"Score: \", score)\n", + " print(\"Date: \", doc.metadata[\"date\"])\n", + " print(doc.page_content)\n", + " print(\"-\" * 80)" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Method 4: We can also filter for all vectors after a given date by only specifying a start date in our query.\n", + "\n", + "Method 5: Similarly, we can filter for or all vectors before a given date by only specify an end date in our query." + ] + }, + { + "cell_type": "code", + "execution_count": 35, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "--------------------------------------------------------------------------------\n", + "Score: 0.17488396167755127\n", + "Date: 2023-08-29 18:13:24+0320\n", + "{\"commit\": \" e4facda540286b0affba47ccc63959fefe2a7b26\", \"author\": \"Sven Klemm\", \"date\": \"Tue Aug 29 18:13:24 2023 +0200\", \"change summary\": \"Add compatibility layer for _timescaledb_internal functions\", \"change details\": \"With timescaledb 2.12 all the functions present in _timescaledb_internal were moved into the _timescaledb_functions schema to improve schema security. This patch adds a compatibility layer so external callers of these internal functions will not break and allow for more flexibility when migrating. \"}\n", + "--------------------------------------------------------------------------------\n", + "--------------------------------------------------------------------------------\n", + "Score: 0.18102192878723145\n", + "Date: 2023-08-20 22:47:10+0320\n", + "{\"commit\": \" 0a66bdb8d36a1879246bd652e4c28500c4b951ab\", \"author\": \"Sven Klemm\", \"date\": \"Sun Aug 20 22:47:10 2023 +0200\", \"change summary\": \"Move functions to _timescaledb_functions schema\", \"change details\": \"To increase schema security we do not want to mix our own internal objects with user objects. Since chunks are created in the _timescaledb_internal schema our internal functions should live in a different dedicated schema. This patch make the necessary adjustments for the following functions: - to_unix_microseconds(timestamptz) - to_timestamp(bigint) - to_timestamp_without_timezone(bigint) - to_date(bigint) - to_interval(bigint) - interval_to_usec(interval) - time_to_internal(anyelement) - subtract_integer_from_now(regclass, bigint) \"}\n", + "--------------------------------------------------------------------------------\n", + "--------------------------------------------------------------------------------\n", + "Score: 0.18150119891755445\n", + "Date: 2023-08-22 12:01:19+0320\n", + "{\"commit\": \" cf04496e4b4237440274eb25e4e02472fc4e06fc\", \"author\": \"Sven Klemm\", \"date\": \"Tue Aug 22 12:01:19 2023 +0200\", \"change summary\": \"Move utility functions to _timescaledb_functions schema\", \"change details\": \"To increase schema security we do not want to mix our own internal objects with user objects. Since chunks are created in the _timescaledb_internal schema our internal functions should live in a different dedicated schema. This patch make the necessary adjustments for the following functions: - generate_uuid() - get_git_commit() - get_os_info() - tsl_loaded() \"}\n", + "--------------------------------------------------------------------------------\n", + "--------------------------------------------------------------------------------\n", + "Score: 0.18422493887617963\n", + "Date: 2023-08-9 15:26:03+0500\n", + "{\"commit\": \" 44eab9cf9bef34274c88efd37a750eaa74cd8044\", \"author\": \"Konstantina Skovola\", \"date\": \"Wed Aug 9 15:26:03 2023 +0300\", \"change summary\": \"Release 2.11.2\", \"change details\": \"This release contains bug fixes since the 2.11.1 release. We recommend that you upgrade at the next available opportunity. **Features** * #5923 Feature flags for TimescaleDB features **Bugfixes** * #5680 Fix DISTINCT query with JOIN on multiple segmentby columns * #5774 Fixed two bugs in decompression sorted merge code * #5786 Ensure pg_config --cppflags are passed * #5906 Fix quoting owners in sql scripts. * #5912 Fix crash in 1-step integer policy creation **Thanks** * @mrksngl for submitting a PR to fix extension upgrade scripts * @ericdevries for reporting an issue with DISTINCT queries using segmentby columns of compressed hypertable \"}\n", + "--------------------------------------------------------------------------------\n" + ] + } + ], + "source": [ + "# Method 4: Query all vectors after start_date\n", + "docs_with_score = db.similarity_search_with_score(query,start_date=start_dt)\n", + "\n", + "for doc, score in docs_with_score:\n", + " print(\"-\" * 80)\n", + " print(\"Score: \", score)\n", + " print(\"Date: \", doc.metadata[\"date\"])\n", + " print(doc.page_content)\n", + " print(\"-\" * 80)" + ] + }, + { + "cell_type": "code", + "execution_count": 36, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "--------------------------------------------------------------------------------\n", + "Score: 0.16723191738128662\n", + "Date: 2023-04-11 22:01:14+0320\n", + "{\"commit\": \" 0595ff0888f2ffb8d313acb0bda9642578a9ade3\", \"author\": \"Sven Klemm\", \"date\": \"Tue Apr 11 22:01:14 2023 +0200\", \"change summary\": \"Move type support functions into _timescaledb_functions schema\", \"change details\": \"\"}\n", + "--------------------------------------------------------------------------------\n", + "--------------------------------------------------------------------------------\n", + "Score: 0.1706540584564209\n", + "Date: 2023-04-6 13:00:00+0320\n", + "{\"commit\": \" 04f43335dea11e9c467ee558ad8edfc00c1a45ed\", \"author\": \"Sven Klemm\", \"date\": \"Thu Apr 6 13:00:00 2023 +0200\", \"change summary\": \"Move aggregate support function into _timescaledb_functions\", \"change details\": \"This patch moves the support functions for histogram, first and last into the _timescaledb_functions schema. Since we alter the schema of the existing functions in upgrade scripts and do not change the aggregates this should work completely transparently for any user objects using those aggregates. \"}\n", + "--------------------------------------------------------------------------------\n", + "--------------------------------------------------------------------------------\n", + "Score: 0.17462033033370972\n", + "Date: 2023-03-31 08:22:57+0320\n", + "{\"commit\": \" feef9206facc5c5f506661de4a81d96ef059b095\", \"author\": \"Sven Klemm\", \"date\": \"Fri Mar 31 08:22:57 2023 +0200\", \"change summary\": \"Add _timescaledb_functions schema\", \"change details\": \"Currently internal user objects like chunks and our functions live in the same schema making locking down that schema hard. This patch adds a new schema _timescaledb_functions that is meant to be the schema used for timescaledb internal functions to allow separation of code and chunks or other user objects. \"}\n", + "--------------------------------------------------------------------------------\n", + "--------------------------------------------------------------------------------\n", + "Score: 0.17488396167755127\n", + "Date: 2023-08-29 18:13:24+0320\n", + "{\"commit\": \" e4facda540286b0affba47ccc63959fefe2a7b26\", \"author\": \"Sven Klemm\", \"date\": \"Tue Aug 29 18:13:24 2023 +0200\", \"change summary\": \"Add compatibility layer for _timescaledb_internal functions\", \"change details\": \"With timescaledb 2.12 all the functions present in _timescaledb_internal were moved into the _timescaledb_functions schema to improve schema security. This patch adds a compatibility layer so external callers of these internal functions will not break and allow for more flexibility when migrating. \"}\n", + "--------------------------------------------------------------------------------\n" + ] + } + ], + "source": [ + "# Method 5: Query all vectors before end_date\n", + "docs_with_score = db.similarity_search_with_score(query, end_date=end_dt)\n", + "\n", + "for doc, score in docs_with_score:\n", + " print(\"-\" * 80)\n", + " print(\"Score: \", score)\n", + " print(\"Date: \", doc.metadata[\"date\"])\n", + " print(doc.page_content)\n", + " print(\"-\" * 80)" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The main takeaway is that in each result above, only vectors within the specified time range are returned. These queries are very efficient as they only need to search the relevant partitions." + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can also use this functionality for question answering, where we want to find the most relevant vectors within a specified time range to use as context for answering a question. Let's take a look at an example below, using Timescale Vector as a retriever:" + ] + }, + { + "cell_type": "code", + "execution_count": 39, + "metadata": {}, + "outputs": [], + "source": [ + "# Set timescale vector as a retriever and specify start and end dates via kwargs\n", + "retriever = db.as_retriever(search_kwargs={\"start_date\": start_dt, \"end_date\": end_dt})" + ] + }, + { + "cell_type": "code", + "execution_count": 42, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "\n", + "\u001b[1m> Entering new RetrievalQA chain...\u001b[0m\n", + "\n", + "\u001b[1m> Finished chain.\u001b[0m\n", + "The following changes were made to the timescaledb functions:\n", + "\n", + "1. \"Add compatibility layer for _timescaledb_internal functions\" - This change was made on Tue Aug 29 18:13:24 2023 +0200.\n", + "2. \"Move functions to _timescaledb_functions schema\" - This change was made on Sun Aug 20 22:47:10 2023 +0200.\n", + "3. \"Move utility functions to _timescaledb_functions schema\" - This change was made on Tue Aug 22 12:01:19 2023 +0200.\n", + "4. \"Move partitioning functions to _timescaledb_functions schema\" - This change was made on Tue Aug 29 10:49:47 2023 +0200.\n" + ] + } + ], + "source": [ + "from langchain.chat_models import ChatOpenAI\n", + "llm = ChatOpenAI(temperature = 0.1, model = 'gpt-3.5-turbo-16k')\n", + "\n", + "from langchain.chains import RetrievalQA\n", + "qa_stuff = RetrievalQA.from_chain_type(\n", + " llm=llm, \n", + " chain_type=\"stuff\", \n", + " retriever=retriever,\n", + " verbose=True,\n", + ")\n", + "\n", + "query = \"What's new with the timescaledb functions? Tell me when these changes were made.\"\n", + "response = qa_stuff.run(query)\n", + "print(response)" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Note that the context the LLM uses to compose an answer are from retrieved documents only within the specified date range. \n", + "\n", + "This shows how you can use Timescale Vector to enhance retrieval augmented generation by retrieving documents within time ranges relevant to your query." + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 3. Using ANN Search Indexes to Speed Up Queries\n", + "\n", + "You can speed up similarity queries by creating an index on the embedding column. You should only do this once you have ingested a large part of your data.\n", + "\n", + "Timescale Vector supports the following indexes:\n", + "- timescale_vector index (tsv): a disk-ann inspired graph index for fast similarity search (default).\n", + "- pgvector's HNSW index: a hierarchical navigable small world graph index for fast similarity search.\n", + "- pgvector's IVFFLAT index: an inverted file index for fast similarity search.\n", + "\n", + "Important note: In PostgreSQL, each table can only have one index on a particular column. So if you'd like to test the performance of different index types, you can do so either by (1) creating multiple tables with different indexes, (2) creating multiple vector columns in the same table and creating different indexes on each column, or (3) by dropping and recreating the index on the same column and comparing results." + ] + }, + { + "cell_type": "code", + "execution_count": 43, + "metadata": {}, + "outputs": [], + "source": [ + "# Initialize an existing TimescaleVector store\n", + "COLLECTION_NAME = \"timescale_commits\"\n", + "embeddings = OpenAIEmbeddings()\n", + "db = TimescaleVector(\n", + " collection_name=COLLECTION_NAME,\n", + " service_url=SERVICE_URL,\n", + " embedding_function=embeddings,\n", + ")" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Using the `create_index()` function without additional arguments will create a timescale_vector_index by default, using the default parameters." + ] + }, + { + "cell_type": "code", + "execution_count": 44, + "metadata": {}, + "outputs": [], + "source": [ + "# create an index\n", + "# by default this will create a Timescale Vector (DiskANN) index\n", + "db.create_index()" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You can also specify the parameters for the index. See the Timescale Vector documentation for a full discussion of the different parameters and their effects on performance.\n", + "\n", + "Note: You don't need to specify parameters as we set smart defaults. But you can always specify your own parameters if you want to experiment eek out more performance for your specific dataset." + ] + }, + { + "cell_type": "code", + "execution_count": 45, + "metadata": {}, + "outputs": [], + "source": [ + "#drop the old index\n", + "db.drop_index()\n", + "\n", + "# create an index\n", + "# Note: You don't need to specify m and ef_construction parameters as we set smart defaults. \n", + "db.create_index(index_type=\"tsv\", max_alpha=1.0, num_neighbors=50) " + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "Timescale Vector also supports the HNSW ANN indexing algorithm, as well as the ivfflat ANN indexing algorithm. Simply specify in the `index_type` argument which index you'd like to create, and optionally specify the parameters for the index." + ] + }, + { + "cell_type": "code", + "execution_count": 46, + "metadata": {}, + "outputs": [], + "source": [ + "#drop the old index\n", + "db.drop_index()\n", + "\n", + "# Create an HNSW index\n", + "# Note: You don't need to specify m and ef_construction parameters as we set smart defaults. \n", + "db.create_index(index_type=\"hnsw\", m=16, ef_construction=64)" + ] + }, + { + "cell_type": "code", + "execution_count": 47, + "metadata": {}, + "outputs": [], + "source": [ + "#drop the old index\n", + "db.drop_index()\n", + "\n", + "# Create an IVFFLAT index\n", + "# Note: You don't need to specify num_lists and num_records parameters as we set smart defaults.\n", + "db.create_index(index_type=\"ivfflat\", num_lists=20, num_records=1000)" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In general, we recommend using the default timescale vector index, or the HNSW index." + ] + }, + { + "cell_type": "code", + "execution_count": 48, + "metadata": {}, + "outputs": [], + "source": [ + "#drop the old index\n", + "db.drop_index()\n", + "# Create a new timescale vector index\n", + "db.create_index()" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 4. Self Querying Retriever with Timescale Vector\n", + "\n", + "Timescale Vector also supports the self-querying retriever functionality, which gives it the ability to query itself. Given a natural language query with a query statement and filters (single or composite), the retriever uses a query constructing LLM chain to write a SQL query and then applies it to the underlying PostgreSQL database in the Timescale Vector vectorstore.\n", + "\n", + "For more on self-querying, [see the docs](https://python.langchain.com/docs/modules/data_connection/retrievers/self_query/)." + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To illustrate self-querying with Timescale Vector, we'll use the same gitlog dataset from Part 3." + ] + }, + { + "cell_type": "code", + "execution_count": 49, + "metadata": {}, + "outputs": [], + "source": [ + "COLLECTION_NAME = \"timescale_commits\"\n", + "vectorstore = TimescaleVector(\n", + " embedding_function=OpenAIEmbeddings(),\n", + " collection_name=COLLECTION_NAME,\n", + " service_url=SERVICE_URL,\n", + ")" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Next we'll create our self-querying retriever. To do this we'll need to provide some information upfront about the metadata fields that our documents support and a short description of the document contents." + ] + }, + { + "cell_type": "code", + "execution_count": 50, + "metadata": {}, + "outputs": [], + "source": [ + "from langchain.llms import OpenAI\n", + "from langchain.retrievers.self_query.base import SelfQueryRetriever\n", + "from langchain.chains.query_constructor.base import AttributeInfo\n", + "\n", + "# Give LLM info about the metadata fields\n", + "metadata_field_info = [\n", + " AttributeInfo(\n", + " name=\"id\",\n", + " description=\"A UUID v1 generated from the date of the commit\",\n", + " type=\"uuid\",\n", + " ),\n", + " AttributeInfo(\n", + " name=\"date\",\n", + " description=\"The date of the commit in timestamptz format\",\n", + " type=\"timestamptz\",\n", + " ),\n", + " AttributeInfo(\n", + " name=\"author_name\",\n", + " description=\"The name of the author of the commit\",\n", + " type=\"string\",\n", + " ),\n", + " AttributeInfo(\n", + " name=\"author_email\",\n", + " description=\"The email address of the author of the commit\",\n", + " type=\"string\",\n", + " )\n", + "]\n", + "document_content_description = \"The git log commit summary containing the commit hash, author, date of commit, change summary and change details\"\n", + "\n", + "# Instantiate the self-query retriever from an LLM\n", + "llm = OpenAI(temperature=0)\n", + "retriever = SelfQueryRetriever.from_llm(\n", + " llm, vectorstore, document_content_description, metadata_field_info, enable_limit=True, verbose=True\n", + ")" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now let's test out the self-querying retriever on our gitlog dataset. \n", + "\n", + "Run the queries below and note how you can specify a query, query with a filter, and query with a composite filter (filters with AND, OR) in natural language and the self-query retriever will translate that query into SQL and perform the search on the Timescale Vector PostgreSQL vectorstore.\n", + "\n", + "This illustrates the power of the self-query retriever. You can use it to perform complex searches over your vectorstore without you or your users having to write any SQL directly!" + ] + }, + { + "cell_type": "code", + "execution_count": 51, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/Users/avtharsewrathan/sideprojects2023/timescaleai/tsv-langchain/langchain/libs/langchain/langchain/chains/llm.py:275: UserWarning: The predict_and_parse method is deprecated, instead pass an output parser directly to LLMChain.\n", + " warnings.warn(\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "query='improvements to continuous aggregates' filter=None limit=None\n" + ] + }, + { + "data": { + "text/plain": [ + "[Document(page_content='{\"commit\": \" 35c91204987ccb0161d745af1a39b7eb91bc65a5\", \"author\": \"Fabr\\\\u00edzio de Royes Mello\", \"date\": \"Thu Nov 24 13:19:36 2022 -0300\", \"change summary\": \"Add Hierarchical Continuous Aggregates validations\", \"change details\": \"Commit 3749953e introduce Hierarchical Continuous Aggregates (aka Continuous Aggregate on top of another Continuous Aggregate) but it lacks of some basic validations. Validations added during the creation of a Hierarchical Continuous Aggregate: * Forbid create a continuous aggregate with fixed-width bucket on top of a continuous aggregate with variable-width bucket. * Forbid incompatible bucket widths: - should not be equal; - bucket width of the new continuous aggregate should be greater than the source continuous aggregate; - bucket width of the new continuous aggregate should be multiple of the source continuous aggregate. \"}', metadata={'id': 'c98d1c00-6c13-11ed-9bbe-23925ce74d13', 'date': '2022-11-24 13:19:36+-500', 'source': '/Users/avtharsewrathan/sideprojects2023/timescaleai/tsv-langchain/langchain/docs/extras/modules/ts_git_log.json', 'seq_num': 446, 'author_name': 'Fabrízio de Royes Mello', 'commit_hash': ' 35c91204987ccb0161d745af1a39b7eb91bc65a5', 'author_email': 'fabriziomello@gmail.com'}),\n", + " Document(page_content='{\"commit\": \" 3749953e9704e45df8f621607989ada0714ce28d\", \"author\": \"Fabr\\\\u00edzio de Royes Mello\", \"date\": \"Wed Oct 5 18:45:40 2022 -0300\", \"change summary\": \"Hierarchical Continuous Aggregates\", \"change details\": \"Enable users create Hierarchical Continuous Aggregates (aka Continuous Aggregates on top of another Continuous Aggregates). With this PR users can create levels of aggregation granularity in Continuous Aggregates making the refresh process even faster. A problem with this feature can be in upper levels we can end up with the \\\\\"average of averages\\\\\". But to get the \\\\\"real average\\\\\" we can rely on \\\\\"stats_aggs\\\\\" TimescaleDB Toolkit function that calculate and store the partials that can be finalized with other toolkit functions like \\\\\"average\\\\\" and \\\\\"sum\\\\\". Closes #1400 \"}', metadata={'id': '0df31a00-44f7-11ed-9794-ebcc1227340f', 'date': '2022-10-5 18:45:40+-500', 'source': '/Users/avtharsewrathan/sideprojects2023/timescaleai/tsv-langchain/langchain/docs/extras/modules/ts_git_log.json', 'seq_num': 470, 'author_name': 'Fabrízio de Royes Mello', 'commit_hash': ' 3749953e9704e45df8f621607989ada0714ce28d', 'author_email': 'fabriziomello@gmail.com'}),\n", + " Document(page_content='{\"commit\": \" a6ff7ba6cc15b280a275e5acd315741ec9c86acc\", \"author\": \"Mats Kindahl\", \"date\": \"Tue Feb 28 12:04:17 2023 +0100\", \"change summary\": \"Rename columns in old-style continuous aggregates\", \"change details\": \"For continuous aggregates with the old-style partial aggregates renaming columns that are not in the group-by clause will generate an error when upgrading to a later version. The reason is that it is implicitly assumed that the name of the column is the same as for the direct view. This holds true for new-style continous aggregates, but is not always true for old-style continuous aggregates. In particular, columns that are not part of the `GROUP BY` clause can have an internally generated name. This commit fixes that by extracting the name of the column from the partial view and use that when renaming the partial view column and the materialized table column. \"}', metadata={'id': 'a49ace80-b757-11ed-8138-2390fd44ffd9', 'date': '2023-02-28 12:04:17+0140', 'source': '/Users/avtharsewrathan/sideprojects2023/timescaleai/tsv-langchain/langchain/docs/extras/modules/ts_git_log.json', 'seq_num': 294, 'author_name': 'Mats Kindahl', 'commit_hash': ' a6ff7ba6cc15b280a275e5acd315741ec9c86acc', 'author_email': 'mats@timescale.com'}),\n", + " Document(page_content='{\"commit\": \" 5bba74a2ec083728f8e93e09d03d102568fd72b5\", \"author\": \"Fabr\\\\u00edzio de Royes Mello\", \"date\": \"Mon Aug 7 19:49:47 2023 -0300\", \"change summary\": \"Relax strong table lock when refreshing a CAGG\", \"change details\": \"When refreshing a Continuous Aggregate we take a table lock on _timescaledb_catalog.continuous_aggs_invalidation_threshold when processing the invalidation logs (the first transaction of the refresh Continuous Aggregate procedure). It means that even two different Continuous Aggregates over two different hypertables will wait each other in the first phase of the refreshing procedure. Also it lead to problems when a pg_dump is running because it take an AccessShareLock on tables so Continuous Aggregate refresh execution will wait until the pg_dump finish. Improved it by relaxing the strong table-level lock to a row-level lock so now the Continuous Aggregate refresh procedure can be executed in multiple sessions with less locks. Fix #3554 \"}', metadata={'id': 'b5583780-3574-11ee-a5ba-2e305874a58f', 'date': '2023-08-7 19:49:47+-500', 'source': '/Users/avtharsewrathan/sideprojects2023/timescaleai/tsv-langchain/langchain/docs/extras/modules/ts_git_log.json', 'seq_num': 27, 'author_name': 'Fabrízio de Royes Mello', 'commit_hash': ' 5bba74a2ec083728f8e93e09d03d102568fd72b5', 'author_email': 'fabriziomello@gmail.com'})]" + ] + }, + "execution_count": 51, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# This example specifies a relevant query\n", + "retriever.get_relevant_documents(\"What are improvements made to continuous aggregates?\")" + ] + }, + { + "cell_type": "code", + "execution_count": 52, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "query=' ' filter=Comparison(comparator=, attribute='author_name', value='Sven Klemm') limit=None\n" + ] + }, + { + "data": { + "text/plain": [ + "[Document(page_content='{\"commit\": \" e2e7ae304521b74ac6b3f157a207da047d44ab06\", \"author\": \"Sven Klemm\", \"date\": \"Fri Mar 3 11:22:06 2023 +0100\", \"change summary\": \"Don\\'t run sanitizer test on individual PRs\", \"change details\": \"Sanitizer tests take a long time to run so we don\\'t want to run them on individual PRs but instead run them nightly and on commits to master. \"}', metadata={'id': '3f401b00-b9ad-11ed-b5ea-a3fd40b9ac16', 'date': '2023-03-3 11:22:06+0140', 'source': '/Users/avtharsewrathan/sideprojects2023/timescaleai/tsv-langchain/langchain/docs/extras/modules/ts_git_log.json', 'seq_num': 295, 'author_name': 'Sven Klemm', 'commit_hash': ' e2e7ae304521b74ac6b3f157a207da047d44ab06', 'author_email': 'sven@timescale.com'}),\n", + " Document(page_content='{\"commit\": \" d8f19e57a04d17593df5f2c694eae8775faddbc7\", \"author\": \"Sven Klemm\", \"date\": \"Wed Feb 1 08:34:20 2023 +0100\", \"change summary\": \"Bump version of setup-wsl github action\", \"change details\": \"The currently used version pulls in Node.js 12 which is deprecated on github. https://github.blog/changelog/2022-09-22-github-actions-all-actions-will-begin-running-on-node16-instead-of-node12/ \"}', metadata={'id': 'd70de600-a202-11ed-85d6-30b6df240f49', 'date': '2023-02-1 08:34:20+0140', 'source': '/Users/avtharsewrathan/sideprojects2023/timescaleai/tsv-langchain/langchain/docs/extras/modules/ts_git_log.json', 'seq_num': 350, 'author_name': 'Sven Klemm', 'commit_hash': ' d8f19e57a04d17593df5f2c694eae8775faddbc7', 'author_email': 'sven@timescale.com'}),\n", + " Document(page_content='{\"commit\": \" 83b13cf6f73a74656dde9cc6ec6cf76740cddd3c\", \"author\": \"Sven Klemm\", \"date\": \"Fri Nov 25 08:27:45 2022 +0100\", \"change summary\": \"Use packaged postgres for sqlsmith and coverity CI\", \"change details\": \"The sqlsmith and coverity workflows used the cache postgres build but could not produce a build by themselves and therefore relied on other workflows to produce the cached binaries. This patch changes those workflows to use normal postgres packages instead of custom built postgres to remove that dependency. \"}', metadata={'id': 'a786ae80-6c92-11ed-bd6c-a57bd3348b97', 'date': '2022-11-25 08:27:45+0140', 'source': '/Users/avtharsewrathan/sideprojects2023/timescaleai/tsv-langchain/langchain/docs/extras/modules/ts_git_log.json', 'seq_num': 447, 'author_name': 'Sven Klemm', 'commit_hash': ' 83b13cf6f73a74656dde9cc6ec6cf76740cddd3c', 'author_email': 'sven@timescale.com'}),\n", + " Document(page_content='{\"commit\": \" b1314e63f2ff6151ab5becfb105afa3682286a4d\", \"author\": \"Sven Klemm\", \"date\": \"Thu Dec 22 12:03:35 2022 +0100\", \"change summary\": \"Fix RPM package test for PG15 on centos 7\", \"change details\": \"Installing PG15 on Centos 7 requires the EPEL repository to satisfy the dependencies. \"}', metadata={'id': '477b1d80-81e8-11ed-9c8c-9b5abbd67c98', 'date': '2022-12-22 12:03:35+0140', 'source': '/Users/avtharsewrathan/sideprojects2023/timescaleai/tsv-langchain/langchain/docs/extras/modules/ts_git_log.json', 'seq_num': 408, 'author_name': 'Sven Klemm', 'commit_hash': ' b1314e63f2ff6151ab5becfb105afa3682286a4d', 'author_email': 'sven@timescale.com'})]" + ] + }, + "execution_count": 52, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# This example specifies a filter\n", + "retriever.get_relevant_documents(\"What commits did Sven Klemm add?\")" + ] + }, + { + "cell_type": "code", + "execution_count": 53, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "query='timescaledb_functions' filter=Comparison(comparator=, attribute='author_name', value='Sven Klemm') limit=None\n" + ] + }, + { + "data": { + "text/plain": [ + "[Document(page_content='{\"commit\": \" 04f43335dea11e9c467ee558ad8edfc00c1a45ed\", \"author\": \"Sven Klemm\", \"date\": \"Thu Apr 6 13:00:00 2023 +0200\", \"change summary\": \"Move aggregate support function into _timescaledb_functions\", \"change details\": \"This patch moves the support functions for histogram, first and last into the _timescaledb_functions schema. Since we alter the schema of the existing functions in upgrade scripts and do not change the aggregates this should work completely transparently for any user objects using those aggregates. \"}', metadata={'id': '2cb47800-d46a-11ed-8f0e-2b624245c561', 'date': '2023-04-6 13:00:00+0320', 'source': '/Users/avtharsewrathan/sideprojects2023/timescaleai/tsv-langchain/langchain/docs/extras/modules/ts_git_log.json', 'seq_num': 233, 'author_name': 'Sven Klemm', 'commit_hash': ' 04f43335dea11e9c467ee558ad8edfc00c1a45ed', 'author_email': 'sven@timescale.com'}),\n", + " Document(page_content='{\"commit\": \" feef9206facc5c5f506661de4a81d96ef059b095\", \"author\": \"Sven Klemm\", \"date\": \"Fri Mar 31 08:22:57 2023 +0200\", \"change summary\": \"Add _timescaledb_functions schema\", \"change details\": \"Currently internal user objects like chunks and our functions live in the same schema making locking down that schema hard. This patch adds a new schema _timescaledb_functions that is meant to be the schema used for timescaledb internal functions to allow separation of code and chunks or other user objects. \"}', metadata={'id': '7a257680-cf8c-11ed-848c-a515e8687479', 'date': '2023-03-31 08:22:57+0320', 'source': '/Users/avtharsewrathan/sideprojects2023/timescaleai/tsv-langchain/langchain/docs/extras/modules/ts_git_log.json', 'seq_num': 239, 'author_name': 'Sven Klemm', 'commit_hash': ' feef9206facc5c5f506661de4a81d96ef059b095', 'author_email': 'sven@timescale.com'}),\n", + " Document(page_content='{\"commit\": \" 0a66bdb8d36a1879246bd652e4c28500c4b951ab\", \"author\": \"Sven Klemm\", \"date\": \"Sun Aug 20 22:47:10 2023 +0200\", \"change summary\": \"Move functions to _timescaledb_functions schema\", \"change details\": \"To increase schema security we do not want to mix our own internal objects with user objects. Since chunks are created in the _timescaledb_internal schema our internal functions should live in a different dedicated schema. This patch make the necessary adjustments for the following functions: - to_unix_microseconds(timestamptz) - to_timestamp(bigint) - to_timestamp_without_timezone(bigint) - to_date(bigint) - to_interval(bigint) - interval_to_usec(interval) - time_to_internal(anyelement) - subtract_integer_from_now(regclass, bigint) \"}', metadata={'id': 'bb99db00-3f9a-11ee-a8dc-0b9c1a5a37c4', 'date': '2023-08-20 22:47:10+0320', 'source': '/Users/avtharsewrathan/sideprojects2023/timescaleai/tsv-langchain/langchain/docs/extras/modules/ts_git_log.json', 'seq_num': 41, 'author_name': 'Sven Klemm', 'commit_hash': ' 0a66bdb8d36a1879246bd652e4c28500c4b951ab', 'author_email': 'sven@timescale.com'}),\n", + " Document(page_content='{\"commit\": \" 56ea8b4de93cefc38e002202d8ac96947dcbaa77\", \"author\": \"Sven Klemm\", \"date\": \"Thu Apr 13 13:16:14 2023 +0200\", \"change summary\": \"Move trigger functions to _timescaledb_functions schema\", \"change details\": \"To increase schema security we do not want to mix our own internal objects with user objects. Since chunks are created in the _timescaledb_internal schema our internal functions should live in a different dedicated schema. This patch make the necessary adjustments for our trigger functions. \"}', metadata={'id': '9a255300-d9ec-11ed-988f-7086c8ca463a', 'date': '2023-04-13 13:16:14+0320', 'source': '/Users/avtharsewrathan/sideprojects2023/timescaleai/tsv-langchain/langchain/docs/extras/modules/ts_git_log.json', 'seq_num': 44, 'author_name': 'Sven Klemm', 'commit_hash': ' 56ea8b4de93cefc38e002202d8ac96947dcbaa77', 'author_email': 'sven@timescale.com'})]" + ] + }, + "execution_count": 53, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# This example specifies a query and filter\n", + "retriever.get_relevant_documents(\"What commits about timescaledb_functions did Sven Klemm add?\")" + ] + }, + { + "cell_type": "code", + "execution_count": 54, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "query=' ' filter=Operation(operator=, arguments=[Comparison(comparator=, attribute='date', value='2023-07-01T00:00:00Z'), Comparison(comparator=, attribute='date', value='2023-07-31T23:59:59Z')]) limit=None\n" + ] + }, + { + "data": { + "text/plain": [ + "[Document(page_content='{\"commit\": \" 5cf354e2469ee7e43248bed382a4b49fc7ccfecd\", \"author\": \"Markus Engel\", \"date\": \"Mon Jul 31 11:28:25 2023 +0200\", \"change summary\": \"Fix quoting owners in sql scripts.\", \"change details\": \"When referring to a role from a string type, it must be properly quoted using pg_catalog.quote_ident before it can be casted to regrole. Fixed this, especially in update scripts. \"}', metadata={'id': '99590280-2f84-11ee-915b-5715b2447de4', 'date': '2023-07-31 11:28:25+0320', 'source': '/Users/avtharsewrathan/sideprojects2023/timescaleai/tsv-langchain/langchain/docs/extras/modules/ts_git_log.json', 'seq_num': 76, 'author_name': 'Markus Engel', 'commit_hash': ' 5cf354e2469ee7e43248bed382a4b49fc7ccfecd', 'author_email': 'engel@sero-systems.de'}),\n", + " Document(page_content='{\"commit\": \" 88aaf23ae37fe7f47252b87325eb570aa417c607\", \"author\": \"noctarius aka Christoph Engelbert\", \"date\": \"Wed Jul 12 14:53:40 2023 +0200\", \"change summary\": \"Allow Replica Identity (Alter Table) on CAGGs (#5868)\", \"change details\": \"This commit is a follow up of #5515, which added support for ALTER TABLE\\\\r ... REPLICA IDENTITY (FULL | INDEX) on hypertables.\\\\r \\\\r This commit allows the execution against materialized hypertables to\\\\r enable update / delete operations on continuous aggregates when logical\\\\r replication in enabled for them.\"}', metadata={'id': '1fcfa200-20b3-11ee-9a18-370561c7cb1a', 'date': '2023-07-12 14:53:40+0320', 'source': '/Users/avtharsewrathan/sideprojects2023/timescaleai/tsv-langchain/langchain/docs/extras/modules/ts_git_log.json', 'seq_num': 96, 'author_name': 'noctarius aka Christoph Engelbert', 'commit_hash': ' 88aaf23ae37fe7f47252b87325eb570aa417c607', 'author_email': 'me@noctarius.com'}),\n", + " Document(page_content='{\"commit\": \" d5268c36fbd23fa2a93c0371998286e8688247bb\", \"author\": \"Alexander Kuzmenkov<36882414+akuzm@users.noreply.github.com>\", \"date\": \"Fri Jul 28 13:35:05 2023 +0200\", \"change summary\": \"Fix SQLSmith workflow\", \"change details\": \"The build was failing because it was picking up the wrong version of Postgres. Remove it. \"}', metadata={'id': 'cc0fba80-2d3a-11ee-ae7d-36dc25cad3b8', 'date': '2023-07-28 13:35:05+0320', 'source': '/Users/avtharsewrathan/sideprojects2023/timescaleai/tsv-langchain/langchain/docs/extras/modules/ts_git_log.json', 'seq_num': 82, 'author_name': 'Alexander Kuzmenkov', 'commit_hash': ' d5268c36fbd23fa2a93c0371998286e8688247bb', 'author_email': '36882414+akuzm@users.noreply.github.com'}),\n", + " Document(page_content='{\"commit\": \" 61c288ec5eb966a9b4d8ed90cd026ffc5e3543c9\", \"author\": \"Lakshmi Narayanan Sreethar\", \"date\": \"Tue Jul 25 16:11:35 2023 +0530\", \"change summary\": \"Fix broken CI after PG12 removal\", \"change details\": \"The commit cdea343cc updated the gh_matrix_builder.py script but failed to import PG_LATEST variable into the script thus breaking the CI. Import that variable to fix the CI tests. \"}', metadata={'id': 'd3835980-2ad7-11ee-b98d-c4e3092e076e', 'date': '2023-07-25 16:11:35+0850', 'source': '/Users/avtharsewrathan/sideprojects2023/timescaleai/tsv-langchain/langchain/docs/extras/modules/ts_git_log.json', 'seq_num': 84, 'author_name': 'Lakshmi Narayanan Sreethar', 'commit_hash': ' 61c288ec5eb966a9b4d8ed90cd026ffc5e3543c9', 'author_email': 'lakshmi@timescale.com'})]" + ] + }, + "execution_count": 54, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# This example specifies a time-based filter\n", + "retriever.get_relevant_documents(\"What commits were added in July 2023?\")" + ] + }, + { + "cell_type": "code", + "execution_count": 55, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "query='hierarchical continuous aggregates' filter=None limit=2\n" + ] + }, + { + "data": { + "text/plain": [ + "[Document(page_content='{\"commit\": \" 35c91204987ccb0161d745af1a39b7eb91bc65a5\", \"author\": \"Fabr\\\\u00edzio de Royes Mello\", \"date\": \"Thu Nov 24 13:19:36 2022 -0300\", \"change summary\": \"Add Hierarchical Continuous Aggregates validations\", \"change details\": \"Commit 3749953e introduce Hierarchical Continuous Aggregates (aka Continuous Aggregate on top of another Continuous Aggregate) but it lacks of some basic validations. Validations added during the creation of a Hierarchical Continuous Aggregate: * Forbid create a continuous aggregate with fixed-width bucket on top of a continuous aggregate with variable-width bucket. * Forbid incompatible bucket widths: - should not be equal; - bucket width of the new continuous aggregate should be greater than the source continuous aggregate; - bucket width of the new continuous aggregate should be multiple of the source continuous aggregate. \"}', metadata={'id': 'c98d1c00-6c13-11ed-9bbe-23925ce74d13', 'date': '2022-11-24 13:19:36+-500', 'source': '/Users/avtharsewrathan/sideprojects2023/timescaleai/tsv-langchain/langchain/docs/extras/modules/ts_git_log.json', 'seq_num': 446, 'author_name': 'Fabrízio de Royes Mello', 'commit_hash': ' 35c91204987ccb0161d745af1a39b7eb91bc65a5', 'author_email': 'fabriziomello@gmail.com'}),\n", + " Document(page_content='{\"commit\": \" 3749953e9704e45df8f621607989ada0714ce28d\", \"author\": \"Fabr\\\\u00edzio de Royes Mello\", \"date\": \"Wed Oct 5 18:45:40 2022 -0300\", \"change summary\": \"Hierarchical Continuous Aggregates\", \"change details\": \"Enable users create Hierarchical Continuous Aggregates (aka Continuous Aggregates on top of another Continuous Aggregates). With this PR users can create levels of aggregation granularity in Continuous Aggregates making the refresh process even faster. A problem with this feature can be in upper levels we can end up with the \\\\\"average of averages\\\\\". But to get the \\\\\"real average\\\\\" we can rely on \\\\\"stats_aggs\\\\\" TimescaleDB Toolkit function that calculate and store the partials that can be finalized with other toolkit functions like \\\\\"average\\\\\" and \\\\\"sum\\\\\". Closes #1400 \"}', metadata={'id': '0df31a00-44f7-11ed-9794-ebcc1227340f', 'date': '2022-10-5 18:45:40+-500', 'source': '/Users/avtharsewrathan/sideprojects2023/timescaleai/tsv-langchain/langchain/docs/extras/modules/ts_git_log.json', 'seq_num': 470, 'author_name': 'Fabrízio de Royes Mello', 'commit_hash': ' 3749953e9704e45df8f621607989ada0714ce28d', 'author_email': 'fabriziomello@gmail.com'})]" + ] + }, + "execution_count": 55, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# This example specifies a query and a LIMIT value\n", + "retriever.get_relevant_documents(\"What are two commits about hierarchical continuous aggregates?\")" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 5. Working with an existing TimescaleVector vectorstore\n", + "\n", + "In the examples above, we created a vectorstore from a collection of documents. However, often we want to work insert data into and query data from an existing vectorstore. Let's see how to initialize, add documents to, and query an existing collection of documents in a TimescaleVector vector store.\n", + "\n", + "To work with an existing Timescale Vector store, we need to know the name of the table we want to query (`COLLECTION_NAME`) and the URL of the cloud PostgreSQL database (`SERVICE_URL`)." + ] + }, + { + "cell_type": "code", + "execution_count": 56, + "metadata": {}, + "outputs": [], + "source": [ + "# Initialize the existing\n", + "COLLECTION_NAME = \"timescale_commits\"\n", + "embeddings = OpenAIEmbeddings()\n", + "vectorstore = TimescaleVector(\n", + " collection_name=COLLECTION_NAME,\n", + " service_url=SERVICE_URL,\n", + " embedding_function=embeddings,\n", + ")" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To load new data into the table, we use the `add_document()` function. This function takes a list of documents and a list of metadata. The metadata must contain a unique id for each document. \n", + "\n", + "If you want your documents to be associated with the current date and time, you do not need to create a list of ids. A uuid will be automatically generated for each document.\n", + "\n", + "If you want your documents to be associated with a past date and time, you can create a list of ids using the `uuid_from_time` function in the `timecale-vector` python library, as shown in Section 2 above. This function takes a datetime object and returns a uuid with the date and time encoded in the uuid." + ] + }, + { + "cell_type": "code", + "execution_count": 58, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "['a34f2b8a-53d7-11ee-8cc3-de1e4b2a0118']" + ] + }, + "execution_count": 58, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Add documents to a collection in TimescaleVector\n", + "ids = vectorstore.add_documents([Document(page_content=\"foo\")])\n", + "ids" + ] + }, + { + "cell_type": "code", + "execution_count": 59, + "metadata": {}, + "outputs": [], + "source": [ + "# Query the vectorstore for similar documents\n", + "docs_with_score = vectorstore.similarity_search_with_score(\"foo\")" + ] + }, + { + "cell_type": "code", + "execution_count": 60, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(Document(page_content='foo', metadata={}), 5.006789860928507e-06)" + ] + }, + "execution_count": 60, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "docs_with_score[0]" + ] + }, + { + "cell_type": "code", + "execution_count": 61, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(Document(page_content='{\"commit\": \" 00b566dfe478c11134bcf1e7bcf38943e7fafe8f\", \"author\": \"Fabr\\\\u00edzio de Royes Mello\", \"date\": \"Mon Mar 6 15:51:03 2023 -0300\", \"change summary\": \"Remove unused functions\", \"change details\": \"We don\\'t use `ts_catalog_delete[_only]` functions anywhere and instead we rely on `ts_catalog_delete_tid[_only]` functions so removing it from our code base. \"}', metadata={'id': 'd7f5c580-bc4f-11ed-9712-ffa0126a201a', 'date': '2023-03-6 15:51:03+-500', 'source': '/Users/avtharsewrathan/sideprojects2023/timescaleai/tsv-langchain/langchain/docs/extras/modules/ts_git_log.json', 'seq_num': 285, 'author_name': 'Fabrízio de Royes Mello', 'commit_hash': ' 00b566dfe478c11134bcf1e7bcf38943e7fafe8f', 'author_email': 'fabriziomello@gmail.com'}),\n", + " 0.23607668446580354)" + ] + }, + "execution_count": 61, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "docs_with_score[1]" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Deleting Data \n", + "\n", + "You can delete data by uuid or by a filter on the metadata." + ] + }, + { + "cell_type": "code", + "execution_count": 64, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "True" + ] + }, + "execution_count": 64, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "ids = vectorstore.add_documents([Document(page_content=\"Bar\")])\n", + "\n", + "vectorstore.delete(ids)" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Deleting using metadata is especially useful if you want to periodically update information scraped from a particular source, or particular date or some other metadata attribute." + ] + }, + { + "cell_type": "code", + "execution_count": 65, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "['c6367004-53d7-11ee-8cc3-de1e4b2a0118']" + ] + }, + "execution_count": 65, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "vectorstore.add_documents([Document(page_content=\"Hello World\", metadata={\"source\": \"www.example.com/hello\"})])\n", + "vectorstore.add_documents([Document(page_content=\"Adios\", metadata={\"source\": \"www.example.com/adios\"})])\n", + "\n", + "vectorstore.delete_by_metadata({\"source\": \"www.example.com/adios\"})\n", + "\n", + "vectorstore.add_documents([Document(page_content=\"Adios, but newer!\", metadata={\"source\": \"www.example.com/adios\"})])" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Overriding a vectorstore\n", + "\n", + "If you have an existing collection, you override it by doing `from_documents` and setting `pre_delete_collection` = True" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "db = TimescaleVector.from_documents(\n", + " documents=docs,\n", + " embedding=embeddings,\n", + " collection_name=COLLECTION_NAME,\n", + " service_url=SERVICE_URL,\n", + " pre_delete_collection=True,\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "docs_with_score = db.similarity_search_with_score(\"foo\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "docs_with_score[0]" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.8.16" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/docs/extras/modules/data_connection/retrievers/self_query/timescalevector_self_query.ipynb b/docs/extras/modules/data_connection/retrievers/self_query/timescalevector_self_query.ipynb new file mode 100644 index 00000000000..dcf4e01b8e5 --- /dev/null +++ b/docs/extras/modules/data_connection/retrievers/self_query/timescalevector_self_query.ipynb @@ -0,0 +1,534 @@ +{ + "cells": [ + { + "attachments": {}, + "cell_type": "markdown", + "id": "13afcae7", + "metadata": {}, + "source": [ + "# Timescale Vector (Postgres) self-querying \n", + "\n", + "[Timescale Vector](https://www.timescale.com/ai) is PostgreSQL++ for AI applications. It enables you to efficiently store and query billions of vector embeddings in `PostgreSQL`.\n", + "\n", + "This notebook shows how to use the Postgres vector database (`TimescaleVector`) to perform self-querying. In the notebook we'll demo the `SelfQueryRetriever` wrapped around a TimescaleVector vector store. \n", + "\n", + "## What is Timescale Vector?\n", + "**[Timescale Vector](https://www.timescale.com/ai) is PostgreSQL++ for AI applications.**\n", + "\n", + "Timescale Vector enables you to efficiently store and query millions of vector embeddings in `PostgreSQL`.\n", + "- Enhances `pgvector` with faster and more accurate similarity search on 1B+ vectors via DiskANN inspired indexing algorithm.\n", + "- Enables fast time-based vector search via automatic time-based partitioning and indexing.\n", + "- Provides a familiar SQL interface for querying vector embeddings and relational data.\n", + "\n", + "Timescale Vector is cloud PostgreSQL for AI that scales with you from POC to production:\n", + "- Simplifies operations by enabling you to store relational metadata, vector embeddings, and time-series data in a single database.\n", + "- Benefits from rock-solid PostgreSQL foundation with enterprise-grade feature liked streaming backups and replication, high-availability and row-level security.\n", + "- Enables a worry-free experience with enterprise-grade security and compliance.\n", + "\n", + "## How to access Timescale Vector\n", + "Timescale Vector is available on [Timescale](https://www.timescale.com/ai), the cloud PostgreSQL platform. (There is no self-hosted version at this time.)\n", + "\n", + "LangChain users get a 90-day free trial for Timescale Vector.\n", + "- To get started, [signup](https://console.cloud.timescale.com/signup?utm_campaign=vectorlaunch&utm_source=langchain&utm_medium=referral) to Timescale, create a new database and follow this notebook!\n", + "- See the [Timescale Vector explainer blog](https://www.timescale.com/blog/how-we-made-postgresql-the-best-vector-database/?utm_campaign=vectorlaunch&utm_source=langchain&utm_medium=referral) for more details and performance benchmarks.\n", + "- See the [installation instructions](https://github.com/timescale/python-vector) for more details on using Timescale Vector in python.\n" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "68e75fb9", + "metadata": {}, + "source": [ + "## Creating a TimescaleVector vectorstore\n", + "First we'll want to create a Timescale Vector vectorstore and seed it with some data. We've created a small demo set of documents that contain summaries of movies.\n", + "\n", + "NOTE: The self-query retriever requires you to have `lark` installed (`pip install lark`). We also need the `timescale-vector` package." + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "63a8af5b", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "#!pip install lark" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "22431060-52c4-48a7-a97b-9f542b8b0928", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "#!pip install timescale-vector " + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "83811610-7df3-4ede-b268-68a6a83ba9e2", + "metadata": {}, + "source": [ + "In this example, we'll use `OpenAIEmbeddings`, so let's load your OpenAI API key." + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "dd01b61b-7d32-4a55-85d6-b2d2d4f18840", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# Get openAI api key by reading local .env file\n", + "# The .env file should contain a line starting with `OPENAI_API_KEY=sk-`\n", + "import os\n", + "from dotenv import load_dotenv, find_dotenv\n", + "_ = load_dotenv(find_dotenv())\n", + "\n", + "OPENAI_API_KEY = os.environ['OPENAI_API_KEY']\n", + "# Alternatively, use getpass to enter the key in a prompt\n", + "#import os\n", + "#import getpass\n", + "#os.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"OpenAI API Key:\")" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "766e9c4b", + "metadata": {}, + "source": [ + "To connect to your PostgreSQL database, you'll need your service URI, which can be found in the cheatsheet or `.env` file you downloaded after creating a new database. \n", + "\n", + "If you haven't already, [signup for Timescale](https://console.cloud.timescale.com/signup?utm_campaign=vectorlaunch&utm_source=langchain&utm_medium=referral), and create a new database.\n", + "\n", + "The URI will look something like this: `postgres://tsdbadmin:@.tsdb.cloud.timescale.com:/tsdb?sslmode=require`" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "6bd6877e", + "metadata": {}, + "outputs": [], + "source": [ + "# Get the service url by reading local .env file\n", + "# The .env file should contain a line starting with `TIMESCALE_SERVICE_URL=postgresql://`\n", + "_ = load_dotenv(find_dotenv())\n", + "TIMESCALE_SERVICE_URL = os.environ[\"TIMESCALE_SERVICE_URL\"]\n", + "\n", + "# Alternatively, use getpass to enter the key in a prompt\n", + "#import os\n", + "#import getpass\n", + "#TIMESCALE_SERVICE_URL = getpass.getpass(\"Timescale Service URL:\")" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "cb4a5787", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "from langchain.schema import Document\n", + "from langchain.embeddings.openai import OpenAIEmbeddings\n", + "from langchain.vectorstores.timescalevector import TimescaleVector\n", + "\n", + "embeddings = OpenAIEmbeddings()" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "a4f863f5", + "metadata": {}, + "source": [ + "Here's the sample documents we'll use for this demo. The data is about movies, and has both content and metadata fields with information about particular movie." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "bcbe04d9", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "docs = [\n", + " Document(\n", + " page_content=\"A bunch of scientists bring back dinosaurs and mayhem breaks loose\",\n", + " metadata={\"year\": 1993, \"rating\": 7.7, \"genre\": \"science fiction\"},\n", + " ),\n", + " Document(\n", + " page_content=\"Leo DiCaprio gets lost in a dream within a dream within a dream within a ...\",\n", + " metadata={\"year\": 2010, \"director\": \"Christopher Nolan\", \"rating\": 8.2},\n", + " ),\n", + " Document(\n", + " page_content=\"A psychologist / detective gets lost in a series of dreams within dreams within dreams and Inception reused the idea\",\n", + " metadata={\"year\": 2006, \"director\": \"Satoshi Kon\", \"rating\": 8.6},\n", + " ),\n", + " Document(\n", + " page_content=\"A bunch of normal-sized women are supremely wholesome and some men pine after them\",\n", + " metadata={\"year\": 2019, \"director\": \"Greta Gerwig\", \"rating\": 8.3},\n", + " ),\n", + " Document(\n", + " page_content=\"Toys come alive and have a blast doing so\",\n", + " metadata={\"year\": 1995, \"genre\": \"animated\"},\n", + " ),\n", + " Document(\n", + " page_content=\"Three men walk into the Zone, three men walk out of the Zone\",\n", + " metadata={\n", + " \"year\": 1979,\n", + " \"rating\": 9.9,\n", + " \"director\": \"Andrei Tarkovsky\",\n", + " \"genre\": \"science fiction\",\n", + " \"rating\": 9.9,\n", + " },\n", + " ),\n", + "]" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "7d0d771e", + "metadata": {}, + "source": [ + "Finally, we'll create our Timescale Vector vectorstore. Note that the collection name will be the name of the PostgreSQL table in which the documents are stored in." + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "2428d1ba", + "metadata": {}, + "outputs": [], + "source": [ + "COLLECTION_NAME = \"langchain_self_query_demo\"\n", + "vectorstore = TimescaleVector.from_documents(\n", + " embedding=embeddings,\n", + " documents=docs,\n", + " collection_name=COLLECTION_NAME,\n", + " service_url=TIMESCALE_SERVICE_URL,\n", + ")" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "5ecaab6d", + "metadata": {}, + "source": [ + "## Creating our self-querying retriever\n", + "Now we can instantiate our retriever. To do this we'll need to provide some information upfront about the metadata fields that our documents support and a short description of the document contents." + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "id": "86e34dbf", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "from langchain.llms import OpenAI\n", + "from langchain.retrievers.self_query.base import SelfQueryRetriever\n", + "from langchain.chains.query_constructor.base import AttributeInfo\n", + "\n", + "# Give LLM info about the metadata fields\n", + "metadata_field_info = [\n", + " AttributeInfo(\n", + " name=\"genre\",\n", + " description=\"The genre of the movie\",\n", + " type=\"string or list[string]\",\n", + " ),\n", + " AttributeInfo(\n", + " name=\"year\",\n", + " description=\"The year the movie was released\",\n", + " type=\"integer\",\n", + " ),\n", + " AttributeInfo(\n", + " name=\"director\",\n", + " description=\"The name of the movie director\",\n", + " type=\"string\",\n", + " ),\n", + " AttributeInfo(\n", + " name=\"rating\", description=\"A 1-10 rating for the movie\", type=\"float\"\n", + " ),\n", + "]\n", + "document_content_description = \"Brief summary of a movie\"\n", + "\n", + "# Instantiate the self-query retriever from an LLM\n", + "llm = OpenAI(temperature=0)\n", + "retriever = SelfQueryRetriever.from_llm(\n", + " llm, vectorstore, document_content_description, metadata_field_info, verbose=True\n", + ")" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "ea9df8d4", + "metadata": {}, + "source": [ + "## Self Querying Retrieval with Timescale Vector\n", + "And now we can try actually using our retriever!\n", + "\n", + "Run the queries below and note how you can specify a query, filter, composite filter (filters with AND, OR) in natural language and the self-query retriever will translate that query into SQL and perform the search on the Timescale Vector (Postgres) vectorstore.\n", + "\n", + "This illustrates the power of the self-query retriever. You can use it to perform complex searches over your vectorstore without you or your users having to write any SQL directly!" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "id": "38a126e9", + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/Users/avtharsewrathan/sideprojects2023/timescaleai/tsv-langchain/langchain/libs/langchain/langchain/chains/llm.py:275: UserWarning: The predict_and_parse method is deprecated, instead pass an output parser directly to LLMChain.\n", + " warnings.warn(\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "query='dinosaur' filter=None limit=None\n" + ] + }, + { + "data": { + "text/plain": [ + "[Document(page_content='A bunch of scientists bring back dinosaurs and mayhem breaks loose', metadata={'year': 1993, 'genre': 'science fiction', 'rating': 7.7}),\n", + " Document(page_content='A bunch of scientists bring back dinosaurs and mayhem breaks loose', metadata={'year': 1993, 'genre': 'science fiction', 'rating': 7.7}),\n", + " Document(page_content='Toys come alive and have a blast doing so', metadata={'year': 1995, 'genre': 'animated'}),\n", + " Document(page_content='Toys come alive and have a blast doing so', metadata={'year': 1995, 'genre': 'animated'})]" + ] + }, + "execution_count": 15, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# This example only specifies a relevant query\n", + "retriever.get_relevant_documents(\"What are some movies about dinosaurs\")" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "id": "fc3f1e6e", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "query=' ' filter=Comparison(comparator=, attribute='rating', value=8.5) limit=None\n" + ] + }, + { + "data": { + "text/plain": [ + "[Document(page_content='Three men walk into the Zone, three men walk out of the Zone', metadata={'year': 1979, 'genre': 'science fiction', 'rating': 9.9, 'director': 'Andrei Tarkovsky'}),\n", + " Document(page_content='Three men walk into the Zone, three men walk out of the Zone', metadata={'year': 1979, 'genre': 'science fiction', 'rating': 9.9, 'director': 'Andrei Tarkovsky'}),\n", + " Document(page_content='A psychologist / detective gets lost in a series of dreams within dreams within dreams and Inception reused the idea', metadata={'year': 2006, 'rating': 8.6, 'director': 'Satoshi Kon'}),\n", + " Document(page_content='A psychologist / detective gets lost in a series of dreams within dreams within dreams and Inception reused the idea', metadata={'year': 2006, 'rating': 8.6, 'director': 'Satoshi Kon'})]" + ] + }, + "execution_count": 16, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# This example only specifies a filter\n", + "retriever.get_relevant_documents(\"I want to watch a movie rated higher than 8.5\")" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "id": "b19d4da0", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "query='women' filter=Comparison(comparator=, attribute='director', value='Greta Gerwig') limit=None\n" + ] + }, + { + "data": { + "text/plain": [ + "[Document(page_content='A bunch of normal-sized women are supremely wholesome and some men pine after them', metadata={'year': 2019, 'rating': 8.3, 'director': 'Greta Gerwig'}),\n", + " Document(page_content='A bunch of normal-sized women are supremely wholesome and some men pine after them', metadata={'year': 2019, 'rating': 8.3, 'director': 'Greta Gerwig'})]" + ] + }, + "execution_count": 17, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# This example specifies a query and a filter\n", + "retriever.get_relevant_documents(\"Has Greta Gerwig directed any movies about women\")" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "id": "f900e40e", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "query=' ' filter=Operation(operator=, arguments=[Comparison(comparator=, attribute='rating', value=8.5), Comparison(comparator=, attribute='genre', value='science fiction')]) limit=None\n" + ] + }, + { + "data": { + "text/plain": [ + "[Document(page_content='Three men walk into the Zone, three men walk out of the Zone', metadata={'year': 1979, 'genre': 'science fiction', 'rating': 9.9, 'director': 'Andrei Tarkovsky'}),\n", + " Document(page_content='Three men walk into the Zone, three men walk out of the Zone', metadata={'year': 1979, 'genre': 'science fiction', 'rating': 9.9, 'director': 'Andrei Tarkovsky'})]" + ] + }, + "execution_count": 18, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# This example specifies a composite filter\n", + "retriever.get_relevant_documents(\n", + " \"What's a highly rated (above 8.5) science fiction film?\"\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "12a51522", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "query='toys' filter=Operation(operator=, arguments=[Comparison(comparator=, attribute='year', value=1990), Comparison(comparator=, attribute='year', value=2005), Comparison(comparator=, attribute='genre', value='animated')]) limit=None\n" + ] + }, + { + "data": { + "text/plain": [ + "[Document(page_content='Toys come alive and have a blast doing so', metadata={'year': 1995, 'genre': 'animated'})]" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# This example specifies a query and composite filter\n", + "retriever.get_relevant_documents(\n", + " \"What's a movie after 1990 but before 2005 that's all about toys, and preferably is animated\"\n", + ")" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "39bd1de1-b9fe-4a98-89da-58d8a7a6ae51", + "metadata": {}, + "source": [ + "### Filter k\n", + "\n", + "We can also use the self query retriever to specify `k`: the number of documents to fetch.\n", + "\n", + "We can do this by passing `enable_limit=True` to the constructor." + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "id": "bff36b88-b506-4877-9c63-e5a1a8d78e64", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "retriever = SelfQueryRetriever.from_llm(\n", + " llm,\n", + " vectorstore,\n", + " document_content_description,\n", + " metadata_field_info,\n", + " enable_limit=True,\n", + " verbose=True,\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "id": "2758d229-4f97-499c-819f-888acaf8ee10", + "metadata": { + "tags": [] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "query='dinosaur' filter=None limit=2\n" + ] + }, + { + "data": { + "text/plain": [ + "[Document(page_content='A bunch of scientists bring back dinosaurs and mayhem breaks loose', metadata={'year': 1993, 'genre': 'science fiction', 'rating': 7.7}),\n", + " Document(page_content='A bunch of scientists bring back dinosaurs and mayhem breaks loose', metadata={'year': 1993, 'genre': 'science fiction', 'rating': 7.7})]" + ] + }, + "execution_count": 22, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# This example specifies a query with a LIMIT value\n", + "retriever.get_relevant_documents(\"what are two movies about dinosaurs\")" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/libs/langchain/langchain/retrievers/self_query/base.py b/libs/langchain/langchain/retrievers/self_query/base.py index 2a7d53277bc..1ebf72a4da0 100644 --- a/libs/langchain/langchain/retrievers/self_query/base.py +++ b/libs/langchain/langchain/retrievers/self_query/base.py @@ -18,6 +18,7 @@ from langchain.retrievers.self_query.pinecone import PineconeTranslator from langchain.retrievers.self_query.qdrant import QdrantTranslator from langchain.retrievers.self_query.redis import RedisTranslator from langchain.retrievers.self_query.supabase import SupabaseVectorTranslator +from langchain.retrievers.self_query.timescalevector import TimescaleVectorTranslator from langchain.retrievers.self_query.vectara import VectaraTranslator from langchain.retrievers.self_query.weaviate import WeaviateTranslator from langchain.schema import BaseRetriever, Document @@ -33,6 +34,7 @@ from langchain.vectorstores import ( Qdrant, Redis, SupabaseVectorStore, + TimescaleVector, Vectara, VectorStore, Weaviate, @@ -53,6 +55,7 @@ def _get_builtin_translator(vectorstore: VectorStore) -> Visitor: ElasticsearchStore: ElasticsearchTranslator, Milvus: MilvusTranslator, SupabaseVectorStore: SupabaseVectorTranslator, + TimescaleVector: TimescaleVectorTranslator, } if isinstance(vectorstore, Qdrant): return QdrantTranslator(metadata_key=vectorstore.metadata_payload_key) diff --git a/libs/langchain/langchain/retrievers/self_query/timescalevector.py b/libs/langchain/langchain/retrievers/self_query/timescalevector.py new file mode 100644 index 00000000000..3d417578fe5 --- /dev/null +++ b/libs/langchain/langchain/retrievers/self_query/timescalevector.py @@ -0,0 +1,84 @@ +from __future__ import annotations + +from typing import TYPE_CHECKING, Tuple, Union + +from langchain.chains.query_constructor.ir import ( + Comparator, + Comparison, + Operation, + Operator, + StructuredQuery, + Visitor, +) + +if TYPE_CHECKING: + from timescale_vector import client + + +class TimescaleVectorTranslator(Visitor): + """Translate the internal query language elements to valid filters.""" + + allowed_operators = [Operator.AND, Operator.OR, Operator.NOT] + """Subset of allowed logical operators.""" + + allowed_comparators = [ + Comparator.EQ, + Comparator.GT, + Comparator.GTE, + Comparator.LT, + Comparator.LTE, + ] + + COMPARATOR_MAP = { + Comparator.EQ: "==", + Comparator.GT: ">", + Comparator.GTE: ">=", + Comparator.LT: "<", + Comparator.LTE: "<=", + } + + OPERATOR_MAP = {Operator.AND: "AND", Operator.OR: "OR", Operator.NOT: "NOT"} + + def _format_func(self, func: Union[Operator, Comparator]) -> str: + self._validate_func(func) + if isinstance(func, Operator): + value = self.OPERATOR_MAP[func.value] # type: ignore + elif isinstance(func, Comparator): + value = self.COMPARATOR_MAP[func.value] # type: ignore + return f"{value}" + + def visit_operation(self, operation: Operation) -> client.Predicates: + try: + from timescale_vector import client + except ImportError as e: + raise ImportError( + "Cannot import timescale-vector. Please install with `pip install " + "timescale-vector`." + ) from e + args = [arg.accept(self) for arg in operation.arguments] + return client.Predicates(*args, operator=self._format_func(operation.operator)) + + def visit_comparison(self, comparison: Comparison) -> client.Predicates: + try: + from timescale_vector import client + except ImportError as e: + raise ImportError( + "Cannot import timescale-vector. Please install with `pip install " + "timescale-vector`." + ) from e + return client.Predicates( + ( + comparison.attribute, + self._format_func(comparison.comparator), + comparison.value, + ) + ) + + def visit_structured_query( + self, structured_query: StructuredQuery + ) -> Tuple[str, dict]: + if structured_query.filter is None: + kwargs = {} + else: + kwargs = {"predicates": structured_query.filter.accept(self)} + return structured_query.query, kwargs diff --git a/libs/langchain/langchain/vectorstores/__init__.py b/libs/langchain/langchain/vectorstores/__init__.py index a3981665807..18a24b20b07 100644 --- a/libs/langchain/langchain/vectorstores/__init__.py +++ b/libs/langchain/langchain/vectorstores/__init__.py @@ -70,6 +70,7 @@ from langchain.vectorstores.supabase import SupabaseVectorStore from langchain.vectorstores.tair import Tair from langchain.vectorstores.tencentvectordb import TencentVectorDB from langchain.vectorstores.tigris import Tigris +from langchain.vectorstores.timescalevector import TimescaleVector from langchain.vectorstores.typesense import Typesense from langchain.vectorstores.usearch import USearch from langchain.vectorstores.vald import Vald @@ -135,6 +136,7 @@ __all__ = [ "SupabaseVectorStore", "Tair", "Tigris", + "TimescaleVector", "Typesense", "USearch", "Vald", diff --git a/libs/langchain/langchain/vectorstores/timescalevector.py b/libs/langchain/langchain/vectorstores/timescalevector.py new file mode 100644 index 00000000000..a25cb97c5a4 --- /dev/null +++ b/libs/langchain/langchain/vectorstores/timescalevector.py @@ -0,0 +1,871 @@ +"""VectorStore wrapper around a Postgres-TimescaleVector database.""" +from __future__ import annotations + +import enum +import logging +import uuid +from datetime import timedelta +from typing import ( + TYPE_CHECKING, + Any, + Callable, + Dict, + Iterable, + List, + Optional, + Tuple, + Type, + Union, +) + +from langchain.docstore.document import Document +from langchain.embeddings.base import Embeddings +from langchain.utils import get_from_dict_or_env +from langchain.vectorstores.base import VectorStore +from langchain.vectorstores.utils import DistanceStrategy + +if TYPE_CHECKING: + from timescale_vector import Predicates + + +DEFAULT_DISTANCE_STRATEGY = DistanceStrategy.COSINE + +ADA_TOKEN_COUNT = 1536 + +_LANGCHAIN_DEFAULT_COLLECTION_NAME = "langchain_store" + + +class TimescaleVector(VectorStore): + """VectorStore implementation using the timescale vector client to store vectors + in Postgres. + + To use, you should have the ``timescale_vector`` python package installed. + + Args: + service_url: Service url on timescale cloud. + embedding: Any embedding function implementing + `langchain.embeddings.base.Embeddings` interface. + collection_name: The name of the collection to use. (default: langchain_store) + This will become the table name used for the collection. + distance_strategy: The distance strategy to use. (default: COSINE) + pre_delete_collection: If True, will delete the collection if it exists. + (default: False). Useful for testing. + + Example: + .. code-block:: python + + from langchain.vectorstores import TimescaleVector + from langchain.embeddings.openai import OpenAIEmbeddings + + SERVICE_URL = "postgres://tsdbadmin:@.tsdb.cloud.timescale.com:/tsdb?sslmode=require" + COLLECTION_NAME = "state_of_the_union_test" + embeddings = OpenAIEmbeddings() + vectorestore = TimescaleVector.from_documents( + embedding=embeddings, + documents=docs, + collection_name=COLLECTION_NAME, + service_url=SERVICE_URL, + ) + """ # noqa: E501 + + def __init__( + self, + service_url: str, + embedding: Embeddings, + collection_name: str = _LANGCHAIN_DEFAULT_COLLECTION_NAME, + num_dimensions: int = ADA_TOKEN_COUNT, + distance_strategy: DistanceStrategy = DEFAULT_DISTANCE_STRATEGY, + pre_delete_collection: bool = False, + logger: Optional[logging.Logger] = None, + relevance_score_fn: Optional[Callable[[float], float]] = None, + time_partition_interval: Optional[timedelta] = None, + ) -> None: + try: + from timescale_vector import client + except ImportError: + raise ImportError( + "Could not import timescale_vector python package. " + "Please install it with `pip install timescale-vector`." + ) + + self.service_url = service_url + self.embedding = embedding + self.collection_name = collection_name + self.num_dimensions = num_dimensions + self._distance_strategy = distance_strategy + self.pre_delete_collection = pre_delete_collection + self.logger = logger or logging.getLogger(__name__) + self.override_relevance_score_fn = relevance_score_fn + self._time_partition_interval = time_partition_interval + self.sync_client = client.Sync( + self.service_url, + self.collection_name, + self.num_dimensions, + self._distance_strategy.value.lower(), + time_partition_interval=self._time_partition_interval, + ) + self.async_client = client.Async( + self.service_url, + self.collection_name, + self.num_dimensions, + self._distance_strategy.value.lower(), + time_partition_interval=self._time_partition_interval, + ) + self.__post_init__() + + def __post_init__( + self, + ) -> None: + """ + Initialize the store. + """ + self.sync_client.create_tables() + if self.pre_delete_collection: + self.sync_client.delete_all() + + @property + def embeddings(self) -> Embeddings: + return self.embedding + + def drop_tables(self) -> None: + self.sync_client.drop_table() + + @classmethod + def __from( + cls, + texts: List[str], + embeddings: List[List[float]], + embedding: Embeddings, + metadatas: Optional[List[dict]] = None, + ids: Optional[List[str]] = None, + collection_name: str = _LANGCHAIN_DEFAULT_COLLECTION_NAME, + distance_strategy: DistanceStrategy = DEFAULT_DISTANCE_STRATEGY, + service_url: Optional[str] = None, + pre_delete_collection: bool = False, + **kwargs: Any, + ) -> TimescaleVector: + num_dimensions = len(embeddings[0]) + + if ids is None: + ids = [str(uuid.uuid1()) for _ in texts] + + if not metadatas: + metadatas = [{} for _ in texts] + + if service_url is None: + service_url = cls.get_service_url(kwargs) + + store = cls( + service_url=service_url, + num_dimensions=num_dimensions, + collection_name=collection_name, + embedding=embedding, + distance_strategy=distance_strategy, + pre_delete_collection=pre_delete_collection, + **kwargs, + ) + + store.add_embeddings( + texts=texts, embeddings=embeddings, metadatas=metadatas, ids=ids, **kwargs + ) + + return store + + @classmethod + async def __afrom( + cls, + texts: List[str], + embeddings: List[List[float]], + embedding: Embeddings, + metadatas: Optional[List[dict]] = None, + ids: Optional[List[str]] = None, + collection_name: str = _LANGCHAIN_DEFAULT_COLLECTION_NAME, + distance_strategy: DistanceStrategy = DEFAULT_DISTANCE_STRATEGY, + service_url: Optional[str] = None, + pre_delete_collection: bool = False, + **kwargs: Any, + ) -> TimescaleVector: + num_dimensions = len(embeddings[0]) + + if ids is None: + ids = [str(uuid.uuid1()) for _ in texts] + + if not metadatas: + metadatas = [{} for _ in texts] + + if service_url is None: + service_url = cls.get_service_url(kwargs) + + store = cls( + service_url=service_url, + num_dimensions=num_dimensions, + collection_name=collection_name, + embedding=embedding, + distance_strategy=distance_strategy, + pre_delete_collection=pre_delete_collection, + **kwargs, + ) + + await store.aadd_embeddings( + texts=texts, embeddings=embeddings, metadatas=metadatas, ids=ids, **kwargs + ) + + return store + + def add_embeddings( + self, + texts: Iterable[str], + embeddings: List[List[float]], + metadatas: Optional[List[dict]] = None, + ids: Optional[List[str]] = None, + **kwargs: Any, + ) -> List[str]: + """Add embeddings to the vectorstore. + + Args: + texts: Iterable of strings to add to the vectorstore. + embeddings: List of list of embedding vectors. + metadatas: List of metadatas associated with the texts. + kwargs: vectorstore specific parameters + """ + if ids is None: + ids = [str(uuid.uuid1()) for _ in texts] + + if not metadatas: + metadatas = [{} for _ in texts] + + records = list(zip(ids, metadatas, texts, embeddings)) + self.sync_client.upsert(records) + + return ids + + async def aadd_embeddings( + self, + texts: Iterable[str], + embeddings: List[List[float]], + metadatas: Optional[List[dict]] = None, + ids: Optional[List[str]] = None, + **kwargs: Any, + ) -> List[str]: + """Add embeddings to the vectorstore. + + Args: + texts: Iterable of strings to add to the vectorstore. + embeddings: List of list of embedding vectors. + metadatas: List of metadatas associated with the texts. + kwargs: vectorstore specific parameters + """ + if ids is None: + ids = [str(uuid.uuid1()) for _ in texts] + + if not metadatas: + metadatas = [{} for _ in texts] + + records = list(zip(ids, metadatas, texts, embeddings)) + await self.async_client.upsert(records) + + return ids + + def add_texts( + self, + texts: Iterable[str], + metadatas: Optional[List[dict]] = None, + ids: Optional[List[str]] = None, + **kwargs: Any, + ) -> List[str]: + """Run more texts through the embeddings and add to the vectorstore. + + Args: + texts: Iterable of strings to add to the vectorstore. + metadatas: Optional list of metadatas associated with the texts. + kwargs: vectorstore specific parameters + + Returns: + List of ids from adding the texts into the vectorstore. + """ + embeddings = self.embedding.embed_documents(list(texts)) + return self.add_embeddings( + texts=texts, embeddings=embeddings, metadatas=metadatas, ids=ids, **kwargs + ) + + async def aadd_texts( + self, + texts: Iterable[str], + metadatas: Optional[List[dict]] = None, + ids: Optional[List[str]] = None, + **kwargs: Any, + ) -> List[str]: + """Run more texts through the embeddings and add to the vectorstore. + + Args: + texts: Iterable of strings to add to the vectorstore. + metadatas: Optional list of metadatas associated with the texts. + kwargs: vectorstore specific parameters + + Returns: + List of ids from adding the texts into the vectorstore. + """ + embeddings = self.embedding.embed_documents(list(texts)) + return await self.aadd_embeddings( + texts=texts, embeddings=embeddings, metadatas=metadatas, ids=ids, **kwargs + ) + + def similarity_search( + self, + query: str, + k: int = 4, + filter: Optional[Union[dict, list]] = None, + predicates: Optional[Predicates] = None, + **kwargs: Any, + ) -> List[Document]: + """Run similarity search with TimescaleVector with distance. + + Args: + query (str): Query text to search for. + k (int): Number of results to return. Defaults to 4. + filter (Optional[Dict[str, str]]): Filter by metadata. Defaults to None. + + Returns: + List of Documents most similar to the query. + """ + embedding = self.embedding.embed_query(text=query) + return self.similarity_search_by_vector( + embedding=embedding, + k=k, + filter=filter, + predicates=predicates, + **kwargs, + ) + + async def asimilarity_search( + self, + query: str, + k: int = 4, + filter: Optional[Union[dict, list]] = None, + predicates: Optional[Predicates] = None, + **kwargs: Any, + ) -> List[Document]: + """Run similarity search with TimescaleVector with distance. + + Args: + query (str): Query text to search for. + k (int): Number of results to return. Defaults to 4. + filter (Optional[Dict[str, str]]): Filter by metadata. Defaults to None. + + Returns: + List of Documents most similar to the query. + """ + embedding = self.embedding.embed_query(text=query) + return await self.asimilarity_search_by_vector( + embedding=embedding, + k=k, + filter=filter, + predicates=predicates, + **kwargs, + ) + + def similarity_search_with_score( + self, + query: str, + k: int = 4, + filter: Optional[Union[dict, list]] = None, + predicates: Optional[Predicates] = None, + **kwargs: Any, + ) -> List[Tuple[Document, float]]: + """Return docs most similar to query. + + Args: + query: Text to look up documents similar to. + k: Number of Documents to return. Defaults to 4. + filter (Optional[Dict[str, str]]): Filter by metadata. Defaults to None. + + Returns: + List of Documents most similar to the query and score for each + """ + embedding = self.embedding.embed_query(query) + docs = self.similarity_search_with_score_by_vector( + embedding=embedding, + k=k, + filter=filter, + predicates=predicates, + **kwargs, + ) + return docs + + async def asimilarity_search_with_score( + self, + query: str, + k: int = 4, + filter: Optional[Union[dict, list]] = None, + predicates: Optional[Predicates] = None, + **kwargs: Any, + ) -> List[Tuple[Document, float]]: + """Return docs most similar to query. + + Args: + query: Text to look up documents similar to. + k: Number of Documents to return. Defaults to 4. + filter (Optional[Dict[str, str]]): Filter by metadata. Defaults to None. + + Returns: + List of Documents most similar to the query and score for each + """ + embedding = self.embedding.embed_query(query) + return await self.asimilarity_search_with_score_by_vector( + embedding=embedding, + k=k, + filter=filter, + predicates=predicates, + **kwargs, + ) + + def date_to_range_filter(self, **kwargs: Any) -> Any: + constructor_args = { + key: kwargs[key] + for key in [ + "start_date", + "end_date", + "time_delta", + "start_inclusive", + "end_inclusive", + ] + if key in kwargs + } + if not constructor_args or len(constructor_args) == 0: + return None + + try: + from timescale_vector import client + except ImportError: + raise ImportError( + "Could not import timescale_vector python package. " + "Please install it with `pip install timescale-vector`." + ) + return client.UUIDTimeRange(**constructor_args) + + def similarity_search_with_score_by_vector( + self, + embedding: List[float], + k: int = 4, + filter: Optional[Union[dict, list]] = None, + predicates: Optional[Predicates] = None, + **kwargs: Any, + ) -> List[Tuple[Document, float]]: + try: + from timescale_vector import client + except ImportError: + raise ImportError( + "Could not import timescale_vector python package. " + "Please install it with `pip install timescale-vector`." + ) + + results = self.sync_client.search( + embedding, + limit=k, + filter=filter, + predicates=predicates, + uuid_time_filter=self.date_to_range_filter(**kwargs), + ) + + docs = [ + ( + Document( + page_content=result[client.SEARCH_RESULT_CONTENTS_IDX], + metadata=result[client.SEARCH_RESULT_METADATA_IDX], + ), + result[client.SEARCH_RESULT_DISTANCE_IDX], + ) + for result in results + ] + return docs + + async def asimilarity_search_with_score_by_vector( + self, + embedding: List[float], + k: int = 4, + filter: Optional[Union[dict, list]] = None, + predicates: Optional[Predicates] = None, + **kwargs: Any, + ) -> List[Tuple[Document, float]]: + try: + from timescale_vector import client + except ImportError: + raise ImportError( + "Could not import timescale_vector python package. " + "Please install it with `pip install timescale-vector`." + ) + + results = await self.async_client.search( + embedding, + limit=k, + filter=filter, + predicates=predicates, + uuid_time_filter=self.date_to_range_filter(**kwargs), + ) + + docs = [ + ( + Document( + page_content=result[client.SEARCH_RESULT_CONTENTS_IDX], + metadata=result[client.SEARCH_RESULT_METADATA_IDX], + ), + result[client.SEARCH_RESULT_DISTANCE_IDX], + ) + for result in results + ] + return docs + + def similarity_search_by_vector( + self, + embedding: List[float], + k: int = 4, + filter: Optional[Union[dict, list]] = None, + predicates: Optional[Predicates] = None, + **kwargs: Any, + ) -> List[Document]: + """Return docs most similar to embedding vector. + + Args: + embedding: Embedding to look up documents similar to. + k: Number of Documents to return. Defaults to 4. + filter (Optional[Dict[str, str]]): Filter by metadata. Defaults to None. + + Returns: + List of Documents most similar to the query vector. + """ + docs_and_scores = self.similarity_search_with_score_by_vector( + embedding=embedding, k=k, filter=filter, predicates=predicates, **kwargs + ) + return [doc for doc, _ in docs_and_scores] + + async def asimilarity_search_by_vector( + self, + embedding: List[float], + k: int = 4, + filter: Optional[Union[dict, list]] = None, + predicates: Optional[Predicates] = None, + **kwargs: Any, + ) -> List[Document]: + """Return docs most similar to embedding vector. + + Args: + embedding: Embedding to look up documents similar to. + k: Number of Documents to return. Defaults to 4. + filter (Optional[Dict[str, str]]): Filter by metadata. Defaults to None. + + Returns: + List of Documents most similar to the query vector. + """ + docs_and_scores = await self.asimilarity_search_with_score_by_vector( + embedding=embedding, k=k, filter=filter, predicates=predicates, **kwargs + ) + return [doc for doc, _ in docs_and_scores] + + @classmethod + def from_texts( + cls: Type[TimescaleVector], + texts: List[str], + embedding: Embeddings, + metadatas: Optional[List[dict]] = None, + collection_name: str = _LANGCHAIN_DEFAULT_COLLECTION_NAME, + distance_strategy: DistanceStrategy = DEFAULT_DISTANCE_STRATEGY, + ids: Optional[List[str]] = None, + pre_delete_collection: bool = False, + **kwargs: Any, + ) -> TimescaleVector: + """ + Return VectorStore initialized from texts and embeddings. + Postgres connection string is required + "Either pass it as a parameter + or set the TIMESCALE_SERVICE_URL environment variable. + """ + embeddings = embedding.embed_documents(list(texts)) + + return cls.__from( + texts, + embeddings, + embedding, + metadatas=metadatas, + ids=ids, + collection_name=collection_name, + distance_strategy=distance_strategy, + pre_delete_collection=pre_delete_collection, + **kwargs, + ) + + @classmethod + async def afrom_texts( + cls: Type[TimescaleVector], + texts: List[str], + embedding: Embeddings, + metadatas: Optional[List[dict]] = None, + collection_name: str = _LANGCHAIN_DEFAULT_COLLECTION_NAME, + distance_strategy: DistanceStrategy = DEFAULT_DISTANCE_STRATEGY, + ids: Optional[List[str]] = None, + pre_delete_collection: bool = False, + **kwargs: Any, + ) -> TimescaleVector: + """ + Return VectorStore initialized from texts and embeddings. + Postgres connection string is required + "Either pass it as a parameter + or set the TIMESCALE_SERVICE_URL environment variable. + """ + embeddings = embedding.embed_documents(list(texts)) + + return await cls.__afrom( + texts, + embeddings, + embedding, + metadatas=metadatas, + ids=ids, + collection_name=collection_name, + distance_strategy=distance_strategy, + pre_delete_collection=pre_delete_collection, + **kwargs, + ) + + @classmethod + def from_embeddings( + cls, + text_embeddings: List[Tuple[str, List[float]]], + embedding: Embeddings, + metadatas: Optional[List[dict]] = None, + collection_name: str = _LANGCHAIN_DEFAULT_COLLECTION_NAME, + distance_strategy: DistanceStrategy = DEFAULT_DISTANCE_STRATEGY, + ids: Optional[List[str]] = None, + pre_delete_collection: bool = False, + **kwargs: Any, + ) -> TimescaleVector: + """Construct TimescaleVector wrapper from raw documents and pre- + generated embeddings. + + Return VectorStore initialized from documents and embeddings. + Postgres connection string is required + "Either pass it as a parameter + or set the TIMESCALE_SERVICE_URL environment variable. + + Example: + .. code-block:: python + + from langchain.vectorstores import TimescaleVector + from langchain.embeddings import OpenAIEmbeddings + embeddings = OpenAIEmbeddings() + text_embeddings = embeddings.embed_documents(texts) + text_embedding_pairs = list(zip(texts, text_embeddings)) + tvs = TimescaleVector.from_embeddings(text_embedding_pairs, embeddings) + """ + texts = [t[0] for t in text_embeddings] + embeddings = [t[1] for t in text_embeddings] + + return cls.__from( + texts, + embeddings, + embedding, + metadatas=metadatas, + ids=ids, + collection_name=collection_name, + distance_strategy=distance_strategy, + pre_delete_collection=pre_delete_collection, + **kwargs, + ) + + @classmethod + async def afrom_embeddings( + cls, + text_embeddings: List[Tuple[str, List[float]]], + embedding: Embeddings, + metadatas: Optional[List[dict]] = None, + collection_name: str = _LANGCHAIN_DEFAULT_COLLECTION_NAME, + distance_strategy: DistanceStrategy = DEFAULT_DISTANCE_STRATEGY, + ids: Optional[List[str]] = None, + pre_delete_collection: bool = False, + **kwargs: Any, + ) -> TimescaleVector: + """Construct TimescaleVector wrapper from raw documents and pre- + generated embeddings. + + Return VectorStore initialized from documents and embeddings. + Postgres connection string is required + "Either pass it as a parameter + or set the TIMESCALE_SERVICE_URL environment variable. + + Example: + .. code-block:: python + + from langchain.vectorstores import TimescaleVector + from langchain.embeddings import OpenAIEmbeddings + embeddings = OpenAIEmbeddings() + text_embeddings = embeddings.embed_documents(texts) + text_embedding_pairs = list(zip(texts, text_embeddings)) + tvs = TimescaleVector.from_embeddings(text_embedding_pairs, embeddings) + """ + texts = [t[0] for t in text_embeddings] + embeddings = [t[1] for t in text_embeddings] + + return await cls.__afrom( + texts, + embeddings, + embedding, + metadatas=metadatas, + ids=ids, + collection_name=collection_name, + distance_strategy=distance_strategy, + pre_delete_collection=pre_delete_collection, + **kwargs, + ) + + @classmethod + def from_existing_index( + cls: Type[TimescaleVector], + embedding: Embeddings, + collection_name: str = _LANGCHAIN_DEFAULT_COLLECTION_NAME, + distance_strategy: DistanceStrategy = DEFAULT_DISTANCE_STRATEGY, + pre_delete_collection: bool = False, + **kwargs: Any, + ) -> TimescaleVector: + """ + Get intsance of an existing TimescaleVector store.This method will + return the instance of the store without inserting any new + embeddings + """ + + service_url = cls.get_service_url(kwargs) + + store = cls( + service_url=service_url, + collection_name=collection_name, + embedding=embedding, + distance_strategy=distance_strategy, + pre_delete_collection=pre_delete_collection, + ) + + return store + + @classmethod + def get_service_url(cls, kwargs: Dict[str, Any]) -> str: + service_url: str = get_from_dict_or_env( + data=kwargs, + key="service_url", + env_key="TIMESCALE_SERVICE_URL", + ) + + if not service_url: + raise ValueError( + "Postgres connection string is required" + "Either pass it as a parameter" + "or set the TIMESCALE_SERVICE_URL environment variable." + ) + + return service_url + + @classmethod + def service_url_from_db_params( + cls, + host: str, + port: int, + database: str, + user: str, + password: str, + ) -> str: + """Return connection string from database parameters.""" + return f"postgresql://{user}:{password}@{host}:{port}/{database}" + + def _select_relevance_score_fn(self) -> Callable[[float], float]: + """ + The 'correct' relevance function + may differ depending on a few things, including: + - the distance / similarity metric used by the VectorStore + - the scale of your embeddings (OpenAI's are unit normed. Many others are not!) + - embedding dimensionality + - etc. + """ + if self.override_relevance_score_fn is not None: + return self.override_relevance_score_fn + + # Default strategy is to rely on distance strategy provided + # in vectorstore constructor + if self._distance_strategy == DistanceStrategy.COSINE: + return self._cosine_relevance_score_fn + elif self._distance_strategy == DistanceStrategy.EUCLIDEAN_DISTANCE: + return self._euclidean_relevance_score_fn + elif self._distance_strategy == DistanceStrategy.MAX_INNER_PRODUCT: + return self._max_inner_product_relevance_score_fn + else: + raise ValueError( + "No supported normalization function" + f" for distance_strategy of {self._distance_strategy}." + "Consider providing relevance_score_fn to TimescaleVector constructor." + ) + + def delete(self, ids: Optional[List[str]] = None, **kwargs: Any) -> Optional[bool]: + """Delete by vector ID or other criteria. + + Args: + ids: List of ids to delete. + **kwargs: Other keyword arguments that subclasses might use. + + Returns: + Optional[bool]: True if deletion is successful, + False otherwise, None if not implemented. + """ + if ids is None: + raise ValueError("No ids provided to delete.") + + self.sync_client.delete_by_ids(ids) + return True + + # todo should this be part of delete|()? + def delete_by_metadata( + self, filter: Union[Dict[str, str], List[Dict[str, str]]], **kwargs: Any + ) -> Optional[bool]: + """Delete by vector ID or other criteria. + + Args: + ids: List of ids to delete. + **kwargs: Other keyword arguments that subclasses might use. + + Returns: + Optional[bool]: True if deletion is successful, + False otherwise, None if not implemented. + """ + + self.sync_client.delete_by_metadata(filter) + return True + + class IndexType(str, enum.Enum): + """Enumerator for the supported Index types""" + + TIMESCALE_VECTOR = "tsv" + PGVECTOR_IVFFLAT = "ivfflat" + PGVECTOR_HNSW = "hnsw" + + DEFAULT_INDEX_TYPE = IndexType.TIMESCALE_VECTOR + + def create_index( + self, index_type: Union[IndexType, str] = DEFAULT_INDEX_TYPE, **kwargs: Any + ) -> None: + try: + from timescale_vector import client + except ImportError: + raise ImportError( + "Could not import timescale_vector python package. " + "Please install it with `pip install timescale-vector`." + ) + + index_type = ( + index_type.value if isinstance(index_type, self.IndexType) else index_type + ) + if index_type == self.IndexType.PGVECTOR_IVFFLAT.value: + self.sync_client.create_embedding_index(client.IvfflatIndex(**kwargs)) + + if index_type == self.IndexType.PGVECTOR_HNSW.value: + self.sync_client.create_embedding_index(client.HNSWIndex(**kwargs)) + + if index_type == self.IndexType.TIMESCALE_VECTOR.value: + self.sync_client.create_embedding_index( + client.TimescaleVectorIndex(**kwargs) + ) + + def drop_index(self) -> None: + self.sync_client.drop_embedding_index() diff --git a/libs/langchain/poetry.lock b/libs/langchain/poetry.lock index 84bf4526ebb..07fdf0b0667 100644 --- a/libs/langchain/poetry.lock +++ b/libs/langchain/poetry.lock @@ -1,10 +1,9 @@ -# This file is automatically @generated by Poetry and should not be changed by hand. +# This file is automatically @generated by Poetry 1.5.1 and should not be changed by hand. [[package]] name = "absl-py" version = "1.4.0" description = "Abseil Python Common Libraries, see https://github.com/abseil/abseil-py." -category = "main" optional = true python-versions = ">=3.6" files = [ @@ -16,7 +15,6 @@ files = [ name = "aioboto3" version = "11.3.0" description = "Async boto3 wrapper" -category = "main" optional = true python-versions = ">=3.7,<4.0" files = [ @@ -35,7 +33,6 @@ s3cse = ["cryptography (>=2.3.1)"] name = "aiobotocore" version = "2.6.0" description = "Async client for aws services using botocore and aiohttp" -category = "main" optional = true python-versions = ">=3.7" files = [ @@ -58,7 +55,6 @@ boto3 = ["boto3 (>=1.28.17,<1.28.18)"] name = "aiodns" version = "3.0.0" description = "Simple DNS resolver for asyncio" -category = "main" optional = true python-versions = "*" files = [ @@ -73,7 +69,6 @@ pycares = ">=4.0.0" name = "aiofiles" version = "23.2.1" description = "File support for asyncio." -category = "main" optional = true python-versions = ">=3.7" files = [ @@ -85,7 +80,6 @@ files = [ name = "aiohttp" version = "3.8.5" description = "Async http client/server framework (asyncio)" -category = "main" optional = false python-versions = ">=3.6" files = [ @@ -194,7 +188,6 @@ speedups = ["Brotli", "aiodns", "cchardet"] name = "aiohttp-retry" version = "2.8.3" description = "Simple retry client for aiohttp" -category = "main" optional = true python-versions = ">=3.7" files = [ @@ -209,7 +202,6 @@ aiohttp = "*" name = "aioitertools" version = "0.11.0" description = "itertools and builtins for AsyncIO and mixed iterables" -category = "main" optional = true python-versions = ">=3.6" files = [ @@ -224,7 +216,6 @@ typing_extensions = {version = ">=4.0", markers = "python_version < \"3.10\""} name = "aiosignal" version = "1.3.1" description = "aiosignal: a list of registered asynchronous callbacks" -category = "main" optional = false python-versions = ">=3.7" files = [ @@ -239,7 +230,6 @@ frozenlist = ">=1.1.0" name = "aleph-alpha-client" version = "2.17.0" description = "python client to interact with Aleph Alpha api endpoints" -category = "main" optional = true python-versions = "*" files = [ @@ -267,7 +257,6 @@ types = ["mypy", "types-Pillow", "types-requests"] name = "altair" version = "4.2.2" description = "Altair: A declarative statistical visualization library for Python." -category = "main" optional = true python-versions = ">=3.7" files = [ @@ -290,7 +279,6 @@ dev = ["black", "docutils", "flake8", "ipython", "m2r", "mistune (<2.0.0)", "pyt name = "amadeus" version = "8.1.0" description = "Python module for the Amadeus travel APIs" -category = "main" optional = true python-versions = ">=3.4.8" files = [ @@ -301,7 +289,6 @@ files = [ name = "amazon-textract-caller" version = "0.0.29" description = "Amazon Textract Caller tools" -category = "main" optional = true python-versions = ">=3.6" files = [ @@ -321,7 +308,6 @@ testing = ["amazon-textract-response-parser", "pytest"] name = "amazon-textract-response-parser" version = "1.0.0" description = "Easily parse JSON returned by Amazon Textract." -category = "main" optional = true python-versions = ">=3.8" files = [ @@ -337,7 +323,6 @@ marshmallow = ">=3.14,<4" name = "anyio" version = "3.7.1" description = "High level compatibility layer for multiple asynchronous event loop implementations" -category = "main" optional = false python-versions = ">=3.7" files = [ @@ -359,7 +344,6 @@ trio = ["trio (<0.22)"] name = "appnope" version = "0.1.3" description = "Disable App Nap on macOS >= 10.9" -category = "dev" optional = false python-versions = "*" files = [ @@ -371,7 +355,6 @@ files = [ name = "argon2-cffi" version = "23.1.0" description = "Argon2 for Python" -category = "dev" optional = false python-versions = ">=3.7" files = [ @@ -392,7 +375,6 @@ typing = ["mypy"] name = "argon2-cffi-bindings" version = "21.2.0" description = "Low-level CFFI bindings for Argon2" -category = "dev" optional = false python-versions = ">=3.6" files = [ @@ -430,7 +412,6 @@ tests = ["pytest"] name = "arrow" version = "1.2.3" description = "Better dates & times for Python" -category = "dev" optional = false python-versions = ">=3.6" files = [ @@ -445,7 +426,6 @@ python-dateutil = ">=2.7.0" name = "arxiv" version = "1.4.8" description = "Python wrapper for the arXiv API: http://arxiv.org/help/api/" -category = "main" optional = true python-versions = ">=3.7" files = [ @@ -460,7 +440,6 @@ feedparser = "*" name = "assemblyai" version = "0.17.0" description = "AssemblyAI Python SDK" -category = "main" optional = true python-versions = ">=3.8" files = [ @@ -481,7 +460,6 @@ extras = ["pyaudio (>=0.2.13)"] name = "asttokens" version = "2.2.1" description = "Annotate AST trees with source code positions" -category = "dev" optional = false python-versions = "*" files = [ @@ -499,7 +477,6 @@ test = ["astroid", "pytest"] name = "astunparse" version = "1.6.3" description = "An AST unparser for Python" -category = "main" optional = true python-versions = "*" files = [ @@ -515,7 +492,6 @@ wheel = ">=0.23.0,<1.0" name = "async-lru" version = "2.0.4" description = "Simple LRU cache for asyncio" -category = "dev" optional = false python-versions = ">=3.8" files = [ @@ -530,7 +506,6 @@ typing-extensions = {version = ">=4.0.0", markers = "python_version < \"3.11\""} name = "async-timeout" version = "4.0.3" description = "Timeout context manager for asyncio programs" -category = "main" optional = false python-versions = ">=3.7" files = [ @@ -538,11 +513,63 @@ files = [ {file = "async_timeout-4.0.3-py3-none-any.whl", hash = "sha256:7405140ff1230c310e51dc27b3145b9092d659ce68ff733fb0cefe3ee42be028"}, ] +[[package]] +name = "asyncpg" +version = "0.28.0" +description = "An asyncio PostgreSQL driver" +optional = true +python-versions = ">=3.7.0" +files = [ + {file = "asyncpg-0.28.0-cp310-cp310-macosx_10_9_x86_64.whl", hash = "sha256:0a6d1b954d2b296292ddff4e0060f494bb4270d87fb3655dd23c5c6096d16d83"}, + {file = "asyncpg-0.28.0-cp310-cp310-macosx_11_0_arm64.whl", hash = "sha256:0740f836985fd2bd73dca42c50c6074d1d61376e134d7ad3ad7566c4f79f8184"}, + {file = "asyncpg-0.28.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:e907cf620a819fab1737f2dd90c0f185e2a796f139ac7de6aa3212a8af96c050"}, + {file = "asyncpg-0.28.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:86b339984d55e8202e0c4b252e9573e26e5afa05617ed02252544f7b3e6de3e9"}, + {file = "asyncpg-0.28.0-cp310-cp310-musllinux_1_1_aarch64.whl", hash = "sha256:0c402745185414e4c204a02daca3d22d732b37359db4d2e705172324e2d94e85"}, + {file = "asyncpg-0.28.0-cp310-cp310-musllinux_1_1_x86_64.whl", hash = "sha256:c88eef5e096296626e9688f00ab627231f709d0e7e3fb84bb4413dff81d996d7"}, + {file = "asyncpg-0.28.0-cp310-cp310-win32.whl", hash = "sha256:90a7bae882a9e65a9e448fdad3e090c2609bb4637d2a9c90bfdcebbfc334bf89"}, + {file = "asyncpg-0.28.0-cp310-cp310-win_amd64.whl", hash = "sha256:76aacdcd5e2e9999e83c8fbcb748208b60925cc714a578925adcb446d709016c"}, + {file = "asyncpg-0.28.0-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:a0e08fe2c9b3618459caaef35979d45f4e4f8d4f79490c9fa3367251366af207"}, + {file = "asyncpg-0.28.0-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:b24e521f6060ff5d35f761a623b0042c84b9c9b9fb82786aadca95a9cb4a893b"}, + {file = "asyncpg-0.28.0-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:99417210461a41891c4ff301490a8713d1ca99b694fef05dabd7139f9d64bd6c"}, + {file = "asyncpg-0.28.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:f029c5adf08c47b10bcdc857001bbef551ae51c57b3110964844a9d79ca0f267"}, + {file = "asyncpg-0.28.0-cp311-cp311-musllinux_1_1_aarch64.whl", hash = "sha256:ad1d6abf6c2f5152f46fff06b0e74f25800ce8ec6c80967f0bc789974de3c652"}, + {file = "asyncpg-0.28.0-cp311-cp311-musllinux_1_1_x86_64.whl", hash = "sha256:d7fa81ada2807bc50fea1dc741b26a4e99258825ba55913b0ddbf199a10d69d8"}, + {file = "asyncpg-0.28.0-cp311-cp311-win32.whl", hash = "sha256:f33c5685e97821533df3ada9384e7784bd1e7865d2b22f153f2e4bd4a083e102"}, + {file = "asyncpg-0.28.0-cp311-cp311-win_amd64.whl", hash = "sha256:5e7337c98fb493079d686a4a6965e8bcb059b8e1b8ec42106322fc6c1c889bb0"}, + {file = "asyncpg-0.28.0-cp37-cp37m-macosx_10_9_x86_64.whl", hash = "sha256:1c56092465e718a9fdcc726cc3d9dcf3a692e4834031c9a9f871d92a75d20d48"}, + {file = "asyncpg-0.28.0-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:4acd6830a7da0eb4426249d71353e8895b350daae2380cb26d11e0d4a01c5472"}, + {file = "asyncpg-0.28.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:63861bb4a540fa033a56db3bb58b0c128c56fad5d24e6d0a8c37cb29b17c1c7d"}, + {file = "asyncpg-0.28.0-cp37-cp37m-musllinux_1_1_aarch64.whl", hash = "sha256:a93a94ae777c70772073d0512f21c74ac82a8a49be3a1d982e3f259ab5f27307"}, + {file = "asyncpg-0.28.0-cp37-cp37m-musllinux_1_1_x86_64.whl", hash = "sha256:d14681110e51a9bc9c065c4e7944e8139076a778e56d6f6a306a26e740ed86d2"}, + {file = "asyncpg-0.28.0-cp37-cp37m-win32.whl", hash = "sha256:8aec08e7310f9ab322925ae5c768532e1d78cfb6440f63c078b8392a38aa636a"}, + {file = "asyncpg-0.28.0-cp37-cp37m-win_amd64.whl", hash = "sha256:319f5fa1ab0432bc91fb39b3960b0d591e6b5c7844dafc92c79e3f1bff96abef"}, + {file = "asyncpg-0.28.0-cp38-cp38-macosx_10_9_x86_64.whl", hash = "sha256:b337ededaabc91c26bf577bfcd19b5508d879c0ad009722be5bb0a9dd30b85a0"}, + {file = "asyncpg-0.28.0-cp38-cp38-macosx_11_0_arm64.whl", hash = "sha256:4d32b680a9b16d2957a0a3cc6b7fa39068baba8e6b728f2e0a148a67644578f4"}, + {file = "asyncpg-0.28.0-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:f4f62f04cdf38441a70f279505ef3b4eadf64479b17e707c950515846a2df197"}, + {file = "asyncpg-0.28.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:4f20cac332c2576c79c2e8e6464791c1f1628416d1115935a34ddd7121bfc6a4"}, + {file = "asyncpg-0.28.0-cp38-cp38-musllinux_1_1_aarch64.whl", hash = "sha256:59f9712ce01e146ff71d95d561fb68bd2d588a35a187116ef05028675462d5ed"}, + {file = "asyncpg-0.28.0-cp38-cp38-musllinux_1_1_x86_64.whl", hash = "sha256:fc9e9f9ff1aa0eddcc3247a180ac9e9b51a62311e988809ac6152e8fb8097756"}, + {file = "asyncpg-0.28.0-cp38-cp38-win32.whl", hash = "sha256:9e721dccd3838fcff66da98709ed884df1e30a95f6ba19f595a3706b4bc757e3"}, + {file = "asyncpg-0.28.0-cp38-cp38-win_amd64.whl", hash = "sha256:8ba7d06a0bea539e0487234511d4adf81dc8762249858ed2a580534e1720db00"}, + {file = "asyncpg-0.28.0-cp39-cp39-macosx_10_9_x86_64.whl", hash = "sha256:d009b08602b8b18edef3a731f2ce6d3f57d8dac2a0a4140367e194eabd3de457"}, + {file = "asyncpg-0.28.0-cp39-cp39-macosx_11_0_arm64.whl", hash = "sha256:ec46a58d81446d580fb21b376ec6baecab7288ce5a578943e2fc7ab73bf7eb39"}, + {file = "asyncpg-0.28.0-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:7b48ceed606cce9e64fd5480a9b0b9a95cea2b798bb95129687abd8599c8b019"}, + {file = "asyncpg-0.28.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:8858f713810f4fe67876728680f42e93b7e7d5c7b61cf2118ef9153ec16b9423"}, + {file = "asyncpg-0.28.0-cp39-cp39-musllinux_1_1_aarch64.whl", hash = "sha256:5e18438a0730d1c0c1715016eacda6e9a505fc5aa931b37c97d928d44941b4bf"}, + {file = "asyncpg-0.28.0-cp39-cp39-musllinux_1_1_x86_64.whl", hash = "sha256:e9c433f6fcdd61c21a715ee9128a3ca48be8ac16fa07be69262f016bb0f4dbd2"}, + {file = "asyncpg-0.28.0-cp39-cp39-win32.whl", hash = "sha256:41e97248d9076bc8e4849da9e33e051be7ba37cd507cbd51dfe4b2d99c70e3dc"}, + {file = "asyncpg-0.28.0-cp39-cp39-win_amd64.whl", hash = "sha256:3ed77f00c6aacfe9d79e9eff9e21729ce92a4b38e80ea99a58ed382f42ebd55b"}, + {file = "asyncpg-0.28.0.tar.gz", hash = "sha256:7252cdc3acb2f52feaa3664280d3bcd78a46bd6c10bfd681acfffefa1120e278"}, +] + +[package.extras] +docs = ["Sphinx (>=5.3.0,<5.4.0)", "sphinx-rtd-theme (>=1.2.2)", "sphinxcontrib-asyncio (>=0.3.0,<0.4.0)"] +test = ["flake8 (>=5.0,<6.0)", "uvloop (>=0.15.3)"] + [[package]] name = "atlassian-python-api" version = "3.41.0" description = "Python Atlassian REST API Wrapper" -category = "main" optional = true python-versions = "*" files = [ @@ -564,7 +591,6 @@ kerberos = ["requests-kerberos"] name = "attr" version = "0.3.2" description = "Simple decorator to set attributes of target function or class in a DRY way." -category = "main" optional = true python-versions = "*" files = [ @@ -576,7 +602,6 @@ files = [ name = "attrs" version = "23.1.0" description = "Classes Without Boilerplate" -category = "main" optional = false python-versions = ">=3.7" files = [ @@ -595,7 +620,6 @@ tests-no-zope = ["cloudpickle", "hypothesis", "mypy (>=1.1.1)", "pympler", "pyte name = "audioread" version = "3.0.0" description = "multi-library, cross-platform audio decoding" -category = "main" optional = true python-versions = ">=3.6" files = [ @@ -606,7 +630,6 @@ files = [ name = "authlib" version = "1.2.1" description = "The ultimate Python library in building OAuth and OpenID Connect servers and clients." -category = "main" optional = true python-versions = "*" files = [ @@ -621,7 +644,6 @@ cryptography = ">=3.2" name = "awadb" version = "0.3.10" description = "AI Native database for embedding vectors" -category = "main" optional = true python-versions = ">=3.7" files = [ @@ -648,7 +670,6 @@ test = ["pytest (>=6.0)"] name = "azure-ai-formrecognizer" version = "3.3.0" description = "Microsoft Azure Form Recognizer Client Library for Python" -category = "main" optional = true python-versions = ">=3.7" files = [ @@ -666,7 +687,6 @@ typing-extensions = ">=4.0.1" name = "azure-ai-vision" version = "0.11.1b1" description = "Microsoft Azure AI Vision SDK for Python" -category = "main" optional = true python-versions = ">=3.7" files = [ @@ -678,7 +698,6 @@ files = [ name = "azure-cognitiveservices-speech" version = "1.31.0" description = "Microsoft Cognitive Services Speech SDK for Python" -category = "main" optional = true python-versions = ">=3.7" files = [ @@ -694,7 +713,6 @@ files = [ name = "azure-common" version = "1.1.28" description = "Microsoft Azure Client Library for Python (Common)" -category = "main" optional = true python-versions = "*" files = [ @@ -706,7 +724,6 @@ files = [ name = "azure-core" version = "1.29.1" description = "Microsoft Azure Core Library for Python" -category = "main" optional = true python-versions = ">=3.7" files = [ @@ -726,7 +743,6 @@ aio = ["aiohttp (>=3.0)"] name = "azure-cosmos" version = "4.5.0" description = "Microsoft Azure Cosmos Client Library for Python" -category = "main" optional = true python-versions = ">=3.6" files = [ @@ -741,7 +757,6 @@ azure-core = ">=1.23.0,<2.0.0" name = "azure-identity" version = "1.14.0" description = "Microsoft Azure Identity Library for Python" -category = "main" optional = true python-versions = ">=3.7" files = [ @@ -759,7 +774,6 @@ msal-extensions = ">=0.3.0,<2.0.0" name = "azure-search-documents" version = "11.4.0b8" description = "Microsoft Azure Cognitive Search Client Library for Python" -category = "main" optional = true python-versions = ">=3.7" files = [ @@ -776,7 +790,6 @@ isodate = ">=0.6.0" name = "babel" version = "2.12.1" description = "Internationalization utilities" -category = "dev" optional = false python-versions = ">=3.7" files = [ @@ -791,7 +804,6 @@ pytz = {version = ">=2015.7", markers = "python_version < \"3.9\""} name = "backcall" version = "0.2.0" description = "Specifications for callback functions passed in to an API" -category = "dev" optional = false python-versions = "*" files = [ @@ -803,7 +815,6 @@ files = [ name = "backoff" version = "2.2.1" description = "Function decoration for backoff and retry" -category = "main" optional = true python-versions = ">=3.7,<4.0" files = [ @@ -815,7 +826,6 @@ files = [ name = "backports-zoneinfo" version = "0.2.1" description = "Backport of the standard library zoneinfo module" -category = "main" optional = true python-versions = ">=3.6" files = [ @@ -844,7 +854,6 @@ tzdata = ["tzdata"] name = "beautifulsoup4" version = "4.12.2" description = "Screen-scraping library" -category = "main" optional = false python-versions = ">=3.6.0" files = [ @@ -863,7 +872,6 @@ lxml = ["lxml"] name = "bibtexparser" version = "1.4.0" description = "Bibtex parser for python 3" -category = "main" optional = true python-versions = "*" files = [ @@ -877,7 +885,6 @@ pyparsing = ">=2.0.3" name = "black" version = "23.7.0" description = "The uncompromising code formatter." -category = "dev" optional = false python-versions = ">=3.8" files = [ @@ -924,7 +931,6 @@ uvloop = ["uvloop (>=0.15.2)"] name = "bleach" version = "6.0.0" description = "An easy safelist-based HTML-sanitizing tool." -category = "dev" optional = false python-versions = ">=3.7" files = [ @@ -943,7 +949,6 @@ css = ["tinycss2 (>=1.1.0,<1.2)"] name = "blinker" version = "1.6.2" description = "Fast, simple object-to-object and broadcast signaling" -category = "main" optional = true python-versions = ">=3.7" files = [ @@ -955,7 +960,6 @@ files = [ name = "boto3" version = "1.28.17" description = "The AWS SDK for Python" -category = "main" optional = true python-versions = ">= 3.7" files = [ @@ -975,7 +979,6 @@ crt = ["botocore[crt] (>=1.21.0,<2.0a0)"] name = "botocore" version = "1.31.17" description = "Low-level, data-driven core of boto 3." -category = "main" optional = true python-versions = ">= 3.7" files = [ @@ -995,7 +998,6 @@ crt = ["awscrt (==0.16.26)"] name = "brotli" version = "1.0.9" description = "Python bindings for the Brotli compression library" -category = "main" optional = true python-versions = "*" files = [ @@ -1087,7 +1089,6 @@ files = [ name = "brotlicffi" version = "1.0.9.2" description = "Python CFFI bindings to the Brotli library" -category = "main" optional = true python-versions = "*" files = [ @@ -1130,7 +1131,6 @@ cffi = ">=1.0.0" name = "build" version = "0.10.0" description = "A simple, correct Python build frontend" -category = "main" optional = true python-versions = ">= 3.7" files = [ @@ -1154,7 +1154,6 @@ virtualenv = ["virtualenv (>=20.0.35)"] name = "cachetools" version = "5.3.1" description = "Extensible memoizing collections and decorators" -category = "main" optional = true python-versions = ">=3.7" files = [ @@ -1166,7 +1165,6 @@ files = [ name = "cassandra-driver" version = "3.28.0" description = "DataStax Driver for Apache Cassandra" -category = "main" optional = false python-versions = "*" files = [ @@ -1218,7 +1216,6 @@ graph = ["gremlinpython (==3.4.6)"] name = "cassio" version = "0.1.0" description = "A framework-agnostic Python library to seamlessly integrate Apache Cassandra(R) with ML/LLM/genAI workloads." -category = "main" optional = false python-versions = ">=3.8" files = [ @@ -1234,7 +1231,6 @@ numpy = ">=1.0" name = "certifi" version = "2023.7.22" description = "Python package for providing Mozilla's CA Bundle." -category = "main" optional = false python-versions = ">=3.6" files = [ @@ -1246,7 +1242,6 @@ files = [ name = "cffi" version = "1.15.1" description = "Foreign Function Interface for Python calling C code." -category = "main" optional = false python-versions = "*" files = [ @@ -1323,7 +1318,6 @@ pycparser = "*" name = "chardet" version = "5.2.0" description = "Universal encoding detector for Python 3" -category = "main" optional = true python-versions = ">=3.7" files = [ @@ -1335,7 +1329,6 @@ files = [ name = "charset-normalizer" version = "3.2.0" description = "The Real First Universal Charset Detector. Open, modern and actively maintained alternative to Chardet." -category = "main" optional = false python-versions = ">=3.7.0" files = [ @@ -1420,7 +1413,6 @@ files = [ name = "clarifai" version = "9.7.1" description = "Clarifai Python Utilities" -category = "main" optional = true python-versions = ">=3.8" files = [ @@ -1437,7 +1429,6 @@ tritonclient = "2.34.0" name = "clarifai-grpc" version = "9.7.3" description = "Clarifai gRPC API Client" -category = "main" optional = true python-versions = ">=3.8" files = [ @@ -1455,7 +1446,6 @@ requests = ">=2.25.1" name = "click" version = "8.1.7" description = "Composable command line interface toolkit" -category = "main" optional = false python-versions = ">=3.7" files = [ @@ -1470,7 +1460,6 @@ colorama = {version = "*", markers = "platform_system == \"Windows\""} name = "click-plugins" version = "1.1.1" description = "An extension module for click to enable registering CLI commands via setuptools entry-points." -category = "main" optional = true python-versions = "*" files = [ @@ -1488,7 +1477,6 @@ dev = ["coveralls", "pytest (>=3.6)", "pytest-cov", "wheel"] name = "clickhouse-connect" version = "0.5.25" description = "ClickHouse core driver, SqlAlchemy, and Superset libraries" -category = "main" optional = true python-versions = "~=3.7" files = [ @@ -1578,7 +1566,6 @@ superset = ["apache-superset (>=1.4.1)"] name = "cligj" version = "0.7.2" description = "Click params for commmand line interfaces to GeoJSON" -category = "main" optional = true python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, <4" files = [ @@ -1596,7 +1583,6 @@ test = ["pytest-cov"] name = "codespell" version = "2.2.5" description = "Codespell" -category = "dev" optional = false python-versions = ">=3.7" files = [ @@ -1614,7 +1600,6 @@ types = ["chardet (>=5.1.0)", "mypy", "pytest", "pytest-cov", "pytest-dependency name = "cohere" version = "4.21" description = "" -category = "main" optional = true python-versions = ">=3.7,<4.0" files = [ @@ -1634,7 +1619,6 @@ urllib3 = ">=1.26,<3" name = "colorama" version = "0.4.6" description = "Cross-platform colored terminal text." -category = "main" optional = false python-versions = "!=3.0.*,!=3.1.*,!=3.2.*,!=3.3.*,!=3.4.*,!=3.5.*,!=3.6.*,>=2.7" files = [ @@ -1646,7 +1630,6 @@ files = [ name = "colored" version = "1.4.4" description = "Simple library for color and formatting to terminal" -category = "dev" optional = false python-versions = "*" files = [ @@ -1657,7 +1640,6 @@ files = [ name = "comm" version = "0.1.4" description = "Jupyter Python Comm implementation, for usage in ipykernel, xeus-python etc." -category = "dev" optional = false python-versions = ">=3.6" files = [ @@ -1677,7 +1659,6 @@ typing = ["mypy (>=0.990)"] name = "coverage" version = "7.3.0" description = "Code coverage measurement for Python" -category = "dev" optional = false python-versions = ">=3.8" files = [ @@ -1745,7 +1726,6 @@ toml = ["tomli"] name = "cryptography" version = "41.0.3" description = "cryptography is a package which provides cryptographic recipes and primitives to Python developers." -category = "main" optional = false python-versions = ">=3.7" files = [ @@ -1791,7 +1771,6 @@ test-randomorder = ["pytest-randomly"] name = "cssselect" version = "1.2.0" description = "cssselect parses CSS3 Selectors and translates them to XPath 1.0" -category = "main" optional = true python-versions = ">=3.7" files = [ @@ -1803,7 +1782,6 @@ files = [ name = "dashvector" version = "1.0.1" description = "DashVector Client Python Sdk Library" -category = "main" optional = true python-versions = ">=3.7.0" files = [ @@ -1823,7 +1801,6 @@ protobuf = ">=3.8.0,<4.0.0" name = "dataclasses-json" version = "0.5.9" description = "Easily serialize dataclasses to and from JSON" -category = "main" optional = false python-versions = ">=3.6" files = [ @@ -1843,7 +1820,6 @@ dev = ["flake8", "hypothesis", "ipython", "mypy (>=0.710)", "portray", "pytest ( name = "debugpy" version = "1.6.7.post1" description = "An implementation of the Debug Adapter Protocol for Python" -category = "dev" optional = false python-versions = ">=3.7" files = [ @@ -1871,7 +1847,6 @@ files = [ name = "decorator" version = "5.1.1" description = "Decorators for Humans" -category = "main" optional = false python-versions = ">=3.5" files = [ @@ -1883,7 +1858,6 @@ files = [ name = "deeplake" version = "3.6.19" description = "Activeloop Deep Lake" -category = "main" optional = true python-versions = "*" files = [ @@ -1921,7 +1895,6 @@ visualizer = ["IPython", "flask"] name = "defusedxml" version = "0.7.1" description = "XML bomb protection for Python stdlib modules" -category = "dev" optional = false python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.*" files = [ @@ -1933,7 +1906,6 @@ files = [ name = "deprecated" version = "1.2.14" description = "Python @deprecated decorator to deprecate old python classes, functions or methods." -category = "main" optional = true python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*" files = [ @@ -1951,7 +1923,6 @@ dev = ["PyTest", "PyTest-Cov", "bump2version (<1)", "sphinx (<2)", "tox"] name = "deprecation" version = "2.1.0" description = "A library to handle automated deprecations" -category = "main" optional = true python-versions = "*" files = [ @@ -1966,7 +1937,6 @@ packaging = "*" name = "dill" version = "0.3.7" description = "serialize all of Python" -category = "main" optional = true python-versions = ">=3.7" files = [ @@ -1981,7 +1951,6 @@ graph = ["objgraph (>=1.7.2)"] name = "dnspython" version = "2.4.2" description = "DNS toolkit" -category = "main" optional = true python-versions = ">=3.8,<4.0" files = [ @@ -2001,7 +1970,6 @@ wmi = ["wmi (>=1.5.1,<2.0.0)"] name = "docarray" version = "0.32.1" description = "The data structure for multimodal data" -category = "main" optional = true python-versions = ">=3.7,<4.0" files = [ @@ -2040,7 +2008,6 @@ web = ["fastapi (>=0.87.0)"] name = "docker" version = "6.1.3" description = "A Python library for the Docker Engine API." -category = "main" optional = true python-versions = ">=3.7" files = [ @@ -2062,7 +2029,6 @@ ssh = ["paramiko (>=2.4.3)"] name = "docopt" version = "0.6.2" description = "Pythonic argument parser, that will make you smile" -category = "main" optional = true python-versions = "*" files = [ @@ -2073,7 +2039,6 @@ files = [ name = "duckdb" version = "0.8.1" description = "DuckDB embedded database" -category = "dev" optional = false python-versions = "*" files = [ @@ -2135,7 +2100,6 @@ files = [ name = "duckdb-engine" version = "0.7.3" description = "SQLAlchemy driver for duckdb" -category = "dev" optional = false python-versions = ">=3.7" files = [ @@ -2152,7 +2116,6 @@ sqlalchemy = ">=1.3.22" name = "duckduckgo-search" version = "3.8.5" description = "Search for words, documents, images, news, maps and text translation using the DuckDuckGo.com search engine." -category = "main" optional = true python-versions = ">=3.7" files = [ @@ -2170,7 +2133,6 @@ lxml = ">=4.9.2" name = "elastic-transport" version = "8.4.0" description = "Transport classes and utilities shared among Python Elastic client libraries" -category = "main" optional = true python-versions = ">=3.6" files = [ @@ -2189,7 +2151,6 @@ develop = ["aiohttp", "mock", "pytest", "pytest-asyncio", "pytest-cov", "pytest- name = "elasticsearch" version = "8.9.0" description = "Python client for Elasticsearch" -category = "main" optional = true python-versions = ">=3.6, <4" files = [ @@ -2208,7 +2169,6 @@ requests = ["requests (>=2.4.0,<3.0.0)"] name = "entrypoints" version = "0.4" description = "Discover and load entry points from installed packages." -category = "main" optional = true python-versions = ">=3.6" files = [ @@ -2220,7 +2180,6 @@ files = [ name = "esprima" version = "4.0.1" description = "ECMAScript parsing infrastructure for multipurpose analysis in Python" -category = "main" optional = true python-versions = "*" files = [ @@ -2231,7 +2190,6 @@ files = [ name = "exceptiongroup" version = "1.1.3" description = "Backport of PEP 654 (exception groups)" -category = "main" optional = false python-versions = ">=3.7" files = [ @@ -2246,7 +2204,6 @@ test = ["pytest (>=6)"] name = "executing" version = "1.2.0" description = "Get the currently executing AST node of a frame, and other information" -category = "dev" optional = false python-versions = "*" files = [ @@ -2261,7 +2218,6 @@ tests = ["asttokens", "littleutils", "pytest", "rich"] name = "faiss-cpu" version = "1.7.4" description = "A library for efficient similarity search and clustering of dense vectors." -category = "main" optional = true python-versions = "*" files = [ @@ -2296,7 +2252,6 @@ files = [ name = "fastavro" version = "1.8.2" description = "Fast read/write of AVRO files" -category = "main" optional = true python-versions = ">=3.8" files = [ @@ -2337,7 +2292,6 @@ zstandard = ["zstandard"] name = "fastjsonschema" version = "2.18.0" description = "Fastest Python implementation of JSON schema" -category = "dev" optional = false python-versions = "*" files = [ @@ -2352,7 +2306,6 @@ devel = ["colorama", "json-spec", "jsonschema", "pylint", "pytest", "pytest-benc name = "feedfinder2" version = "0.0.4" description = "Find the feed URLs for a website." -category = "main" optional = true python-versions = "*" files = [ @@ -2368,7 +2321,6 @@ six = "*" name = "feedparser" version = "6.0.10" description = "Universal feed parser, handles RSS 0.9x, RSS 1.0, RSS 2.0, CDF, Atom 0.3, and Atom 1.0 feeds" -category = "main" optional = true python-versions = ">=3.6" files = [ @@ -2383,7 +2335,6 @@ sgmllib3k = "*" name = "filelock" version = "3.12.2" description = "A platform independent file lock." -category = "main" optional = true python-versions = ">=3.7" files = [ @@ -2399,7 +2350,6 @@ testing = ["covdefaults (>=2.3)", "coverage (>=7.2.7)", "diff-cover (>=7.5)", "p name = "fiona" version = "1.9.4.post1" description = "Fiona reads and writes spatial data files" -category = "main" optional = true python-versions = ">=3.7" files = [ @@ -2444,7 +2394,6 @@ test = ["Fiona[s3]", "pytest (>=7)", "pytest-cov", "pytz"] name = "flatbuffers" version = "23.5.26" description = "The FlatBuffers serialization format for Python" -category = "main" optional = true python-versions = "*" files = [ @@ -2456,7 +2405,6 @@ files = [ name = "fqdn" version = "1.5.1" description = "Validates fully-qualified domain names against RFC 1123, so that they are acceptable to modern bowsers" -category = "dev" optional = false python-versions = ">=2.7, !=3.0, !=3.1, !=3.2, !=3.3, !=3.4, <4" files = [ @@ -2468,7 +2416,6 @@ files = [ name = "freezegun" version = "1.2.2" description = "Let your Python tests travel through time" -category = "dev" optional = false python-versions = ">=3.6" files = [ @@ -2483,7 +2430,6 @@ python-dateutil = ">=2.7" name = "frozenlist" version = "1.4.0" description = "A list-like structure which implements collections.abc.MutableSequence" -category = "main" optional = false python-versions = ">=3.8" files = [ @@ -2554,7 +2500,6 @@ files = [ name = "fsspec" version = "2023.6.0" description = "File-system specification" -category = "main" optional = true python-versions = ">=3.8" files = [ @@ -2590,7 +2535,6 @@ tqdm = ["tqdm"] name = "future" version = "0.18.3" description = "Clean single-source support for Python 3 and 2" -category = "main" optional = true python-versions = ">=2.6, !=3.0.*, !=3.1.*, !=3.2.*" files = [ @@ -2601,7 +2545,6 @@ files = [ name = "gast" version = "0.4.0" description = "Python AST that abstracts the underlying Python version" -category = "main" optional = true python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*" files = [ @@ -2613,7 +2556,6 @@ files = [ name = "geojson" version = "2.5.0" description = "Python bindings and utilities for GeoJSON" -category = "main" optional = true python-versions = "*" files = [ @@ -2625,7 +2567,6 @@ files = [ name = "geomet" version = "0.2.1.post1" description = "GeoJSON <-> WKT/WKB conversion utilities" -category = "main" optional = false python-versions = ">2.6, !=3.3.*, <4" files = [ @@ -2641,7 +2582,6 @@ six = "*" name = "geopandas" version = "0.13.2" description = "Geographic pandas extensions" -category = "main" optional = true python-versions = ">=3.8" files = [ @@ -2660,7 +2600,6 @@ shapely = ">=1.7.1" name = "gitdb" version = "4.0.10" description = "Git Object Database" -category = "main" optional = true python-versions = ">=3.7" files = [ @@ -2675,7 +2614,6 @@ smmap = ">=3.0.1,<6" name = "gitpython" version = "3.1.32" description = "GitPython is a Python library used to interact with Git repositories" -category = "main" optional = true python-versions = ">=3.7" files = [ @@ -2690,7 +2628,6 @@ gitdb = ">=4.0.1,<5" name = "google-api-core" version = "2.11.1" description = "Google API client core library" -category = "main" optional = true python-versions = ">=3.7" files = [ @@ -2713,7 +2650,6 @@ grpcio-gcp = ["grpcio-gcp (>=0.2.2,<1.0.dev0)"] name = "google-api-python-client" version = "2.70.0" description = "Google API Client Library for Python" -category = "main" optional = true python-versions = ">=3.7" files = [ @@ -2722,7 +2658,7 @@ files = [ ] [package.dependencies] -google-api-core = ">=1.31.5,<2.0.0 || >2.3.0,<3.0.0dev" +google-api-core = ">=1.31.5,<2.0.dev0 || >2.3.0,<3.0.0dev" google-auth = ">=1.19.0,<3.0.0dev" google-auth-httplib2 = ">=0.1.0" httplib2 = ">=0.15.0,<1dev" @@ -2732,7 +2668,6 @@ uritemplate = ">=3.0.1,<5" name = "google-auth" version = "2.22.0" description = "Google Authentication Library" -category = "main" optional = true python-versions = ">=3.6" files = [ @@ -2758,7 +2693,6 @@ requests = ["requests (>=2.20.0,<3.0.0.dev0)"] name = "google-auth-httplib2" version = "0.1.0" description = "Google Authentication Library: httplib2 transport" -category = "main" optional = true python-versions = "*" files = [ @@ -2775,7 +2709,6 @@ six = "*" name = "google-auth-oauthlib" version = "1.0.0" description = "Google Authentication Library" -category = "main" optional = true python-versions = ">=3.6" files = [ @@ -2794,7 +2727,6 @@ tool = ["click (>=6.0.0)"] name = "google-pasta" version = "0.2.0" description = "pasta is an AST-based Python refactoring library" -category = "main" optional = true python-versions = "*" files = [ @@ -2810,7 +2742,6 @@ six = "*" name = "google-search-results" version = "2.4.2" description = "Scrape and search localized results from Google, Bing, Baidu, Yahoo, Yandex, Ebay, Homedepot, youtube at scale using SerpApi.com" -category = "main" optional = true python-versions = ">=3.5" files = [ @@ -2824,7 +2755,6 @@ requests = "*" name = "googleapis-common-protos" version = "1.60.0" description = "Common protobufs used in Google APIs" -category = "main" optional = true python-versions = ">=3.7" files = [ @@ -2842,7 +2772,6 @@ grpc = ["grpcio (>=1.44.0,<2.0.0.dev0)"] name = "gptcache" version = "0.1.39.1" description = "GPTCache, a powerful caching library that can be used to speed up and lower the cost of chat applications that rely on the LLM service. GPTCache works as a memcache for AIGC applications, similar to how Redis works for traditional applications." -category = "main" optional = true python-versions = ">=3.8.1" files = [ @@ -2859,7 +2788,6 @@ requests = "*" name = "gql" version = "3.4.1" description = "GraphQL client for Python" -category = "main" optional = true python-versions = "*" files = [ @@ -2886,7 +2814,6 @@ websockets = ["websockets (>=10,<11)", "websockets (>=9,<10)"] name = "graphql-core" version = "3.2.3" description = "GraphQL implementation for Python, a port of GraphQL.js, the JavaScript reference implementation for GraphQL." -category = "main" optional = true python-versions = ">=3.6,<4" files = [ @@ -2898,7 +2825,6 @@ files = [ name = "greenlet" version = "2.0.2" description = "Lightweight in-process concurrent programming" -category = "main" optional = false python-versions = ">=2.7,!=3.0.*,!=3.1.*,!=3.2.*,!=3.3.*,!=3.4.*" files = [ @@ -2907,6 +2833,7 @@ files = [ {file = "greenlet-2.0.2-cp27-cp27m-win32.whl", hash = "sha256:6c3acb79b0bfd4fe733dff8bc62695283b57949ebcca05ae5c129eb606ff2d74"}, {file = "greenlet-2.0.2-cp27-cp27m-win_amd64.whl", hash = "sha256:283737e0da3f08bd637b5ad058507e578dd462db259f7f6e4c5c365ba4ee9343"}, {file = "greenlet-2.0.2-cp27-cp27mu-manylinux2010_x86_64.whl", hash = "sha256:d27ec7509b9c18b6d73f2f5ede2622441de812e7b1a80bbd446cb0633bd3d5ae"}, + {file = "greenlet-2.0.2-cp310-cp310-macosx_11_0_arm64.whl", hash = "sha256:d967650d3f56af314b72df7089d96cda1083a7fc2da05b375d2bc48c82ab3f3c"}, {file = "greenlet-2.0.2-cp310-cp310-macosx_11_0_x86_64.whl", hash = "sha256:30bcf80dda7f15ac77ba5af2b961bdd9dbc77fd4ac6105cee85b0d0a5fcf74df"}, {file = "greenlet-2.0.2-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:26fbfce90728d82bc9e6c38ea4d038cba20b7faf8a0ca53a9c07b67318d46088"}, {file = "greenlet-2.0.2-cp310-cp310-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:9190f09060ea4debddd24665d6804b995a9c122ef5917ab26e1566dcc712ceeb"}, @@ -2915,6 +2842,7 @@ files = [ {file = "greenlet-2.0.2-cp310-cp310-musllinux_1_1_x86_64.whl", hash = "sha256:76ae285c8104046b3a7f06b42f29c7b73f77683df18c49ab5af7983994c2dd91"}, {file = "greenlet-2.0.2-cp310-cp310-win_amd64.whl", hash = "sha256:2d4686f195e32d36b4d7cf2d166857dbd0ee9f3d20ae349b6bf8afc8485b3645"}, {file = "greenlet-2.0.2-cp311-cp311-macosx_10_9_universal2.whl", hash = "sha256:c4302695ad8027363e96311df24ee28978162cdcdd2006476c43970b384a244c"}, + {file = "greenlet-2.0.2-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:d4606a527e30548153be1a9f155f4e283d109ffba663a15856089fb55f933e47"}, {file = "greenlet-2.0.2-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:c48f54ef8e05f04d6eff74b8233f6063cb1ed960243eacc474ee73a2ea8573ca"}, {file = "greenlet-2.0.2-cp311-cp311-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:a1846f1b999e78e13837c93c778dcfc3365902cfb8d1bdb7dd73ead37059f0d0"}, {file = "greenlet-2.0.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:3a06ad5312349fec0ab944664b01d26f8d1f05009566339ac6f63f56589bc1a2"}, @@ -2944,6 +2872,7 @@ files = [ {file = "greenlet-2.0.2-cp37-cp37m-win32.whl", hash = "sha256:3f6ea9bd35eb450837a3d80e77b517ea5bc56b4647f5502cd28de13675ee12f7"}, {file = "greenlet-2.0.2-cp37-cp37m-win_amd64.whl", hash = "sha256:7492e2b7bd7c9b9916388d9df23fa49d9b88ac0640db0a5b4ecc2b653bf451e3"}, {file = "greenlet-2.0.2-cp38-cp38-macosx_10_15_x86_64.whl", hash = "sha256:b864ba53912b6c3ab6bcb2beb19f19edd01a6bfcbdfe1f37ddd1778abfe75a30"}, + {file = "greenlet-2.0.2-cp38-cp38-macosx_11_0_arm64.whl", hash = "sha256:1087300cf9700bbf455b1b97e24db18f2f77b55302a68272c56209d5587c12d1"}, {file = "greenlet-2.0.2-cp38-cp38-manylinux2010_x86_64.whl", hash = "sha256:ba2956617f1c42598a308a84c6cf021a90ff3862eddafd20c3333d50f0edb45b"}, {file = "greenlet-2.0.2-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:fc3a569657468b6f3fb60587e48356fe512c1754ca05a564f11366ac9e306526"}, {file = "greenlet-2.0.2-cp38-cp38-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:8eab883b3b2a38cc1e050819ef06a7e6344d4a990d24d45bc6f2cf959045a45b"}, @@ -2952,6 +2881,7 @@ files = [ {file = "greenlet-2.0.2-cp38-cp38-musllinux_1_1_x86_64.whl", hash = "sha256:b0ef99cdbe2b682b9ccbb964743a6aca37905fda5e0452e5ee239b1654d37f2a"}, {file = "greenlet-2.0.2-cp38-cp38-win32.whl", hash = "sha256:b80f600eddddce72320dbbc8e3784d16bd3fb7b517e82476d8da921f27d4b249"}, {file = "greenlet-2.0.2-cp38-cp38-win_amd64.whl", hash = "sha256:4d2e11331fc0c02b6e84b0d28ece3a36e0548ee1a1ce9ddde03752d9b79bba40"}, + {file = "greenlet-2.0.2-cp39-cp39-macosx_11_0_arm64.whl", hash = "sha256:8512a0c38cfd4e66a858ddd1b17705587900dd760c6003998e9472b77b56d417"}, {file = "greenlet-2.0.2-cp39-cp39-macosx_11_0_x86_64.whl", hash = "sha256:88d9ab96491d38a5ab7c56dd7a3cc37d83336ecc564e4e8816dbed12e5aaefc8"}, {file = "greenlet-2.0.2-cp39-cp39-manylinux2010_x86_64.whl", hash = "sha256:561091a7be172ab497a3527602d467e2b3fbe75f9e783d8b8ce403fa414f71a6"}, {file = "greenlet-2.0.2-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:971ce5e14dc5e73715755d0ca2975ac88cfdaefcaab078a284fea6cfabf866df"}, @@ -2972,7 +2902,6 @@ test = ["objgraph", "psutil"] name = "grpcio" version = "1.57.0" description = "HTTP/2-based RPC framework" -category = "main" optional = true python-versions = ">=3.7" files = [ @@ -3030,7 +2959,6 @@ protobuf = ["grpcio-tools (>=1.57.0)"] name = "grpcio-tools" version = "1.48.2" description = "Protobuf code generator for gRPC" -category = "main" optional = true python-versions = ">=3.6" files = [ @@ -3091,7 +3019,6 @@ setuptools = "*" name = "h11" version = "0.14.0" description = "A pure-Python, bring-your-own-I/O implementation of HTTP/1.1" -category = "main" optional = true python-versions = ">=3.7" files = [ @@ -3103,7 +3030,6 @@ files = [ name = "h2" version = "4.1.0" description = "HTTP/2 State-Machine based protocol implementation" -category = "main" optional = true python-versions = ">=3.6.1" files = [ @@ -3119,7 +3045,6 @@ hyperframe = ">=6.0,<7" name = "h5py" version = "3.9.0" description = "Read and write HDF5 files from Python" -category = "main" optional = true python-versions = ">=3.8" files = [ @@ -3153,7 +3078,6 @@ numpy = ">=1.17.3" name = "hnswlib" version = "0.7.0" description = "hnswlib" -category = "main" optional = true python-versions = "*" files = [ @@ -3167,7 +3091,6 @@ numpy = "*" name = "hpack" version = "4.0.0" description = "Pure-Python HPACK header compression" -category = "main" optional = true python-versions = ">=3.6.1" files = [ @@ -3179,7 +3102,6 @@ files = [ name = "html2text" version = "2020.1.16" description = "Turn HTML into equivalent Markdown-structured text." -category = "main" optional = true python-versions = ">=3.5" files = [ @@ -3191,7 +3113,6 @@ files = [ name = "httpcore" version = "0.17.3" description = "A minimal low-level HTTP client." -category = "main" optional = true python-versions = ">=3.7" files = [ @@ -3203,17 +3124,16 @@ files = [ anyio = ">=3.0,<5.0" certifi = "*" h11 = ">=0.13,<0.15" -sniffio = ">=1.0.0,<2.0.0" +sniffio = "==1.*" [package.extras] http2 = ["h2 (>=3,<5)"] -socks = ["socksio (>=1.0.0,<2.0.0)"] +socks = ["socksio (==1.*)"] [[package]] name = "httplib2" version = "0.22.0" description = "A comprehensive HTTP client library." -category = "main" optional = true python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*" files = [ @@ -3228,7 +3148,6 @@ pyparsing = {version = ">=2.4.2,<3.0.0 || >3.0.0,<3.0.1 || >3.0.1,<3.0.2 || >3.0 name = "httpx" version = "0.24.1" description = "The next generation HTTP client." -category = "main" optional = true python-versions = ">=3.7" files = [ @@ -3244,19 +3163,18 @@ h2 = {version = ">=3,<5", optional = true, markers = "extra == \"http2\""} httpcore = ">=0.15.0,<0.18.0" idna = "*" sniffio = "*" -socksio = {version = ">=1.0.0,<2.0.0", optional = true, markers = "extra == \"socks\""} +socksio = {version = "==1.*", optional = true, markers = "extra == \"socks\""} [package.extras] brotli = ["brotli", "brotlicffi"] -cli = ["click (>=8.0.0,<9.0.0)", "pygments (>=2.0.0,<3.0.0)", "rich (>=10,<14)"] +cli = ["click (==8.*)", "pygments (==2.*)", "rich (>=10,<14)"] http2 = ["h2 (>=3,<5)"] -socks = ["socksio (>=1.0.0,<2.0.0)"] +socks = ["socksio (==1.*)"] [[package]] name = "huggingface-hub" version = "0.16.4" description = "Client library to download and publish models, datasets and other repos on the huggingface.co hub" -category = "main" optional = true python-versions = ">=3.7.0" files = [ @@ -3289,7 +3207,6 @@ typing = ["pydantic", "types-PyYAML", "types-requests", "types-simplejson", "typ name = "humbug" version = "0.3.2" description = "Humbug: Do you build developer tools? Humbug helps you know your users." -category = "main" optional = true python-versions = "*" files = [ @@ -3309,7 +3226,6 @@ profile = ["GPUtil", "psutil", "types-psutil"] name = "hyperframe" version = "6.0.1" description = "HTTP/2 framing layer for Python" -category = "main" optional = true python-versions = ">=3.6.1" files = [ @@ -3321,7 +3237,6 @@ files = [ name = "idna" version = "3.4" description = "Internationalized Domain Names in Applications (IDNA)" -category = "main" optional = false python-versions = ">=3.5" files = [ @@ -3333,7 +3248,6 @@ files = [ name = "importlib-metadata" version = "6.8.0" description = "Read metadata from Python packages" -category = "main" optional = false python-versions = ">=3.8" files = [ @@ -3353,7 +3267,6 @@ testing = ["flufl.flake8", "importlib-resources (>=1.3)", "packaging", "pyfakefs name = "importlib-resources" version = "6.0.1" description = "Read resources from Python packages" -category = "main" optional = false python-versions = ">=3.8" files = [ @@ -3372,7 +3285,6 @@ testing = ["pytest (>=6)", "pytest-black (>=0.3.7)", "pytest-checkdocs (>=2.4)", name = "iniconfig" version = "2.0.0" description = "brain-dead simple config-ini parsing" -category = "dev" optional = false python-versions = ">=3.7" files = [ @@ -3384,7 +3296,6 @@ files = [ name = "ipykernel" version = "6.25.1" description = "IPython Kernel for Jupyter" -category = "dev" optional = false python-versions = ">=3.8" files = [ @@ -3398,7 +3309,7 @@ comm = ">=0.1.1" debugpy = ">=1.6.5" ipython = ">=7.23.1" jupyter-client = ">=6.1.12" -jupyter-core = ">=4.12,<5.0.0 || >=5.1.0" +jupyter-core = ">=4.12,<5.0.dev0 || >=5.1.dev0" matplotlib-inline = ">=0.1" nest-asyncio = "*" packaging = "*" @@ -3418,7 +3329,6 @@ test = ["flaky", "ipyparallel", "pre-commit", "pytest (>=7.0)", "pytest-asyncio" name = "ipython" version = "8.12.2" description = "IPython: Productive Interactive Computing" -category = "dev" optional = false python-versions = ">=3.8" files = [ @@ -3458,7 +3368,6 @@ test-extra = ["curio", "matplotlib (!=3.2.0)", "nbformat", "numpy (>=1.21)", "pa name = "ipython-genutils" version = "0.2.0" description = "Vestigial utilities from IPython" -category = "dev" optional = false python-versions = "*" files = [ @@ -3470,7 +3379,6 @@ files = [ name = "ipywidgets" version = "8.1.0" description = "Jupyter interactive widgets" -category = "dev" optional = false python-versions = ">=3.7" files = [ @@ -3492,7 +3400,6 @@ test = ["ipykernel", "jsonschema", "pytest (>=3.6.0)", "pytest-cov", "pytz"] name = "isodate" version = "0.6.1" description = "An ISO 8601 date/time/duration parser and formatter" -category = "main" optional = true python-versions = "*" files = [ @@ -3507,7 +3414,6 @@ six = "*" name = "isoduration" version = "20.11.0" description = "Operations with ISO 8601 durations" -category = "dev" optional = false python-versions = ">=3.7" files = [ @@ -3522,7 +3428,6 @@ arrow = ">=0.15.0" name = "jaraco-context" version = "4.3.0" description = "Context managers by jaraco" -category = "main" optional = true python-versions = ">=3.7" files = [ @@ -3538,7 +3443,6 @@ testing = ["flake8 (<5)", "pytest (>=6)", "pytest-black (>=0.3.7)", "pytest-chec name = "jedi" version = "0.19.0" description = "An autocompletion tool for Python that can be used for text editors." -category = "dev" optional = false python-versions = ">=3.6" files = [ @@ -3558,7 +3462,6 @@ testing = ["Django (<3.1)", "attrs", "colorama", "docopt", "pytest (<7.0.0)"] name = "jieba3k" version = "0.35.1" description = "Chinese Words Segementation Utilities" -category = "main" optional = true python-versions = "*" files = [ @@ -3569,7 +3472,6 @@ files = [ name = "jinja2" version = "3.1.2" description = "A very fast and expressive template engine." -category = "main" optional = false python-versions = ">=3.7" files = [ @@ -3587,7 +3489,6 @@ i18n = ["Babel (>=2.7)"] name = "jmespath" version = "1.0.1" description = "JSON Matching Expressions" -category = "main" optional = true python-versions = ">=3.7" files = [ @@ -3599,7 +3500,6 @@ files = [ name = "joblib" version = "1.3.2" description = "Lightweight pipelining with Python functions" -category = "main" optional = true python-versions = ">=3.7" files = [ @@ -3611,7 +3511,6 @@ files = [ name = "jq" version = "1.4.1" description = "jq is a lightweight and flexible JSON processor." -category = "main" optional = true python-versions = ">=3.5" files = [ @@ -3676,7 +3575,6 @@ files = [ name = "json5" version = "0.9.14" description = "A Python implementation of the JSON5 data format." -category = "dev" optional = false python-versions = "*" files = [ @@ -3691,7 +3589,6 @@ dev = ["hypothesis"] name = "jsonable" version = "0.3.1" description = "An abstract class that supports jsonserialization/deserialization." -category = "main" optional = true python-versions = "*" files = [ @@ -3703,7 +3600,6 @@ files = [ name = "jsonlines" version = "3.1.0" description = "Library with helpers for the jsonlines file format" -category = "main" optional = true python-versions = ">=3.6" files = [ @@ -3718,18 +3614,17 @@ attrs = ">=19.2.0" name = "jsonpointer" version = "2.4" description = "Identify specific nodes in a JSON document (RFC 6901)" -category = "dev" optional = false python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.*, !=3.5.*, !=3.6.*" files = [ {file = "jsonpointer-2.4-py2.py3-none-any.whl", hash = "sha256:15d51bba20eea3165644553647711d150376234112651b4f1811022aecad7d7a"}, + {file = "jsonpointer-2.4.tar.gz", hash = "sha256:585cee82b70211fa9e6043b7bb89db6e1aa49524340dde8ad6b63206ea689d88"}, ] [[package]] name = "jsonschema" version = "4.19.0" description = "An implementation of JSON Schema validation for Python" -category = "main" optional = false python-versions = ">=3.8" files = [ @@ -3761,7 +3656,6 @@ format-nongpl = ["fqdn", "idna", "isoduration", "jsonpointer (>1.13)", "rfc3339- name = "jsonschema-specifications" version = "2023.7.1" description = "The JSON Schema meta-schemas and vocabularies, exposed as a Registry" -category = "main" optional = false python-versions = ">=3.8" files = [ @@ -3777,7 +3671,6 @@ referencing = ">=0.28.0" name = "jupyter" version = "1.0.0" description = "Jupyter metapackage. Install all the Jupyter components in one go." -category = "dev" optional = false python-versions = "*" files = [ @@ -3798,7 +3691,6 @@ qtconsole = "*" name = "jupyter-client" version = "8.3.0" description = "Jupyter protocol implementation and client libraries" -category = "dev" optional = false python-versions = ">=3.8" files = [ @@ -3808,7 +3700,7 @@ files = [ [package.dependencies] importlib-metadata = {version = ">=4.8.3", markers = "python_version < \"3.10\""} -jupyter-core = ">=4.12,<5.0.0 || >=5.1.0" +jupyter-core = ">=4.12,<5.0.dev0 || >=5.1.dev0" python-dateutil = ">=2.8.2" pyzmq = ">=23.0" tornado = ">=6.2" @@ -3822,7 +3714,6 @@ test = ["coverage", "ipykernel (>=6.14)", "mypy", "paramiko", "pre-commit", "pyt name = "jupyter-console" version = "6.6.3" description = "Jupyter terminal console" -category = "dev" optional = false python-versions = ">=3.7" files = [ @@ -3834,7 +3725,7 @@ files = [ ipykernel = ">=6.14" ipython = "*" jupyter-client = ">=7.0.0" -jupyter-core = ">=4.12,<5.0.0 || >=5.1.0" +jupyter-core = ">=4.12,<5.0.dev0 || >=5.1.dev0" prompt-toolkit = ">=3.0.30" pygments = "*" pyzmq = ">=17" @@ -3847,7 +3738,6 @@ test = ["flaky", "pexpect", "pytest"] name = "jupyter-core" version = "5.3.1" description = "Jupyter core package. A base package on which Jupyter projects rely." -category = "dev" optional = false python-versions = ">=3.8" files = [ @@ -3868,7 +3758,6 @@ test = ["ipykernel", "pre-commit", "pytest", "pytest-cov", "pytest-timeout"] name = "jupyter-events" version = "0.7.0" description = "Jupyter Event System library" -category = "dev" optional = false python-versions = ">=3.8" files = [ @@ -3894,7 +3783,6 @@ test = ["click", "pre-commit", "pytest (>=7.0)", "pytest-asyncio (>=0.19.0)", "p name = "jupyter-lsp" version = "2.2.0" description = "Multi-Language Server WebSocket proxy for Jupyter Notebook/Lab server" -category = "dev" optional = false python-versions = ">=3.8" files = [ @@ -3910,7 +3798,6 @@ jupyter-server = ">=1.1.2" name = "jupyter-server" version = "2.7.2" description = "The backend—i.e. core services, APIs, and REST endpoints—to Jupyter web applications." -category = "dev" optional = false python-versions = ">=3.8" files = [ @@ -3923,7 +3810,7 @@ anyio = ">=3.1.0" argon2-cffi = "*" jinja2 = "*" jupyter-client = ">=7.4.4" -jupyter-core = ">=4.12,<5.0.0 || >=5.1.0" +jupyter-core = ">=4.12,<5.0.dev0 || >=5.1.dev0" jupyter-events = ">=0.6.0" jupyter-server-terminals = "*" nbconvert = ">=6.4.4" @@ -3947,7 +3834,6 @@ test = ["flaky", "ipykernel", "pre-commit", "pytest (>=7.0)", "pytest-console-sc name = "jupyter-server-terminals" version = "0.4.4" description = "A Jupyter Server Extension Providing Terminals." -category = "dev" optional = false python-versions = ">=3.8" files = [ @@ -3967,7 +3853,6 @@ test = ["coverage", "jupyter-server (>=2.0.0)", "pytest (>=7.0)", "pytest-cov", name = "jupyterlab" version = "4.0.5" description = "JupyterLab computational environment" -category = "dev" optional = false python-versions = ">=3.8" files = [ @@ -4001,7 +3886,6 @@ test = ["coverage", "pytest (>=7.0)", "pytest-check-links (>=0.7)", "pytest-cons name = "jupyterlab-pygments" version = "0.2.2" description = "Pygments theme using JupyterLab CSS variables" -category = "dev" optional = false python-versions = ">=3.7" files = [ @@ -4013,7 +3897,6 @@ files = [ name = "jupyterlab-server" version = "2.24.0" description = "A set of server components for JupyterLab and JupyterLab like applications." -category = "dev" optional = false python-versions = ">=3.7" files = [ @@ -4040,7 +3923,6 @@ test = ["hatch", "ipykernel", "jupyterlab-server[openapi]", "openapi-spec-valida name = "jupyterlab-widgets" version = "3.0.8" description = "Jupyter interactive widgets for JupyterLab" -category = "dev" optional = false python-versions = ">=3.7" files = [ @@ -4052,7 +3934,6 @@ files = [ name = "keras" version = "2.13.1" description = "Deep learning for humans." -category = "main" optional = true python-versions = ">=3.8" files = [ @@ -4064,7 +3945,6 @@ files = [ name = "lancedb" version = "0.1.16" description = "lancedb" -category = "main" optional = true python-versions = ">=3.8" files = [ @@ -4091,7 +3971,6 @@ tests = ["pandas (>=1.4)", "pytest", "pytest-asyncio", "pytest-mock"] name = "langkit" version = "0.0.15" description = "A collection of text metric udfs for whylogs profiling and monitoring in WhyLabs" -category = "main" optional = true python-versions = ">=3.8,<4.0" files = [ @@ -4111,7 +3990,6 @@ all = ["datasets (>=2.12.0,<3.0.0)", "evaluate (>=0.4.0,<0.5.0)", "nltk (>=3.8.1 name = "langsmith" version = "0.0.38" description = "Client library to connect to the LangSmith LLM Tracing and Evaluation Platform." -category = "main" optional = false python-versions = ">=3.8.1,<4.0" files = [ @@ -4127,7 +4005,6 @@ requests = ">=2,<3" name = "lark" version = "1.1.7" description = "a modern parsing library" -category = "main" optional = false python-versions = ">=3.6" files = [ @@ -4145,7 +4022,6 @@ regex = ["regex"] name = "lazy-loader" version = "0.3" description = "lazy_loader" -category = "main" optional = true python-versions = ">=3.7" files = [ @@ -4161,10 +4037,11 @@ test = ["pytest (>=7.4)", "pytest-cov (>=4.1)"] name = "libclang" version = "16.0.6" description = "Clang Python Bindings, mirrored from the official LLVM repo: https://github.com/llvm/llvm-project/tree/main/clang/bindings/python, to make the installation process easier." -category = "main" optional = true python-versions = "*" files = [ + {file = "libclang-16.0.6-1-py2.py3-none-manylinux2014_aarch64.whl", hash = "sha256:88bc7e7b393c32e41e03ba77ef02fdd647da1f764c2cd028e69e0837080b79f6"}, + {file = "libclang-16.0.6-1-py2.py3-none-manylinux2014_armv7l.whl", hash = "sha256:d80ed5827736ed5ec2bcedf536720476fd9d4fa4c79ef0cb24aea4c59332f361"}, {file = "libclang-16.0.6-py2.py3-none-macosx_10_9_x86_64.whl", hash = "sha256:da9e47ebc3f0a6d90fb169ef25f9fbcd29b4a4ef97a8b0e3e3a17800af1423f4"}, {file = "libclang-16.0.6-py2.py3-none-macosx_11_0_arm64.whl", hash = "sha256:e1a5ad1e895e5443e205568c85c04b4608e4e973dae42f4dfd9cb46c81d1486b"}, {file = "libclang-16.0.6-py2.py3-none-manylinux2010_x86_64.whl", hash = "sha256:9dcdc730939788b8b69ffd6d5d75fe5366e3ee007f1e36a99799ec0b0c001492"}, @@ -4180,7 +4057,6 @@ files = [ name = "libdeeplake" version = "0.0.60" description = "C++ backend for Deep Lake" -category = "main" optional = true python-versions = "*" files = [ @@ -4213,7 +4089,6 @@ numpy = "*" name = "librosa" version = "0.10.1" description = "Python module for audio and music processing" -category = "main" optional = true python-versions = ">=3.7" files = [ @@ -4245,7 +4120,6 @@ tests = ["matplotlib (>=3.3.0)", "packaging (>=20.0)", "pytest", "pytest-cov", " name = "llvmlite" version = "0.40.1" description = "lightweight wrapper around basic LLVM functionality" -category = "main" optional = true python-versions = ">=3.8" files = [ @@ -4279,7 +4153,6 @@ files = [ name = "loguru" version = "0.7.0" description = "Python logging made (stupidly) simple" -category = "main" optional = true python-versions = ">=3.5" files = [ @@ -4298,7 +4171,6 @@ dev = ["Sphinx (==5.3.0)", "colorama (==0.4.5)", "colorama (==0.4.6)", "freezegu name = "lxml" version = "4.9.3" description = "Powerful and Pythonic XML processing library combining libxml2/libxslt with the ElementTree API." -category = "main" optional = true python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, != 3.4.*" files = [ @@ -4406,7 +4278,6 @@ source = ["Cython (>=0.29.35)"] name = "lz4" version = "4.3.2" description = "LZ4 Bindings for Python" -category = "main" optional = true python-versions = ">=3.7" files = [ @@ -4456,7 +4327,6 @@ tests = ["psutil", "pytest (!=3.3.0)", "pytest-cov"] name = "manifest-ml" version = "0.0.1" description = "Manifest for Prompt Programming Foundation Models." -category = "main" optional = true python-versions = ">=3.8.0" files = [ @@ -4480,7 +4350,6 @@ dev = ["autopep8 (>=1.6.0)", "black (>=22.3.0)", "docformatter (>=1.4)", "flake8 name = "markdown" version = "3.4.4" description = "Python implementation of John Gruber's Markdown." -category = "main" optional = true python-versions = ">=3.7" files = [ @@ -4496,7 +4365,6 @@ testing = ["coverage", "pyyaml"] name = "markdown-it-py" version = "3.0.0" description = "Python port of markdown-it. Markdown parsing, done right!" -category = "main" optional = true python-versions = ">=3.8" files = [ @@ -4521,7 +4389,6 @@ testing = ["coverage", "pytest", "pytest-cov", "pytest-regressions"] name = "markdownify" version = "0.11.6" description = "Convert HTML to markdown." -category = "main" optional = true python-versions = "*" files = [ @@ -4537,7 +4404,6 @@ six = ">=1.15,<2" name = "markupsafe" version = "2.1.3" description = "Safely add untrusted strings to HTML/XML markup." -category = "main" optional = false python-versions = ">=3.7" files = [ @@ -4561,6 +4427,16 @@ files = [ {file = "MarkupSafe-2.1.3-cp311-cp311-musllinux_1_1_x86_64.whl", hash = "sha256:5bbe06f8eeafd38e5d0a4894ffec89378b6c6a625ff57e3028921f8ff59318ac"}, {file = "MarkupSafe-2.1.3-cp311-cp311-win32.whl", hash = "sha256:dd15ff04ffd7e05ffcb7fe79f1b98041b8ea30ae9234aed2a9168b5797c3effb"}, {file = "MarkupSafe-2.1.3-cp311-cp311-win_amd64.whl", hash = "sha256:134da1eca9ec0ae528110ccc9e48041e0828d79f24121a1a146161103c76e686"}, + {file = "MarkupSafe-2.1.3-cp312-cp312-macosx_10_9_universal2.whl", hash = "sha256:f698de3fd0c4e6972b92290a45bd9b1536bffe8c6759c62471efaa8acb4c37bc"}, + {file = "MarkupSafe-2.1.3-cp312-cp312-macosx_10_9_x86_64.whl", hash = "sha256:aa57bd9cf8ae831a362185ee444e15a93ecb2e344c8e52e4d721ea3ab6ef1823"}, + {file = "MarkupSafe-2.1.3-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:ffcc3f7c66b5f5b7931a5aa68fc9cecc51e685ef90282f4a82f0f5e9b704ad11"}, + {file = "MarkupSafe-2.1.3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:47d4f1c5f80fc62fdd7777d0d40a2e9dda0a05883ab11374334f6c4de38adffd"}, + {file = "MarkupSafe-2.1.3-cp312-cp312-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:1f67c7038d560d92149c060157d623c542173016c4babc0c1913cca0564b9939"}, + {file = "MarkupSafe-2.1.3-cp312-cp312-musllinux_1_1_aarch64.whl", hash = "sha256:9aad3c1755095ce347e26488214ef77e0485a3c34a50c5a5e2471dff60b9dd9c"}, + {file = "MarkupSafe-2.1.3-cp312-cp312-musllinux_1_1_i686.whl", hash = "sha256:14ff806850827afd6b07a5f32bd917fb7f45b046ba40c57abdb636674a8b559c"}, + {file = "MarkupSafe-2.1.3-cp312-cp312-musllinux_1_1_x86_64.whl", hash = "sha256:8f9293864fe09b8149f0cc42ce56e3f0e54de883a9de90cd427f191c346eb2e1"}, + {file = "MarkupSafe-2.1.3-cp312-cp312-win32.whl", hash = "sha256:715d3562f79d540f251b99ebd6d8baa547118974341db04f5ad06d5ea3eb8007"}, + {file = "MarkupSafe-2.1.3-cp312-cp312-win_amd64.whl", hash = "sha256:1b8dd8c3fd14349433c79fa8abeb573a55fc0fdd769133baac1f5e07abf54aeb"}, {file = "MarkupSafe-2.1.3-cp37-cp37m-macosx_10_9_x86_64.whl", hash = "sha256:8e254ae696c88d98da6555f5ace2279cf7cd5b3f52be2b5cf97feafe883b58d2"}, {file = "MarkupSafe-2.1.3-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:cb0932dc158471523c9637e807d9bfb93e06a95cbf010f1a38b98623b929ef2b"}, {file = "MarkupSafe-2.1.3-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:9402b03f1a1b4dc4c19845e5c749e3ab82d5078d16a2a4c2cd2df62d57bb0707"}, @@ -4597,7 +4473,6 @@ files = [ name = "marqo" version = "1.2.4" description = "Tensor search for humans" -category = "main" optional = true python-versions = ">=3" files = [ @@ -4616,7 +4491,6 @@ urllib3 = "*" name = "marshmallow" version = "3.20.1" description = "A lightweight library for converting complex datatypes to and from native Python datatypes." -category = "main" optional = false python-versions = ">=3.8" files = [ @@ -4637,7 +4511,6 @@ tests = ["pytest", "pytz", "simplejson"] name = "marshmallow-enum" version = "1.5.1" description = "Enum field for Marshmallow" -category = "main" optional = false python-versions = "*" files = [ @@ -4652,7 +4525,6 @@ marshmallow = ">=2.0.0" name = "matplotlib-inline" version = "0.1.6" description = "Inline Matplotlib backend for Jupyter" -category = "dev" optional = false python-versions = ">=3.5" files = [ @@ -4667,7 +4539,6 @@ traitlets = "*" name = "mdurl" version = "0.1.2" description = "Markdown URL utilities" -category = "main" optional = true python-versions = ">=3.7" files = [ @@ -4679,7 +4550,6 @@ files = [ name = "mistune" version = "3.0.1" description = "A sane and fast Markdown parser with useful plugins and renderers" -category = "dev" optional = false python-versions = ">=3.7" files = [ @@ -4691,7 +4561,6 @@ files = [ name = "mmh3" version = "3.1.0" description = "Python wrapper for MurmurHash (MurmurHash3), a set of fast and robust hash functions." -category = "main" optional = true python-versions = "*" files = [ @@ -4736,7 +4605,6 @@ files = [ name = "momento" version = "1.7.1" description = "SDK for Momento" -category = "main" optional = true python-versions = ">=3.7,<4.0" files = [ @@ -4753,7 +4621,6 @@ pyjwt = ">=2.4.0,<3.0.0" name = "momento-wire-types" version = "0.67.0" description = "Momento Client Proto Generated Files" -category = "main" optional = true python-versions = ">=3.7,<4.0" files = [ @@ -4769,7 +4636,6 @@ protobuf = ">=3,<5" name = "more-itertools" version = "10.1.0" description = "More routines for operating on iterables, beyond itertools" -category = "main" optional = true python-versions = ">=3.8" files = [ @@ -4781,7 +4647,6 @@ files = [ name = "mpmath" version = "1.3.0" description = "Python library for arbitrary-precision floating-point arithmetic" -category = "main" optional = true python-versions = "*" files = [ @@ -4799,7 +4664,6 @@ tests = ["pytest (>=4.6)"] name = "msal" version = "1.23.0" description = "The Microsoft Authentication Library (MSAL) for Python library enables your app to access the Microsoft Cloud by supporting authentication of users with Microsoft Azure Active Directory accounts (AAD) and Microsoft Accounts (MSA) using industry standard OAuth2 and OpenID Connect." -category = "main" optional = true python-versions = "*" files = [ @@ -4819,7 +4683,6 @@ broker = ["pymsalruntime (>=0.13.2,<0.14)"] name = "msal-extensions" version = "1.0.0" description = "Microsoft Authentication Library extensions (MSAL EX) provides a persistence API that can save your data on disk, encrypted on Windows, macOS and Linux. Concurrent data access will be coordinated by a file lock mechanism." -category = "main" optional = true python-versions = "*" files = [ @@ -4838,7 +4701,6 @@ portalocker = [ name = "msgpack" version = "1.0.5" description = "MessagePack serializer" -category = "main" optional = true python-versions = "*" files = [ @@ -4911,7 +4773,6 @@ files = [ name = "msrest" version = "0.7.1" description = "AutoRest swagger generator Python client runtime." -category = "main" optional = true python-versions = ">=3.6" files = [ @@ -4933,7 +4794,6 @@ async = ["aiodns", "aiohttp (>=3.0)"] name = "multidict" version = "6.0.4" description = "multidict implementation" -category = "main" optional = false python-versions = ">=3.7" files = [ @@ -5017,7 +4877,6 @@ files = [ name = "multiprocess" version = "0.70.15" description = "better multiprocessing and multithreading in Python" -category = "main" optional = true python-versions = ">=3.7" files = [ @@ -5046,7 +4905,6 @@ dill = ">=0.3.7" name = "mwcli" version = "0.0.3" description = "Utilities for processing MediaWiki on the command line." -category = "main" optional = true python-versions = "*" files = [ @@ -5063,7 +4921,6 @@ para = "*" name = "mwparserfromhell" version = "0.6.4" description = "MWParserFromHell is a parser for MediaWiki wikicode." -category = "main" optional = true python-versions = ">= 3.6" files = [ @@ -5101,7 +4958,6 @@ files = [ name = "mwtypes" version = "0.3.2" description = "A set of types for processing MediaWiki data." -category = "main" optional = true python-versions = "*" files = [ @@ -5116,7 +4972,6 @@ jsonable = ">=0.3.0" name = "mwxml" version = "0.3.3" description = "A set of utilities for processing MediaWiki XML dump data." -category = "main" optional = true python-versions = "*" files = [ @@ -5134,7 +4989,6 @@ para = ">=0.0.1" name = "mypy" version = "0.991" description = "Optional static typing for Python" -category = "dev" optional = false python-versions = ">=3.7" files = [ @@ -5185,7 +5039,6 @@ reports = ["lxml"] name = "mypy-extensions" version = "1.0.0" description = "Type system extensions for programs checked with the mypy type checker." -category = "main" optional = false python-versions = ">=3.5" files = [ @@ -5197,7 +5050,6 @@ files = [ name = "mypy-protobuf" version = "3.3.0" description = "Generate mypy stub files from protobuf specs" -category = "dev" optional = false python-versions = ">=3.7" files = [ @@ -5213,7 +5065,6 @@ types-protobuf = ">=3.19.12" name = "nbclient" version = "0.8.0" description = "A client library for executing notebooks. Formerly nbconvert's ExecutePreprocessor." -category = "dev" optional = false python-versions = ">=3.8.0" files = [ @@ -5223,7 +5074,7 @@ files = [ [package.dependencies] jupyter-client = ">=6.1.12" -jupyter-core = ">=4.12,<5.0.0 || >=5.1.0" +jupyter-core = ">=4.12,<5.0.dev0 || >=5.1.dev0" nbformat = ">=5.1" traitlets = ">=5.4" @@ -5236,7 +5087,6 @@ test = ["flaky", "ipykernel (>=6.19.3)", "ipython", "ipywidgets", "nbconvert (>= name = "nbconvert" version = "7.7.4" description = "Converting Jupyter Notebooks" -category = "dev" optional = false python-versions = ">=3.8" files = [ @@ -5275,7 +5125,6 @@ webpdf = ["playwright"] name = "nbformat" version = "5.9.2" description = "The Jupyter Notebook format" -category = "dev" optional = false python-versions = ">=3.8" files = [ @@ -5297,7 +5146,6 @@ test = ["pep440", "pre-commit", "pytest", "testpath"] name = "nebula3-python" version = "3.4.0" description = "Python client for NebulaGraph V3.4" -category = "main" optional = true python-versions = "*" files = [ @@ -5315,7 +5163,6 @@ six = ">=1.16.0" name = "neo4j" version = "5.11.0" description = "Neo4j Bolt driver for Python" -category = "main" optional = true python-versions = ">=3.7" files = [ @@ -5333,7 +5180,6 @@ pandas = ["numpy (>=1.7.0,<2.0.0)", "pandas (>=1.1.0,<3.0.0)"] name = "nest-asyncio" version = "1.5.7" description = "Patch asyncio to allow nested event loops" -category = "main" optional = false python-versions = ">=3.5" files = [ @@ -5345,7 +5191,6 @@ files = [ name = "networkx" version = "2.8.8" description = "Python package for creating and manipulating graphs and networks" -category = "main" optional = true python-versions = ">=3.8" files = [ @@ -5364,7 +5209,6 @@ test = ["codecov (>=2.1)", "pytest (>=7.2)", "pytest-cov (>=4.0)"] name = "newspaper3k" version = "0.2.8" description = "Simplified python article discovery & extraction." -category = "main" optional = true python-versions = "*" files = [ @@ -5391,7 +5235,6 @@ tldextract = ">=2.0.1" name = "nlpcloud" version = "1.1.44" description = "Python client for the NLP Cloud API" -category = "main" optional = true python-versions = "*" files = [ @@ -5406,7 +5249,6 @@ requests = "*" name = "nltk" version = "3.8.1" description = "Natural Language Toolkit" -category = "main" optional = true python-versions = ">=3.7" files = [ @@ -5432,7 +5274,6 @@ twitter = ["twython"] name = "nomic" version = "1.1.14" description = "The offical Nomic python client." -category = "main" optional = true python-versions = "*" files = [ @@ -5460,7 +5301,6 @@ gpt4all = ["peft (==0.3.0.dev0)", "sentencepiece", "torch", "transformers (==4.2 name = "notebook" version = "7.0.2" description = "Jupyter Notebook - A web-based notebook environment for interactive computing" -category = "dev" optional = false python-versions = ">=3.8" files = [ @@ -5485,7 +5325,6 @@ test = ["ipykernel", "jupyter-server[test] (>=2.4.0,<3)", "jupyterlab-server[tes name = "notebook-shim" version = "0.2.3" description = "A shim layer for notebook traits and config" -category = "dev" optional = false python-versions = ">=3.7" files = [ @@ -5503,7 +5342,6 @@ test = ["pytest", "pytest-console-scripts", "pytest-jupyter", "pytest-tornasync" name = "numba" version = "0.57.1" description = "compiling Python code using LLVM" -category = "main" optional = true python-versions = ">=3.8" files = [ @@ -5535,14 +5373,13 @@ files = [ [package.dependencies] importlib-metadata = {version = "*", markers = "python_version < \"3.9\""} -llvmlite = ">=0.40.0dev0,<0.41" +llvmlite = "==0.40.*" numpy = ">=1.21,<1.25" [[package]] name = "numcodecs" version = "0.11.0" description = "A Python package providing buffer compression and transformation codecs for use" -category = "main" optional = true python-versions = ">=3.8" files = [ @@ -5575,7 +5412,6 @@ zfpy = ["zfpy (>=1.0.0)"] name = "numexpr" version = "2.8.5" description = "Fast numerical expression evaluator for NumPy" -category = "main" optional = false python-versions = ">=3.7" files = [ @@ -5618,7 +5454,6 @@ numpy = ">=1.13.3" name = "numpy" version = "1.24.3" description = "Fundamental package for array computing in Python" -category = "main" optional = false python-versions = ">=3.8" files = [ @@ -5656,7 +5491,6 @@ files = [ name = "nvidia-cublas-cu11" version = "11.10.3.66" description = "CUBLAS native runtime libraries" -category = "main" optional = true python-versions = ">=3" files = [ @@ -5672,7 +5506,6 @@ wheel = "*" name = "nvidia-cuda-nvrtc-cu11" version = "11.7.99" description = "NVRTC native runtime libraries" -category = "main" optional = true python-versions = ">=3" files = [ @@ -5689,7 +5522,6 @@ wheel = "*" name = "nvidia-cuda-runtime-cu11" version = "11.7.99" description = "CUDA Runtime native Libraries" -category = "main" optional = true python-versions = ">=3" files = [ @@ -5705,7 +5537,6 @@ wheel = "*" name = "nvidia-cudnn-cu11" version = "8.5.0.96" description = "cuDNN runtime libraries" -category = "main" optional = true python-versions = ">=3" files = [ @@ -5721,7 +5552,6 @@ wheel = "*" name = "o365" version = "2.0.27" description = "Microsoft Graph and Office 365 API made easy" -category = "main" optional = true python-versions = ">=3.4" files = [ @@ -5742,7 +5572,6 @@ tzlocal = ">=4.0,<5.0" name = "oauthlib" version = "3.2.2" description = "A generic, spec-compliant, thorough implementation of the OAuth request-signing logic" -category = "main" optional = true python-versions = ">=3.6" files = [ @@ -5759,7 +5588,6 @@ signedtoken = ["cryptography (>=3.0.0)", "pyjwt (>=2.0.0,<3)"] name = "openai" version = "0.27.8" description = "Python client library for the OpenAI API" -category = "main" optional = false python-versions = ">=3.7.1" files = [ @@ -5774,7 +5602,7 @@ tqdm = "*" [package.extras] datalib = ["numpy", "openpyxl (>=3.0.7)", "pandas (>=1.2.3)", "pandas-stubs (>=1.1.0.11)"] -dev = ["black (>=21.6b0,<22.0)", "pytest (>=6.0.0,<7.0.0)", "pytest-asyncio", "pytest-mock"] +dev = ["black (>=21.6b0,<22.0)", "pytest (==6.*)", "pytest-asyncio", "pytest-mock"] embeddings = ["matplotlib", "numpy", "openpyxl (>=3.0.7)", "pandas (>=1.2.3)", "pandas-stubs (>=1.1.0.11)", "plotly", "scikit-learn (>=1.0.2)", "scipy", "tenacity (>=8.0.1)"] wandb = ["numpy", "openpyxl (>=3.0.7)", "pandas (>=1.2.3)", "pandas-stubs (>=1.1.0.11)", "wandb"] @@ -5782,7 +5610,6 @@ wandb = ["numpy", "openpyxl (>=3.0.7)", "pandas (>=1.2.3)", "pandas-stubs (>=1.1 name = "openapi-schema-pydantic" version = "1.2.4" description = "OpenAPI (v3) specification schema as pydantic class" -category = "main" optional = true python-versions = ">=3.6.1" files = [ @@ -5797,7 +5624,6 @@ pydantic = ">=1.8.2" name = "openlm" version = "0.0.5" description = "Drop-in OpenAI-compatible that can call LLMs from other providers" -category = "main" optional = true python-versions = ">=3.8.1,<4.0" files = [ @@ -5812,7 +5638,6 @@ requests = ">=2,<3" name = "opensearch-py" version = "2.3.1" description = "Python client for OpenSearch" -category = "main" optional = true python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, <4" files = [ @@ -5837,7 +5662,6 @@ kerberos = ["requests-kerberos"] name = "opt-einsum" version = "3.3.0" description = "Optimizing numpys einsum function" -category = "main" optional = true python-versions = ">=3.5" files = [ @@ -5856,7 +5680,6 @@ tests = ["pytest", "pytest-cov", "pytest-pep8"] name = "orjson" version = "3.9.5" description = "Fast, correct Python JSON library supporting dataclasses, datetimes, and numpy" -category = "main" optional = true python-versions = ">=3.7" files = [ @@ -5926,7 +5749,6 @@ files = [ name = "overrides" version = "7.4.0" description = "A decorator to automatically detect mismatch when overriding a method." -category = "dev" optional = false python-versions = ">=3.6" files = [ @@ -5938,7 +5760,6 @@ files = [ name = "packaging" version = "23.1" description = "Core utilities for Python packages" -category = "main" optional = false python-versions = ">=3.7" files = [ @@ -5950,7 +5771,6 @@ files = [ name = "pandas" version = "2.0.3" description = "Powerful data structures for data analysis, time series, and statistics" -category = "main" optional = false python-versions = ">=3.8" files = [ @@ -6018,7 +5838,6 @@ xml = ["lxml (>=4.6.3)"] name = "pandocfilters" version = "1.5.0" description = "Utilities for writing pandoc filters in python" -category = "dev" optional = false python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*" files = [ @@ -6030,7 +5849,6 @@ files = [ name = "para" version = "0.0.8" description = "a set utilities that ake advantage of python's 'multiprocessing' module to distribute CPU-intensive tasks" -category = "main" optional = true python-versions = "*" files = [ @@ -6042,7 +5860,6 @@ files = [ name = "parso" version = "0.8.3" description = "A Python Parser" -category = "dev" optional = false python-versions = ">=3.6" files = [ @@ -6058,7 +5875,6 @@ testing = ["docopt", "pytest (<6.0.0)"] name = "pathos" version = "0.3.1" description = "parallel graph management and execution in heterogeneous computing" -category = "main" optional = true python-versions = ">=3.7" files = [ @@ -6076,7 +5892,6 @@ ppft = ">=1.7.6.7" name = "pathspec" version = "0.11.2" description = "Utility library for gitignore style pattern matching of file paths." -category = "dev" optional = false python-versions = ">=3.7" files = [ @@ -6088,7 +5903,6 @@ files = [ name = "pdfminer-six" version = "20221105" description = "PDF parser and analyzer" -category = "main" optional = true python-versions = ">=3.6" files = [ @@ -6109,7 +5923,6 @@ image = ["Pillow"] name = "pexpect" version = "4.8.0" description = "Pexpect allows easy control of interactive console applications." -category = "main" optional = false python-versions = "*" files = [ @@ -6124,7 +5937,6 @@ ptyprocess = ">=0.5" name = "pgvector" version = "0.1.8" description = "pgvector support for Python" -category = "main" optional = true python-versions = ">=3.6" files = [ @@ -6138,7 +5950,6 @@ numpy = "*" name = "pickleshare" version = "0.7.5" description = "Tiny 'shelve'-like database with concurrency support" -category = "dev" optional = false python-versions = "*" files = [ @@ -6150,7 +5961,6 @@ files = [ name = "pillow" version = "9.5.0" description = "Python Imaging Library (Fork)" -category = "main" optional = true python-versions = ">=3.7" files = [ @@ -6230,7 +6040,6 @@ tests = ["check-manifest", "coverage", "defusedxml", "markdown2", "olefile", "pa name = "pinecone-client" version = "2.2.2" description = "Pinecone client and SDK" -category = "main" optional = true python-versions = ">=3.8" files = [ @@ -6256,7 +6065,6 @@ grpc = ["googleapis-common-protos (>=1.53.0)", "grpc-gateway-protoc-gen-openapiv name = "pinecone-text" version = "0.4.2" description = "Text utilities library by Pinecone.io" -category = "main" optional = true python-versions = ">=3.8,<4.0" files = [ @@ -6276,7 +6084,6 @@ wget = ">=3.2,<4.0" name = "pkgutil-resolve-name" version = "1.3.10" description = "Resolve a name to an object." -category = "main" optional = false python-versions = ">=3.6" files = [ @@ -6288,7 +6095,6 @@ files = [ name = "platformdirs" version = "3.10.0" description = "A small Python package for determining appropriate platform-specific dirs, e.g. a \"user data dir\"." -category = "main" optional = false python-versions = ">=3.7" files = [ @@ -6304,7 +6110,6 @@ test = ["appdirs (==1.4.4)", "covdefaults (>=2.3)", "pytest (>=7.4)", "pytest-co name = "playwright" version = "1.37.0" description = "A high-level API to automate web browsers" -category = "dev" optional = false python-versions = ">=3.8" files = [ @@ -6326,7 +6131,6 @@ typing-extensions = {version = "*", markers = "python_version <= \"3.8\""} name = "pluggy" version = "1.2.0" description = "plugin and hook calling mechanisms for python" -category = "dev" optional = false python-versions = ">=3.7" files = [ @@ -6342,7 +6146,6 @@ testing = ["pytest", "pytest-benchmark"] name = "pooch" version = "1.7.0" description = "\"Pooch manages your Python library's sample data files: it automatically downloads and stores them in a local directory, with support for versioning and corruption checks.\"" -category = "main" optional = true python-versions = ">=3.7" files = [ @@ -6364,7 +6167,6 @@ xxhash = ["xxhash (>=1.4.3)"] name = "portalocker" version = "2.7.0" description = "Wraps the portalocker recipe for easy usage" -category = "main" optional = true python-versions = ">=3.5" files = [ @@ -6384,7 +6186,6 @@ tests = ["pytest (>=5.4.1)", "pytest-cov (>=2.8.1)", "pytest-mypy (>=0.8.0)", "p name = "pox" version = "0.3.3" description = "utilities for filesystem exploration and automated builds" -category = "main" optional = true python-versions = ">=3.7" files = [ @@ -6396,7 +6197,6 @@ files = [ name = "ppft" version = "1.7.6.7" description = "distributed and parallel Python" -category = "main" optional = true python-versions = ">=3.7" files = [ @@ -6411,7 +6211,6 @@ dill = ["dill (>=0.3.7)"] name = "prometheus-client" version = "0.17.1" description = "Python client for the Prometheus monitoring system." -category = "dev" optional = false python-versions = ">=3.6" files = [ @@ -6426,7 +6225,6 @@ twisted = ["twisted"] name = "prompt-toolkit" version = "3.0.39" description = "Library for building powerful interactive command lines in Python" -category = "dev" optional = false python-versions = ">=3.7.0" files = [ @@ -6441,7 +6239,6 @@ wcwidth = "*" name = "protobuf" version = "3.20.3" description = "Protocol Buffers" -category = "main" optional = false python-versions = ">=3.7" files = [ @@ -6473,7 +6270,6 @@ files = [ name = "psutil" version = "5.9.5" description = "Cross-platform lib for process and system monitoring in Python." -category = "dev" optional = false python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*" files = [ @@ -6500,7 +6296,6 @@ test = ["enum34", "ipaddress", "mock", "pywin32", "wmi"] name = "psychicapi" version = "0.8.4" description = "Psychic.dev is an open-source data integration platform for LLMs. This is the Python client for Psychic" -category = "main" optional = true python-versions = "*" files = [ @@ -6511,11 +6306,30 @@ files = [ [package.dependencies] requests = "*" +[[package]] +name = "psycopg2" +version = "2.9.7" +description = "psycopg2 - Python-PostgreSQL Database Adapter" +optional = true +python-versions = ">=3.6" +files = [ + {file = "psycopg2-2.9.7-cp310-cp310-win32.whl", hash = "sha256:1a6a2d609bce44f78af4556bea0c62a5e7f05c23e5ea9c599e07678995609084"}, + {file = "psycopg2-2.9.7-cp310-cp310-win_amd64.whl", hash = "sha256:b22ed9c66da2589a664e0f1ca2465c29b75aaab36fa209d4fb916025fb9119e5"}, + {file = "psycopg2-2.9.7-cp311-cp311-win32.whl", hash = "sha256:44d93a0109dfdf22fe399b419bcd7fa589d86895d3931b01fb321d74dadc68f1"}, + {file = "psycopg2-2.9.7-cp311-cp311-win_amd64.whl", hash = "sha256:91e81a8333a0037babfc9fe6d11e997a9d4dac0f38c43074886b0d9dead94fe9"}, + {file = "psycopg2-2.9.7-cp37-cp37m-win32.whl", hash = "sha256:d1210fcf99aae6f728812d1d2240afc1dc44b9e6cba526a06fb8134f969957c2"}, + {file = "psycopg2-2.9.7-cp37-cp37m-win_amd64.whl", hash = "sha256:e9b04cbef584310a1ac0f0d55bb623ca3244c87c51187645432e342de9ae81a8"}, + {file = "psycopg2-2.9.7-cp38-cp38-win32.whl", hash = "sha256:d5c5297e2fbc8068d4255f1e606bfc9291f06f91ec31b2a0d4c536210ac5c0a2"}, + {file = "psycopg2-2.9.7-cp38-cp38-win_amd64.whl", hash = "sha256:8275abf628c6dc7ec834ea63f6f3846bf33518907a2b9b693d41fd063767a866"}, + {file = "psycopg2-2.9.7-cp39-cp39-win32.whl", hash = "sha256:c7949770cafbd2f12cecc97dea410c514368908a103acf519f2a346134caa4d5"}, + {file = "psycopg2-2.9.7-cp39-cp39-win_amd64.whl", hash = "sha256:b6bd7d9d3a7a63faae6edf365f0ed0e9b0a1aaf1da3ca146e6b043fb3eb5d723"}, + {file = "psycopg2-2.9.7.tar.gz", hash = "sha256:f00cc35bd7119f1fed17b85bd1007855194dde2cbd8de01ab8ebb17487440ad8"}, +] + [[package]] name = "psycopg2-binary" version = "2.9.7" description = "psycopg2 - Python-PostgreSQL Database Adapter" -category = "main" optional = true python-versions = ">=3.6" files = [ @@ -6585,7 +6399,6 @@ files = [ name = "ptyprocess" version = "0.7.0" description = "Run a subprocess in a pseudo terminal" -category = "main" optional = false python-versions = "*" files = [ @@ -6597,7 +6410,6 @@ files = [ name = "pure-eval" version = "0.2.2" description = "Safely evaluate AST nodes without side effects" -category = "dev" optional = false python-versions = "*" files = [ @@ -6612,7 +6424,6 @@ tests = ["pytest"] name = "py" version = "1.11.0" description = "library with cross-python path, ini-parsing, io, code, log facilities" -category = "main" optional = true python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.*" files = [ @@ -6624,7 +6435,6 @@ files = [ name = "py-trello" version = "0.19.0" description = "Python wrapper around the Trello API" -category = "main" optional = true python-versions = "*" files = [ @@ -6641,7 +6451,6 @@ requests-oauthlib = ">=0.4.1" name = "py4j" version = "0.10.9.7" description = "Enables Python programs to dynamically access arbitrary Java objects" -category = "main" optional = true python-versions = "*" files = [ @@ -6653,7 +6462,6 @@ files = [ name = "pyaes" version = "1.6.1" description = "Pure-Python Implementation of the AES block-cipher and common modes of operation" -category = "main" optional = true python-versions = "*" files = [ @@ -6664,7 +6472,6 @@ files = [ name = "pyarrow" version = "12.0.1" description = "Python library for Apache Arrow" -category = "main" optional = true python-versions = ">=3.7" files = [ @@ -6702,7 +6509,6 @@ numpy = ">=1.16.6" name = "pyasn1" version = "0.5.0" description = "Pure-Python implementation of ASN.1 types and DER/BER/CER codecs (X.208)" -category = "main" optional = true python-versions = "!=3.0.*,!=3.1.*,!=3.2.*,!=3.3.*,!=3.4.*,!=3.5.*,>=2.7" files = [ @@ -6714,7 +6520,6 @@ files = [ name = "pyasn1-modules" version = "0.3.0" description = "A collection of ASN.1-based protocols modules" -category = "main" optional = true python-versions = "!=3.0.*,!=3.1.*,!=3.2.*,!=3.3.*,!=3.4.*,!=3.5.*,>=2.7" files = [ @@ -6729,7 +6534,6 @@ pyasn1 = ">=0.4.6,<0.6.0" name = "pycares" version = "4.3.0" description = "Python interface for c-ares" -category = "main" optional = true python-versions = "*" files = [ @@ -6797,7 +6601,6 @@ idna = ["idna (>=2.1)"] name = "pycparser" version = "2.21" description = "C parser in Python" -category = "main" optional = false python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*" files = [ @@ -6809,7 +6612,6 @@ files = [ name = "pydantic" version = "1.10.12" description = "Data validation and settings management using python type hints" -category = "main" optional = false python-versions = ">=3.7" files = [ @@ -6862,7 +6664,6 @@ email = ["email-validator (>=1.0.3)"] name = "pydeck" version = "0.8.0" description = "Widget for deck.gl maps" -category = "main" optional = true python-versions = ">=3.7" files = [ @@ -6882,7 +6683,6 @@ jupyter = ["ipykernel (>=5.1.2)", "ipython (>=5.8.0)", "ipywidgets (>=7,<8)", "t name = "pyee" version = "9.0.4" description = "A port of node.js's EventEmitter to python." -category = "dev" optional = false python-versions = "*" files = [ @@ -6897,7 +6697,6 @@ typing-extensions = "*" name = "pygments" version = "2.16.1" description = "Pygments is a syntax highlighting package written in Python." -category = "main" optional = false python-versions = ">=3.7" files = [ @@ -6912,7 +6711,6 @@ plugins = ["importlib-metadata"] name = "pyjwt" version = "2.8.0" description = "JSON Web Token implementation in Python" -category = "main" optional = true python-versions = ">=3.7" files = [ @@ -6933,7 +6731,6 @@ tests = ["coverage[toml] (==5.0.4)", "pytest (>=6.0.0,<7.0.0)"] name = "pylance" version = "0.5.10" description = "python wrapper for lance-rs" -category = "main" optional = true python-versions = ">=3.8" files = [ @@ -6955,7 +6752,6 @@ tests = ["duckdb", "ml_dtypes", "pandas (>=1.4)", "polars[pandas,pyarrow]", "pyt name = "pymongo" version = "4.5.0" description = "Python driver for MongoDB " -category = "main" optional = true python-versions = ">=3.7" files = [ @@ -7057,7 +6853,6 @@ zstd = ["zstandard"] name = "pympler" version = "1.0.1" description = "A development tool to measure, monitor and analyze the memory behavior of Python objects." -category = "main" optional = true python-versions = ">=3.6" files = [ @@ -7069,7 +6864,6 @@ files = [ name = "pymupdf" version = "1.22.5" description = "Python bindings for the PDF toolkit and renderer MuPDF" -category = "main" optional = true python-versions = ">=3.7" files = [ @@ -7109,7 +6903,6 @@ files = [ name = "pyowm" version = "3.3.0" description = "A Python wrapper around OpenWeatherMap web APIs" -category = "main" optional = true python-versions = ">=3.7" files = [ @@ -7129,7 +6922,6 @@ requests = [ name = "pyparsing" version = "3.1.1" description = "pyparsing module - Classes and methods to define and execute parsing grammars" -category = "main" optional = true python-versions = ">=3.6.8" files = [ @@ -7144,7 +6936,6 @@ diagrams = ["jinja2", "railroad-diagrams"] name = "pypdf" version = "3.15.2" description = "A pure-python PDF library capable of splitting, merging, cropping, and transforming PDF files" -category = "main" optional = true python-versions = ">=3.6" files = [ @@ -7166,7 +6957,6 @@ image = ["Pillow (>=8.0.0)"] name = "pypdfium2" version = "4.18.0" description = "Python bindings to PDFium" -category = "main" optional = true python-versions = ">=3.6" files = [ @@ -7188,7 +6978,6 @@ files = [ name = "pyphen" version = "0.14.0" description = "Pure Python module to hyphenate text" -category = "main" optional = true python-versions = ">=3.7" files = [ @@ -7204,7 +6993,6 @@ test = ["flake8", "isort", "pytest"] name = "pyproj" version = "3.5.0" description = "Python interface to PROJ (cartographic projections and coordinate transformations library)" -category = "main" optional = true python-versions = ">=3.8" files = [ @@ -7252,7 +7040,6 @@ certifi = "*" name = "pyproject-hooks" version = "1.0.0" description = "Wrappers to call pyproject.toml-based build backend hooks." -category = "main" optional = true python-versions = ">=3.7" files = [ @@ -7267,7 +7054,6 @@ tomli = {version = ">=1.1.0", markers = "python_version < \"3.11\""} name = "pysocks" version = "1.7.1" description = "A Python SOCKS client module. See https://github.com/Anorov/PySocks for more information." -category = "main" optional = true python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*" files = [ @@ -7280,7 +7066,6 @@ files = [ name = "pyspark" version = "3.4.1" description = "Apache Spark Python API" -category = "main" optional = true python-versions = ">=3.7" files = [ @@ -7301,7 +7086,6 @@ sql = ["numpy (>=1.15)", "pandas (>=1.0.5)", "pyarrow (>=1.0.0)"] name = "pytesseract" version = "0.3.10" description = "Python-tesseract is a python wrapper for Google's Tesseract-OCR" -category = "main" optional = true python-versions = ">=3.7" files = [ @@ -7317,7 +7101,6 @@ Pillow = ">=8.0.0" name = "pytest" version = "7.4.0" description = "pytest: simple powerful testing with Python" -category = "dev" optional = false python-versions = ">=3.7" files = [ @@ -7340,7 +7123,6 @@ testing = ["argcomplete", "attrs (>=19.2.0)", "hypothesis (>=3.56)", "mock", "no name = "pytest-asyncio" version = "0.20.3" description = "Pytest support for asyncio" -category = "dev" optional = false python-versions = ">=3.7" files = [ @@ -7359,7 +7141,6 @@ testing = ["coverage (>=6.2)", "flaky (>=3.5.0)", "hypothesis (>=5.7.1)", "mypy name = "pytest-cov" version = "4.1.0" description = "Pytest plugin for measuring coverage." -category = "dev" optional = false python-versions = ">=3.7" files = [ @@ -7378,7 +7159,6 @@ testing = ["fields", "hunter", "process-tests", "pytest-xdist", "six", "virtuale name = "pytest-dotenv" version = "0.5.2" description = "A py.test plugin that parses environment files before running tests" -category = "dev" optional = false python-versions = "*" files = [ @@ -7394,7 +7174,6 @@ python-dotenv = ">=0.9.1" name = "pytest-mock" version = "3.11.1" description = "Thin-wrapper around the mock package for easier use with pytest" -category = "dev" optional = false python-versions = ">=3.7" files = [ @@ -7412,7 +7191,6 @@ dev = ["pre-commit", "pytest-asyncio", "tox"] name = "pytest-socket" version = "0.6.0" description = "Pytest Plugin to disable socket calls during tests" -category = "dev" optional = false python-versions = ">=3.7,<4.0" files = [ @@ -7427,7 +7205,6 @@ pytest = ">=3.6.3" name = "pytest-vcr" version = "1.0.2" description = "Plugin for managing VCR.py cassettes" -category = "dev" optional = false python-versions = "*" files = [ @@ -7443,7 +7220,6 @@ vcrpy = "*" name = "pytest-watcher" version = "0.2.6" description = "Continiously runs pytest on changes in *.py files" -category = "dev" optional = false python-versions = ">=3.7.0,<4.0.0" files = [ @@ -7458,7 +7234,6 @@ watchdog = ">=2.0.0" name = "python-arango" version = "7.6.0" description = "Python Driver for ArangoDB" -category = "main" optional = true python-versions = ">=3.8" files = [ @@ -7482,7 +7257,6 @@ dev = ["black (>=22.3.0)", "flake8 (>=4.0.1)", "isort (>=5.10.1)", "mock", "mypy name = "python-dateutil" version = "2.8.2" description = "Extensions to the standard Python datetime module" -category = "main" optional = false python-versions = "!=3.0.*,!=3.1.*,!=3.2.*,>=2.7" files = [ @@ -7497,7 +7271,6 @@ six = ">=1.5" name = "python-dotenv" version = "1.0.0" description = "Read key-value pairs from a .env file and set them as environment variables" -category = "main" optional = false python-versions = ">=3.8" files = [ @@ -7512,7 +7285,6 @@ cli = ["click (>=5.0)"] name = "python-json-logger" version = "2.0.7" description = "A python library adding a json log formatter" -category = "dev" optional = false python-versions = ">=3.6" files = [ @@ -7524,7 +7296,6 @@ files = [ name = "python-rapidjson" version = "1.10" description = "Python wrapper around rapidjson" -category = "main" optional = true python-versions = ">=3.6" files = [ @@ -7590,7 +7361,6 @@ files = [ name = "pytz" version = "2023.3" description = "World timezone definitions, modern and historical" -category = "main" optional = false python-versions = "*" files = [ @@ -7602,7 +7372,6 @@ files = [ name = "pytz-deprecation-shim" version = "0.1.0.post0" description = "Shims to make deprecation of pytz easier" -category = "main" optional = true python-versions = "!=3.0.*,!=3.1.*,!=3.2.*,!=3.3.*,!=3.4.*,!=3.5.*,>=2.7" files = [ @@ -7618,7 +7387,6 @@ tzdata = {version = "*", markers = "python_version >= \"3.6\""} name = "pyvespa" version = "0.33.0" description = "Python API for vespa.ai" -category = "main" optional = true python-versions = ">=3.6" files = [ @@ -7643,7 +7411,6 @@ ml = ["keras-tuner", "tensorflow", "tensorflow-ranking", "torch (<1.13)", "trans name = "pywin32" version = "306" description = "Python for Window Extensions" -category = "main" optional = false python-versions = "*" files = [ @@ -7667,7 +7434,6 @@ files = [ name = "pywinpty" version = "2.0.11" description = "Pseudo terminal support for Windows from Python." -category = "dev" optional = false python-versions = ">=3.8" files = [ @@ -7682,7 +7448,6 @@ files = [ name = "pyyaml" version = "6.0.1" description = "YAML parser and emitter for Python" -category = "main" optional = false python-versions = ">=3.6" files = [ @@ -7691,6 +7456,7 @@ files = [ {file = "PyYAML-6.0.1-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:69b023b2b4daa7548bcfbd4aa3da05b3a74b772db9e23b982788168117739938"}, {file = "PyYAML-6.0.1-cp310-cp310-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:81e0b275a9ecc9c0c0c07b4b90ba548307583c125f54d5b6946cfee6360c733d"}, {file = "PyYAML-6.0.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:ba336e390cd8e4d1739f42dfe9bb83a3cc2e80f567d8805e11b46f4a943f5515"}, + {file = "PyYAML-6.0.1-cp310-cp310-musllinux_1_1_x86_64.whl", hash = "sha256:326c013efe8048858a6d312ddd31d56e468118ad4cdeda36c719bf5bb6192290"}, {file = "PyYAML-6.0.1-cp310-cp310-win32.whl", hash = "sha256:bd4af7373a854424dabd882decdc5579653d7868b8fb26dc7d0e99f823aa5924"}, {file = "PyYAML-6.0.1-cp310-cp310-win_amd64.whl", hash = "sha256:fd1592b3fdf65fff2ad0004b5e363300ef59ced41c2e6b3a99d4089fa8c5435d"}, {file = "PyYAML-6.0.1-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:6965a7bc3cf88e5a1c3bd2e0b5c22f8d677dc88a455344035f03399034eb3007"}, @@ -7698,8 +7464,15 @@ files = [ {file = "PyYAML-6.0.1-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:42f8152b8dbc4fe7d96729ec2b99c7097d656dc1213a3229ca5383f973a5ed6d"}, {file = "PyYAML-6.0.1-cp311-cp311-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:062582fca9fabdd2c8b54a3ef1c978d786e0f6b3a1510e0ac93ef59e0ddae2bc"}, {file = "PyYAML-6.0.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:d2b04aac4d386b172d5b9692e2d2da8de7bfb6c387fa4f801fbf6fb2e6ba4673"}, + {file = "PyYAML-6.0.1-cp311-cp311-musllinux_1_1_x86_64.whl", hash = "sha256:e7d73685e87afe9f3b36c799222440d6cf362062f78be1013661b00c5c6f678b"}, {file = "PyYAML-6.0.1-cp311-cp311-win32.whl", hash = "sha256:1635fd110e8d85d55237ab316b5b011de701ea0f29d07611174a1b42f1444741"}, {file = "PyYAML-6.0.1-cp311-cp311-win_amd64.whl", hash = "sha256:bf07ee2fef7014951eeb99f56f39c9bb4af143d8aa3c21b1677805985307da34"}, + {file = "PyYAML-6.0.1-cp312-cp312-macosx_10_9_x86_64.whl", hash = "sha256:855fb52b0dc35af121542a76b9a84f8d1cd886ea97c84703eaa6d88e37a2ad28"}, + {file = "PyYAML-6.0.1-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:40df9b996c2b73138957fe23a16a4f0ba614f4c0efce1e9406a184b6d07fa3a9"}, + {file = "PyYAML-6.0.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:6c22bec3fbe2524cde73d7ada88f6566758a8f7227bfbf93a408a9d86bcc12a0"}, + {file = "PyYAML-6.0.1-cp312-cp312-musllinux_1_1_x86_64.whl", hash = "sha256:8d4e9c88387b0f5c7d5f281e55304de64cf7f9c0021a3525bd3b1c542da3b0e4"}, + {file = "PyYAML-6.0.1-cp312-cp312-win32.whl", hash = "sha256:d483d2cdf104e7c9fa60c544d92981f12ad66a457afae824d146093b8c294c54"}, + {file = "PyYAML-6.0.1-cp312-cp312-win_amd64.whl", hash = "sha256:0d3304d8c0adc42be59c5f8a4d9e3d7379e6955ad754aa9d6ab7a398b59dd1df"}, {file = "PyYAML-6.0.1-cp36-cp36m-macosx_10_9_x86_64.whl", hash = "sha256:50550eb667afee136e9a77d6dc71ae76a44df8b3e51e41b77f6de2932bfe0f47"}, {file = "PyYAML-6.0.1-cp36-cp36m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:1fe35611261b29bd1de0070f0b2f47cb6ff71fa6595c077e42bd0c419fa27b98"}, {file = "PyYAML-6.0.1-cp36-cp36m-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:704219a11b772aea0d8ecd7058d0082713c3562b4e271b849ad7dc4a5c90c13c"}, @@ -7716,6 +7489,7 @@ files = [ {file = "PyYAML-6.0.1-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:a0cd17c15d3bb3fa06978b4e8958dcdc6e0174ccea823003a106c7d4d7899ac5"}, {file = "PyYAML-6.0.1-cp38-cp38-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:28c119d996beec18c05208a8bd78cbe4007878c6dd15091efb73a30e90539696"}, {file = "PyYAML-6.0.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:7e07cbde391ba96ab58e532ff4803f79c4129397514e1413a7dc761ccd755735"}, + {file = "PyYAML-6.0.1-cp38-cp38-musllinux_1_1_x86_64.whl", hash = "sha256:49a183be227561de579b4a36efbb21b3eab9651dd81b1858589f796549873dd6"}, {file = "PyYAML-6.0.1-cp38-cp38-win32.whl", hash = "sha256:184c5108a2aca3c5b3d3bf9395d50893a7ab82a38004c8f61c258d4428e80206"}, {file = "PyYAML-6.0.1-cp38-cp38-win_amd64.whl", hash = "sha256:1e2722cc9fbb45d9b87631ac70924c11d3a401b2d7f410cc0e3bbf249f2dca62"}, {file = "PyYAML-6.0.1-cp39-cp39-macosx_10_9_x86_64.whl", hash = "sha256:9eb6caa9a297fc2c2fb8862bc5370d0303ddba53ba97e71f08023b6cd73d16a8"}, @@ -7723,6 +7497,7 @@ files = [ {file = "PyYAML-6.0.1-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:5773183b6446b2c99bb77e77595dd486303b4faab2b086e7b17bc6bef28865f6"}, {file = "PyYAML-6.0.1-cp39-cp39-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:b786eecbdf8499b9ca1d697215862083bd6d2a99965554781d0d8d1ad31e13a0"}, {file = "PyYAML-6.0.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:bc1bf2925a1ecd43da378f4db9e4f799775d6367bdb94671027b73b393a7c42c"}, + {file = "PyYAML-6.0.1-cp39-cp39-musllinux_1_1_x86_64.whl", hash = "sha256:04ac92ad1925b2cff1db0cfebffb6ffc43457495c9b3c39d3fcae417d7125dc5"}, {file = "PyYAML-6.0.1-cp39-cp39-win32.whl", hash = "sha256:faca3bdcf85b2fc05d06ff3fbc1f83e1391b3e724afa3feba7d13eeab355484c"}, {file = "PyYAML-6.0.1-cp39-cp39-win_amd64.whl", hash = "sha256:510c9deebc5c0225e8c96813043e62b680ba2f9c50a08d3724c7f28a747d1486"}, {file = "PyYAML-6.0.1.tar.gz", hash = "sha256:bfdf460b1736c775f2ba9f6a92bca30bc2095067b8a9d77876d1fad6cc3b4a43"}, @@ -7732,7 +7507,6 @@ files = [ name = "pyzmq" version = "25.1.1" description = "Python bindings for 0MQ" -category = "dev" optional = false python-versions = ">=3.6" files = [ @@ -7838,7 +7612,6 @@ cffi = {version = "*", markers = "implementation_name == \"pypy\""} name = "qdrant-client" version = "1.4.0" description = "Client library for the Qdrant vector search engine" -category = "main" optional = true python-versions = ">=3.7,<3.12" files = [ @@ -7859,7 +7632,6 @@ urllib3 = ">=1.26.14,<2.0.0" name = "qtconsole" version = "5.4.3" description = "Jupyter Qt console" -category = "dev" optional = false python-versions = ">= 3.7" files = [ @@ -7886,7 +7658,6 @@ test = ["flaky", "pytest", "pytest-qt"] name = "qtpy" version = "2.3.1" description = "Provides an abstraction layer on top of the various Qt bindings (PyQt5/6 and PySide2/6)." -category = "dev" optional = false python-versions = ">=3.7" files = [ @@ -7904,7 +7675,6 @@ test = ["pytest (>=6,!=7.0.0,!=7.0.1)", "pytest-cov (>=3.0.0)", "pytest-qt"] name = "rank-bm25" version = "0.2.2" description = "Various BM25 algorithms for document ranking" -category = "main" optional = true python-versions = "*" files = [ @@ -7922,7 +7692,6 @@ dev = ["pytest"] name = "rapidfuzz" version = "3.2.0" description = "rapid fuzzy string matching" -category = "main" optional = true python-versions = ">=3.7" files = [ @@ -8027,7 +7796,6 @@ full = ["numpy"] name = "ratelimiter" version = "1.2.0.post0" description = "Simple python rate limiting object" -category = "main" optional = true python-versions = "*" files = [ @@ -8042,7 +7810,6 @@ test = ["pytest (>=3.0)", "pytest-asyncio"] name = "rdflib" version = "6.3.2" description = "RDFLib is a Python library for working with RDF, a simple yet powerful language for representing information." -category = "main" optional = true python-versions = ">=3.7,<4.0" files = [ @@ -8064,7 +7831,6 @@ networkx = ["networkx (>=2.0.0,<3.0.0)"] name = "redis" version = "4.6.0" description = "Python client for Redis database and key-value store" -category = "main" optional = true python-versions = ">=3.7" files = [ @@ -8083,7 +7849,6 @@ ocsp = ["cryptography (>=36.0.1)", "pyopenssl (==20.0.1)", "requests (>=2.26.0)" name = "referencing" version = "0.30.2" description = "JSON Referencing + Python" -category = "main" optional = false python-versions = ">=3.8" files = [ @@ -8099,7 +7864,6 @@ rpds-py = ">=0.7.0" name = "regex" version = "2023.8.8" description = "Alternative regular expression module, to replace re." -category = "main" optional = false python-versions = ">=3.6" files = [ @@ -8197,7 +7961,6 @@ files = [ name = "requests" version = "2.31.0" description = "Python HTTP for Humans." -category = "main" optional = false python-versions = ">=3.7" files = [ @@ -8220,7 +7983,6 @@ use-chardet-on-py3 = ["chardet (>=3.0.2,<6)"] name = "requests-file" version = "1.5.1" description = "File transport adapter for Requests" -category = "main" optional = true python-versions = "*" files = [ @@ -8236,7 +7998,6 @@ six = "*" name = "requests-oauthlib" version = "1.3.1" description = "OAuthlib authentication support for Requests." -category = "main" optional = true python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*" files = [ @@ -8255,7 +8016,6 @@ rsa = ["oauthlib[signedtoken] (>=3.0.0)"] name = "requests-toolbelt" version = "1.0.0" description = "A utility belt for advanced users of python-requests" -category = "main" optional = true python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*" files = [ @@ -8270,7 +8030,6 @@ requests = ">=2.0.1,<3.0.0" name = "responses" version = "0.22.0" description = "A utility library for mocking out the `requests` Python library." -category = "dev" optional = false python-versions = ">=3.7" files = [ @@ -8291,7 +8050,6 @@ tests = ["coverage (>=6.0.0)", "flake8", "mypy", "pytest (>=7.0.0)", "pytest-asy name = "retry" version = "0.9.2" description = "Easy to use retry decorator." -category = "main" optional = true python-versions = "*" files = [ @@ -8307,7 +8065,6 @@ py = ">=1.4.26,<2.0.0" name = "rfc3339-validator" version = "0.1.4" description = "A pure python RFC3339 validator" -category = "dev" optional = false python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.*" files = [ @@ -8322,7 +8079,6 @@ six = "*" name = "rfc3986-validator" version = "0.1.1" description = "Pure python rfc3986 validator" -category = "dev" optional = false python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.*" files = [ @@ -8334,7 +8090,6 @@ files = [ name = "rich" version = "13.5.2" description = "Render rich text, tables, progress bars, syntax highlighting, markdown and more to the terminal" -category = "main" optional = true python-versions = ">=3.7.0" files = [ @@ -8354,7 +8109,6 @@ jupyter = ["ipywidgets (>=7.5.1,<9)"] name = "rpds-py" version = "0.9.2" description = "Python bindings to Rust's persistent data structures (rpds)" -category = "main" optional = false python-versions = ">=3.8" files = [ @@ -8461,7 +8215,6 @@ files = [ name = "rsa" version = "4.9" description = "Pure-Python RSA implementation" -category = "main" optional = true python-versions = ">=3.6,<4" files = [ @@ -8476,7 +8229,6 @@ pyasn1 = ">=0.1.3" name = "ruff" version = "0.0.249" description = "An extremely fast Python linter, written in Rust." -category = "dev" optional = false python-versions = ">=3.7" files = [ @@ -8503,7 +8255,6 @@ files = [ name = "s3transfer" version = "0.6.2" description = "An Amazon S3 Transfer Manager" -category = "main" optional = true python-versions = ">= 3.7" files = [ @@ -8521,7 +8272,6 @@ crt = ["botocore[crt] (>=1.20.29,<2.0a.0)"] name = "safetensors" version = "0.3.2" description = "Fast and Safe Tensor serialization" -category = "main" optional = true python-versions = "*" files = [ @@ -8595,7 +8345,6 @@ torch = ["torch (>=1.10)"] name = "scikit-learn" version = "1.3.0" description = "A set of python modules for machine learning and data mining" -category = "main" optional = true python-versions = ">=3.8" files = [ @@ -8638,7 +8387,6 @@ tests = ["black (>=23.3.0)", "matplotlib (>=3.1.3)", "mypy (>=1.3)", "numpydoc ( name = "scipy" version = "1.9.3" description = "Fundamental algorithms for scientific computing in Python" -category = "main" optional = true python-versions = ">=3.8" files = [ @@ -8677,7 +8425,6 @@ test = ["asv", "gmpy2", "mpmath", "pytest", "pytest-cov", "pytest-xdist", "sciki name = "semver" version = "3.0.1" description = "Python helper for Semantic Versioning (https://semver.org)" -category = "main" optional = true python-versions = ">=3.7" files = [ @@ -8689,7 +8436,6 @@ files = [ name = "send2trash" version = "1.8.2" description = "Send file to trash natively under Mac OS X, Windows and Linux" -category = "dev" optional = false python-versions = "!=3.0.*,!=3.1.*,!=3.2.*,!=3.3.*,!=3.4.*,>=2.7" files = [ @@ -8706,7 +8452,6 @@ win32 = ["pywin32"] name = "sentence-transformers" version = "2.2.2" description = "Multilingual text embeddings" -category = "main" optional = true python-versions = ">=3.6.0" files = [ @@ -8729,7 +8474,6 @@ transformers = ">=4.6.0,<5.0.0" name = "sentencepiece" version = "0.1.99" description = "SentencePiece python wrapper" -category = "main" optional = true python-versions = "*" files = [ @@ -8784,7 +8528,6 @@ files = [ name = "setuptools" version = "67.8.0" description = "Easily download, build, install, upgrade, and uninstall Python packages" -category = "main" optional = false python-versions = ">=3.7" files = [ @@ -8801,7 +8544,6 @@ testing-integration = ["build[virtualenv]", "filelock (>=3.4.0)", "jaraco.envs ( name = "sgmllib3k" version = "1.0.0" description = "Py3k port of sgmllib." -category = "main" optional = true python-versions = "*" files = [ @@ -8812,7 +8554,6 @@ files = [ name = "shapely" version = "2.0.1" description = "Manipulation and analysis of geometric objects" -category = "main" optional = true python-versions = ">=3.7" files = [ @@ -8860,14 +8601,13 @@ files = [ numpy = ">=1.14" [package.extras] -docs = ["matplotlib", "numpydoc (>=1.1.0,<1.2.0)", "sphinx", "sphinx-book-theme", "sphinx-remove-toctrees"] +docs = ["matplotlib", "numpydoc (==1.1.*)", "sphinx", "sphinx-book-theme", "sphinx-remove-toctrees"] test = ["pytest", "pytest-cov"] [[package]] name = "singlestoredb" version = "0.7.1" description = "Interface to the SingleStore database and cluster management APIs" -category = "main" optional = true python-versions = ">=3.6" files = [ @@ -8900,7 +8640,6 @@ sqlalchemy = ["sqlalchemy-singlestoredb"] name = "six" version = "1.16.0" description = "Python 2 and 3 compatibility utilities" -category = "main" optional = false python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*" files = [ @@ -8912,7 +8651,6 @@ files = [ name = "smmap" version = "5.0.0" description = "A pure Python implementation of a sliding window memory map manager" -category = "main" optional = true python-versions = ">=3.6" files = [ @@ -8924,7 +8662,6 @@ files = [ name = "sniffio" version = "1.3.0" description = "Sniff out which async library your code is running under" -category = "main" optional = false python-versions = ">=3.7" files = [ @@ -8936,7 +8673,6 @@ files = [ name = "socksio" version = "1.0.0" description = "Sans-I/O implementation of SOCKS4, SOCKS4A, and SOCKS5." -category = "main" optional = true python-versions = ">=3.6" files = [ @@ -8948,7 +8684,6 @@ files = [ name = "soundfile" version = "0.12.1" description = "An audio library based on libsndfile, CFFI and NumPy" -category = "main" optional = true python-versions = "*" files = [ @@ -8972,7 +8707,6 @@ numpy = ["numpy"] name = "soupsieve" version = "2.4.1" description = "A modern CSS selector implementation for Beautiful Soup." -category = "main" optional = false python-versions = ">=3.7" files = [ @@ -8984,7 +8718,6 @@ files = [ name = "soxr" version = "0.3.6" description = "High quality, one-dimensional sample-rate conversion library" -category = "main" optional = true python-versions = ">=3.6" files = [ @@ -9026,7 +8759,6 @@ test = ["pytest"] name = "sqlalchemy" version = "2.0.20" description = "Database Abstraction Library" -category = "main" optional = false python-versions = ">=3.7" files = [ @@ -9074,7 +8806,7 @@ files = [ ] [package.dependencies] -greenlet = {version = "!=0.4.17", markers = "platform_machine == \"aarch64\" or platform_machine == \"ppc64le\" or platform_machine == \"x86_64\" or platform_machine == \"amd64\" or platform_machine == \"AMD64\" or platform_machine == \"win32\" or platform_machine == \"WIN32\""} +greenlet = {version = "!=0.4.17", markers = "platform_machine == \"win32\" or platform_machine == \"WIN32\" or platform_machine == \"AMD64\" or platform_machine == \"amd64\" or platform_machine == \"x86_64\" or platform_machine == \"ppc64le\" or platform_machine == \"aarch64\""} typing-extensions = ">=4.2.0" [package.extras] @@ -9105,7 +8837,6 @@ sqlcipher = ["sqlcipher3-binary"] name = "sqlite-vss" version = "0.1.2" description = "" -category = "main" optional = true python-versions = ">=3.7" files = [ @@ -9121,7 +8852,6 @@ test = ["pytest"] name = "sqlitedict" version = "2.1.0" description = "Persistent dict in Python, backed up by sqlite3 and pickle, multithread-safe." -category = "main" optional = true python-versions = "*" files = [ @@ -9132,7 +8862,6 @@ files = [ name = "sqlparams" version = "5.1.0" description = "Convert between various DB API 2.0 parameter styles." -category = "main" optional = true python-versions = ">=3.7" files = [ @@ -9144,7 +8873,6 @@ files = [ name = "stack-data" version = "0.6.2" description = "Extract data from python stack frames and tracebacks for informative displays" -category = "dev" optional = false python-versions = "*" files = [ @@ -9164,7 +8892,6 @@ tests = ["cython", "littleutils", "pygments", "pytest", "typeguard"] name = "streamlit" version = "1.26.0" description = "A faster way to build and share data apps" -category = "main" optional = true python-versions = ">=3.8, !=3.9.7" files = [ @@ -9205,7 +8932,6 @@ snowflake = ["snowflake-snowpark-python"] name = "stringcase" version = "1.2.0" description = "String case converter." -category = "main" optional = true python-versions = "*" files = [ @@ -9216,7 +8942,6 @@ files = [ name = "sympy" version = "1.12" description = "Computer algebra system (CAS) in Python" -category = "main" optional = true python-versions = ">=3.8" files = [ @@ -9231,7 +8956,6 @@ mpmath = ">=0.19" name = "syrupy" version = "4.2.1" description = "Pytest Snapshot Test Utility" -category = "dev" optional = false python-versions = ">=3.8.1,<4" files = [ @@ -9247,7 +8971,6 @@ pytest = ">=7.0.0,<8.0.0" name = "telethon" version = "1.29.3" description = "Full-featured Telegram client library for Python 3" -category = "main" optional = true python-versions = ">=3.5" files = [ @@ -9265,7 +8988,6 @@ cryptg = ["cryptg"] name = "tenacity" version = "8.2.3" description = "Retry code until it succeeds" -category = "main" optional = false python-versions = ">=3.7" files = [ @@ -9280,7 +9002,6 @@ doc = ["reno", "sphinx", "tornado (>=4.5)"] name = "tensorboard" version = "2.13.0" description = "TensorBoard lets you watch Tensors Flow" -category = "main" optional = true python-versions = ">=3.8" files = [ @@ -9305,7 +9026,6 @@ wheel = ">=0.26" name = "tensorboard-data-server" version = "0.7.1" description = "Fast data loading for TensorBoard" -category = "main" optional = true python-versions = ">=3.7" files = [ @@ -9318,7 +9038,6 @@ files = [ name = "tensorflow" version = "2.13.0" description = "TensorFlow is an open source machine learning framework for everyone." -category = "main" optional = true python-versions = ">=3.8" files = [ @@ -9371,7 +9090,6 @@ wrapt = ">=1.11.0" name = "tensorflow-estimator" version = "2.13.0" description = "TensorFlow Estimator." -category = "main" optional = true python-versions = ">=3.7" files = [ @@ -9382,7 +9100,6 @@ files = [ name = "tensorflow-hub" version = "0.14.0" description = "TensorFlow Hub is a library to foster the publication, discovery, and consumption of reusable parts of machine learning models." -category = "main" optional = true python-versions = "*" files = [ @@ -9397,7 +9114,6 @@ protobuf = ">=3.19.6" name = "tensorflow-io-gcs-filesystem" version = "0.33.0" description = "TensorFlow IO" -category = "main" optional = true python-versions = ">=3.7, <3.12" files = [ @@ -9428,7 +9144,6 @@ tensorflow-rocm = ["tensorflow-rocm (>=2.13.0,<2.14.0)"] name = "tensorflow-macos" version = "2.13.0" description = "TensorFlow is an open source machine learning framework for everyone." -category = "main" optional = true python-versions = ">=3.8" files = [ @@ -9464,7 +9179,6 @@ wrapt = ">=1.11.0" name = "tensorflow-text" version = "2.13.0" description = "TF.Text is a TensorFlow library of text related ops, modules, and subgraphs." -category = "main" optional = true python-versions = "*" files = [ @@ -9489,7 +9203,6 @@ tests = ["absl-py", "pytest", "tensorflow-datasets (>=3.2.0)"] name = "termcolor" version = "2.3.0" description = "ANSI color formatting for output in terminal" -category = "main" optional = true python-versions = ">=3.7" files = [ @@ -9504,7 +9217,6 @@ tests = ["pytest", "pytest-cov"] name = "terminado" version = "0.17.1" description = "Tornado websocket backend for the Xterm.js Javascript terminal emulator library." -category = "dev" optional = false python-versions = ">=3.7" files = [ @@ -9525,7 +9237,6 @@ test = ["pre-commit", "pytest (>=7.0)", "pytest-timeout"] name = "textstat" version = "0.7.3" description = "Calculate statistical features from text" -category = "main" optional = true python-versions = ">=3.6" files = [ @@ -9540,7 +9251,6 @@ pyphen = "*" name = "threadpoolctl" version = "3.2.0" description = "threadpoolctl" -category = "main" optional = true python-versions = ">=3.8" files = [ @@ -9552,7 +9262,6 @@ files = [ name = "tigrisdb" version = "1.0.0b6" description = "Python SDK for Tigris " -category = "main" optional = true python-versions = ">=3.8,<4.0" files = [ @@ -9568,7 +9277,6 @@ protobuf = ">=3.19.6" name = "tiktoken" version = "0.3.3" description = "tiktoken is a fast BPE tokeniser for use with OpenAI's models" -category = "main" optional = false python-versions = ">=3.8" files = [ @@ -9610,11 +9318,29 @@ requests = ">=2.26.0" [package.extras] blobfile = ["blobfile (>=2)"] +[[package]] +name = "timescale-vector" +version = "0.0.1" +description = "Python library for storing vector data in Postgres" +optional = true +python-versions = ">=3.7" +files = [ + {file = "timescale-vector-0.0.1.tar.gz", hash = "sha256:420d088b1d45e98f5b9770c76ddf826521aa6e813cb4997d24355eaeda1a7775"}, + {file = "timescale_vector-0.0.1-py3-none-any.whl", hash = "sha256:81283e8f359387bacd2bd092431a288f34c211968c53b3fed7f3fed1979f39eb"}, +] + +[package.dependencies] +asyncpg = "*" +pgvector = "*" +psycopg2 = "*" + +[package.extras] +dev = ["python-dotenv"] + [[package]] name = "tinycss2" version = "1.2.1" description = "A tiny CSS parser" -category = "dev" optional = false python-versions = ">=3.7" files = [ @@ -9633,7 +9359,6 @@ test = ["flake8", "isort", "pytest"] name = "tinysegmenter" version = "0.3" description = "Very compact Japanese tokenizer" -category = "main" optional = true python-versions = "*" files = [ @@ -9644,7 +9369,6 @@ files = [ name = "tldextract" version = "3.4.4" description = "Accurately separates a URL's subdomain, domain, and public suffix, using the Public Suffix List (PSL). By default, this includes the public ICANN TLDs and their exceptions. You can optionally support the Public Suffix List's private domains as well." -category = "main" optional = true python-versions = ">=3.7" files = [ @@ -9662,7 +9386,6 @@ requests-file = ">=1.4" name = "tokenizers" version = "0.13.3" description = "Fast and Customizable Tokenizers" -category = "main" optional = true python-versions = "*" files = [ @@ -9717,7 +9440,6 @@ testing = ["black (==22.3)", "datasets", "numpy", "pytest", "requests"] name = "toml" version = "0.10.2" description = "Python Library for Tom's Obvious, Minimal Language" -category = "main" optional = false python-versions = ">=2.6, !=3.0.*, !=3.1.*, !=3.2.*" files = [ @@ -9729,7 +9451,6 @@ files = [ name = "tomli" version = "2.0.1" description = "A lil' TOML parser" -category = "main" optional = false python-versions = ">=3.7" files = [ @@ -9741,7 +9462,6 @@ files = [ name = "toolz" version = "0.12.0" description = "List processing tools and functional utilities" -category = "main" optional = true python-versions = ">=3.5" files = [ @@ -9753,7 +9473,6 @@ files = [ name = "torch" version = "1.13.1" description = "Tensors and Dynamic neural networks in Python with strong GPU acceleration" -category = "main" optional = true python-versions = ">=3.7.0" files = [ @@ -9794,7 +9513,6 @@ opt-einsum = ["opt-einsum (>=3.3)"] name = "torchvision" version = "0.14.1" description = "image and video datasets and models for torch deep learning" -category = "main" optional = true python-versions = ">=3.7" files = [ @@ -9821,7 +9539,7 @@ files = [ [package.dependencies] numpy = "*" -pillow = ">=5.3.0,<8.3.0 || >=8.4.0" +pillow = ">=5.3.0,<8.3.dev0 || >=8.4.dev0" requests = "*" torch = "1.13.1" typing-extensions = "*" @@ -9833,7 +9551,6 @@ scipy = ["scipy"] name = "tornado" version = "6.3.3" description = "Tornado is a Python web framework and asynchronous networking library, originally developed at FriendFeed." -category = "main" optional = false python-versions = ">= 3.8" files = [ @@ -9854,7 +9571,6 @@ files = [ name = "tqdm" version = "4.66.1" description = "Fast, Extensible Progress Meter" -category = "main" optional = false python-versions = ">=3.7" files = [ @@ -9875,7 +9591,6 @@ telegram = ["requests"] name = "traitlets" version = "5.9.0" description = "Traitlets Python configuration system" -category = "dev" optional = false python-versions = ">=3.7" files = [ @@ -9891,7 +9606,6 @@ test = ["argcomplete (>=2.0)", "pre-commit", "pytest", "pytest-mock"] name = "transformers" version = "4.32.0" description = "State-of-the-art Machine Learning for JAX, PyTorch and TensorFlow" -category = "main" optional = true python-versions = ">=3.8.0" files = [ @@ -9961,7 +9675,6 @@ vision = ["Pillow (<10.0.0)"] name = "tritonclient" version = "2.34.0" description = "Python client library and utilities for communicating with Triton Inference Server" -category = "main" optional = true python-versions = "*" files = [ @@ -9983,7 +9696,6 @@ http = ["aiohttp (>=3.8.1,<4.0.0)", "geventhttpclient (>=1.4.4,<=2.0.2)", "numpy name = "types-chardet" version = "5.0.4.6" description = "Typing stubs for chardet" -category = "dev" optional = false python-versions = "*" files = [ @@ -9995,7 +9707,6 @@ files = [ name = "types-protobuf" version = "4.24.0.1" description = "Typing stubs for protobuf" -category = "dev" optional = false python-versions = "*" files = [ @@ -10007,7 +9718,6 @@ files = [ name = "types-pyopenssl" version = "23.2.0.2" description = "Typing stubs for pyOpenSSL" -category = "dev" optional = false python-versions = "*" files = [ @@ -10022,7 +9732,6 @@ cryptography = ">=35.0.0" name = "types-pytz" version = "2023.3.0.1" description = "Typing stubs for pytz" -category = "dev" optional = false python-versions = "*" files = [ @@ -10034,7 +9743,6 @@ files = [ name = "types-pyyaml" version = "6.0.12.11" description = "Typing stubs for PyYAML" -category = "dev" optional = false python-versions = "*" files = [ @@ -10046,7 +9754,6 @@ files = [ name = "types-redis" version = "4.6.0.5" description = "Typing stubs for redis" -category = "dev" optional = false python-versions = "*" files = [ @@ -10062,7 +9769,6 @@ types-pyOpenSSL = "*" name = "types-requests" version = "2.31.0.2" description = "Typing stubs for requests" -category = "main" optional = false python-versions = "*" files = [ @@ -10077,7 +9783,6 @@ types-urllib3 = "*" name = "types-toml" version = "0.10.8.7" description = "Typing stubs for toml" -category = "dev" optional = false python-versions = "*" files = [ @@ -10089,7 +9794,6 @@ files = [ name = "types-urllib3" version = "1.26.25.14" description = "Typing stubs for urllib3" -category = "main" optional = false python-versions = "*" files = [ @@ -10101,7 +9805,6 @@ files = [ name = "typing-extensions" version = "4.5.0" description = "Backported and Experimental Type Hints for Python 3.7+" -category = "main" optional = false python-versions = ">=3.7" files = [ @@ -10113,7 +9816,6 @@ files = [ name = "typing-inspect" version = "0.9.0" description = "Runtime inspection utilities for typing module." -category = "main" optional = false python-versions = "*" files = [ @@ -10129,7 +9831,6 @@ typing-extensions = ">=3.7.4" name = "tzdata" version = "2023.3" description = "Provider of IANA time zone data" -category = "main" optional = false python-versions = ">=2" files = [ @@ -10141,7 +9842,6 @@ files = [ name = "tzlocal" version = "4.3.1" description = "tzinfo object for the local timezone" -category = "main" optional = true python-versions = ">=3.7" files = [ @@ -10161,7 +9861,6 @@ devenv = ["black", "check-manifest", "flake8", "pyroma", "pytest (>=4.3)", "pyte name = "uri-template" version = "1.3.0" description = "RFC 6570 URI Template Processor" -category = "dev" optional = false python-versions = ">=3.7" files = [ @@ -10176,7 +9875,6 @@ dev = ["flake8", "flake8-annotations", "flake8-bandit", "flake8-bugbear", "flake name = "uritemplate" version = "4.1.1" description = "Implementation of RFC 6570 URI Templates" -category = "main" optional = true python-versions = ">=3.6" files = [ @@ -10188,7 +9886,6 @@ files = [ name = "urllib3" version = "1.26.16" description = "HTTP library with thread-safe connection pooling, file post, and more." -category = "main" optional = false python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.*, !=3.5.*" files = [ @@ -10205,7 +9902,6 @@ socks = ["PySocks (>=1.5.6,!=1.5.7,<2.0)"] name = "validators" version = "0.21.0" description = "Python Data Validation for Humans™" -category = "main" optional = true python-versions = ">=3.8,<4.0" files = [ @@ -10217,7 +9913,6 @@ files = [ name = "vcrpy" version = "5.1.0" description = "Automatically mock your HTTP interactions to simplify and speed up testing" -category = "dev" optional = false python-versions = ">=3.8" files = [ @@ -10235,7 +9930,6 @@ yarl = "*" name = "watchdog" version = "3.0.0" description = "Filesystem events monitoring" -category = "main" optional = false python-versions = ">=3.7" files = [ @@ -10275,7 +9969,6 @@ watchmedo = ["PyYAML (>=3.10)"] name = "wcwidth" version = "0.2.6" description = "Measures the displayed width of unicode strings in a terminal" -category = "dev" optional = false python-versions = "*" files = [ @@ -10287,7 +9980,6 @@ files = [ name = "weaviate-client" version = "3.23.0" description = "A python native Weaviate client" -category = "main" optional = true python-versions = ">=3.8" files = [ @@ -10308,7 +10000,6 @@ grpc = ["grpcio", "grpcio-tools"] name = "webcolors" version = "1.13" description = "A library for working with the color formats defined by HTML and CSS." -category = "dev" optional = false python-versions = ">=3.7" files = [ @@ -10324,7 +10015,6 @@ tests = ["pytest", "pytest-cov"] name = "webencodings" version = "0.5.1" description = "Character encoding aliases for legacy web content" -category = "dev" optional = false python-versions = "*" files = [ @@ -10336,7 +10026,6 @@ files = [ name = "websocket-client" version = "1.6.2" description = "WebSocket client for Python with low level API options" -category = "main" optional = false python-versions = ">=3.8" files = [ @@ -10353,7 +10042,6 @@ test = ["websockets"] name = "websockets" version = "11.0.3" description = "An implementation of the WebSocket Protocol (RFC 6455 & 7692)" -category = "main" optional = true python-versions = ">=3.7" files = [ @@ -10433,7 +10121,6 @@ files = [ name = "werkzeug" version = "2.3.7" description = "The comprehensive WSGI web application library." -category = "main" optional = true python-versions = ">=3.8" files = [ @@ -10451,7 +10138,6 @@ watchdog = ["watchdog (>=2.3)"] name = "wget" version = "3.2" description = "pure python download utility" -category = "main" optional = true python-versions = "*" files = [ @@ -10462,7 +10148,6 @@ files = [ name = "wheel" version = "0.41.2" description = "A built-package format for Python" -category = "main" optional = true python-versions = ">=3.7" files = [ @@ -10477,7 +10162,6 @@ test = ["pytest (>=6.0.0)", "setuptools (>=65)"] name = "whylabs-client" version = "0.5.4" description = "WhyLabs API client" -category = "main" optional = true python-versions = ">=3.6" files = [ @@ -10493,7 +10177,6 @@ urllib3 = ">=1.25.3" name = "whylogs" version = "1.2.6" description = "Profile and monitor your ML data pipeline end-to-end" -category = "main" optional = true python-versions = ">=3.7.1,<4" files = [ @@ -10527,7 +10210,6 @@ viz = ["Pillow (>=9.2.0,<10.0.0)", "ipython", "numpy", "numpy (>=1.23.2)", "pyba name = "whylogs-sketching" version = "3.4.1.dev3" description = "sketching library of whylogs" -category = "main" optional = true python-versions = "*" files = [ @@ -10568,7 +10250,6 @@ files = [ name = "widgetsnbextension" version = "4.0.8" description = "Jupyter interactive widgets for Jupyter Notebook" -category = "dev" optional = false python-versions = ">=3.7" files = [ @@ -10580,7 +10261,6 @@ files = [ name = "wikipedia" version = "1.4.0" description = "Wikipedia API for Python" -category = "main" optional = true python-versions = "*" files = [ @@ -10595,7 +10275,6 @@ requests = ">=2.0.0,<3.0.0" name = "win32-setctime" version = "1.1.0" description = "A small Python utility to set file creation time on Windows" -category = "main" optional = true python-versions = ">=3.5" files = [ @@ -10610,7 +10289,6 @@ dev = ["black (>=19.3b0)", "pytest (>=4.6.2)"] name = "wolframalpha" version = "5.0.0" description = "Wolfram|Alpha 2.0 API client" -category = "main" optional = true python-versions = ">=3.6" files = [ @@ -10631,7 +10309,6 @@ testing = ["keyring", "pmxbot", "pytest (>=3.5,!=3.7.3)", "pytest-black (>=0.3.7 name = "wonderwords" version = "2.2.0" description = "A python package for random words and sentences in the english language" -category = "main" optional = true python-versions = ">=3.6" files = [ @@ -10646,7 +10323,6 @@ cli = ["rich (==9.10.0)"] name = "wrapt" version = "1.15.0" description = "Module for decorators, wrappers and monkey patching." -category = "main" optional = false python-versions = "!=3.0.*,!=3.1.*,!=3.2.*,!=3.3.*,!=3.4.*,>=2.7" files = [ @@ -10731,7 +10407,6 @@ files = [ name = "xata" version = "1.0.0b0" description = "Python client for Xata.io" -category = "main" optional = true python-versions = ">=3.8,<4.0" files = [ @@ -10749,7 +10424,6 @@ requests = ">=2.28.1,<3.0.0" name = "xmltodict" version = "0.13.0" description = "Makes working with XML feel like you are working with JSON" -category = "main" optional = true python-versions = ">=3.4" files = [ @@ -10761,7 +10435,6 @@ files = [ name = "yarl" version = "1.9.2" description = "Yet another URL library" -category = "main" optional = false python-versions = ">=3.7" files = [ @@ -10849,7 +10522,6 @@ multidict = ">=4.0" name = "zipp" version = "3.16.2" description = "Backport of pathlib-compatible object wrapper for zip files" -category = "main" optional = false python-versions = ">=3.8" files = [ @@ -10865,7 +10537,6 @@ testing = ["big-O", "jaraco.functools", "jaraco.itertools", "more-itertools", "p name = "zstandard" version = "0.21.0" description = "Zstandard bindings for Python" -category = "main" optional = true python-versions = ">=3.7" files = [ @@ -10921,15 +10592,15 @@ cffi = {version = ">=1.11", markers = "platform_python_implementation == \"PyPy\ cffi = ["cffi (>=1.11)"] [extras] -all = ["clarifai", "cohere", "openai", "nlpcloud", "huggingface_hub", "manifest-ml", "elasticsearch", "opensearch-py", "google-search-results", "faiss-cpu", "sentence-transformers", "transformers", "nltk", "wikipedia", "beautifulsoup4", "tiktoken", "torch", "jinja2", "pinecone-client", "pinecone-text", "marqo", "pymongo", "weaviate-client", "redis", "google-api-python-client", "google-auth", "wolframalpha", "qdrant-client", "tensorflow-text", "pypdf", "networkx", "nomic", "aleph-alpha-client", "deeplake", "libdeeplake", "pgvector", "psycopg2-binary", "pyowm", "pytesseract", "html2text", "atlassian-python-api", "gptcache", "duckduckgo-search", "arxiv", "azure-identity", "clickhouse-connect", "azure-cosmos", "lancedb", "langkit", "lark", "pexpect", "pyvespa", "O365", "jq", "docarray", "pdfminer-six", "lxml", "requests-toolbelt", "neo4j", "openlm", "azure-ai-formrecognizer", "azure-ai-vision", "azure-cognitiveservices-speech", "momento", "singlestoredb", "tigrisdb", "nebula3-python", "awadb", "esprima", "rdflib", "amadeus", "librosa", "python-arango"] -azure = ["azure-identity", "azure-cosmos", "openai", "azure-core", "azure-ai-formrecognizer", "azure-ai-vision", "azure-cognitiveservices-speech", "azure-search-documents"] +all = ["O365", "aleph-alpha-client", "amadeus", "arxiv", "atlassian-python-api", "awadb", "azure-ai-formrecognizer", "azure-ai-vision", "azure-cognitiveservices-speech", "azure-cosmos", "azure-identity", "beautifulsoup4", "clarifai", "clickhouse-connect", "cohere", "deeplake", "docarray", "duckduckgo-search", "elasticsearch", "esprima", "faiss-cpu", "google-api-python-client", "google-auth", "google-search-results", "gptcache", "html2text", "huggingface_hub", "jinja2", "jq", "lancedb", "langkit", "lark", "libdeeplake", "librosa", "lxml", "manifest-ml", "marqo", "momento", "nebula3-python", "neo4j", "networkx", "nlpcloud", "nltk", "nomic", "openai", "openlm", "opensearch-py", "pdfminer-six", "pexpect", "pgvector", "pinecone-client", "pinecone-text", "psycopg2-binary", "pymongo", "pyowm", "pypdf", "pytesseract", "python-arango", "pyvespa", "qdrant-client", "rdflib", "redis", "requests-toolbelt", "sentence-transformers", "singlestoredb", "tensorflow-text", "tigrisdb", "tiktoken", "torch", "transformers", "weaviate-client", "wikipedia", "wolframalpha"] +azure = ["azure-ai-formrecognizer", "azure-ai-vision", "azure-cognitiveservices-speech", "azure-core", "azure-cosmos", "azure-identity", "azure-search-documents", "openai"] clarifai = ["clarifai"] cohere = ["cohere"] docarray = ["docarray"] embeddings = ["sentence-transformers"] -extended-testing = ["amazon-textract-caller", "assemblyai", "beautifulsoup4", "bibtexparser", "cassio", "chardet", "esprima", "jq", "pdfminer-six", "pgvector", "pypdf", "pymupdf", "pypdfium2", "tqdm", "lxml", "atlassian-python-api", "mwparserfromhell", "mwxml", "pandas", "telethon", "psychicapi", "gql", "requests-toolbelt", "html2text", "py-trello", "scikit-learn", "streamlit", "pyspark", "openai", "sympy", "rapidfuzz", "openai", "rank-bm25", "geopandas", "jinja2", "gitpython", "newspaper3k", "feedparser", "xata", "xmltodict", "faiss-cpu", "openapi-schema-pydantic", "markdownify", "dashvector", "sqlite-vss"] +extended-testing = ["amazon-textract-caller", "assemblyai", "atlassian-python-api", "beautifulsoup4", "bibtexparser", "cassio", "chardet", "dashvector", "esprima", "faiss-cpu", "feedparser", "geopandas", "gitpython", "gql", "html2text", "jinja2", "jq", "lxml", "markdownify", "mwparserfromhell", "mwxml", "newspaper3k", "openai", "openai", "openapi-schema-pydantic", "pandas", "pdfminer-six", "pgvector", "psychicapi", "py-trello", "pymupdf", "pypdf", "pypdfium2", "pyspark", "rank-bm25", "rapidfuzz", "requests-toolbelt", "scikit-learn", "sqlite-vss", "streamlit", "sympy", "telethon", "timescale-vector", "tqdm", "xata", "xmltodict"] javascript = ["esprima"] -llms = ["clarifai", "cohere", "openai", "openlm", "nlpcloud", "huggingface_hub", "manifest-ml", "torch", "transformers"] +llms = ["clarifai", "cohere", "huggingface_hub", "manifest-ml", "nlpcloud", "openai", "openlm", "torch", "transformers"] openai = ["openai", "tiktoken"] qdrant = ["qdrant-client"] text-helpers = ["chardet"] @@ -10937,4 +10608,4 @@ text-helpers = ["chardet"] [metadata] lock-version = "2.0" python-versions = ">=3.8.1,<4.0" -content-hash = "2d99b9aed3cafd34e50e2e706dc3eb74822202cff3cdc6270250ffddc18189a3" +content-hash = "11ce1c967a78f79a922b9bbbc1c00541703185e28c63b7a0a02aa5c562c36ee3" diff --git a/libs/langchain/pyproject.toml b/libs/langchain/pyproject.toml index 5dd82bca677..562f60f4b8c 100644 --- a/libs/langchain/pyproject.toml +++ b/libs/langchain/pyproject.toml @@ -129,6 +129,7 @@ markdownify = {version = "^0.11.6", optional = true} assemblyai = {version = "^0.17.0", optional = true} dashvector = {version = "^1.0.1", optional = true} sqlite-vss = {version = "^0.1.2", optional = true} +timescale-vector = {version = "^0.0.1", optional = true} [tool.poetry.group.test.dependencies] @@ -345,6 +346,7 @@ extended_testing = [ "markdownify", "dashvector", "sqlite-vss", + "timescale-vector", ] [tool.ruff] diff --git a/libs/langchain/tests/integration_tests/vectorstores/test_timescalevector.py b/libs/langchain/tests/integration_tests/vectorstores/test_timescalevector.py new file mode 100644 index 00000000000..639adc1375a --- /dev/null +++ b/libs/langchain/tests/integration_tests/vectorstores/test_timescalevector.py @@ -0,0 +1,433 @@ +"""Test TimescaleVector functionality.""" +import os +from datetime import datetime, timedelta +from typing import List + +import pytest + +from langchain.docstore.document import Document +from langchain.vectorstores.timescalevector import TimescaleVector +from tests.integration_tests.vectorstores.fake_embeddings import FakeEmbeddings + +SERVICE_URL = TimescaleVector.service_url_from_db_params( + host=os.environ.get("TEST_TIMESCALE_HOST", "localhost"), + port=int(os.environ.get("TEST_TIMESCALE_PORT", "5432")), + database=os.environ.get("TEST_TIMESCALE_DATABASE", "postgres"), + user=os.environ.get("TEST_TIMESCALE_USER", "postgres"), + password=os.environ.get("TEST_TIMESCALE_PASSWORD", "postgres"), +) + + +ADA_TOKEN_COUNT = 1536 + + +class FakeEmbeddingsWithAdaDimension(FakeEmbeddings): + """Fake embeddings functionality for testing.""" + + def embed_documents(self, texts: List[str]) -> List[List[float]]: + """Return simple embeddings.""" + return [ + [float(1.0)] * (ADA_TOKEN_COUNT - 1) + [float(i)] for i in range(len(texts)) + ] + + def embed_query(self, text: str) -> List[float]: + """Return simple embeddings.""" + return [float(1.0)] * (ADA_TOKEN_COUNT - 1) + [float(0.0)] + + +def test_timescalevector() -> None: + """Test end to end construction and search.""" + texts = ["foo", "bar", "baz"] + docsearch = TimescaleVector.from_texts( + texts=texts, + collection_name="test_collection", + embedding=FakeEmbeddingsWithAdaDimension(), + service_url=SERVICE_URL, + pre_delete_collection=True, + ) + output = docsearch.similarity_search("foo", k=1) + assert output == [Document(page_content="foo")] + + +def test_timescalevector_from_documents() -> None: + """Test end to end construction and search.""" + texts = ["foo", "bar", "baz"] + docs = [Document(page_content=t, metadata={"a": "b"}) for t in texts] + docsearch = TimescaleVector.from_documents( + documents=docs, + collection_name="test_collection", + embedding=FakeEmbeddingsWithAdaDimension(), + service_url=SERVICE_URL, + pre_delete_collection=True, + ) + output = docsearch.similarity_search("foo", k=1) + assert output == [Document(page_content="foo", metadata={"a": "b"})] + + +@pytest.mark.asyncio +async def test_timescalevector_afrom_documents() -> None: + """Test end to end construction and search.""" + texts = ["foo", "bar", "baz"] + docs = [Document(page_content=t, metadata={"a": "b"}) for t in texts] + docsearch = await TimescaleVector.afrom_documents( + documents=docs, + collection_name="test_collection", + embedding=FakeEmbeddingsWithAdaDimension(), + service_url=SERVICE_URL, + pre_delete_collection=True, + ) + output = await docsearch.asimilarity_search("foo", k=1) + assert output == [Document(page_content="foo", metadata={"a": "b"})] + + +def test_timescalevector_embeddings() -> None: + """Test end to end construction with embeddings and search.""" + texts = ["foo", "bar", "baz"] + text_embeddings = FakeEmbeddingsWithAdaDimension().embed_documents(texts) + text_embedding_pairs = list(zip(texts, text_embeddings)) + docsearch = TimescaleVector.from_embeddings( + text_embeddings=text_embedding_pairs, + collection_name="test_collection", + embedding=FakeEmbeddingsWithAdaDimension(), + service_url=SERVICE_URL, + pre_delete_collection=True, + ) + output = docsearch.similarity_search("foo", k=1) + assert output == [Document(page_content="foo")] + + +@pytest.mark.asyncio +async def test_timescalevector_aembeddings() -> None: + """Test end to end construction with embeddings and search.""" + texts = ["foo", "bar", "baz"] + text_embeddings = FakeEmbeddingsWithAdaDimension().embed_documents(texts) + text_embedding_pairs = list(zip(texts, text_embeddings)) + docsearch = await TimescaleVector.afrom_embeddings( + text_embeddings=text_embedding_pairs, + collection_name="test_collection", + embedding=FakeEmbeddingsWithAdaDimension(), + service_url=SERVICE_URL, + pre_delete_collection=True, + ) + output = await docsearch.asimilarity_search("foo", k=1) + assert output == [Document(page_content="foo")] + + +def test_timescalevector_with_metadatas() -> None: + """Test end to end construction and search.""" + texts = ["foo", "bar", "baz"] + metadatas = [{"page": str(i)} for i in range(len(texts))] + docsearch = TimescaleVector.from_texts( + texts=texts, + collection_name="test_collection", + embedding=FakeEmbeddingsWithAdaDimension(), + metadatas=metadatas, + service_url=SERVICE_URL, + pre_delete_collection=True, + ) + output = docsearch.similarity_search("foo", k=1) + assert output == [Document(page_content="foo", metadata={"page": "0"})] + + +def test_timescalevector_with_metadatas_with_scores() -> None: + """Test end to end construction and search.""" + texts = ["foo", "bar", "baz"] + metadatas = [{"page": str(i)} for i in range(len(texts))] + docsearch = TimescaleVector.from_texts( + texts=texts, + collection_name="test_collection", + embedding=FakeEmbeddingsWithAdaDimension(), + metadatas=metadatas, + service_url=SERVICE_URL, + pre_delete_collection=True, + ) + output = docsearch.similarity_search_with_score("foo", k=1) + assert output == [(Document(page_content="foo", metadata={"page": "0"}), 0.0)] + + +@pytest.mark.asyncio +async def test_timescalevector_awith_metadatas_with_scores() -> None: + """Test end to end construction and search.""" + texts = ["foo", "bar", "baz"] + metadatas = [{"page": str(i)} for i in range(len(texts))] + docsearch = await TimescaleVector.afrom_texts( + texts=texts, + collection_name="test_collection", + embedding=FakeEmbeddingsWithAdaDimension(), + metadatas=metadatas, + service_url=SERVICE_URL, + pre_delete_collection=True, + ) + output = await docsearch.asimilarity_search_with_score("foo", k=1) + assert output == [(Document(page_content="foo", metadata={"page": "0"}), 0.0)] + + +def test_timescalevector_with_filter_match() -> None: + """Test end to end construction and search.""" + texts = ["foo", "bar", "baz"] + metadatas = [{"page": str(i)} for i in range(len(texts))] + docsearch = TimescaleVector.from_texts( + texts=texts, + collection_name="test_collection_filter", + embedding=FakeEmbeddingsWithAdaDimension(), + metadatas=metadatas, + service_url=SERVICE_URL, + pre_delete_collection=True, + ) + output = docsearch.similarity_search_with_score("foo", k=1, filter={"page": "0"}) + assert output == [(Document(page_content="foo", metadata={"page": "0"}), 0.0)] + + +def test_timescalevector_with_filter_distant_match() -> None: + """Test end to end construction and search.""" + texts = ["foo", "bar", "baz"] + metadatas = [{"page": str(i)} for i in range(len(texts))] + docsearch = TimescaleVector.from_texts( + texts=texts, + collection_name="test_collection_filter", + embedding=FakeEmbeddingsWithAdaDimension(), + metadatas=metadatas, + service_url=SERVICE_URL, + pre_delete_collection=True, + ) + output = docsearch.similarity_search_with_score("foo", k=1, filter={"page": "2"}) + assert output == [ + (Document(page_content="baz", metadata={"page": "2"}), 0.0013003906671379406) + ] + + +def test_timescalevector_with_filter_no_match() -> None: + """Test end to end construction and search.""" + texts = ["foo", "bar", "baz"] + metadatas = [{"page": str(i)} for i in range(len(texts))] + docsearch = TimescaleVector.from_texts( + texts=texts, + collection_name="test_collection_filter", + embedding=FakeEmbeddingsWithAdaDimension(), + metadatas=metadatas, + service_url=SERVICE_URL, + pre_delete_collection=True, + ) + output = docsearch.similarity_search_with_score("foo", k=1, filter={"page": "5"}) + assert output == [] + + +def test_timescalevector_with_filter_in_set() -> None: + """Test end to end construction and search.""" + texts = ["foo", "bar", "baz"] + metadatas = [{"page": str(i)} for i in range(len(texts))] + docsearch = TimescaleVector.from_texts( + texts=texts, + collection_name="test_collection_filter", + embedding=FakeEmbeddingsWithAdaDimension(), + metadatas=metadatas, + service_url=SERVICE_URL, + pre_delete_collection=True, + ) + output = docsearch.similarity_search_with_score( + "foo", k=2, filter=[{"page": "0"}, {"page": "2"}] + ) + assert output == [ + (Document(page_content="foo", metadata={"page": "0"}), 0.0), + (Document(page_content="baz", metadata={"page": "2"}), 0.0013003906671379406), + ] + + +def test_timescalevector_relevance_score() -> None: + """Test to make sure the relevance score is scaled to 0-1.""" + texts = ["foo", "bar", "baz"] + metadatas = [{"page": str(i)} for i in range(len(texts))] + docsearch = TimescaleVector.from_texts( + texts=texts, + collection_name="test_collection", + embedding=FakeEmbeddingsWithAdaDimension(), + metadatas=metadatas, + service_url=SERVICE_URL, + pre_delete_collection=True, + ) + + output = docsearch.similarity_search_with_relevance_scores("foo", k=3) + assert output == [ + (Document(page_content="foo", metadata={"page": "0"}), 1.0), + (Document(page_content="bar", metadata={"page": "1"}), 0.9996744261675065), + (Document(page_content="baz", metadata={"page": "2"}), 0.9986996093328621), + ] + + +@pytest.mark.asyncio +async def test_timescalevector_relevance_score_async() -> None: + """Test to make sure the relevance score is scaled to 0-1.""" + texts = ["foo", "bar", "baz"] + metadatas = [{"page": str(i)} for i in range(len(texts))] + docsearch = await TimescaleVector.afrom_texts( + texts=texts, + collection_name="test_collection", + embedding=FakeEmbeddingsWithAdaDimension(), + metadatas=metadatas, + service_url=SERVICE_URL, + pre_delete_collection=True, + ) + + output = await docsearch.asimilarity_search_with_relevance_scores("foo", k=3) + assert output == [ + (Document(page_content="foo", metadata={"page": "0"}), 1.0), + (Document(page_content="bar", metadata={"page": "1"}), 0.9996744261675065), + (Document(page_content="baz", metadata={"page": "2"}), 0.9986996093328621), + ] + + +def test_timescalevector_retriever_search_threshold() -> None: + """Test using retriever for searching with threshold.""" + texts = ["foo", "bar", "baz"] + metadatas = [{"page": str(i)} for i in range(len(texts))] + docsearch = TimescaleVector.from_texts( + texts=texts, + collection_name="test_collection", + embedding=FakeEmbeddingsWithAdaDimension(), + metadatas=metadatas, + service_url=SERVICE_URL, + pre_delete_collection=True, + ) + + retriever = docsearch.as_retriever( + search_type="similarity_score_threshold", + search_kwargs={"k": 3, "score_threshold": 0.999}, + ) + output = retriever.get_relevant_documents("summer") + assert output == [ + Document(page_content="foo", metadata={"page": "0"}), + Document(page_content="bar", metadata={"page": "1"}), + ] + + +def test_timescalevector_retriever_search_threshold_custom_normalization_fn() -> None: + """Test searching with threshold and custom normalization function""" + texts = ["foo", "bar", "baz"] + metadatas = [{"page": str(i)} for i in range(len(texts))] + docsearch = TimescaleVector.from_texts( + texts=texts, + collection_name="test_collection", + embedding=FakeEmbeddingsWithAdaDimension(), + metadatas=metadatas, + service_url=SERVICE_URL, + pre_delete_collection=True, + relevance_score_fn=lambda d: d * 0, + ) + + retriever = docsearch.as_retriever( + search_type="similarity_score_threshold", + search_kwargs={"k": 3, "score_threshold": 0.5}, + ) + output = retriever.get_relevant_documents("foo") + assert output == [] + + +def test_timescalevector_delete() -> None: + """Test deleting functionality.""" + texts = ["bar", "baz"] + docs = [Document(page_content=t, metadata={"a": "b"}) for t in texts] + docsearch = TimescaleVector.from_documents( + documents=docs, + collection_name="test_collection", + embedding=FakeEmbeddingsWithAdaDimension(), + service_url=SERVICE_URL, + pre_delete_collection=True, + ) + texts = ["foo"] + meta = [{"b": "c"}] + ids = docsearch.add_texts(texts, meta) + + output = docsearch.similarity_search("bar", k=10) + assert len(output) == 3 + docsearch.delete(ids) + + output = docsearch.similarity_search("bar", k=10) + assert len(output) == 2 + + docsearch.delete_by_metadata({"a": "b"}) + output = docsearch.similarity_search("bar", k=10) + assert len(output) == 0 + + +def test_timescalevector_with_index() -> None: + """Test deleting functionality.""" + texts = ["bar", "baz"] + docs = [Document(page_content=t, metadata={"a": "b"}) for t in texts] + docsearch = TimescaleVector.from_documents( + documents=docs, + collection_name="test_collection", + embedding=FakeEmbeddingsWithAdaDimension(), + service_url=SERVICE_URL, + pre_delete_collection=True, + ) + texts = ["foo"] + meta = [{"b": "c"}] + docsearch.add_texts(texts, meta) + + docsearch.create_index() + + output = docsearch.similarity_search("bar", k=10) + assert len(output) == 3 + + docsearch.drop_index() + docsearch.create_index( + index_type=TimescaleVector.IndexType.TIMESCALE_VECTOR, + max_alpha=1.0, + num_neighbors=50, + ) + + docsearch.drop_index() + docsearch.create_index("tsv", max_alpha=1.0, num_neighbors=50) + + docsearch.drop_index() + docsearch.create_index("ivfflat", num_lists=20, num_records=1000) + + docsearch.drop_index() + docsearch.create_index("hnsw", m=16, ef_construction=64) + + +def test_timescalevector_time_partitioning() -> None: + """Test deleting functionality.""" + from timescale_vector import client + + texts = ["bar", "baz"] + docs = [Document(page_content=t, metadata={"a": "b"}) for t in texts] + docsearch = TimescaleVector.from_documents( + documents=docs, + collection_name="test_collection_time_partitioning", + embedding=FakeEmbeddingsWithAdaDimension(), + service_url=SERVICE_URL, + pre_delete_collection=True, + time_partition_interval=timedelta(hours=1), + ) + texts = ["foo"] + meta = [{"b": "c"}] + + ids = [client.uuid_from_time(datetime.now() - timedelta(hours=3))] + docsearch.add_texts(texts, meta, ids) + + output = docsearch.similarity_search("bar", k=10) + assert len(output) == 3 + + output = docsearch.similarity_search( + "bar", k=10, start_date=datetime.now() - timedelta(hours=1) + ) + assert len(output) == 2 + + output = docsearch.similarity_search( + "bar", k=10, end_date=datetime.now() - timedelta(hours=1) + ) + assert len(output) == 1 + + output = docsearch.similarity_search( + "bar", k=10, start_date=datetime.now() - timedelta(minutes=200) + ) + assert len(output) == 3 + + output = docsearch.similarity_search( + "bar", + k=10, + start_date=datetime.now() - timedelta(minutes=200), + time_delta=timedelta(hours=1), + ) + assert len(output) == 1 diff --git a/libs/langchain/tests/unit_tests/retrievers/self_query/test_timescalevector.py b/libs/langchain/tests/unit_tests/retrievers/self_query/test_timescalevector.py new file mode 100644 index 00000000000..35dd328e06a --- /dev/null +++ b/libs/langchain/tests/unit_tests/retrievers/self_query/test_timescalevector.py @@ -0,0 +1,97 @@ +from typing import Dict, Tuple + +import pytest as pytest + +from langchain.chains.query_constructor.ir import ( + Comparator, + Comparison, + Operation, + Operator, + StructuredQuery, +) +from langchain.retrievers.self_query.timescalevector import TimescaleVectorTranslator + +DEFAULT_TRANSLATOR = TimescaleVectorTranslator() + + +@pytest.mark.requires("timescale_vector") +def test_visit_comparison() -> None: + from timescale_vector import client + + comp = Comparison(comparator=Comparator.LT, attribute="foo", value=1) + expected = client.Predicates(("foo", "<", 1)) + actual = DEFAULT_TRANSLATOR.visit_comparison(comp) + assert expected == actual + + +@pytest.mark.requires("timescale_vector") +def test_visit_operation() -> None: + from timescale_vector import client + + op = Operation( + operator=Operator.AND, + arguments=[ + Comparison(comparator=Comparator.LT, attribute="foo", value=2), + Comparison(comparator=Comparator.EQ, attribute="bar", value="baz"), + Comparison(comparator=Comparator.GT, attribute="abc", value=2.0), + ], + ) + expected = client.Predicates( + client.Predicates(("foo", "<", 2)), + client.Predicates(("bar", "==", "baz")), + client.Predicates(("abc", ">", 2.0)), + ) + + actual = DEFAULT_TRANSLATOR.visit_operation(op) + assert expected == actual + + +@pytest.mark.requires("timescale_vector") +def test_visit_structured_query() -> None: + from timescale_vector import client + + query = "What is the capital of France?" + structured_query = StructuredQuery( + query=query, + filter=None, + ) + expected: Tuple[str, Dict] = (query, {}) + actual = DEFAULT_TRANSLATOR.visit_structured_query(structured_query) + assert expected == actual + + comp = Comparison(comparator=Comparator.LT, attribute="foo", value=1) + expected = ( + query, + {"predicates": client.Predicates(("foo", "<", 1))}, + ) + structured_query = StructuredQuery( + query=query, + filter=comp, + ) + actual = DEFAULT_TRANSLATOR.visit_structured_query(structured_query) + assert expected == actual + + op = Operation( + operator=Operator.AND, + arguments=[ + Comparison(comparator=Comparator.LT, attribute="foo", value=2), + Comparison(comparator=Comparator.EQ, attribute="bar", value="baz"), + Comparison(comparator=Comparator.GT, attribute="abc", value=2.0), + ], + ) + structured_query = StructuredQuery( + query=query, + filter=op, + ) + expected = ( + query, + { + "predicates": client.Predicates( + client.Predicates(("foo", "<", 2)), + client.Predicates(("bar", "==", "baz")), + client.Predicates(("abc", ">", 2.0)), + ) + }, + ) + actual = DEFAULT_TRANSLATOR.visit_structured_query(structured_query) + assert expected == actual