diff --git a/docs/docs/integrations/providers/ydb.mdx b/docs/docs/integrations/providers/ydb.mdx new file mode 100644 index 00000000000..181ce935adb --- /dev/null +++ b/docs/docs/integrations/providers/ydb.mdx @@ -0,0 +1,23 @@ +# YDB + +All functionality related to YDB. + +> [YDB](https://ydb.tech/) is a versatile open source Distributed SQL Database that combines +> high availability and scalability with strong consistency and ACID transactions. +> It accommodates transactional (OLTP), analytical (OLAP), and streaming workloads simultaneously. + +## Installation and Setup + +```bash +pip install langchain-ydb +``` + +## Vector Store + +To import YDB vector store: + +```python +from langchain_ydb.vectorstores import YDB +``` + +For a more detailed walkthrough of the YDB vector store, see [this notebook](/docs/integrations/vectorstores/ydb). diff --git a/docs/docs/integrations/vectorstores/ydb.ipynb b/docs/docs/integrations/vectorstores/ydb.ipynb new file mode 100644 index 00000000000..8c9f18fb6fd --- /dev/null +++ b/docs/docs/integrations/vectorstores/ydb.ipynb @@ -0,0 +1,491 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "683953b3", + "metadata": {}, + "source": [ + "# YDB\n", + "\n", + "> [YDB](https://ydb.tech/) is a versatile open source Distributed SQL Database that combines high availability and scalability with strong consistency and ACID transactions. It accommodates transactional (OLTP), analytical (OLAP), and streaming workloads simultaneously.\n", + "\n", + "This notebook shows how to use functionality related to the `YDB` vector store.\n", + "\n", + "## Setup\n", + "\n", + "First, set up a local YDB with Docker:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "da75f17c", + "metadata": {}, + "outputs": [], + "source": [ + "! docker run -d -p 2136:2136 --name ydb-langchain -e YDB_USE_IN_MEMORY_PDISKS=true -h localhost ydbplatform/local-ydb:trunk" + ] + }, + { + "cell_type": "markdown", + "id": "0acb2a8d", + "metadata": {}, + "source": [ + "You'll need to install `langchain-ydb` to use this integration" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d454fb7c", + "metadata": {}, + "outputs": [], + "source": [ + "! pip install -qU langchain-ydb" + ] + }, + { + "cell_type": "markdown", + "id": "3df5501b", + "metadata": {}, + "source": [ + "### Credentials\n", + "\n", + "There are no credentials for this notebook, just make sure you have installed the packages as shown above." + ] + }, + { + "cell_type": "markdown", + "id": "54d5276f", + "metadata": {}, + "source": [ + "If you want to get best in-class automated tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f6fd5b03", + "metadata": {}, + "outputs": [], + "source": [ + "# os.environ[\"LANGSMITH_API_KEY\"] = getpass.getpass(\"Enter your LangSmith API key: \")\n", + "# os.environ[\"LANGSMITH_TRACING\"] = \"true\"" + ] + }, + { + "cell_type": "markdown", + "id": "2b87fe34", + "metadata": {}, + "source": [ + "## Initialization\n", + "\n", + "import EmbeddingTabs from \"@theme/EmbeddingTabs\";\n", + "\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "60276097", + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/Users/ovcharuk/opensource/langchain/.venv/lib/python3.13/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n", + " from .autonotebook import tqdm as notebook_tqdm\n" + ] + } + ], + "source": [ + "# | output: false\n", + "# | echo: false\n", + "from langchain_openai import OpenAIEmbeddings\n", + "\n", + "embeddings = OpenAIEmbeddings(model=\"text-embedding-3-large\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "aac9563e", + "metadata": { + "ExecuteTime": { + "end_time": "2023-06-03T08:33:31.554934Z", + "start_time": "2023-06-03T08:33:31.549590Z" + }, + "tags": [] + }, + "outputs": [], + "source": [ + "from langchain_ydb.vectorstores import YDB, YDBSearchStrategy, YDBSettings\n", + "\n", + "settings = YDBSettings(\n", + " table=\"ydb_example\",\n", + " strategy=YDBSearchStrategy.COSINE_SIMILARITY,\n", + ")\n", + "vector_store = YDB(embeddings, config=settings)" + ] + }, + { + "cell_type": "markdown", + "id": "32dd3f67", + "metadata": {}, + "source": [ + "## Manage vector store\n", + "\n", + "Once you have created your vector store, you can interact with it by adding and deleting different items.\n", + "\n", + "### Add items to vector store\n", + "\n", + "Prepare documents to work with:" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "944743ee", + "metadata": {}, + "outputs": [], + "source": [ + "from uuid import uuid4\n", + "\n", + "from langchain_core.documents import Document\n", + "\n", + "document_1 = Document(\n", + " page_content=\"I had chocalate chip pancakes and scrambled eggs for breakfast this morning.\",\n", + " metadata={\"source\": \"tweet\"},\n", + ")\n", + "\n", + "document_2 = Document(\n", + " page_content=\"The weather forecast for tomorrow is cloudy and overcast, with a high of 62 degrees.\",\n", + " metadata={\"source\": \"news\"},\n", + ")\n", + "\n", + "document_3 = Document(\n", + " page_content=\"Building an exciting new project with LangChain - come check it out!\",\n", + " metadata={\"source\": \"tweet\"},\n", + ")\n", + "\n", + "document_4 = Document(\n", + " page_content=\"Robbers broke into the city bank and stole $1 million in cash.\",\n", + " metadata={\"source\": \"news\"},\n", + ")\n", + "\n", + "document_5 = Document(\n", + " page_content=\"Wow! That was an amazing movie. I can't wait to see it again.\",\n", + " metadata={\"source\": \"tweet\"},\n", + ")\n", + "\n", + "document_6 = Document(\n", + " page_content=\"Is the new iPhone worth the price? Read this review to find out.\",\n", + " metadata={\"source\": \"website\"},\n", + ")\n", + "\n", + "document_7 = Document(\n", + " page_content=\"The top 10 soccer players in the world right now.\",\n", + " metadata={\"source\": \"website\"},\n", + ")\n", + "\n", + "document_8 = Document(\n", + " page_content=\"LangGraph is the best framework for building stateful, agentic applications!\",\n", + " metadata={\"source\": \"tweet\"},\n", + ")\n", + "\n", + "document_9 = Document(\n", + " page_content=\"The stock market is down 500 points today due to fears of a recession.\",\n", + " metadata={\"source\": \"news\"},\n", + ")\n", + "\n", + "document_10 = Document(\n", + " page_content=\"I have a bad feeling I am going to get deleted :(\",\n", + " metadata={\"source\": \"tweet\"},\n", + ")\n", + "\n", + "documents = [\n", + " document_1,\n", + " document_2,\n", + " document_3,\n", + " document_4,\n", + " document_5,\n", + " document_6,\n", + " document_7,\n", + " document_8,\n", + " document_9,\n", + " document_10,\n", + "]\n", + "uuids = [str(uuid4()) for _ in range(len(documents))]" + ] + }, + { + "cell_type": "markdown", + "id": "c616b7f4", + "metadata": {}, + "source": [ + "You can add items to your vector store by using the `add_documents` function." + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "3f632996", + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Inserting data...: 100%|██████████| 10/10 [00:00<00:00, 14.67it/s]\n" + ] + }, + { + "data": { + "text/plain": [ + "['947be6aa-d489-44c5-910e-62e4d58d2ffb',\n", + " '7a62904d-9db3-412b-83b6-f01b34dd7de3',\n", + " 'e5a49c64-c985-4ed7-ac58-5ffa31ade699',\n", + " '99cf4104-36ab-4bd5-b0da-e210d260e512',\n", + " '5810bcd0-b46e-443e-a663-e888c9e028d1',\n", + " '190c193d-844e-4dbb-9a4b-b8f5f16cfae6',\n", + " 'f8912944-f80a-4178-954e-4595bf59e341',\n", + " '34fc7b09-6000-42c9-95f7-7d49f430b904',\n", + " '0f6b6783-f300-4a4d-bb04-8025c4dfd409',\n", + " '46c37ba9-7cf2-4ac8-9bd1-d84e2cb1155c']" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "vector_store.add_documents(documents=documents, ids=uuids)" + ] + }, + { + "cell_type": "markdown", + "id": "18af81cc", + "metadata": {}, + "source": [ + "### Delete items from vector store\n", + "\n", + "You can delete items from your vector store by ID using the `delete` function." + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "12b32762", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "True" + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "vector_store.delete(ids=[uuids[-1]])" + ] + }, + { + "cell_type": "markdown", + "id": "ada27577", + "metadata": {}, + "source": [ + "## Query vector store\n", + "\n", + "Once your vector store has been created and relevant documents have been added, you will likely want to query it during the execution of your chain or agent.\n", + "\n", + "### Query directly\n", + "\n", + "#### Similarity search\n", + "\n", + "A simple similarity search can be performed as follows:" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "015831a3", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "* Building an exciting new project with LangChain - come check it out! [{'source': 'tweet'}]\n", + "* LangGraph is the best framework for building stateful, agentic applications! [{'source': 'tweet'}]\n" + ] + } + ], + "source": [ + "results = vector_store.similarity_search(\n", + " \"LangChain provides abstractions to make working with LLMs easy\", k=2\n", + ")\n", + "for res in results:\n", + " print(f\"* {res.page_content} [{res.metadata}]\")" + ] + }, + { + "cell_type": "markdown", + "id": "623d3b9d", + "metadata": {}, + "source": [ + "#### Similarity search with score\n", + "\n", + "You can also perform a search with a score:" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "e7d43430", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "* [SIM=0.595] The weather forecast for tomorrow is cloudy and overcast, with a high of 62 degrees. [{'source': 'news'}]\n", + "* [SIM=0.212] I had chocalate chip pancakes and scrambled eggs for breakfast this morning. [{'source': 'tweet'}]\n", + "* [SIM=0.118] Wow! That was an amazing movie. I can't wait to see it again. [{'source': 'tweet'}]\n" + ] + } + ], + "source": [ + "results = vector_store.similarity_search_with_score(\"Will it be hot tomorrow?\", k=3)\n", + "for res, score in results:\n", + " print(f\"* [SIM={score:.3f}] {res.page_content} [{res.metadata}]\")" + ] + }, + { + "cell_type": "markdown", + "id": "f5a90c12", + "metadata": {}, + "source": [ + "### Filtering\n", + "\n", + "You can search with filters as described below:" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "169d01d1", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "* I had chocalate chip pancakes and scrambled eggs for breakfast this morning. [{'source': 'tweet'}]\n", + "* Wow! That was an amazing movie. I can't wait to see it again. [{'source': 'tweet'}]\n", + "* Building an exciting new project with LangChain - come check it out! [{'source': 'tweet'}]\n", + "* LangGraph is the best framework for building stateful, agentic applications! [{'source': 'tweet'}]\n" + ] + } + ], + "source": [ + "results = vector_store.similarity_search_with_score(\n", + " \"What did I eat for breakfast?\",\n", + " k=4,\n", + " filter={\"source\": \"tweet\"},\n", + ")\n", + "for res, _ in results:\n", + " print(f\"* {res.page_content} [{res.metadata}]\")" + ] + }, + { + "cell_type": "markdown", + "id": "afacfd4e", + "metadata": {}, + "source": [ + "### Query by turning into retriever\n", + "\n", + "You can also transform the vector store into a retriever for easier usage in your chains.\n", + "\n", + "Here's how to transform your vector store into a retriever and then invoke the retriever with a simple query and filter." + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "97187188", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "* Robbers broke into the city bank and stole $1 million in cash. [{'source': 'news'}]\n", + "* The stock market is down 500 points today due to fears of a recession. [{'source': 'news'}]\n" + ] + } + ], + "source": [ + "retriever = vector_store.as_retriever(\n", + " search_kwargs={\"k\": 2},\n", + ")\n", + "results = retriever.invoke(\n", + " \"Stealing from the bank is a crime\", filter={\"source\": \"news\"}\n", + ")\n", + "for res in results:\n", + " print(f\"* {res.page_content} [{res.metadata}]\")" + ] + }, + { + "cell_type": "markdown", + "id": "57fade30", + "metadata": {}, + "source": [ + "## Usage for retrieval-augmented generation\n", + "\n", + "For guides on how to use this vector store for retrieval-augmented generation (RAG), see the following sections:\n", + "\n", + "- [Tutorials](/docs/tutorials/)\n", + "- [How-to: Question and answer with RAG](https://python.langchain.com/docs/how_to/#qa-with-rag)\n", + "- [Retrieval conceptual docs](https://python.langchain.com/docs/concepts/retrieval)" + ] + }, + { + "cell_type": "markdown", + "id": "02452d34", + "metadata": {}, + "source": [ + "## API reference\n", + "\n", + "For detailed documentation of all `YDB` features and configurations head to the API reference:https://python.langchain.com/api_reference/community/vectorstores/langchain_community.vectorstores.ydb.YDB.html" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": ".venv", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.13.2" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/libs/packages.yml b/libs/packages.yml index 87c671d04fb..6d2689e924a 100644 --- a/libs/packages.yml +++ b/libs/packages.yml @@ -603,3 +603,6 @@ packages: - name: langchain-cloudflare repo: cloudflare/langchain-cloudflare path: . +- name: langchain-ydb + path: . + repo: ydb-platform/langchain-ydb