diff --git a/docs/docs/integrations/retrievers/arcee.ipynb b/docs/docs/integrations/retrievers/arcee.ipynb index 1f637458fae..1013baf72ca 100644 --- a/docs/docs/integrations/retrievers/arcee.ipynb +++ b/docs/docs/integrations/retrievers/arcee.ipynb @@ -4,8 +4,11 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Arcee Retriever\n", - "This notebook demonstrates how to use the `ArceeRetriever` class to retrieve relevant document(s) for Arcee's Domain Adapted Language Models (DALMs)." + "# Arcee\n", + "\n", + ">[Arcee](https://www.arcee.ai/about/about-us) helps with the development of the SLMs—small, specialized, secure, and scalable language models.\n", + "\n", + "This notebook demonstrates how to use the `ArceeRetriever` class to retrieve relevant document(s) for Arcee's `Domain Adapted Language Models` (`DALMs`)." ] }, { diff --git a/docs/docs/integrations/retrievers/azure_ai_search.ipynb b/docs/docs/integrations/retrievers/azure_ai_search.ipynb index a88120d2a94..6151fc2227a 100644 --- a/docs/docs/integrations/retrievers/azure_ai_search.ipynb +++ b/docs/docs/integrations/retrievers/azure_ai_search.ipynb @@ -7,7 +7,7 @@ "source": [ "# Azure AI Search\n", "\n", - ">[Azure AI Search](https://learn.microsoft.com/en-us/azure/search/search-what-is-azure-search) (formerly known as `Azure Cognitive Search`or Azure Search) is a cloud search service that gives developers infrastructure, APIs, and tools for building a rich search experience over private, heterogeneous content in web, mobile, and enterprise applications.\n", + ">[Microsoft Azure AI Search](https://learn.microsoft.com/en-us/azure/search/search-what-is-azure-search) (formerly known as `Azure Cognitive Search`or Azure Search) is a cloud search service that gives developers infrastructure, APIs, and tools for building a rich search experience over private, heterogeneous content in web, mobile, and enterprise applications.\n", "\n", ">Search is foundational to any app that surfaces text to users, where common scenarios include catalog or document search, online retail apps, or data exploration over proprietary content. When you create a search service, you'll work with the following capabilities:\n", ">- A search engine for full text search over a search index containing user-owned content\n", @@ -283,7 +283,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.11.8" + "version": "3.10.12" } }, "nbformat": 4, diff --git a/docs/docs/integrations/retrievers/bm25.ipynb b/docs/docs/integrations/retrievers/bm25.ipynb index 241b3e56391..7f15bb5b9bd 100644 --- a/docs/docs/integrations/retrievers/bm25.ipynb +++ b/docs/docs/integrations/retrievers/bm25.ipynb @@ -7,10 +7,9 @@ "source": [ "# BM25\n", "\n", - "[BM25](https://en.wikipedia.org/wiki/Okapi_BM25) also known as the Okapi BM25, is a ranking function used in information retrieval systems to estimate the relevance of documents to a given search query.\n", - "\n", - "This notebook goes over how to use a retriever that under the hood uses BM25 using [`rank_bm25`](https://github.com/dorianbrown/rank_bm25) package.\n", - "\n" + ">[BM25 (Wikipedia)](https://en.wikipedia.org/wiki/Okapi_BM25) also known as the `Okapi BM25`, is a ranking function used in information retrieval systems to estimate the relevance of documents to a given search query.\n", + ">\n", + ">`BM25Retriever` retriever uses the [`rank_bm25`](https://github.com/dorianbrown/rank_bm25) package.\n" ] }, { diff --git a/docs/docs/integrations/retrievers/breebs.ipynb b/docs/docs/integrations/retrievers/breebs.ipynb index 5d7e26b7119..f9fa9d84b21 100644 --- a/docs/docs/integrations/retrievers/breebs.ipynb +++ b/docs/docs/integrations/retrievers/breebs.ipynb @@ -6,7 +6,7 @@ "source": [ "# BREEBS (Open Knowledge)\n", "\n", - "[BREEBS](https://www.breebs.com/) is an open collaborative knowledge platform. \n", + ">[BREEBS](https://www.breebs.com/) is an open collaborative knowledge platform. \n", "Anybody can create a Breeb, a knowledge capsule, based on PDFs stored on a Google Drive folder.\n", "A breeb can be used by any LLM/chatbot to improve its expertise, reduce hallucinations and give access to sources.\n", "Behind the scenes, Breebs implements several Retrieval Augmented Generation (RAG) models to seamlessly provide useful context at each iteration. \n", diff --git a/docs/docs/integrations/retrievers/chatgpt-plugin.ipynb b/docs/docs/integrations/retrievers/chatgpt-plugin.ipynb index 1e0388606b2..5b00552d80a 100644 --- a/docs/docs/integrations/retrievers/chatgpt-plugin.ipynb +++ b/docs/docs/integrations/retrievers/chatgpt-plugin.ipynb @@ -5,11 +5,11 @@ "id": "1edb9e6b", "metadata": {}, "source": [ - "# ChatGPT Plugin\n", + "# ChatGPT plugin\n", "\n", - ">[OpenAI plugins](https://platform.openai.com/docs/plugins/introduction) connect ChatGPT to third-party applications. These plugins enable ChatGPT to interact with APIs defined by developers, enhancing ChatGPT's capabilities and allowing it to perform a wide range of actions.\n", + ">[OpenAI plugins](https://platform.openai.com/docs/plugins/introduction) connect `ChatGPT` to third-party applications. These plugins enable `ChatGPT` to interact with APIs defined by developers, enhancing `ChatGPT's` capabilities and allowing it to perform a wide range of actions.\n", "\n", - ">Plugins can allow ChatGPT to do things like:\n", + ">Plugins allow `ChatGPT` to do things like:\n", ">- Retrieve real-time information; e.g., sports scores, stock prices, the latest news, etc.\n", ">- Retrieve knowledge-base information; e.g., company docs, personal notes, etc.\n", ">- Perform actions on behalf of the user; e.g., booking a flight, ordering food, etc.\n", diff --git a/docs/docs/integrations/retrievers/cohere-reranker.ipynb b/docs/docs/integrations/retrievers/cohere-reranker.ipynb index 5602e66d9f5..2378ccec456 100644 --- a/docs/docs/integrations/retrievers/cohere-reranker.ipynb +++ b/docs/docs/integrations/retrievers/cohere-reranker.ipynb @@ -5,7 +5,7 @@ "id": "fc0db1bc", "metadata": {}, "source": [ - "# Cohere Reranker\n", + "# Cohere reranker\n", "\n", ">[Cohere](https://cohere.ai/about) is a Canadian startup that provides natural language processing models that help companies improve human-machine interactions.\n", "\n", diff --git a/docs/docs/integrations/retrievers/cohere.ipynb b/docs/docs/integrations/retrievers/cohere.ipynb index 867ac192daf..55640d8f6c0 100644 --- a/docs/docs/integrations/retrievers/cohere.ipynb +++ b/docs/docs/integrations/retrievers/cohere.ipynb @@ -5,9 +5,11 @@ "id": "bf733a38-db84-4363-89e2-de6735c37230", "metadata": {}, "source": [ - "# Cohere RAG retriever\n", + "# Cohere RAG\n", "\n", - "This notebook covers how to get started with Cohere RAG retriever. This allows you to leverage the ability to search documents over various connectors or by supplying your own." + ">[Cohere](https://cohere.ai/about) is a Canadian startup that provides natural language processing models that help companies improve human-machine interactions.\n", + "\n", + "This notebook covers how to get started with the `Cohere RAG` retriever. This allows you to leverage the ability to search documents over various connectors or by supplying your own." ] }, { @@ -231,7 +233,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.11.7" + "version": "3.10.12" } }, "nbformat": 4, diff --git a/docs/docs/integrations/retrievers/dria_index.ipynb b/docs/docs/integrations/retrievers/dria_index.ipynb index ced1cb822c9..5f6329ec1bd 100644 --- a/docs/docs/integrations/retrievers/dria_index.ipynb +++ b/docs/docs/integrations/retrievers/dria_index.ipynb @@ -8,7 +8,7 @@ "source": [ "# Dria\n", "\n", - "Dria is a hub of public RAG models for developers to both contribute and utilize a shared embedding lake. This notebook demonstrates how to use the Dria API for data retrieval tasks." + ">[Dria](https://dria.co/) is a hub of public RAG models for developers to both contribute and utilize a shared embedding lake. This notebook demonstrates how to use the `Dria API` for data retrieval tasks." ] }, { @@ -169,7 +169,7 @@ "provenance": [] }, "kernelspec": { - "display_name": "Python 3", + "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, @@ -183,9 +183,9 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.x" + "version": "3.10.12" } }, "nbformat": 4, - "nbformat_minor": 0 -} \ No newline at end of file + "nbformat_minor": 4 +} diff --git a/docs/docs/integrations/retrievers/elasticsearch_retriever.ipynb b/docs/docs/integrations/retrievers/elasticsearch_retriever.ipynb index 8c51c8326ce..0b72a998296 100644 --- a/docs/docs/integrations/retrievers/elasticsearch_retriever.ipynb +++ b/docs/docs/integrations/retrievers/elasticsearch_retriever.ipynb @@ -5,11 +5,11 @@ "id": "ab66dd43", "metadata": {}, "source": [ - "# ElasticsearchRetriever\n", + "# Elasticsearch\n", "\n", - "[Elasticsearch](https://www.elastic.co/elasticsearch/) is a distributed, RESTful search and analytics engine. It provides a distributed, multitenant-capable full-text search engine with an HTTP web interface and schema-free JSON documents. It support keyword search, vector search, hybrid search and complex filtering.\n", + ">[Elasticsearch](https://www.elastic.co/elasticsearch/) is a distributed, RESTful search and analytics engine. It provides a distributed, multitenant-capable full-text search engine with an HTTP web interface and schema-free JSON documents. It supports keyword search, vector search, hybrid search and complex filtering.\n", "\n", - "The `ElasticsearchRetriever` is a generic wrapper to enable flexible access to all Elasticsearch features through the [Query DSL](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl.html). For most use cases the other classes (`ElasticsearchStore`, `ElasticsearchEmbeddings`, etc.) should suffice, but if they don't you can use `ElasticsearchRetriever`." + "The `ElasticsearchRetriever` is a generic wrapper to enable flexible access to all `Elasticsearch` features through the [Query DSL](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl.html). For most use cases the other classes (`ElasticsearchStore`, `ElasticsearchEmbeddings`, etc.) should suffice, but if they don't you can use `ElasticsearchRetriever`." ] }, { @@ -561,7 +561,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.11.7" + "version": "3.10.12" } }, "nbformat": 4, diff --git a/docs/docs/integrations/retrievers/embedchain.ipynb b/docs/docs/integrations/retrievers/embedchain.ipynb index 6a1295f3362..97dc8a99b7d 100644 --- a/docs/docs/integrations/retrievers/embedchain.ipynb +++ b/docs/docs/integrations/retrievers/embedchain.ipynb @@ -7,11 +7,11 @@ "source": [ "# Embedchain\n", "\n", - "Embedchain is a RAG framework to create data pipelines. It loads, indexes, retrieves and syncs all the data.\n", + ">[Embedchain](https://github.com/embedchain/embedchain) is a RAG framework to create data pipelines. It loads, indexes, retrieves and syncs all the data.\n", + ">\n", + ">It is available as an [open source package](https://github.com/embedchain/embedchain) and as a [hosted platform solution](https://app.embedchain.ai/).\n", "\n", - "It is available as an [open source package](https://github.com/embedchain/embedchain) and as a [hosted platform solution](https://app.embedchain.ai/).\n", - "\n", - "This notebook shows how to use a retriever that uses Embedchain." + "This notebook shows how to use a retriever that uses `Embedchain`." ] }, { diff --git a/docs/docs/integrations/retrievers/flashrank-reranker.ipynb b/docs/docs/integrations/retrievers/flashrank-reranker.ipynb index bdd4ed6d762..f63605526d9 100644 --- a/docs/docs/integrations/retrievers/flashrank-reranker.ipynb +++ b/docs/docs/integrations/retrievers/flashrank-reranker.ipynb @@ -9,7 +9,9 @@ } }, "source": [ - "# Flashrank Reranker\n", + "# FlashRank reranker\n", + "\n", + ">[FlashRank](https://github.com/PrithivirajDamodaran/FlashRank) is the Ultra-lite & Super-fast Python library to add re-ranking to your existing search & retrieval pipelines. It is based on SoTA cross-encoders, with gratitude to all the model owners.\n", "\n", "This notebook shows how to use [flashrank](https://github.com/PrithivirajDamodaran/FlashRank) for document compression and retrieval." ] @@ -512,7 +514,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.12.2" + "version": "3.10.12" } }, "nbformat": 4, diff --git a/docs/docs/integrations/retrievers/fleet_context.ipynb b/docs/docs/integrations/retrievers/fleet_context.ipynb index b480f09c59b..af85caa0cb0 100644 --- a/docs/docs/integrations/retrievers/fleet_context.ipynb +++ b/docs/docs/integrations/retrievers/fleet_context.ipynb @@ -5,11 +5,13 @@ "id": "a33a03c9-f11d-45ef-a563-9da0652fcf92", "metadata": {}, "source": [ - "# Fleet AI Libraries Context\n", + "# Fleet AI Context\n", "\n", - "The Fleet AI team is on a mission to embed the world's most important data. They've started by embedding the top 1200 Python libraries to enable code generation with up-to-date knowledge. They've been kind enough to share their embeddings of the [LangChain docs](/docs/get_started/introduction) and [API reference](https://api.python.langchain.com/en/latest/api_reference.html).\n", + ">[Fleet AI Context](https://www.fleet.so/context) is a dataset of high-quality embeddings of the top 1200 most popular & permissive Python Libraries & their documentation.\n", + ">\n", + ">The `Fleet AI` team is on a mission to embed the world's most important data. They've started by embedding the top 1200 Python libraries to enable code generation with up-to-date knowledge. They've been kind enough to share their embeddings of the [LangChain docs](/docs/get_started/introduction) and [API reference](https://api.python.langchain.com/en/latest/api_reference.html).\n", "\n", - "Let's take a look at how we can use these embeddings to power a docs retrieval system and ultimately a simple code generating chain!" + "Let's take a look at how we can use these embeddings to power a docs retrieval system and ultimately a simple code-generating chain!" ] }, { diff --git a/docs/docs/integrations/retrievers/google_vertex_ai_search.ipynb b/docs/docs/integrations/retrievers/google_vertex_ai_search.ipynb index 8c1b8b748c9..4da87c1ce7f 100644 --- a/docs/docs/integrations/retrievers/google_vertex_ai_search.ipynb +++ b/docs/docs/integrations/retrievers/google_vertex_ai_search.ipynb @@ -6,13 +6,13 @@ "source": [ "# Google Vertex AI Search\n", "\n", - "[Vertex AI Search](https://cloud.google.com/enterprise-search) (formerly known as Enterprise Search on Generative AI App Builder) is a part of the [Vertex AI](https://cloud.google.com/vertex-ai) machine learning platform offered by Google Cloud.\n", + ">[Google Vertex AI Search](https://cloud.google.com/enterprise-search) (formerly known as `Enterprise Search` on `Generative AI App Builder`) is a part of the [Vertex AI](https://cloud.google.com/vertex-ai) machine learning platform offered by `Google Cloud`.\n", + ">\n", + ">`Vertex AI Search` lets organizations quickly build generative AI-powered search engines for customers and employees. It's underpinned by a variety of `Google Search` technologies, including semantic search, which helps deliver more relevant results than traditional keyword-based search techniques by using natural language processing and machine learning techniques to infer relationships within the content and intent from the user’s query input. Vertex AI Search also benefits from Google’s expertise in understanding how users search and factors in content relevance to order displayed results.\n", "\n", - "Vertex AI Search lets organizations quickly build generative AI powered search engines for customers and employees. It's underpinned by a variety of Google Search technologies, including semantic search, which helps deliver more relevant results than traditional keyword-based search techniques by using natural language processing and machine learning techniques to infer relationships within the content and intent from the user’s query input. Vertex AI Search also benefits from Google’s expertise in understanding how users search and factors in content relevance to order displayed results.\n", + ">`Vertex AI Search` is available in the `Google Cloud Console` and via an API for enterprise workflow integration.\n", "\n", - "Vertex AI Search is available in the Google Cloud Console and via an API for enterprise workflow integration.\n", - "\n", - "This notebook demonstrates how to configure Vertex AI Search and use the Vertex AI Search retriever. The Vertex AI Search retriever encapsulates the [Python client library](https://cloud.google.com/generative-ai-app-builder/docs/libraries#client-libraries-install-python) and uses it to access the [Search Service API](https://cloud.google.com/python/docs/reference/discoveryengine/latest/google.cloud.discoveryengine_v1beta.services.search_service).\n" + "This notebook demonstrates how to configure `Vertex AI Search` and use the Vertex AI Search retriever. The Vertex AI Search retriever encapsulates the [Python client library](https://cloud.google.com/generative-ai-app-builder/docs/libraries#client-libraries-install-python) and uses it to access the [Search Service API](https://cloud.google.com/python/docs/reference/discoveryengine/latest/google.cloud.discoveryengine_v1beta.services.search_service).\n" ] }, { @@ -351,7 +351,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.11.0" + "version": "3.10.12" } }, "nbformat": 4, diff --git a/docs/docs/integrations/retrievers/jaguar.ipynb b/docs/docs/integrations/retrievers/jaguar.ipynb index 3d3287a69ee..e1b56d7732f 100644 --- a/docs/docs/integrations/retrievers/jaguar.ipynb +++ b/docs/docs/integrations/retrievers/jaguar.ipynb @@ -5,16 +5,18 @@ "id": "671e9ec1-fa00-4c92-a2fb-ceb142168ea9", "metadata": {}, "source": [ - "# Jaguar Vector Database\n", + "# JaguarDB Vector Database\n", "\n", - "1. It is a distributed vector database\n", - "2. The “ZeroMove” feature of JaguarDB enables instant horizontal scalability\n", - "3. Multimodal: embeddings, text, images, videos, PDFs, audio, time series, and geospatial\n", - "4. All-masters: allows both parallel reads and writes\n", - "5. Anomaly detection capabilities\n", - "6. RAG support: combines LLM with proprietary and real-time data\n", - "7. Shared metadata: sharing of metadata across multiple vector indexes\n", - "8. Distance metrics: Euclidean, Cosine, InnerProduct, Manhatten, Chebyshev, Hamming, Jeccard, Minkowski" + ">[JaguarDB Vector Database](http://www.jaguardb.com/windex.html\n", + ">\n", + ">1. It is a distributed vector database\n", + ">2. The “ZeroMove” feature of JaguarDB enables instant horizontal scalability\n", + ">3. Multimodal: embeddings, text, images, videos, PDFs, audio, time series, and geospatial\n", + ">4. All-masters: allows both parallel reads and writes\n", + ">5. Anomaly detection capabilities\n", + ">6. RAG support: combines LLM with proprietary and real-time data\n", + ">7. Shared metadata: sharing of metadata across multiple vector indexes\n", + ">8. Distance metrics: Euclidean, Cosine, InnerProduct, Manhatten, Chebyshev, Hamming, Jeccard, Minkowski" ] }, { diff --git a/docs/docs/integrations/retrievers/kay.ipynb b/docs/docs/integrations/retrievers/kay.ipynb index 6af77877204..66d8ed7b730 100644 --- a/docs/docs/integrations/retrievers/kay.ipynb +++ b/docs/docs/integrations/retrievers/kay.ipynb @@ -7,10 +7,9 @@ "source": [ "# Kay.ai\n", "\n", + ">[Kai Data API](https://www.kay.ai/) built for RAG 🕵️ We are curating the world's largest datasets as high-quality embeddings so your AI agents can retrieve context on the fly. Latest models, fast retrieval, and zero infra.\n", "\n", - "> Data API built for RAG 🕵️ We are curating the world's largest datasets as high-quality embeddings so your AI agents can retrieve context on the fly. Latest models, fast retrieval, and zero infra.\n", - "\n", - "This notebook shows you how to retrieve datasets supported by [Kay](https://kay.ai/). You can currently search SEC Filings and Press Releases of US companies. Visit [kay.ai](https://kay.ai) for the latest data drops. For any questions, join our [discord](https://discord.gg/hAnE4e5T6M) or [tweet at us](https://twitter.com/vishalrohra_)" + "This notebook shows you how to retrieve datasets supported by [Kay](https://kay.ai/). You can currently search `SEC Filings` and `Press Releases of US companies`. Visit [kay.ai](https://kay.ai) for the latest data drops. For any questions, join our [discord](https://discord.gg/hAnE4e5T6M) or [tweet at us](https://twitter.com/vishalrohra_)" ] }, { @@ -18,10 +17,27 @@ "id": "fc507b8e-ea51-417c-93da-42bf998a1195", "metadata": {}, "source": [ - "Installation\n", - "=\n", + "## Installation\n", "\n", - "First you will need to install the [`kay` package](https://pypi.org/project/kay/). You will also need an API key: you can get one for free at [https://kay.ai](https://kay.ai/). Once you have an API key, you must set it as an environment variable `KAY_API_KEY`.\n", + "First, install the [`kay` package](https://pypi.org/project/kay/). " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ae22ad3e-4643-4314-8dea-a5abff0d87b0", + "metadata": {}, + "outputs": [], + "source": [ + "!pip install kay" + ] + }, + { + "cell_type": "markdown", + "id": "efd317f7-9b7d-4e71-875c-5f0b6efeca05", + "metadata": {}, + "source": [ + "You will also need an API key: you can get one for free at [https://kay.ai](https://kay.ai/). Once you have an API key, you must set it as an environment variable `KAY_API_KEY`.\n", "\n", "`KayAiRetriever` has a static `.create()` factory method that takes the following arguments:\n", "\n", @@ -35,11 +51,9 @@ "id": "c923bea0-585a-4f62-8662-efc167e8d793", "metadata": {}, "source": [ - "Examples\n", - "=\n", + "## Examples\n", "\n", - "Basic Retriever Usage\n", - "-" + "### Basic Retriever Usage" ] }, { @@ -111,8 +125,7 @@ "id": "21f6e9e5-478c-4b2c-9d61-f7a84f4d2f8f", "metadata": {}, "source": [ - "Usage in a chain\n", - "-" + "### Usage in a chain" ] }, { diff --git a/docs/docs/integrations/retrievers/knn.ipynb b/docs/docs/integrations/retrievers/knn.ipynb index 0324a1823f8..9eb641ffe82 100644 --- a/docs/docs/integrations/retrievers/knn.ipynb +++ b/docs/docs/integrations/retrievers/knn.ipynb @@ -7,11 +7,11 @@ "source": [ "# kNN\n", "\n", - ">In statistics, the [k-nearest neighbors algorithm (k-NN)](https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm) is a non-parametric supervised learning method first developed by Evelyn Fix and Joseph Hodges in 1951, and later expanded by Thomas Cover. It is used for classification and regression.\n", + ">In statistics, the [k-nearest neighbours algorithm (k-NN)](https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm) is a non-parametric supervised learning method first developed by `Evelyn Fix` and `Joseph Hodges` in 1951, and later expanded by `Thomas Cover`. It is used for classification and regression.\n", "\n", - "This notebook goes over how to use a retriever that under the hood uses an kNN.\n", + "This notebook goes over how to use a retriever that under the hood uses a kNN.\n", "\n", - "Largely based on https://github.com/karpathy/randomfun/blob/master/knn_vs_svm.html" + "Largely based on the code of [Andrej Karpathy](https://github.com/karpathy/randomfun/blob/master/knn_vs_svm.html)." ] }, { diff --git a/docs/docs/integrations/retrievers/merger_retriever.ipynb b/docs/docs/integrations/retrievers/merger_retriever.ipynb index cc6dc2cb45b..b3086839391 100644 --- a/docs/docs/integrations/retrievers/merger_retriever.ipynb +++ b/docs/docs/integrations/retrievers/merger_retriever.ipynb @@ -8,7 +8,7 @@ "source": [ "# LOTR (Merger Retriever)\n", "\n", - "`Lord of the Retrievers`, also known as `MergerRetriever`, takes a list of retrievers as input and merges the results of their get_relevant_documents() methods into a single list. The merged results will be a list of documents that are relevant to the query and that have been ranked by the different retrievers.\n", + ">`Lord of the Retrievers (LOTR)`, also known as `MergerRetriever`, takes a list of retrievers as input and merges the results of their get_relevant_documents() methods into a single list. The merged results will be a list of documents that are relevant to the query and that have been ranked by the different retrievers.\n", "\n", "The `MergerRetriever` class can be used to improve the accuracy of document retrieval in a number of ways. First, it can combine the results of multiple retrievers, which can help to reduce the risk of bias in the results. Second, it can rank the results of the different retrievers, which can help to ensure that the most relevant documents are returned first." ] diff --git a/docs/docs/integrations/retrievers/qdrant-sparse.ipynb b/docs/docs/integrations/retrievers/qdrant-sparse.ipynb index 17b81b543c1..54607f97f43 100644 --- a/docs/docs/integrations/retrievers/qdrant-sparse.ipynb +++ b/docs/docs/integrations/retrievers/qdrant-sparse.ipynb @@ -5,12 +5,12 @@ "id": "ce0f17b9", "metadata": {}, "source": [ - "# Qdrant Sparse Vector Retriever\n", + "# Qdrant Sparse Vector\n", "\n", ">[Qdrant](https://qdrant.tech/) is an open-source, high-performance vector search engine/database.\n", "\n", "\n", - ">`QdrantSparseVectorRetriever` uses [sparse vectors](https://qdrant.tech/articles/sparse-vectors/) introduced in Qdrant [v1.7.0](https://qdrant.tech/articles/qdrant-1.7.x/) for document retrieval.\n" + ">`QdrantSparseVectorRetriever` uses [sparse vectors](https://qdrant.tech/articles/sparse-vectors/) introduced in `Qdrant` [v1.7.0](https://qdrant.tech/articles/qdrant-1.7.x/) for document retrieval.\n" ] }, { diff --git a/docs/docs/integrations/retrievers/ragatouille.ipynb b/docs/docs/integrations/retrievers/ragatouille.ipynb index 350c831c148..868fde5f607 100644 --- a/docs/docs/integrations/retrievers/ragatouille.ipynb +++ b/docs/docs/integrations/retrievers/ragatouille.ipynb @@ -8,9 +8,13 @@ "# RAGatouille\n", "\n", "\n", - "This page covers how to use [RAGatouille](https://github.com/bclavie/RAGatouille) as a retriever in a LangChain chain. RAGatouille makes it as simple as can be to use ColBERT! [ColBERT](https://github.com/stanford-futuredata/ColBERT) is a fast and accurate retrieval model, enabling scalable BERT-based search over large text collections in tens of milliseconds.\n", + ">[RAGatouille](https://github.com/bclavie/RAGatouille) makes it as simple as can be to use `ColBERT`!\n", + ">\n", + ">[ColBERT](https://github.com/stanford-futuredata/ColBERT) is a fast and accurate retrieval model, enabling scalable BERT-based search over large text collections in tens of milliseconds.\n", "\n", - "We can use this as a [retriever](/docs/modules/data_connection/retrievers). It will show functionality specific to this integration. After going through, it may be useful to explore [relevant use-case pages](/docs/use_cases/question_answering) to learn how to use this vectorstore as part of a larger chain.\n", + "We can use this as a [retriever](/docs/modules/data_connection/retrievers). It will show functionality specific to this integration. After going through, it may be useful to explore [relevant use-case pages](/docs/use_cases/question_answering) to learn how to use this vector store as part of a larger chain.\n", + "\n", + "This page covers how to use [RAGatouille](https://github.com/bclavie/RAGatouille) as a retriever in a LangChain chain. \n", "\n", "## Setup\n", "\n", diff --git a/docs/docs/integrations/retrievers/sec_filings.ipynb b/docs/docs/integrations/retrievers/sec_filings.ipynb index 3cfbcddd200..b23cc05cc0a 100644 --- a/docs/docs/integrations/retrievers/sec_filings.ipynb +++ b/docs/docs/integrations/retrievers/sec_filings.ipynb @@ -8,9 +8,9 @@ "# SEC filing\n", "\n", "\n", - ">The SEC filing is a financial statement or other formal document submitted to the U.S. Securities and Exchange Commission (SEC). Public companies, certain insiders, and broker-dealers are required to make regular SEC filings. Investors and financial professionals rely on these filings for information about companies they are evaluating for investment purposes.\n", + ">[SEC filing](https://www.sec.gov/edgar) is a financial statement or other formal document submitted to the U.S. Securities and Exchange Commission (SEC). Public companies, certain insiders, and broker-dealers are required to make regular `SEC filings`. Investors and financial professionals rely on these filings for information about companies they are evaluating for investment purposes.\n", ">\n", - ">SEC filings data powered by [Kay.ai](https://kay.ai) and [Cybersyn](https://www.cybersyn.com/) via [Snowflake Marketplace](https://app.snowflake.com/marketplace/providers/GZTSZAS2KCS/Cybersyn%2C%20Inc).\n" + ">`SEC filings` data powered by [Kay.ai](https://kay.ai) and [Cybersyn](https://www.cybersyn.com/) via [Snowflake Marketplace](https://app.snowflake.com/marketplace/providers/GZTSZAS2KCS/Cybersyn%2C%20Inc).\n" ] }, { diff --git a/docs/docs/integrations/retrievers/self_query/astradb.ipynb b/docs/docs/integrations/retrievers/self_query/astradb.ipynb index aa8e81b5e14..a37597cf2e4 100644 --- a/docs/docs/integrations/retrievers/self_query/astradb.ipynb +++ b/docs/docs/integrations/retrievers/self_query/astradb.ipynb @@ -4,9 +4,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Astra DB\n", + "# Astra DB (Cassandra)\n", "\n", - "DataStax [Astra DB](https://docs.datastax.com/en/astra/home/astra.html) is a serverless vector-capable database built on Cassandra and made conveniently available through an easy-to-use JSON API.\n", + ">[DataStax Astra DB](https://docs.datastax.com/en/astra/home/astra.html) is a serverless vector-capable database built on `Cassandra` and made conveniently available through an easy-to-use JSON API.\n", "\n", "In the walkthrough, we'll demo the `SelfQueryRetriever` with an `Astra DB` vector store." ] @@ -57,6 +57,9 @@ "cell_type": "markdown", "metadata": { "collapsed": false, + "jupyter": { + "outputs_hidden": false + }, "pycharm": { "name": "#%% md\n" } @@ -276,7 +279,10 @@ { "cell_type": "markdown", "metadata": { - "collapsed": false + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } }, "source": [ "## Cleanup\n", @@ -290,7 +296,10 @@ "cell_type": "code", "execution_count": null, "metadata": { - "collapsed": false + "collapsed": false, + "jupyter": { + "outputs_hidden": false + } }, "outputs": [], "source": [ @@ -300,7 +309,7 @@ ], "metadata": { "kernelspec": { - "display_name": ".venv", + "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, @@ -314,9 +323,9 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.11.5" + "version": "3.10.12" } }, "nbformat": 4, - "nbformat_minor": 2 + "nbformat_minor": 4 } diff --git a/docs/docs/integrations/retrievers/self_query/chroma_self_query.ipynb b/docs/docs/integrations/retrievers/self_query/chroma_self_query.ipynb index e04509af54b..08ee33c5c38 100644 --- a/docs/docs/integrations/retrievers/self_query/chroma_self_query.ipynb +++ b/docs/docs/integrations/retrievers/self_query/chroma_self_query.ipynb @@ -7,7 +7,7 @@ "source": [ "# Chroma\n", "\n", - ">[Chroma](https://docs.trychroma.com/getting-started) is a database for building AI applications with embeddings.\n", + ">[Chroma](https://docs.trychroma.com/getting-started) is a vector database for building AI applications with embeddings.\n", "\n", "In the notebook, we'll demo the `SelfQueryRetriever` wrapped around a `Chroma` vector store. " ] diff --git a/docs/docs/integrations/retrievers/self_query/index.mdx b/docs/docs/integrations/retrievers/self_query/index.mdx index 71899a63970..dc438601a21 100644 --- a/docs/docs/integrations/retrievers/self_query/index.mdx +++ b/docs/docs/integrations/retrievers/self_query/index.mdx @@ -2,7 +2,7 @@ sidebar-position: 0 --- -# Self-querying retriever +# Self-querying retrievers Learn about how the self-querying retriever works [here](/docs/modules/data_connection/retrievers/self_query). diff --git a/docs/docs/integrations/retrievers/self_query/mongodb_atlas.ipynb b/docs/docs/integrations/retrievers/self_query/mongodb_atlas.ipynb index cfe0aa6a79e..d7b13b47f06 100644 --- a/docs/docs/integrations/retrievers/self_query/mongodb_atlas.ipynb +++ b/docs/docs/integrations/retrievers/self_query/mongodb_atlas.ipynb @@ -6,8 +6,8 @@ "source": [ "# MongoDB Atlas\n", "\n", - "[MongoDB Atlas](https://www.mongodb.com/) is a document database that can be \n", - "used as a vector databse.\n", + ">[MongoDB Atlas](https://www.mongodb.com/) is a document database that can be \n", + "used as a vector database.\n", "\n", "In the walkthrough, we'll demo the `SelfQueryRetriever` with a `MongoDB Atlas` vector store." ] @@ -299,7 +299,7 @@ ], "metadata": { "kernelspec": { - "display_name": ".venv", + "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, @@ -313,9 +313,9 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.11.5" + "version": "3.10.12" } }, "nbformat": 4, - "nbformat_minor": 2 + "nbformat_minor": 4 } diff --git a/docs/docs/integrations/retrievers/self_query/pgvector_self_query.ipynb b/docs/docs/integrations/retrievers/self_query/pgvector_self_query.ipynb index 8daf192f593..0ea19836738 100644 --- a/docs/docs/integrations/retrievers/self_query/pgvector_self_query.ipynb +++ b/docs/docs/integrations/retrievers/self_query/pgvector_self_query.ipynb @@ -5,9 +5,9 @@ "id": "13afcae7", "metadata": {}, "source": [ - "# PGVector\n", + "# PGVector (Postgres)\n", "\n", - ">[PGVector](https://github.com/pgvector/pgvector) is a vector similarity search for Postgres.\n", + ">[PGVector](https://github.com/pgvector/pgvector) is a vector similarity search package for `Postgres` data base.\n", "\n", "In the notebook, we'll demo the `SelfQueryRetriever` wrapped around a `PGVector` vector store." ] @@ -300,7 +300,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.9.16" + "version": "3.10.12" } }, "nbformat": 4, diff --git a/docs/docs/integrations/retrievers/self_query/supabase_self_query.ipynb b/docs/docs/integrations/retrievers/self_query/supabase_self_query.ipynb index 7477cfec580..d1bed3d9dcf 100644 --- a/docs/docs/integrations/retrievers/self_query/supabase_self_query.ipynb +++ b/docs/docs/integrations/retrievers/self_query/supabase_self_query.ipynb @@ -5,7 +5,7 @@ "id": "13afcae7", "metadata": {}, "source": [ - "# Supabase\n", + "# Supabase (Postgres)\n", "\n", ">[Supabase](https://supabase.com/docs) is an open-source `Firebase` alternative. \n", "> `Supabase` is built on top of `PostgreSQL`, which offers strong `SQL` \n", diff --git a/docs/docs/integrations/retrievers/self_query/timescalevector_self_query.ipynb b/docs/docs/integrations/retrievers/self_query/timescalevector_self_query.ipynb index 9dc762d025e..f74fff32553 100644 --- a/docs/docs/integrations/retrievers/self_query/timescalevector_self_query.ipynb +++ b/docs/docs/integrations/retrievers/self_query/timescalevector_self_query.ipynb @@ -6,9 +6,13 @@ "id": "13afcae7", "metadata": {}, "source": [ - "# Timescale Vector (Postgres) self-querying \n", + "# Timescale Vector (Postgres) \n", "\n", - "[Timescale Vector](https://www.timescale.com/ai) is PostgreSQL++ for AI applications. It enables you to efficiently store and query billions of vector embeddings in `PostgreSQL`.\n", + ">[Timescale Vector](https://www.timescale.com/ai) is `PostgreSQL++` for AI applications. It enables you to efficiently store and query billions of vector embeddings in `PostgreSQL`.\n", + ">\n", + ">[PostgreSQL](https://en.wikipedia.org/wiki/PostgreSQL) also known as `Postgres`,\n", + "> is a free and open-source relational database management system (RDBMS) \n", + "> emphasizing extensibility and `SQL` compliance.\n", "\n", "This notebook shows how to use the Postgres vector database (`TimescaleVector`) to perform self-querying. In the notebook we'll demo the `SelfQueryRetriever` wrapped around a TimescaleVector vector store. \n", "\n", @@ -528,6 +532,18 @@ "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.12" } }, "nbformat": 4, diff --git a/docs/docs/integrations/retrievers/self_query/vectara_self_query.ipynb b/docs/docs/integrations/retrievers/self_query/vectara_self_query.ipynb index c95fe311df2..807fe75be7b 100644 --- a/docs/docs/integrations/retrievers/self_query/vectara_self_query.ipynb +++ b/docs/docs/integrations/retrievers/self_query/vectara_self_query.ipynb @@ -5,19 +5,15 @@ "id": "13afcae7", "metadata": {}, "source": [ - "# Vectara self-querying \n", + "# Vectara \n", "\n", ">[Vectara](https://vectara.com/) is the trusted GenAI platform that provides an easy-to-use API for document indexing and querying. \n", - "\n", - "Vectara provides an end-to-end managed service for Retrieval Augmented Generation or [RAG](https://vectara.com/grounded-generation/), which includes:\n", - "\n", - "1. A way to extract text from document files and chunk them into sentences.\n", - "\n", - "2. The state-of-the-art [Boomerang](https://vectara.com/how-boomerang-takes-retrieval-augmented-generation-to-the-next-level-via-grounded-generation/) embeddings model. Each text chunk is encoded into a vector embedding using Boomerang, and stored in the Vectara internal knowledge (vector+text) store\n", - "\n", - "3. A query service that automatically encodes the query into embedding, and retrieves the most relevant text segments (including support for [Hybrid Search](https://docs.vectara.com/docs/api-reference/search-apis/lexical-matching) and [MMR](https://vectara.com/get-diverse-results-and-comprehensive-summaries-with-vectaras-mmr-reranker/))\n", - "\n", - "4. An option to create [generative summary](https://docs.vectara.com/docs/learn/grounded-generation/grounded-generation-overview), based on the retrieved documents, including citations.\n", + ">\n", + ">`Vectara` provides an end-to-end managed service for `Retrieval Augmented Generation` or [RAG](https://vectara.com/grounded-generation/), which includes:\n", + ">1. A way to `extract text` from document files and `chunk` them into sentences.\n", + ">2. The state-of-the-art [Boomerang](https://vectara.com/how-boomerang-takes-retrieval-augmented-generation-to-the-next-level-via-grounded-generation/) embeddings model. Each text chunk is encoded into a vector embedding using `Boomerang`, and stored in the Vectara internal knowledge (vector+text) store\n", + ">3. A query service that automatically encodes the query into embedding, and retrieves the most relevant text segments (including support for [Hybrid Search](https://docs.vectara.com/docs/api-reference/search-apis/lexical-matching) and [MMR](https://vectara.com/get-diverse-results-and-comprehensive-summaries-with-vectaras-mmr-reranker/))\n", + ">4. An option to create [generative summary](https://docs.vectara.com/docs/learn/grounded-generation/grounded-generation-overview), based on the retrieved documents, including citations.\n", "\n", "See the [Vectara API documentation](https://docs.vectara.com/docs/) for more information on how to use the API.\n", "\n", @@ -31,17 +27,17 @@ "source": [ "# Setup\n", "\n", - "You will need a Vectara account to use Vectara with LangChain. To get started, use the following steps (see our [quickstart](https://docs.vectara.com/docs/quickstart) guide):\n", - "1. [Sign up](https://console.vectara.com/signup) for a Vectara account if you don't already have one. Once you have completed your sign up you will have a Vectara customer ID. You can find your customer ID by clicking on your name, on the top-right of the Vectara console window.\n", - "2. Within your account you can create one or more corpora. Each corpus represents an area that stores text data upon ingest from input documents. To create a corpus, use the **\"Create Corpus\"** button. You then provide a name to your corpus as well as a description. Optionally you can define filtering attributes and apply some advanced options. If you click on your created corpus, you can see its name and corpus ID right on the top.\n", + "You will need a `Vectara` account to use `Vectara` with `LangChain`. To get started, use the following steps (see our [quickstart](https://docs.vectara.com/docs/quickstart) guide):\n", + "1. [Sign up](https://console.vectara.com/signup) for a `Vectara` account if you don't already have one. Once you have completed your sign up you will have a Vectara customer ID. You can find your customer ID by clicking on your name, on the top-right of the Vectara console window.\n", + "2. Within your account you can create one or more corpora. Each corpus represents an area that stores text data upon ingesting from input documents. To create a corpus, use the **\"Create Corpus\"** button. You then provide a name to your corpus as well as a description. Optionally you can define filtering attributes and apply some advanced options. If you click on your created corpus, you can see its name and corpus ID right on the top.\n", "3. Next you'll need to create API keys to access the corpus. Click on the **\"Authorization\"** tab in the corpus view and then the **\"Create API Key\"** button. Give your key a name, and choose whether you want query only or query+index for your key. Click \"Create\" and you now have an active API key. Keep this key confidential. \n", "\n", - "To use LangChain with Vectara, you'll need to have these three values: customer ID, corpus ID and api_key.\n", + "To use LangChain with Vectara, you need three values: customer ID, corpus ID and api_key.\n", "You can provide those to LangChain in two ways:\n", "\n", "1. Include in your environment these three variables: `VECTARA_CUSTOMER_ID`, `VECTARA_CORPUS_ID` and `VECTARA_API_KEY`.\n", "\n", - "> For example, you can set these variables using os.environ and getpass as follows:\n", + "> For example, you can set these variables using `os.environ` and `getpass` as follows:\n", "\n", "```python\n", "import os\n", @@ -52,7 +48,7 @@ "os.environ[\"VECTARA_API_KEY\"] = getpass.getpass(\"Vectara API Key:\")\n", "```\n", "\n", - "1. Provide them as arguments when creating the Vectara vectorstore object:\n", + "1. Provide them as arguments when creating the `Vectara` vectorstore object:\n", "\n", "```python\n", "vectorstore = Vectara(\n", @@ -398,7 +394,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.10.9" + "version": "3.10.12" } }, "nbformat": 4, diff --git a/docs/docs/integrations/retrievers/tavily.ipynb b/docs/docs/integrations/retrievers/tavily.ipynb index 8358202612d..6c5c61fb3d7 100644 --- a/docs/docs/integrations/retrievers/tavily.ipynb +++ b/docs/docs/integrations/retrievers/tavily.ipynb @@ -6,7 +6,7 @@ "source": [ "# Tavily Search API\n", "\n", - "[Tavily's Search API](https://tavily.com) is a search engine built specifically for AI agents (LLMs), delivering real-time, accurate, and factual results at speed.\n", + ">[Tavily's Search API](https://tavily.com) is a search engine built specifically for AI agents (LLMs), delivering real-time, accurate, and factual results at speed.\n", "\n", "We can use this as a [retriever](/docs/modules/data_connection/retrievers). It will show functionality specific to this integration. After going through, it may be useful to explore [relevant use-case pages](/docs/use_cases/question_answering) to learn how to use this vectorstore as part of a larger chain.\n", "\n", diff --git a/docs/docs/integrations/retrievers/you-retriever.ipynb b/docs/docs/integrations/retrievers/you-retriever.ipynb index d32f167251c..d0f41b4fdf3 100644 --- a/docs/docs/integrations/retrievers/you-retriever.ipynb +++ b/docs/docs/integrations/retrievers/you-retriever.ipynb @@ -5,9 +5,9 @@ "id": "818fc023", "metadata": {}, "source": [ - "# You.com Retriever\n", + "# You.com\n", "\n", - "The [you.com API](https://api.you.com) is a suite of tools designed to help developers ground the output of LLMs in the most recent, most accurate, most relevant information that may not have been included in their training dataset." + ">[you.com API](https://api.you.com) is a suite of tools designed to help developers ground the output of LLMs in the most recent, most accurate, most relevant information that may not have been included in their training dataset." ] }, {