Merge branch 'master' into deepsense/text-to-speech

2025-09-11 16:01:33 +00:00 · 2023-09-08 08:09:01 +02:00
parent f23fed34e8 01e9d7902d
commit 69fe0621d4
60 changed files with 4527 additions and 2056 deletions
--- a/docs/extras/integrations/providers/awadb.md
+++ b/docs/extras/integrations/providers/awadb.md
@@ -9,13 +9,20 @@ pip install awadb
 ```


-## VectorStore
+## Vector Store

-There exists a wrapper around AwaDB vector databases, allowing you to use it as a vectorstore,
-whether for semantic search or example selection.

 ```python
 from langchain.vectorstores import AwaDB
 ```

-For a more detailed walkthrough of the AwaDB wrapper, see [here](/docs/integrations/vectorstores/awadb.html).
+See a [usage example](/docs/integrations/vectorstores/awadb).
+
+
+## Text Embedding Model
+
+```python
+from langchain.embeddings import AwaEmbeddings
+```
+
+See a [usage example](/docs/integrations/text_embedding/awadb).
--- a/docs/extras/integrations/providers/modelscope.mdx
+++ b/docs/extras/integrations/providers/modelscope.mdx
@@ -1,20 +1,24 @@
 # ModelScope

+>[ModelScope](https://www.modelscope.cn/home) is a big repository of the models and datasets.
+
 This page covers how to use the modelscope ecosystem within LangChain.
 It is broken into two parts: installation and setup, and then references to specific modelscope wrappers.

 ## Installation and Setup

-* Install the Python SDK with `pip install modelscope`
+Install the `modelscope` package.
+ 
+```bash
+pip install modelscope
+```

-## Wrappers

-### Embeddings
+## Text Embedding Models

-There exists a modelscope Embeddings wrapper, which you can access with 

 ```python
 from langchain.embeddings import ModelScopeEmbeddings
 ```

-For a more detailed walkthrough of this, see [this notebook](/docs/integrations/text_embedding/modelscope_hub.html)
+For a more detailed walkthrough of this, see [this notebook](/docs/integrations/text_embedding/modelscope_hub)
--- a/docs/extras/integrations/providers/nlpcloud.mdx
+++ b/docs/extras/integrations/providers/nlpcloud.mdx
@@ -1,17 +1,31 @@
 # NLPCloud

-This page covers how to use the NLPCloud ecosystem within LangChain.
-It is broken into two parts: installation and setup, and then references to specific NLPCloud wrappers.
+>[NLP Cloud](https://docs.nlpcloud.com/#introduction) is an artificial intelligence platform that allows you to use the most advanced AI engines, and even train your own engines with your own data. 
+

 ## Installation and Setup
- Install the Python SDK with `pip install nlpcloud`
+
+- Install the `nlpcloud` package.
+
+```bash
+pip install nlpcloud
+```
+
 - Get an NLPCloud api key and set it as an environment variable (`NLPCLOUD_API_KEY`)

-## Wrappers

-### LLM
+## LLM
+
+See a [usage example](/docs/integrations/llms/nlpcloud).

-There exists an NLPCloud LLM wrapper, which you can access with 
 ```python
 from langchain.llms import NLPCloud
 ```
+
+## Text Embedding Models
+
+See a [usage example](/docs/integrations/text_embedding/nlp_cloud)
+
+```python
+from langchain.embeddings import NLPCloudEmbeddings
+```
--- a/docs/extras/integrations/providers/spacy.mdx
+++ b/docs/extras/integrations/providers/spacy.mdx
@@ -18,3 +18,11 @@ See a [usage example](/docs/modules/data_connection/document_transformers/text_s
 ```python
 from langchain.text_splitter import SpacyTextSplitter
 ```
+
+## Text Embedding Models
+
+See a [usage example](/docs/integrations/text_embedding/spacy_embedding)
+
+```python
+from langchain.embeddings.spacy_embeddings import SpacyEmbeddings
+```
--- a/docs/extras/integrations/providers/vectara/index.mdx
+++ b/docs/extras/integrations/providers/vectara/index.mdx
@@ -11,9 +11,10 @@ What is Vectara?
 - You can use Vectara's integration with LangChain as a Vector store or using the Retriever abstraction.

 ## Installation and Setup
-To use Vectara with LangChain no special installation steps are required. You just have to provide your customer_id, corpus ID, and an API key created within the Vectara console to enable indexing and searching.
+To use Vectara with LangChain no special installation steps are required. 
+To get started, follow our [quickstart](https://docs.vectara.com/docs/quickstart) guide to create an account, a corpus and an API key. 
+Once you have these, you can provide them as arguments to the Vectara vectorstore, or you can set them as environment variables.

-Alternatively these can be provided as environment variables
 - export `VECTARA_CUSTOMER_ID`="your_customer_id"
 - export `VECTARA_CORPUS_ID`="your_corpus_id"
 - export `VECTARA_API_KEY`="your-vectara-api-key"
--- a/docs/extras/integrations/text_embedding/awadb.ipynb
+++ b/docs/extras/integrations/text_embedding/awadb.ipynb
@@ -5,9 +5,11 @@
   "id": "b14a24db",
   "metadata": {},
   "source": [
-    "# AwaEmbedding\n",
+    "# AwaDB\n",
    "\n",
-    "This notebook explains how to use AwaEmbedding, which is included in [awadb](https://github.com/awa-ai/awadb), to embedding texts in langchain."
+    ">[AwaDB](https://github.com/awa-ai/awadb) is an AI Native database for the search and storage of embedding vectors used by LLM Applications.\n",
+    "\n",
+    "This notebook explains how to use `AwaEmbeddings` in LangChain."
   ]
  },
  {
@@ -101,7 +103,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.11.4"
+   "version": "3.10.12"
  }
 },
 "nbformat": 4,
--- a/docs/extras/integrations/text_embedding/bedrock.ipynb
+++ b/docs/extras/integrations/text_embedding/bedrock.ipynb
@@ -5,7 +5,9 @@
   "id": "75e378f5-55d7-44b6-8e2e-6d7b8b171ec4",
   "metadata": {},
   "source": [
-    "# Bedrock Embeddings"
+    "# Bedrock\n",
+    "\n",
+    ">[Amazon Bedrock](https://aws.amazon.com/bedrock/) is a fully managed service that makes FMs from leading AI startups and Amazon available via an API, so you can choose from a wide range of FMs to find the model that is best suited for your use case.\n"
   ]
  },
  {
@@ -91,7 +93,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.9.13"
+   "version": "3.10.12"
  }
 },
 "nbformat": 4,
--- a/docs/extras/integrations/text_embedding/bge_huggingface.ipynb
+++ b/docs/extras/integrations/text_embedding/bge_huggingface.ipynb
@@ -5,26 +5,29 @@
   "id": "719619d3",
   "metadata": {},
   "source": [
-    "# BGE Hugging Face Embeddings\n",
+    "# BGE on Hugging Face\n",
    "\n",
-    "This notebook shows how to use BGE Embeddings through Hugging Face"
+    ">[BGE models on the HuggingFace](https://huggingface.co/BAAI/bge-large-en) are [the best open-source embedding models](https://huggingface.co/spaces/mteb/leaderboard).\n",
+    ">BGE model is created by the [Beijing Academy of Artificial Intelligence (BAAI)](https://www.baai.ac.cn/english.html). `BAAI` is a private non-profit organization engaged in AI research and development.\n",
+    "\n",
+    "This notebook shows how to use `BGE Embeddings` through `Hugging Face`"
   ]
  },
  {
   "cell_type": "code",
-   "execution_count": 8,
+   "execution_count": null,
   "id": "f7a54279",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
-    "# !pip install sentence_transformers"
+    "#!pip install sentence_transformers"
   ]
  },
  {
   "cell_type": "code",
-   "execution_count": 5,
+   "execution_count": null,
   "id": "9e1d5b6b",
   "metadata": {},
   "outputs": [],
@@ -43,12 +46,24 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 7,
+   "execution_count": 5,
   "id": "e59d1a89",
   "metadata": {},
-   "outputs": [],
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "384"
+      ]
+     },
+     "execution_count": 5,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
   "source": [
-    "embedding = hf.embed_query(\"hi this is harrison\")"
+    "embedding = hf.embed_query(\"hi this is harrison\")\n",
+    "len(embedding)"
   ]
  },
  {
@@ -76,7 +91,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.10.1"
+   "version": "3.10.12"
  }
 },
 "nbformat": 4,
--- a/docs/extras/integrations/text_embedding/google_vertex_ai_palm.ipynb
+++ b/docs/extras/integrations/text_embedding/google_vertex_ai_palm.ipynb
@@ -1,13 +1,14 @@
 {
 "cells": [
  {
-   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "# Google Cloud Platform Vertex AI PaLM \n",
+    "# Google Vertex AI PaLM \n",
    "\n",
-    "Note: This is seperate from the Google PaLM integration, it exposes [Vertex AI PaLM API](https://cloud.google.com/vertex-ai/docs/generative-ai/learn/overview) on Google Cloud. \n",
+    ">[Vertex AI PaLM API](https://cloud.google.com/vertex-ai/docs/generative-ai/learn/overview) is a service on Google Cloud exposing the embedding models. \n",
+    "\n",
+    "Note: This integration is seperate from the Google PaLM integration.\n",
    "\n",
    "By default, Google Cloud [does not use](https://cloud.google.com/vertex-ai/docs/generative-ai/data-governance#foundation_model_development) Customer Data to train its foundation models as part of Google Cloud`s AI/ML Privacy Commitment. More details about how Google processes data can also be found in [Google's Customer Data Processing Addendum (CDPA)](https://cloud.google.com/terms/data-processing-addendum).\n",
    "\n",
@@ -96,7 +97,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.9.1"
+   "version": "3.10.12"
  },
  "vscode": {
   "interpreter": {
--- a/docs/extras/integrations/text_embedding/modelscope_hub.ipynb
+++ b/docs/extras/integrations/text_embedding/modelscope_hub.ipynb
@@ -1,12 +1,13 @@
 {
 "cells": [
  {
-   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# ModelScope\n",
    "\n",
+    ">[ModelScope](https://www.modelscope.cn/home) is big repository of the models and datasets.\n",
+    "\n",
    "Let's load the ModelScope Embedding class."
   ]
  },
@@ -67,16 +68,23 @@
 ],
 "metadata": {
  "kernelspec": {
-   "display_name": "chatgpt",
+   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
   "name": "python",
-   "version": "3.9.15"
-  },
-  "orig_nbformat": 4
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.12"
+  }
 },
 "nbformat": 4,
- "nbformat_minor": 2
+ "nbformat_minor": 4
 }
--- a/docs/extras/integrations/text_embedding/mosaicml.ipynb
+++ b/docs/extras/integrations/text_embedding/mosaicml.ipynb
@@ -1,15 +1,14 @@
 {
 "cells": [
  {
-   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "# MosaicML embeddings\n",
+    "# MosaicML\n",
    "\n",
-    "[MosaicML](https://docs.mosaicml.com/en/latest/inference.html) offers a managed inference service. You can either use a variety of open source models, or deploy your own.\n",
+    ">[MosaicML](https://docs.mosaicml.com/en/latest/inference.html) offers a managed inference service. You can either use a variety of open source models, or deploy your own.\n",
    "\n",
-    "This example goes over how to use LangChain to interact with MosaicML Inference for text embedding."
+    "This example goes over how to use LangChain to interact with `MosaicML` Inference for text embedding."
   ]
  },
  {
@@ -94,6 +93,11 @@
  }
 ],
 "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
@@ -103,9 +107,10 @@
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3"
+   "pygments_lexer": "ipython3",
+   "version": "3.10.12"
  }
 },
 "nbformat": 4,
- "nbformat_minor": 2
+ "nbformat_minor": 4
 }
--- a/docs/extras/integrations/text_embedding/nlp_cloud.ipynb
+++ b/docs/extras/integrations/text_embedding/nlp_cloud.ipynb
@@ -7,7 +7,7 @@
   "source": [
    "# NLP Cloud\n",
    "\n",
-    "NLP Cloud is an artificial intelligence platform that allows you to use the most advanced AI engines, and even train your own engines with your own data. \n",
+    ">[NLP Cloud](https://docs.nlpcloud.com/#introduction) is an artificial intelligence platform that allows you to use the most advanced AI engines, and even train your own engines with your own data. \n",
    "\n",
    "The [embeddings](https://docs.nlpcloud.com/#embeddings) endpoint offers the following model:\n",
    "\n",
@@ -80,7 +80,7 @@
 ],
 "metadata": {
  "kernelspec": {
-   "display_name": "Python 3.11.2 64-bit",
+   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
@@ -94,7 +94,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.11.2"
+   "version": "3.10.12"
  },
  "vscode": {
   "interpreter": {
--- a/docs/extras/integrations/text_embedding/sagemaker-endpoint.ipynb
+++ b/docs/extras/integrations/text_embedding/sagemaker-endpoint.ipynb
@@ -5,11 +5,13 @@
   "id": "1f83f273",
   "metadata": {},
   "source": [
-    "# SageMaker Endpoint Embeddings\n",
+    "# SageMaker\n",
    "\n",
-    "Let's load the SageMaker Endpoints Embeddings class. The class can be used if you host, e.g. your own Hugging Face model on SageMaker.\n",
+    "Let's load the `SageMaker Endpoints Embeddings` class. The class can be used if you host, e.g. your own Hugging Face model on SageMaker.\n",
    "\n",
-    "For instructions on how to do this, please see [here](https://www.philschmid.de/custom-inference-huggingface-sagemaker). **Note**: In order to handle batched requests, you will need to adjust the return line in the `predict_fn()` function within the custom `inference.py` script:\n",
+    "For instructions on how to do this, please see [here](https://www.philschmid.de/custom-inference-huggingface-sagemaker). \n",
+    "\n",
+    "**Note**: In order to handle batched requests, you will need to adjust the return line in the `predict_fn()` function within the custom `inference.py` script:\n",
    "\n",
    "Change from\n",
    "\n",
@@ -143,7 +145,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.9.1"
+   "version": "3.10.12"
  },
  "vscode": {
   "interpreter": {
--- a/docs/extras/integrations/text_embedding/self-hosted.ipynb
+++ b/docs/extras/integrations/text_embedding/self-hosted.ipynb
@@ -5,8 +5,8 @@
   "id": "eec4efda",
   "metadata": {},
   "source": [
-    "# Self Hosted Embeddings\n",
-    "Let's load the SelfHostedEmbeddings, SelfHostedHuggingFaceEmbeddings, and SelfHostedHuggingFaceInstructEmbeddings classes."
+    "# Self Hosted\n",
+    "Let's load the `SelfHostedEmbeddings`, `SelfHostedHuggingFaceEmbeddings`, and `SelfHostedHuggingFaceInstructEmbeddings` classes."
   ]
  },
  {
@@ -149,9 +149,7 @@
   "cell_type": "code",
   "execution_count": null,
   "id": "fc1bfd0f",
-   "metadata": {
-    "scrolled": false
-   },
+   "metadata": {},
   "outputs": [],
   "source": [
    "query_result = embeddings.embed_query(text)"
@@ -182,7 +180,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.9.1"
+   "version": "3.10.12"
  },
  "vscode": {
   "interpreter": {
--- a/docs/extras/integrations/text_embedding/sentence_transformers.ipynb
+++ b/docs/extras/integrations/text_embedding/sentence_transformers.ipynb
@@ -1,16 +1,15 @@
 {
 "cells": [
  {
-   "attachments": {},
   "cell_type": "markdown",
   "id": "ed47bb62",
   "metadata": {},
   "source": [
-    "# Sentence Transformers Embeddings\n",
+    "# Sentence Transformers\n",
    "\n",
-    "[SentenceTransformers](https://www.sbert.net/) embeddings are called using the `HuggingFaceEmbeddings` integration. We have also added an alias for `SentenceTransformerEmbeddings` for users who are more familiar with directly using that package.\n",
+    ">[SentenceTransformers](https://www.sbert.net/) embeddings are called using the `HuggingFaceEmbeddings` integration. We have also added an alias for `SentenceTransformerEmbeddings` for users who are more familiar with directly using that package.\n",
    "\n",
-    "SentenceTransformers is a python package that can generate text and image embeddings, originating from [Sentence-BERT](https://arxiv.org/abs/1908.10084)"
+    "`SentenceTransformers` is a python package that can generate text and image embeddings, originating from [Sentence-BERT](https://arxiv.org/abs/1908.10084)"
   ]
  },
  {
@@ -109,7 +108,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.8.16"
+   "version": "3.10.12"
  },
  "vscode": {
   "interpreter": {
--- a/docs/extras/integrations/text_embedding/spacy_embedding.ipynb
+++ b/docs/extras/integrations/text_embedding/spacy_embedding.ipynb
@@ -1,21 +1,31 @@
 {
 "cells": [
  {
-   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "# Spacy Embedding\n",
+    "# SpaCy\n",
    "\n",
-    "### Loading the Spacy embedding class to generate and query embeddings"
+    ">[spaCy](https://spacy.io/) is an open-source software library for advanced natural language processing, written in the programming languages Python and Cython.\n",
+    " \n",
+    "\n",
+    "## Installation and Setup"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "#!pip install spacy"
   ]
  },
  {
-   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "#### Import the necessary classes"
+    "Import the necessary classes"
   ]
  },
  {
@@ -28,11 +38,12 @@
   ]
  },
  {
-   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "#### Initialize SpacyEmbeddings.This will load the Spacy model into memory."
+    "## Example\n",
+    "\n",
+    "Initialize SpacyEmbeddings.This will load the Spacy model into memory."
   ]
  },
  {
@@ -45,11 +56,10 @@
   ]
  },
  {
-   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "#### Define some example texts . These could be any documents that you want to analyze - for example, news articles, social media posts, or product reviews."
+    "Define some example texts . These could be any documents that you want to analyze - for example, news articles, social media posts, or product reviews."
   ]
  },
  {
@@ -67,11 +77,10 @@
   ]
  },
  {
-   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "#### Generate and print embeddings for the texts . The SpacyEmbeddings class generates an embedding for each document, which is a numerical representation of the document's content. These embeddings can be used for various natural language processing tasks, such as document similarity comparison or text classification."
+    "Generate and print embeddings for the texts . The SpacyEmbeddings class generates an embedding for each document, which is a numerical representation of the document's content. These embeddings can be used for various natural language processing tasks, such as document similarity comparison or text classification."
   ]
  },
  {
@@ -86,11 +95,10 @@
   ]
  },
  {
-   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "#### Generate and print an embedding for a single piece of text. You can also generate an embedding for a single piece of text, such as a search query. This can be useful for tasks like information retrieval, where you want to find documents that are similar to a given query."
+    "Generate and print an embedding for a single piece of text. You can also generate an embedding for a single piece of text, such as a search query. This can be useful for tasks like information retrieval, where you want to find documents that are similar to a given query."
   ]
  },
  {
@@ -106,11 +114,24 @@
  }
 ],
 "metadata": {
-  "language_info": {
-   "name": "python"
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
  },
-  "orig_nbformat": 4
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.12"
+  }
 },
 "nbformat": 4,
- "nbformat_minor": 2
+ "nbformat_minor": 4
 }
--- a/docs/extras/integrations/vectorstores/supabase.ipynb
+++ b/docs/extras/integrations/vectorstores/supabase.ipynb
@@ -28,43 +28,41 @@
    "The following function determines cosine similarity, but you can adjust to your needs.\n",
    "\n",
    "```sql\n",
-    "       -- Enable the pgvector extension to work with embedding vectors\n",
-    "       create extension vector;\n",
+    "-- Enable the pgvector extension to work with embedding vectors\n",
+    "create extension if not exists vector;\n",
    "\n",
-    "       -- Create a table to store your documents\n",
-    "       create table documents (\n",
-    "       id uuid primary key,\n",
-    "       content text, -- corresponds to Document.pageContent\n",
-    "       metadata jsonb, -- corresponds to Document.metadata\n",
-    "       embedding vector(1536) -- 1536 works for OpenAI embeddings, change if needed\n",
-    "       );\n",
+    "-- Create a table to store your documents\n",
+    "create table\n",
+    "  documents (\n",
+    "    id uuid primary key,\n",
+    "    content text, -- corresponds to Document.pageContent\n",
+    "    metadata jsonb, -- corresponds to Document.metadata\n",
+    "    embedding vector (1536) -- 1536 works for OpenAI embeddings, change if needed\n",
+    "  );\n",
    "\n",
-    "       CREATE FUNCTION match_documents(query_embedding vector(1536), match_count int)\n",
-    "           RETURNS TABLE(\n",
-    "               id uuid,\n",
-    "               content text,\n",
-    "               metadata jsonb,\n",
-    "               -- we return matched vectors to enable maximal marginal relevance searches\n",
-    "               embedding vector(1536),\n",
-    "               similarity float)\n",
-    "           LANGUAGE plpgsql\n",
-    "           AS $$\n",
-    "           # variable_conflict use_column\n",
-    "       BEGIN\n",
-    "           RETURN query\n",
-    "           SELECT\n",
-    "               id,\n",
-    "               content,\n",
-    "               metadata,\n",
-    "               embedding,\n",
-    "               1 -(documents.embedding <=> query_embedding) AS similarity\n",
-    "           FROM\n",
-    "               documents\n",
-    "           ORDER BY\n",
-    "               documents.embedding <=> query_embedding\n",
-    "           LIMIT match_count;\n",
-    "       END;\n",
-    "       $$;\n",
+    "-- Create a function to search for documents\n",
+    "create function match_documents (\n",
+    "  query_embedding vector (1536),\n",
+    "  filter jsonb default '{}'\n",
+    ") returns table (\n",
+    "  id uuid,\n",
+    "  content text,\n",
+    "  metadata jsonb,\n",
+    "  similarity float\n",
+    ") language plpgsql as $$\n",
+    "#variable_conflict use_column\n",
+    "begin\n",
+    "  return query\n",
+    "  select\n",
+    "    id,\n",
+    "    content,\n",
+    "    metadata,\n",
+    "    1 - (documents.embedding <=> query_embedding) as similarity\n",
+    "  from documents\n",
+    "  where metadata @> filter\n",
+    "  order by documents.embedding <=> query_embedding;\n",
+    "end;\n",
+    "$$;\n",
    "```"
   ]
  },
--- a/docs/extras/integrations/vectorstores/vectara.ipynb
+++ b/docs/extras/integrations/vectorstores/vectara.ipynb
@@ -26,7 +26,7 @@
   "source": [
    "# Setup\n",
    "\n",
-    "You will need a Vectara account to use Vectara with LangChain. To get started, use the following steps:\n",
+    "You will need a Vectara account to use Vectara with LangChain. To get started, use the following steps (see our [quickstart](https://docs.vectara.com/docs/quickstart) guide):\n",
    "1. [Sign up](https://console.vectara.com/signup) for a Vectara account if you don't already have one. Once you have completed your sign up you will have a Vectara customer ID. You can find your customer ID by clicking on your name, on the top-right of the Vectara console window.\n",
    "2. Within your account you can create one or more corpora. Each corpus represents an area that stores text data upon ingest from input documents. To create a corpus, use the **\"Create Corpus\"** button. You then provide a name to your corpus as well as a description. Optionally you can define filtering attributes and apply some advanced options. If you click on your created corpus, you can see its name and corpus ID right on the top.\n",
    "3. Next you'll need to create API keys to access the corpus. Click on the **\"Authorization\"** tab in the corpus view and then the **\"Create API Key\"** button. Give your key a name, and choose whether you want query only or query+index for your key. Click \"Create\" and you now have an active API key. Keep this key confidential. \n",
@@ -47,7 +47,7 @@
    "os.environ[\"VECTARA_API_KEY\"] = getpass.getpass(\"Vectara API Key:\")\n",
    "```\n",
    "\n",
-    "2. Add them to the Vectara vectorstore constructor:\n",
+    "1. Provide them as arguments when creating the Vectara vectorstore object:\n",
    "\n",
    "```python\n",
    "vectorstore = Vectara(\n",
@@ -65,13 +65,22 @@
   "source": [
    "## Connecting to Vectara from LangChain\n",
    "\n",
-    "To get started, let's ingest the documents using the from_documents() method.\n",
-    "We assume here that you've added your VECTARA_CUSTOMER_ID, VECTARA_CORPUS_ID and query+indexing VECTARA_API_KEY as environment variables."
+    "In this example, we assume that you've created an account and a corpus, and added your VECTARA_CUSTOMER_ID, VECTARA_CORPUS_ID and VECTARA_API_KEY (created with permissions for both indexing and query) as environment variables.\n",
+    "\n",
+    "The corpus has 3 fields defined as metadata for filtering:\n",
+    "* url: a string field containing the source URL of the document (where relevant)\n",
+    "* speech: a string field containing the name of the speech\n",
+    "* author: the name of the author\n",
+    "\n",
+    "Let's start by ingesting 3 documents into the corpus:\n",
+    "1. The State of the Union speech from 2022, available in the LangChain repository as a text file\n",
+    "2. The \"I have a dream\" speech by Dr. Kind\n",
+    "3. The \"We shall Fight on the Beaches\" speech by Winston Churchil"
   ]
  },
  {
   "cell_type": "code",
-   "execution_count": 1,
+   "execution_count": 2,
   "id": "04a1f1a0",
   "metadata": {},
   "outputs": [],
@@ -79,12 +88,17 @@
    "from langchain.embeddings import FakeEmbeddings\n",
    "from langchain.text_splitter import CharacterTextSplitter\n",
    "from langchain.vectorstores import Vectara\n",
-    "from langchain.document_loaders import TextLoader"
+    "from langchain.document_loaders import TextLoader\n",
+    "\n",
+    "from langchain.llms import OpenAI\n",
+    "from langchain.chains import ConversationalRetrievalChain\n",
+    "from langchain.retrievers.self_query.base import SelfQueryRetriever\n",
+    "from langchain.chains.query_constructor.base import AttributeInfo"
   ]
  },
  {
   "cell_type": "code",
-   "execution_count": 2,
+   "execution_count": 3,
   "id": "be0a4973",
   "metadata": {},
   "outputs": [],
@@ -97,7 +111,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 3,
+   "execution_count": 4,
   "id": "8429667e",
   "metadata": {
    "ExecuteTime": {
@@ -111,7 +125,7 @@
    "vectara = Vectara.from_documents(\n",
    "    docs,\n",
    "    embedding=FakeEmbeddings(size=768),\n",
-    "    doc_metadata={\"speech\": \"state-of-the-union\"},\n",
+    "    doc_metadata={\"speech\": \"state-of-the-union\", \"author\": \"Biden\"},\n",
    ")"
   ]
  },
@@ -130,7 +144,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 4,
+   "execution_count": 5,
   "id": "85ef3468",
   "metadata": {},
   "outputs": [],
@@ -142,14 +156,16 @@
    "    [\n",
    "        \"https://www.gilderlehrman.org/sites/default/files/inline-pdfs/king.dreamspeech.excerpts.pdf\",\n",
    "        \"I-have-a-dream\",\n",
+    "        \"Dr. King\"\n",
    "    ],\n",
    "    [\n",
    "        \"https://www.parkwayschools.net/cms/lib/MO01931486/Centricity/Domain/1578/Churchill_Beaches_Speech.pdf\",\n",
    "        \"we shall fight on the beaches\",\n",
+    "        \"Churchil\"\n",
    "    ],\n",
    "]\n",
    "files_list = []\n",
-    "for url, _ in urls:\n",
+    "for url, _, _ in urls:\n",
    "    name = tempfile.NamedTemporaryFile().name\n",
    "    urllib.request.urlretrieve(url, name)\n",
    "    files_list.append(name)\n",
@@ -157,7 +173,7 @@
    "docsearch: Vectara = Vectara.from_files(\n",
    "    files=files_list,\n",
    "    embedding=FakeEmbeddings(size=768),\n",
-    "    metadatas=[{\"url\": url, \"speech\": title} for url, title in urls],\n",
+    "    metadatas=[{\"url\": url, \"speech\": title, \"author\": author} for url, title, author in urls],\n",
    ")"
   ]
  },
@@ -178,7 +194,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 5,
+   "execution_count": 6,
   "id": "a8c513ab",
   "metadata": {
    "ExecuteTime": {
@@ -197,7 +213,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 6,
+   "execution_count": 7,
   "id": "fc516993",
   "metadata": {
    "ExecuteTime": {
@@ -231,7 +247,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 7,
+   "execution_count": 8,
   "id": "8804a21d",
   "metadata": {
    "ExecuteTime": {
@@ -249,7 +265,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 8,
+   "execution_count": 9,
   "id": "756a6887",
   "metadata": {
    "ExecuteTime": {
@@ -264,7 +280,7 @@
     "text": [
      "Justice Breyer, thank you for your service. One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence. A former top litigator in private practice.\n",
      "\n",
-      "Score: 0.786569\n"
+      "Score: 0.8299499\n"
     ]
    }
   ],
@@ -284,7 +300,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 9,
+   "execution_count": 10,
   "id": "47784de5",
   "metadata": {},
   "outputs": [
@@ -307,7 +323,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 10,
+   "execution_count": 11,
   "id": "3e22949f",
   "metadata": {},
   "outputs": [
@@ -315,7 +331,7 @@
     "name": "stdout",
     "output_type": "stream",
     "text": [
-      "With this threshold of 0.2 we have 3 documents\n"
+      "With this threshold of 0.2 we have 5 documents\n"
     ]
    }
   ],
@@ -340,7 +356,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 11,
+   "execution_count": 12,
   "id": "9427195f",
   "metadata": {
    "ExecuteTime": {
@@ -352,10 +368,10 @@
    {
     "data": {
      "text/plain": [
-       "VectaraRetriever(tags=['Vectara'], metadata=None, vectorstore=<langchain.vectorstores.vectara.Vectara object at 0x1586bd330>, search_type='similarity', search_kwargs={'lambda_val': 0.025, 'k': 5, 'filter': '', 'n_sentence_context': '2'})"
+       "VectaraRetriever(tags=['Vectara'], metadata=None, vectorstore=<langchain.vectorstores.vectara.Vectara object at 0x13b15e9b0>, search_type='similarity', search_kwargs={'lambda_val': 0.025, 'k': 5, 'filter': '', 'n_sentence_context': '2'})"
      ]
     },
-     "execution_count": 11,
+     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
@@ -367,7 +383,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 12,
+   "execution_count": 13,
   "id": "f3c70c31",
   "metadata": {
    "ExecuteTime": {
@@ -379,10 +395,10 @@
    {
     "data": {
      "text/plain": [
-       "Document(page_content='Justice Breyer, thank you for your service. One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence. A former top litigator in private practice.', metadata={'source': 'langchain', 'lang': 'eng', 'offset': '596', 'len': '97', 'speech': 'state-of-the-union'})"
+       "Document(page_content='Justice Breyer, thank you for your service. One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence. A former top litigator in private practice.', metadata={'source': 'langchain', 'lang': 'eng', 'offset': '596', 'len': '97', 'speech': 'state-of-the-union', 'author': 'Biden'})"
      ]
     },
-     "execution_count": 12,
+     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
@@ -392,10 +408,118 @@
    "retriever.get_relevant_documents(query)[0]"
   ]
  },
+  {
+   "cell_type": "markdown",
+   "id": "e944c26a",
+   "metadata": {},
+   "source": [
+    "## Using Vectara as a SelfQuery Retriever"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 15,
+   "id": "8be674de",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "metadata_field_info = [\n",
+    "    AttributeInfo(\n",
+    "        name=\"speech\",\n",
+    "        description=\"what name of the speech\",\n",
+    "        type=\"string or list[string]\",\n",
+    "    ),\n",
+    "    AttributeInfo(\n",
+    "        name=\"author\",\n",
+    "        description=\"author of the speech\",\n",
+    "        type=\"string or list[string]\",\n",
+    "    ),\n",
+    "]\n",
+    "document_content_description = \"the text of the speech\"\n",
+    "\n",
+    "vectordb = Vectara()\n",
+    "llm = OpenAI(temperature=0)\n",
+    "retriever = SelfQueryRetriever.from_llm(llm, vectara, \n",
+    "                                        document_content_description, metadata_field_info, \n",
+    "                                        verbose=True)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 16,
+   "id": "f8938999",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "/Users/ofer/dev/langchain/libs/langchain/langchain/chains/llm.py:278: UserWarning: The predict_and_parse method is deprecated, instead pass an output parser directly to LLMChain.\n",
+      "  warnings.warn(\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "query='freedom' filter=Comparison(comparator=<Comparator.EQ: 'eq'>, attribute='author', value='Biden') limit=None\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "[Document(page_content='Well I know this nation. We will meet the test. To protect freedom and liberty, to expand fairness and opportunity. We will save democracy. As hard as these times have been, I am more optimistic about America today than I have been my whole life.', metadata={'source': 'langchain', 'lang': 'eng', 'offset': '346', 'len': '67', 'speech': 'state-of-the-union', 'author': 'Biden'}),\n",
+       " Document(page_content='To our fellow Ukrainian Americans who forge a deep bond that connects our two nations we stand with you. Putin may circle Kyiv with tanks, but he will never gain the hearts and souls of the Ukrainian people. He will never extinguish their love of freedom. He will never weaken the resolve of the free world. We meet tonight in an America that has lived through two of the hardest years this nation has ever faced.', metadata={'source': 'langchain', 'lang': 'eng', 'offset': '740', 'len': '47', 'speech': 'state-of-the-union', 'author': 'Biden'}),\n",
+       " Document(page_content='But most importantly as Americans. With a duty to one another to the American people to the Constitution. And with an unwavering resolve that freedom will always triumph over tyranny. Six days ago, Russia’s Vladimir Putin sought to shake the foundations of the free world thinking he could make it bend to his menacing ways. But he badly miscalculated.', metadata={'source': 'langchain', 'lang': 'eng', 'offset': '413', 'len': '77', 'speech': 'state-of-the-union', 'author': 'Biden'}),\n",
+       " Document(page_content='We can do this. \\n\\nMy fellow Americans—tonight , we have gathered in a sacred space—the citadel of our democracy. In this Capitol, generation after generation, Americans have debated great questions amid great strife, and have done great things. We have fought for freedom, expanded liberty, defeated totalitarianism and terror. And built the strongest, freest, and most prosperous nation the world has ever known. Now is the hour. \\n\\nOur moment of responsibility.', metadata={'source': 'langchain', 'lang': 'eng', 'offset': '906', 'len': '82', 'speech': 'state-of-the-union', 'author': 'Biden'}),\n",
+       " Document(page_content='In state after state, new laws have been passed, not only to suppress the vote, but to subvert entire elections. We cannot let this happen. Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections.', metadata={'source': 'langchain', 'lang': 'eng', 'offset': '0', 'len': '63', 'speech': 'state-of-the-union', 'author': 'Biden'})]"
+      ]
+     },
+     "execution_count": 16,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "retriever.get_relevant_documents(\"what did Biden say about the freedom?\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 17,
+   "id": "a97037fb",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "query='freedom' filter=Comparison(comparator=<Comparator.EQ: 'eq'>, attribute='author', value='Dr. King') limit=None\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "[Document(page_content='And if America is to be a great nation, this must become true. So\\nlet freedom ring from the prodigious hilltops of New Hampshire. Let freedom ring from the mighty\\nmountains of New York. Let freedom ring from the heightening Alleghenies of Pennsylvania. Let\\nfreedom ring from the snowcapped Rockies of Colorado.', metadata={'lang': 'eng', 'section': '3', 'offset': '1534', 'len': '55', 'CreationDate': '1424880481', 'Producer': 'Adobe PDF Library 10.0', 'Author': 'Sasha Rolon-Pereira', 'Title': 'Martin Luther King Jr.pdf', 'Creator': 'Acrobat PDFMaker 10.1 for Word', 'ModDate': '1424880524', 'url': 'https://www.gilderlehrman.org/sites/default/files/inline-pdfs/king.dreamspeech.excerpts.pdf', 'speech': 'I-have-a-dream', 'author': 'Dr. King', 'title': 'Martin Luther King Jr.pdf'}),\n",
+       " Document(page_content='And if America is to be a great nation, this must become true. So\\nlet freedom ring from the prodigious hilltops of New Hampshire. Let freedom ring from the mighty\\nmountains of New York. Let freedom ring from the heightening Alleghenies of Pennsylvania. Let\\nfreedom ring from the snowcapped Rockies of Colorado.', metadata={'lang': 'eng', 'section': '3', 'offset': '1534', 'len': '55', 'CreationDate': '1424880481', 'Producer': 'Adobe PDF Library 10.0', 'Author': 'Sasha Rolon-Pereira', 'Title': 'Martin Luther King Jr.pdf', 'Creator': 'Acrobat PDFMaker 10.1 for Word', 'ModDate': '1424880524', 'url': 'https://www.gilderlehrman.org/sites/default/files/inline-pdfs/king.dreamspeech.excerpts.pdf', 'speech': 'I-have-a-dream', 'author': 'Dr. King', 'title': 'Martin Luther King Jr.pdf'}),\n",
+       " Document(page_content='Let freedom ring from the curvaceous slopes of\\nCalifornia. But not only that. Let freedom ring from Stone Mountain of Georgia. Let freedom ring from Lookout\\nMountain of Tennessee. Let freedom ring from every hill and molehill of Mississippi, from every\\nmountain side. Let freedom ring . . .\\nWhen we allow freedom to ring—when we let it ring from every city and every hamlet, from every state\\nand every city, we will be able to speed up that day when all of God’s children, black men and white\\nmen, Jews and Gentiles, Protestants and Catholics, will be able to join hands and sing in the words of the\\nold Negro spiritual, “Free at last, Free at last, Great God a-mighty, We are free at last.”', metadata={'lang': 'eng', 'section': '3', 'offset': '1842', 'len': '52', 'CreationDate': '1424880481', 'Producer': 'Adobe PDF Library 10.0', 'Author': 'Sasha Rolon-Pereira', 'Title': 'Martin Luther King Jr.pdf', 'Creator': 'Acrobat PDFMaker 10.1 for Word', 'ModDate': '1424880524', 'url': 'https://www.gilderlehrman.org/sites/default/files/inline-pdfs/king.dreamspeech.excerpts.pdf', 'speech': 'I-have-a-dream', 'author': 'Dr. King', 'title': 'Martin Luther King Jr.pdf'}),\n",
+       " Document(page_content='Let freedom ring from the curvaceous slopes of\\nCalifornia. But not only that. Let freedom ring from Stone Mountain of Georgia. Let freedom ring from Lookout\\nMountain of Tennessee. Let freedom ring from every hill and molehill of Mississippi, from every\\nmountain side. Let freedom ring . . .\\nWhen we allow freedom to ring—when we let it ring from every city and every hamlet, from every state\\nand every city, we will be able to speed up that day when all of God’s children, black men and white\\nmen, Jews and Gentiles, Protestants and Catholics, will be able to join hands and sing in the words of the\\nold Negro spiritual, “Free at last, Free at last, Great God a-mighty, We are free at last.”', metadata={'lang': 'eng', 'section': '3', 'offset': '1842', 'len': '52', 'CreationDate': '1424880481', 'Producer': 'Adobe PDF Library 10.0', 'Author': 'Sasha Rolon-Pereira', 'Title': 'Martin Luther King Jr.pdf', 'Creator': 'Acrobat PDFMaker 10.1 for Word', 'ModDate': '1424880524', 'url': 'https://www.gilderlehrman.org/sites/default/files/inline-pdfs/king.dreamspeech.excerpts.pdf', 'speech': 'I-have-a-dream', 'author': 'Dr. King', 'title': 'Martin Luther King Jr.pdf'}),\n",
+       " Document(page_content='Let freedom ring from the mighty\\nmountains of New York. Let freedom ring from the heightening Alleghenies of Pennsylvania. Let\\nfreedom ring from the snowcapped Rockies of Colorado. Let freedom ring from the curvaceous slopes of\\nCalifornia. But not only that. Let freedom ring from Stone Mountain of Georgia.', metadata={'lang': 'eng', 'section': '3', 'offset': '1657', 'len': '57', 'CreationDate': '1424880481', 'Producer': 'Adobe PDF Library 10.0', 'Author': 'Sasha Rolon-Pereira', 'Title': 'Martin Luther King Jr.pdf', 'Creator': 'Acrobat PDFMaker 10.1 for Word', 'ModDate': '1424880524', 'url': 'https://www.gilderlehrman.org/sites/default/files/inline-pdfs/king.dreamspeech.excerpts.pdf', 'speech': 'I-have-a-dream', 'author': 'Dr. King', 'title': 'Martin Luther King Jr.pdf'})]"
+      ]
+     },
+     "execution_count": 17,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "retriever.get_relevant_documents(\"what did Dr. King say about the freedom?\")"
+   ]
+  },
  {
   "cell_type": "code",
   "execution_count": null,
-   "id": "2300e785",
+   "id": "f6d17e90",
   "metadata": {},
   "outputs": [],
   "source": []