mirror of
https://github.com/hwchase17/langchain.git
synced 2025-06-22 14:49:29 +00:00
docs: Add doc for Vectorize provider (#30436)
This pull request adds documentation and a tutorial for integrating the [Vectorize](https://vectorize.io/) service with LangChain. The most important changes include adding a new documentation page for Vectorize and creating a Jupyter notebook that demonstrates how to use the Vectorize retriever. The source code for the langchain-vectorize package can be found [here](https://github.com/vectorize-io/integrations-python/tree/main/langchain). Previews: * https://langchain-git-fork-cbornet-vectorize-langchain.vercel.app/docs/integrations/providers/vectorize/ * https://langchain-git-fork-cbornet-vectorize-langchain.vercel.app/docs/integrations/retrievers/vectorize/ Documentation updates: * [`docs/docs/integrations/providers/vectorize.mdx`](diffhunk://#diff-7e00d4ce4768f73b4d381a7c7b1f94d138f1b27ebd08e3666b942630a0285606R1-R40): Added a new documentation page for Vectorize, including an overview of its features, installation instructions, and a basic usage example. Tutorial updates: * [`docs/docs/integrations/retrievers/vectorize.ipynb`](diffhunk://#diff-ba5bb9a1b4586db7740944b001bcfeadc88be357640ded0c82a329b11d8d6e29R1-R294): Created a Jupyter notebook tutorial that shows how to set up the Vectorize environment, create a RAG pipeline, and use the LangChain Vectorize retriever. The notebook includes steps for account creation, token generation, environment setup, and pipeline deployment.
This commit is contained in:
parent
6f8735592b
commit
86beb64b50
40
docs/docs/integrations/providers/vectorize.mdx
Normal file
40
docs/docs/integrations/providers/vectorize.mdx
Normal file
@ -0,0 +1,40 @@
|
||||
# Vectorize
|
||||
|
||||
> [Vectorize](https://vectorize.io/) helps you build AI apps faster and with less hassle.
|
||||
> It automates data extraction, finds the best vectorization strategy using RAG evaluation,
|
||||
> and lets you quickly deploy real-time RAG pipelines for your unstructured data.
|
||||
> Your vector search indexes stay up-to-date, and it integrates with your existing vector database,
|
||||
> so you maintain full control of your data.
|
||||
> Vectorize handles the heavy lifting, freeing you to focus on building robust AI solutions without getting bogged down by data management.
|
||||
|
||||
# Installation and Setup
|
||||
|
||||
Install the following Python package:
|
||||
```bash
|
||||
pip install langchain-vectorize
|
||||
```
|
||||
|
||||
Sign up for a free Vectorize account [here](https://platform.vectorize.io/)
|
||||
Generate an access token in the [Access Token](https://docs.vectorize.io/rag-pipelines/retrieval-endpoint#access-tokens) section
|
||||
Gather your organization ID. From the browser url, extract the UUID from the URL after /organization/
|
||||
|
||||
Set up the following variables:
|
||||
```python
|
||||
VECTORIZE_ORG_ID="your-organization-id"
|
||||
VECTORIZE_API_TOKEN="your-api-token"
|
||||
```
|
||||
|
||||
## Retriever
|
||||
|
||||
```python
|
||||
from langchain_vectorize import VectorizeRetriever
|
||||
|
||||
retriever = VectorizeRetriever(
|
||||
api_token=VECTORIZE_API_TOKEN,
|
||||
organization=VECTORIZE_ORG_ID,
|
||||
pipeline_id="...",
|
||||
)
|
||||
retriever.invoke("query")
|
||||
```
|
||||
|
||||
Learn more in the [example notebook](/docs/integrations/retrievers/vectorize).
|
425
docs/docs/integrations/retrievers/vectorize.ipynb
Normal file
425
docs/docs/integrations/retrievers/vectorize.ipynb
Normal file
@ -0,0 +1,425 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"metadata": {},
|
||||
"cell_type": "raw",
|
||||
"source": [
|
||||
"---\n",
|
||||
"sidebar_label: Vectorize\n",
|
||||
"---"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "zvHrM3wa7IE1"
|
||||
},
|
||||
"source": [
|
||||
"# VectorizeRetriever\n",
|
||||
"\n",
|
||||
"This notebook shows how to use the LangChain Vectorize retriever.\n",
|
||||
"\n",
|
||||
"> [Vectorize](https://vectorize.io/) helps you build AI apps faster and with less hassle.\n",
|
||||
"> It automates data extraction, finds the best vectorization strategy using RAG evaluation,\n",
|
||||
"> and lets you quickly deploy real-time RAG pipelines for your unstructured data.\n",
|
||||
"> Your vector search indexes stay up-to-date, and it integrates with your existing vector database,\n",
|
||||
"> so you maintain full control of your data.\n",
|
||||
"> Vectorize handles the heavy lifting, freeing you to focus on building robust AI solutions without getting bogged down by data management.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "r-RswOO5o4K_"
|
||||
},
|
||||
"source": [
|
||||
"## Setup\n",
|
||||
"\n",
|
||||
"In the following steps, we'll setup the Vectorize environment and create a RAG pipeline.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "FhvmvFKh4Rlh"
|
||||
},
|
||||
"source": [
|
||||
"### Create a Vectorize Account & Get Your Access Token\n",
|
||||
"\n",
|
||||
"Sign up for a free Vectorize account [here](https://platform.vectorize.io/)\n",
|
||||
"Generate an access token in the [Access Token](https://docs.vectorize.io/rag-pipelines/retrieval-endpoint#access-tokens) section\n",
|
||||
"Gather your organization ID. From the browser url, extract the UUID from the URL after /organization/"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "L2SULMfWpWFX"
|
||||
},
|
||||
"source": [
|
||||
"### Configure token and organization ID\n",
|
||||
"\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"id": "BnF8KoDZpg2O"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import getpass\n",
|
||||
"\n",
|
||||
"VECTORIZE_ORG_ID = getpass.getpass(\"Enter Vectorize organization ID: \")\n",
|
||||
"VECTORIZE_API_TOKEN = getpass.getpass(\"Enter Vectorize API Token: \")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"metadata": {
|
||||
"id": "JdZ5vlzjoDVr"
|
||||
},
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"### Installation\n",
|
||||
"\n",
|
||||
"This retriever lives in the `langchain-vectorize` package:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"metadata": {
|
||||
"id": "IJFmtvDLn5R3"
|
||||
},
|
||||
"cell_type": "code",
|
||||
"outputs": [],
|
||||
"execution_count": null,
|
||||
"source": "!pip install -qU langchain-vectorize"
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "Oj10Moznpz67"
|
||||
},
|
||||
"source": "### Download a PDF file"
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"id": "eLbbTPytrgNw"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": "!wget \"https://raw.githubusercontent.com/vectorize-io/vectorize-clients/refs/tags/python-0.1.3/tests/python/tests/research.pdf\""
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "7g54J6awtshs"
|
||||
},
|
||||
"source": "### Initialize the vectorize client"
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"id": "9Fr4yz5CrFWP"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import vectorize_client as v\n",
|
||||
"\n",
|
||||
"api = v.ApiClient(v.Configuration(access_token=VECTORIZE_API_TOKEN))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "wPDoeqETxJrS"
|
||||
},
|
||||
"source": "### Create a File Upload Source Connector"
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"id": "9yEARIcFue5N"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import json\n",
|
||||
"import os\n",
|
||||
"\n",
|
||||
"import urllib3\n",
|
||||
"\n",
|
||||
"connectors_api = v.ConnectorsApi(api)\n",
|
||||
"response = connectors_api.create_source_connector(\n",
|
||||
" VECTORIZE_ORG_ID, [{\"type\": \"FILE_UPLOAD\", \"name\": \"From API\"}]\n",
|
||||
")\n",
|
||||
"source_connector_id = response.connectors[0].id"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "yU3lS6dpxZnQ"
|
||||
},
|
||||
"source": "### Upload the PDF file"
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"id": "OIiMIZ8ZxUYF"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"file_path = \"research.pdf\"\n",
|
||||
"\n",
|
||||
"http = urllib3.PoolManager()\n",
|
||||
"uploads_api = v.UploadsApi(api)\n",
|
||||
"metadata = {\"created-from-api\": True}\n",
|
||||
"\n",
|
||||
"upload_response = uploads_api.start_file_upload_to_connector(\n",
|
||||
" VECTORIZE_ORG_ID,\n",
|
||||
" source_connector_id,\n",
|
||||
" v.StartFileUploadToConnectorRequest(\n",
|
||||
" name=file_path.split(\"/\")[-1],\n",
|
||||
" content_type=\"application/pdf\",\n",
|
||||
" # add additional metadata that will be stored along with each chunk in the vector database\n",
|
||||
" metadata=json.dumps(metadata),\n",
|
||||
" ),\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"with open(file_path, \"rb\") as f:\n",
|
||||
" response = http.request(\n",
|
||||
" \"PUT\",\n",
|
||||
" upload_response.upload_url,\n",
|
||||
" body=f,\n",
|
||||
" headers={\n",
|
||||
" \"Content-Type\": \"application/pdf\",\n",
|
||||
" \"Content-Length\": str(os.path.getsize(file_path)),\n",
|
||||
" },\n",
|
||||
" )\n",
|
||||
"\n",
|
||||
"if response.status != 200:\n",
|
||||
" print(\"Upload failed: \", response.data)\n",
|
||||
"else:\n",
|
||||
" print(\"Upload successful\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "PdJJfOhfxiIo"
|
||||
},
|
||||
"source": "### Connect to the AI Platform and Vector Database"
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"id": "0ZSGhXJfxjBb"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"ai_platforms = connectors_api.get_ai_platform_connectors(VECTORIZE_ORG_ID)\n",
|
||||
"builtin_ai_platform = [\n",
|
||||
" c.id for c in ai_platforms.ai_platform_connectors if c.type == \"VECTORIZE\"\n",
|
||||
"][0]\n",
|
||||
"\n",
|
||||
"vector_databases = connectors_api.get_destination_connectors(VECTORIZE_ORG_ID)\n",
|
||||
"builtin_vector_db = [\n",
|
||||
" c.id for c in vector_databases.destination_connectors if c.type == \"VECTORIZE\"\n",
|
||||
"][0]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "JWoL-kqQxs5H"
|
||||
},
|
||||
"source": "### Configure and Deploy the Pipeline"
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"id": "hze9vJbQxvqA"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"pipelines = v.PipelinesApi(api)\n",
|
||||
"response = pipelines.create_pipeline(\n",
|
||||
" VECTORIZE_ORG_ID,\n",
|
||||
" v.PipelineConfigurationSchema(\n",
|
||||
" source_connectors=[\n",
|
||||
" v.SourceConnectorSchema(\n",
|
||||
" id=source_connector_id, type=\"FILE_UPLOAD\", config={}\n",
|
||||
" )\n",
|
||||
" ],\n",
|
||||
" destination_connector=v.DestinationConnectorSchema(\n",
|
||||
" id=builtin_vector_db, type=\"VECTORIZE\", config={}\n",
|
||||
" ),\n",
|
||||
" ai_platform=v.AIPlatformSchema(\n",
|
||||
" id=builtin_ai_platform, type=\"VECTORIZE\", config={}\n",
|
||||
" ),\n",
|
||||
" pipeline_name=\"My Pipeline From API\",\n",
|
||||
" schedule=v.ScheduleSchema(type=\"manual\"),\n",
|
||||
" ),\n",
|
||||
")\n",
|
||||
"pipeline_id = response.data.id"
|
||||
]
|
||||
},
|
||||
{
|
||||
"metadata": {},
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"### Configure tracing (optional)\n",
|
||||
"\n",
|
||||
"If you want to get automated tracing from individual queries, you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"metadata": {},
|
||||
"cell_type": "code",
|
||||
"outputs": [],
|
||||
"execution_count": null,
|
||||
"source": [
|
||||
"# os.environ[\"LANGSMITH_API_KEY\"] = getpass.getpass(\"Enter your LangSmith API key: \")\n",
|
||||
"# os.environ[\"LANGSMITH_TRACING\"] = \"true\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "5ULion9wyj6T"
|
||||
},
|
||||
"source": "## Instantiation"
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"id": "9D-QfiW7yoe0"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain_vectorize.retrievers import VectorizeRetriever\n",
|
||||
"\n",
|
||||
"retriever = VectorizeRetriever(\n",
|
||||
" api_token=VECTORIZE_API_TOKEN,\n",
|
||||
" organization=VECTORIZE_ORG_ID,\n",
|
||||
" pipeline_id=pipeline_id,\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"metadata": {},
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"## Usage\n",
|
||||
"\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"metadata": {},
|
||||
"cell_type": "code",
|
||||
"outputs": [],
|
||||
"execution_count": null,
|
||||
"source": [
|
||||
"query = \"Apple Shareholders equity\"\n",
|
||||
"retriever.invoke(query, num_results=2)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"metadata": {},
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"## Use within a chain\n",
|
||||
"\n",
|
||||
"Like other retrievers, VectorizeRetriever can be incorporated into LLM applications via [chains](/docs/how_to/sequence/).\n",
|
||||
"\n",
|
||||
"We will need a LLM or chat model:\n",
|
||||
"\n",
|
||||
"import ChatModelTabs from \"@theme/ChatModelTabs\";\n",
|
||||
"\n",
|
||||
"<ChatModelTabs customVarName=\"llm\" />"
|
||||
]
|
||||
},
|
||||
{
|
||||
"metadata": {},
|
||||
"cell_type": "code",
|
||||
"outputs": [],
|
||||
"execution_count": null,
|
||||
"source": [
|
||||
"# | output: false\n",
|
||||
"# | echo: false\n",
|
||||
"\n",
|
||||
"from langchain_openai import ChatOpenAI\n",
|
||||
"\n",
|
||||
"llm = ChatOpenAI(model=\"gpt-3.5-turbo-0125\", temperature=0)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"metadata": {},
|
||||
"cell_type": "code",
|
||||
"outputs": [],
|
||||
"execution_count": null,
|
||||
"source": [
|
||||
"from langchain_core.output_parsers import StrOutputParser\n",
|
||||
"from langchain_core.prompts import ChatPromptTemplate\n",
|
||||
"from langchain_core.runnables import RunnablePassthrough\n",
|
||||
"\n",
|
||||
"prompt = ChatPromptTemplate.from_template(\n",
|
||||
" \"\"\"Answer the question based only on the context provided.\n",
|
||||
"\n",
|
||||
"Context: {context}\n",
|
||||
"\n",
|
||||
"Question: {question}\"\"\"\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"def format_docs(docs):\n",
|
||||
" return \"\\n\\n\".join(doc.page_content for doc in docs)\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"chain = (\n",
|
||||
" {\"context\": retriever | format_docs, \"question\": RunnablePassthrough()}\n",
|
||||
" | prompt\n",
|
||||
" | llm\n",
|
||||
" | StrOutputParser()\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"metadata": {},
|
||||
"cell_type": "code",
|
||||
"outputs": [],
|
||||
"execution_count": null,
|
||||
"source": "chain.invoke(\"...\")"
|
||||
},
|
||||
{
|
||||
"metadata": {},
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"## API reference\n",
|
||||
"\n",
|
||||
"For detailed documentation of all VectorizeRetriever features and configurations head to the [API reference](https://python.langchain.com/api_reference/vectorize/langchain_vectorize.retrievers.VectorizeRetriever.html)."
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"colab": {
|
||||
"provenance": []
|
||||
},
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"name": "python"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 0
|
||||
}
|
Loading…
Reference in New Issue
Block a user