mirror of
https://github.com/hwchase17/langchain.git
synced 2025-06-19 13:23:35 +00:00
docs: updated the docs for vectara (#30398)
Thank you for contributing to LangChain! **PR title**: Docs Update for vectara **Description:** Vectara is moved as langchain partner package and updating the docs according to that.
This commit is contained in:
parent
f68eaab44f
commit
56629ed87b
@ -5,21 +5,38 @@
|
||||
"id": "134a0785",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Vectara Chat\n",
|
||||
"## Overview\n",
|
||||
"\n",
|
||||
"[Vectara](https://vectara.com/) is the trusted AI Assistant and Agent platform which focuses on enterprise readiness for mission-critical applications.\n",
|
||||
"\n",
|
||||
"Vectara serverless RAG-as-a-service provides all the components of RAG behind an easy-to-use API, including:\n",
|
||||
"1. A way to extract text from files (PDF, PPT, DOCX, etc)\n",
|
||||
"2. ML-based chunking that provides state of the art performance.\n",
|
||||
"3. The [Boomerang](https://vectara.com/how-boomerang-takes-retrieval-augmented-generation-to-the-next-level-via-grounded-generation/) embeddings model.\n",
|
||||
"4. Its own internal vector database where text chunks and embedding vectors are stored.\n",
|
||||
"5. A query service that automatically encodes the query into embedding, and retrieves the most relevant text segments (including support for [Hybrid Search](https://docs.vectara.com/docs/api-reference/search-apis/lexical-matching) as well as multiple reranking options such as the [multi-lingual relevance reranker](https://www.vectara.com/blog/deep-dive-into-vectara-multilingual-reranker-v1-state-of-the-art-reranker-across-100-languages), [MMR](https://vectara.com/get-diverse-results-and-comprehensive-summaries-with-vectaras-mmr-reranker/), [UDF reranker](https://www.vectara.com/blog/rag-with-user-defined-functions-based-reranking). \n",
|
||||
"5. A query service that automatically encodes the query into embedding, and retrieves the most relevant text segments, including support for [Hybrid Search](https://docs.vectara.com/docs/api-reference/search-apis/lexical-matching) as well as multiple reranking options such as the [multi-lingual relevance reranker](https://www.vectara.com/blog/deep-dive-into-vectara-multilingual-reranker-v1-state-of-the-art-reranker-across-100-languages), [MMR](https://vectara.com/get-diverse-results-and-comprehensive-summaries-with-vectaras-mmr-reranker/), [UDF reranker](https://www.vectara.com/blog/rag-with-user-defined-functions-based-reranking). \n",
|
||||
"6. An LLM to for creating a [generative summary](https://docs.vectara.com/docs/learn/grounded-generation/grounded-generation-overview), based on the retrieved documents (context), including citations.\n",
|
||||
"\n",
|
||||
"See the [Vectara API documentation](https://docs.vectara.com/docs/) for more information on how to use the API.\n",
|
||||
"For more information:\n",
|
||||
"- [Documentation](https://docs.vectara.com/docs/)\n",
|
||||
"- [API Playground](https://docs.vectara.com/docs/rest-api/)\n",
|
||||
"- [Quickstart](https://docs.vectara.com/docs/quickstart)\n",
|
||||
"\n",
|
||||
"This notebook shows how to use Vectara's [Chat](https://docs.vectara.com/docs/api-reference/chat-apis/chat-apis-overview) functionality, which provides automatic storage of conversation history and ensures follow up questions consider that history."
|
||||
"\n",
|
||||
"This notebook shows how to use Vectara's [Chat](https://docs.vectara.com/docs/api-reference/chat-apis/chat-apis-overview) functionality, which provides automatic storage of conversation history and ensures follow up questions consider that history.\n",
|
||||
"\n",
|
||||
"### Setup\n",
|
||||
"\n",
|
||||
"To use the `VectaraVectorStore` you first need to install the partner package.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "b4a2f525-4805-4880-8bfa-18fe6f1cd1c7",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"!uv pip install -U pip && uv pip install -qU langchain-vectara"
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -27,17 +44,19 @@
|
||||
"id": "56372c5b",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Getting Started\n",
|
||||
"## Getting Started\n",
|
||||
"\n",
|
||||
"To get started, use the following steps:\n",
|
||||
"1. If you don't already have one, [Sign up](https://www.vectara.com/integrations/langchain) for your free Vectara trial. Once you have completed your sign up you will have a Vectara customer ID. You can find your customer ID by clicking on your name, on the top-right of the Vectara console window.\n",
|
||||
"1. If you don't already have one, [Sign up](https://www.vectara.com/integrations/langchain) for your free Vectara trial.\n",
|
||||
"2. Within your account you can create one or more corpora. Each corpus represents an area that stores text data upon ingest from input documents. To create a corpus, use the **\"Create Corpus\"** button. You then provide a name to your corpus as well as a description. Optionally you can define filtering attributes and apply some advanced options. If you click on your created corpus, you can see its name and corpus ID right on the top.\n",
|
||||
"3. Next you'll need to create API keys to access the corpus. Click on the **\"Access Control\"** tab in the corpus view and then the **\"Create API Key\"** button. Give your key a name, and choose whether you want query-only or query+index for your key. Click \"Create\" and you now have an active API key. Keep this key confidential. \n",
|
||||
"\n",
|
||||
"To use LangChain with Vectara, you'll need to have these three values: `customer ID`, `corpus ID` and `api_key`.\n",
|
||||
"You can provide those to LangChain in two ways:\n",
|
||||
"To use LangChain with Vectara, you'll need to have these two values: `corpus_key` and `api_key`.\n",
|
||||
"You can provide `VECTARA_API_KEY` to LangChain in two ways:\n",
|
||||
"\n",
|
||||
"1. Include in your environment these three variables: `VECTARA_CUSTOMER_ID`, `VECTARA_CORPUS_ID` and `VECTARA_API_KEY`.\n",
|
||||
"## Instantiation\n",
|
||||
"\n",
|
||||
"1. Include in your environment these two variables: `VECTARA_API_KEY`.\n",
|
||||
"\n",
|
||||
" For example, you can set these variables using os.environ and getpass as follows:\n",
|
||||
"\n",
|
||||
@ -45,8 +64,6 @@
|
||||
"import os\n",
|
||||
"import getpass\n",
|
||||
"\n",
|
||||
"os.environ[\"VECTARA_CUSTOMER_ID\"] = getpass.getpass(\"Vectara Customer ID:\")\n",
|
||||
"os.environ[\"VECTARA_CORPUS_ID\"] = getpass.getpass(\"Vectara Corpus ID:\")\n",
|
||||
"os.environ[\"VECTARA_API_KEY\"] = getpass.getpass(\"Vectara API Key:\")\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
@ -54,17 +71,16 @@
|
||||
"\n",
|
||||
"```python\n",
|
||||
"vectara = Vectara(\n",
|
||||
" vectara_customer_id=vectara_customer_id,\n",
|
||||
" vectara_corpus_id=vectara_corpus_id,\n",
|
||||
" vectara_api_key=vectara_api_key\n",
|
||||
" )\n",
|
||||
" vectara_api_key=vectara_api_key\n",
|
||||
")\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"In this notebook we assume they are provided in the environment."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"execution_count": 2,
|
||||
"id": "70c4e529",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
@ -73,14 +89,15 @@
|
||||
"source": [
|
||||
"import os\n",
|
||||
"\n",
|
||||
"os.environ[\"VECTARA_API_KEY\"] = \"<YOUR_VECTARA_API_KEY>\"\n",
|
||||
"os.environ[\"VECTARA_CORPUS_ID\"] = \"<YOUR_VECTARA_CORPUS_ID>\"\n",
|
||||
"os.environ[\"VECTARA_CUSTOMER_ID\"] = \"<YOUR_VECTARA_CUSTOMER_ID>\"\n",
|
||||
"os.environ[\"VECTARA_API_KEY\"] = \"<VECTARA_API_KEY>\"\n",
|
||||
"os.environ[\"VECTARA_CORPUS_KEY\"] = \"<VECTARA_CORPUS_KEY>\"\n",
|
||||
"\n",
|
||||
"from langchain_community.vectorstores import Vectara\n",
|
||||
"from langchain_community.vectorstores.vectara import (\n",
|
||||
" RerankConfig,\n",
|
||||
" SummaryConfig,\n",
|
||||
"from langchain_vectara import Vectara\n",
|
||||
"from langchain_vectara.vectorstores import (\n",
|
||||
" CorpusConfig,\n",
|
||||
" GenerationConfig,\n",
|
||||
" MmrReranker,\n",
|
||||
" SearchConfig,\n",
|
||||
" VectaraQueryConfig,\n",
|
||||
")"
|
||||
]
|
||||
@ -101,7 +118,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"execution_count": 3,
|
||||
"id": "01c46e92",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
@ -110,10 +127,11 @@
|
||||
"source": [
|
||||
"from langchain_community.document_loaders import TextLoader\n",
|
||||
"\n",
|
||||
"loader = TextLoader(\"state_of_the_union.txt\")\n",
|
||||
"loader = TextLoader(\"../document_loaders/example_data/state_of_the_union.txt\")\n",
|
||||
"documents = loader.load()\n",
|
||||
"\n",
|
||||
"vectara = Vectara.from_documents(documents, embedding=None)"
|
||||
"corpus_key = os.getenv(\"VECTARA_CORPUS_KEY\")\n",
|
||||
"vectara = Vectara.from_documents(documents, embedding=None, corpus_key=corpus_key)"
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -126,18 +144,29 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"execution_count": 4,
|
||||
"id": "1b41a10b-bf68-4689-8f00-9aed7675e2ab",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"summary_config = SummaryConfig(is_enabled=True, max_results=7, response_lang=\"eng\")\n",
|
||||
"rerank_config = RerankConfig(reranker=\"mmr\", rerank_k=50, mmr_diversity_bias=0.2)\n",
|
||||
"config = VectaraQueryConfig(\n",
|
||||
" k=10, lambda_val=0.005, rerank_config=rerank_config, summary_config=summary_config\n",
|
||||
"generation_config = GenerationConfig(\n",
|
||||
" max_used_search_results=7,\n",
|
||||
" response_language=\"eng\",\n",
|
||||
" generation_preset_name=\"vectara-summary-ext-24-05-med-omni\",\n",
|
||||
" enable_factual_consistency_score=True,\n",
|
||||
")\n",
|
||||
"search_config = SearchConfig(\n",
|
||||
" corpora=[CorpusConfig(corpus_key=corpus_key, limit=25)],\n",
|
||||
" reranker=MmrReranker(diversity_bias=0.2),\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"config = VectaraQueryConfig(\n",
|
||||
" search=search_config,\n",
|
||||
" generation=generation_config,\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"bot = vectara.as_chat(config)"
|
||||
]
|
||||
@ -147,12 +176,15 @@
|
||||
"id": "83f38c18-ac82-45f4-a79e-8b37ce1ae115",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"\n",
|
||||
"## Invocation\n",
|
||||
"\n",
|
||||
"Here's an example of asking a question with no chat history"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"execution_count": 5,
|
||||
"id": "bc672290-8a8b-4828-a90c-f1bbdd6b3920",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
@ -161,10 +193,10 @@
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'The President expressed gratitude to Justice Breyer and highlighted the significance of nominating Ketanji Brown Jackson to the Supreme Court, praising her legal expertise and commitment to upholding excellence [1]. The President also reassured the public about the situation with gas prices and the conflict in Ukraine, emphasizing unity with allies and the belief that the world will emerge stronger from these challenges [2][4]. Additionally, the President shared personal experiences related to economic struggles and the importance of passing the American Rescue Plan to support those in need [3]. The focus was also on job creation and economic growth, acknowledging the impact of inflation on families [5]. While addressing cancer as a significant issue, the President discussed plans to enhance cancer research and support for patients and families [7].'"
|
||||
"'The president stated that nominating someone to serve on the United States Supreme Court is one of the most serious constitutional responsibilities. He nominated Circuit Court of Appeals Judge Ketanji Brown Jackson, describing her as one of the nation’s top legal minds who will continue Justice Breyer’s legacy of excellence and noting her experience as a former top litigator in private practice [1].'"
|
||||
]
|
||||
},
|
||||
"execution_count": 4,
|
||||
"execution_count": 5,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
@ -183,7 +215,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"execution_count": 6,
|
||||
"id": "9c95460b-7116-4155-a9d2-c0fb027ee592",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
@ -192,10 +224,10 @@
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"\"In his remarks, the President specified that Ketanji Brown Jackson is succeeding Justice Breyer on the United States Supreme Court[1]. The President praised Jackson as a top legal mind who will continue Justice Breyer's legacy of excellence. The nomination of Jackson was highlighted as a significant constitutional responsibility of the President[1]. The President emphasized the importance of this nomination and the qualities that Jackson brings to the role. The focus was on the transition from Justice Breyer to Judge Ketanji Brown Jackson on the Supreme Court[1].\""
|
||||
"'Yes, the president mentioned that Ketanji Brown Jackson succeeded Justice Breyer [1].'"
|
||||
]
|
||||
},
|
||||
"execution_count": 5,
|
||||
"execution_count": 6,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
@ -217,7 +249,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"execution_count": 7,
|
||||
"id": "936dc62f",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
@ -227,14 +259,14 @@
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Judge Ketanji Brown Jackson is a nominee for the United States Supreme Court, known for her legal expertise and experience as a former litigator. She is praised for her potential to continue the legacy of excellence on the Court[1]. While the search results provide information on various topics like innovation, economic growth, and healthcare initiatives, they do not directly address Judge Ketanji Brown Jackson's specific accomplishments. Therefore, I do not have enough information to answer this question."
|
||||
"The president acknowledged the significant impact of COVID-19 on the nation, expressing understanding of the public's fatigue and frustration. He emphasized the need to view COVID-19 not as a partisan issue but as a serious disease, urging unity among Americans. The president highlighted the progress made, noting that severe cases have decreased significantly, and mentioned new CDC guidelines allowing most Americans to be mask-free. He also pointed out the efforts to vaccinate the nation and provide economic relief, and the ongoing commitment to vaccinate the world [2], [3], [5]."
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"output = {}\n",
|
||||
"curr_key = None\n",
|
||||
"for chunk in bot.stream(\"what about her accopmlishments?\"):\n",
|
||||
"for chunk in bot.stream(\"what did he said about the covid?\"):\n",
|
||||
" for key in chunk:\n",
|
||||
" if key not in output:\n",
|
||||
" output[key] = chunk[key]\n",
|
||||
@ -244,6 +276,83 @@
|
||||
" print(chunk[key], end=\"\", flush=True)\n",
|
||||
" curr_key = key"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "cefdf72b1d90085a",
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"source": [
|
||||
"## Chaining\n",
|
||||
"\n",
|
||||
"For additional capabilities you can use chaining."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 33,
|
||||
"id": "167bc806-395e-46bf-80cc-3c5d43164f42",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"So, the president talked about how the COVID-19 sickness has affected a lot of people in the country. He said that it's important for everyone to work together to fight the sickness, no matter what political party they are in. The president also mentioned that they are working hard to give vaccines to people to help protect them from getting sick. They are also giving money and help to people who need it, like food, housing, and cheaper health insurance. The president also said that they are sending vaccines to many other countries to help people all around the world stay healthy.\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain_core.output_parsers import StrOutputParser\n",
|
||||
"from langchain_core.prompts import ChatPromptTemplate\n",
|
||||
"from langchain_openai.chat_models import ChatOpenAI\n",
|
||||
"\n",
|
||||
"llm = ChatOpenAI(temperature=0)\n",
|
||||
"\n",
|
||||
"prompt = ChatPromptTemplate.from_messages(\n",
|
||||
" [\n",
|
||||
" (\n",
|
||||
" \"system\",\n",
|
||||
" \"You are a helpful assistant that explains the stuff to a five year old. Vectara is providing the answer.\",\n",
|
||||
" ),\n",
|
||||
" (\"human\", \"{vectara_response}\"),\n",
|
||||
" ]\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"def get_vectara_response(question: dict) -> str:\n",
|
||||
" \"\"\"\n",
|
||||
" Calls Vectara as_chat and returns the answer string. This encapsulates\n",
|
||||
" the Vectara call.\n",
|
||||
" \"\"\"\n",
|
||||
" try:\n",
|
||||
" response = bot.invoke(question[\"question\"])\n",
|
||||
" return response[\"answer\"]\n",
|
||||
" except Exception as e:\n",
|
||||
" return \"I'm sorry, I couldn't get an answer from Vectara.\"\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"# Create the chain\n",
|
||||
"chain = get_vectara_response | prompt | llm | StrOutputParser()\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"# Invoke the chain\n",
|
||||
"result = chain.invoke({\"question\": \"what did he say about the covid?\"})\n",
|
||||
"print(result)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "3b8bb761-db4a-436c-8939-41e9f8652083",
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"source": [
|
||||
"## API reference\n",
|
||||
"\n",
|
||||
"You can look at the [Chat](https://docs.vectara.com/docs/api-reference/chat-apis/chat-apis-overview) documentation for the details."
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
@ -262,7 +371,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.8"
|
||||
"version": "3.12.0"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
348
docs/docs/integrations/providers/vectara.ipynb
Normal file
348
docs/docs/integrations/providers/vectara.ipynb
Normal file
@ -0,0 +1,348 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "559f8e0e",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Vectara\n",
|
||||
"\n",
|
||||
"[Vectara](https://vectara.com/) is the trusted AI Assistant and Agent platform which focuses on enterprise readiness for mission-critical applications.\n",
|
||||
"Vectara serverless RAG-as-a-service provides all the components of RAG behind an easy-to-use API, including:\n",
|
||||
"1. A way to extract text from files (PDF, PPT, DOCX, etc)\n",
|
||||
"2. ML-based chunking that provides state of the art performance.\n",
|
||||
"3. The [Boomerang](https://vectara.com/how-boomerang-takes-retrieval-augmented-generation-to-the-next-level-via-grounded-generation/) embeddings model.\n",
|
||||
"4. Its own internal vector database where text chunks and embedding vectors are stored.\n",
|
||||
"5. A query service that automatically encodes the query into embedding, and retrieves the most relevant text segments, including support for [Hybrid Search](https://docs.vectara.com/docs/api-reference/search-apis/lexical-matching) as well as multiple reranking options such as the [multi-lingual relevance reranker](https://www.vectara.com/blog/deep-dive-into-vectara-multilingual-reranker-v1-state-of-the-art-reranker-across-100-languages), [MMR](https://vectara.com/get-diverse-results-and-comprehensive-summaries-with-vectaras-mmr-reranker/), [UDF reranker](https://www.vectara.com/blog/rag-with-user-defined-functions-based-reranking). \n",
|
||||
"6. An LLM to for creating a [generative summary](https://docs.vectara.com/docs/learn/grounded-generation/grounded-generation-overview), based on the retrieved documents (context), including citations.\n",
|
||||
"\n",
|
||||
"For more information:\n",
|
||||
"- [Documentation](https://docs.vectara.com/docs/)\n",
|
||||
"- [API Playground](https://docs.vectara.com/docs/rest-api/)\n",
|
||||
"- [Quickstart](https://docs.vectara.com/docs/quickstart)\n",
|
||||
"\n",
|
||||
"This notebook shows how to use the basic retrieval functionality, when utilizing Vectara just as a Vector Store (without summarization), incuding: `similarity_search` and `similarity_search_with_score` as well as using the LangChain `as_retriever` functionality.\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"## Setup\n",
|
||||
"\n",
|
||||
"To use the `VectaraVectorStore` you first need to install the partner package.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "dfdf03ba-d6f5-4b1e-86d3-a65c4bc99aa1",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"!uv pip install -U pip && uv pip install -qU langchain-vectara"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "e97dcf11",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Getting Started\n",
|
||||
"\n",
|
||||
"To get started, use the following steps:\n",
|
||||
"1. If you don't already have one, [Sign up](https://www.vectara.com/integrations/langchain) for your free Vectara trial.\n",
|
||||
"2. Within your account you can create one or more corpora. Each corpus represents an area that stores text data upon ingest from input documents. To create a corpus, use the **\"Create Corpus\"** button. You then provide a name to your corpus as well as a description. Optionally you can define filtering attributes and apply some advanced options. If you click on your created corpus, you can see its name and corpus ID right on the top.\n",
|
||||
"3. Next you'll need to create API keys to access the corpus. Click on the **\"Access Control\"** tab in the corpus view and then the **\"Create API Key\"** button. Give your key a name, and choose whether you want query-only or query+index for your key. Click \"Create\" and you now have an active API key. Keep this key confidential. \n",
|
||||
"\n",
|
||||
"To use LangChain with Vectara, you'll need to have these two values: `corpus_key` and `api_key`.\n",
|
||||
"You can provide `VECTARA_API_KEY` to LangChain in two ways:\n",
|
||||
"\n",
|
||||
"1. Include in your environment these two variables: `VECTARA_API_KEY`.\n",
|
||||
"\n",
|
||||
" For example, you can set these variables using os.environ and getpass as follows:\n",
|
||||
"\n",
|
||||
"```python\n",
|
||||
"import os\n",
|
||||
"import getpass\n",
|
||||
"\n",
|
||||
"os.environ[\"VECTARA_API_KEY\"] = getpass.getpass(\"Vectara API Key:\")\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"2. Add them to the `Vectara` vectorstore constructor:\n",
|
||||
"\n",
|
||||
"```python\n",
|
||||
"vectara = Vectara(\n",
|
||||
" vectara_api_key=vectara_api_key\n",
|
||||
")\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"In this notebook we assume they are provided in the environment."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"id": "aac7a9a6",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"\n",
|
||||
"os.environ[\"VECTARA_API_KEY\"] = \"<VECTARA_API_KEY>\"\n",
|
||||
"os.environ[\"VECTARA_CORPUS_KEY\"] = \"VECTARA_CORPUS_KEY\"\n",
|
||||
"\n",
|
||||
"from langchain_vectara import Vectara\n",
|
||||
"from langchain_vectara.vectorstores import (\n",
|
||||
" ChainReranker,\n",
|
||||
" CorpusConfig,\n",
|
||||
" CustomerSpecificReranker,\n",
|
||||
" File,\n",
|
||||
" GenerationConfig,\n",
|
||||
" MmrReranker,\n",
|
||||
" SearchConfig,\n",
|
||||
" VectaraQueryConfig,\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"vectara = Vectara(vectara_api_key=os.getenv(\"VECTARA_API_KEY\"))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "875ffb7e",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"First we load the state-of-the-union text into Vectara.\n",
|
||||
"\n",
|
||||
"Note that we use the add_files interface which does not require any local processing or chunking - Vectara receives the file content and performs all the necessary pre-processing, chunking and embedding of the file into its knowledge store.\n",
|
||||
"\n",
|
||||
"In this case it uses a .txt file but the same works for many other [file types](https://docs.vectara.com/docs/api-reference/indexing-apis/file-upload/file-upload-filetypes)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 12,
|
||||
"id": "be0a4973",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"['state_of_the_union.txt']"
|
||||
]
|
||||
},
|
||||
"execution_count": 12,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"corpus_key = os.getenv(\"VECTARA_CORPUS_KEY\")\n",
|
||||
"file_obj = File(\n",
|
||||
" file_path=\"../document_loaders/example_data/state_of_the_union.txt\",\n",
|
||||
" metadata={\"source\": \"text_file\"},\n",
|
||||
")\n",
|
||||
"vectara.add_files([file_obj], corpus_key)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "22a6b953",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Vectara RAG (retrieval augmented generation)\n",
|
||||
"\n",
|
||||
"We now create a `VectaraQueryConfig` object to control the retrieval and summarization options:\n",
|
||||
"* We enable summarization, specifying we would like the LLM to pick the top 7 matching chunks and respond in English\n",
|
||||
"\n",
|
||||
"Using this configuration, let's create a LangChain `Runnable` object that encpasulates the full Vectara RAG pipeline, using the `as_rag` method:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 13,
|
||||
"id": "9ecda054-96a8-4a91-aeae-32006efb1ac8",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"\"President Biden discussed several key issues in his recent statements. He emphasized the importance of keeping schools open and noted that with a high vaccination rate and reduced hospitalizations, most Americans can safely return to normal activities without masks [1]. He addressed the need to hold social media platforms accountable for their impact on children and called for stronger privacy protections and mental health services [2]. Biden also announced measures against Russia, including preventing its central bank from defending the Ruble and targeting Russian oligarchs' assets, as part of efforts to weaken Russia's economy and military [3]. Additionally, he highlighted the importance of protecting women's rights, specifically the right to choose as affirmed in Roe v. Wade [5]. Lastly, he advocated for funding the police with necessary resources and training to ensure community safety [6].\""
|
||||
]
|
||||
},
|
||||
"execution_count": 13,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"generation_config = GenerationConfig(\n",
|
||||
" max_used_search_results=7,\n",
|
||||
" response_language=\"eng\",\n",
|
||||
" generation_preset_name=\"vectara-summary-ext-24-05-med-omni\",\n",
|
||||
" enable_factual_consistency_score=True,\n",
|
||||
")\n",
|
||||
"search_config = SearchConfig(\n",
|
||||
" corpora=[CorpusConfig(corpus_key=corpus_key)],\n",
|
||||
" limit=25,\n",
|
||||
" reranker=ChainReranker(\n",
|
||||
" rerankers=[\n",
|
||||
" CustomerSpecificReranker(reranker_id=\"rnk_272725719\", limit=100),\n",
|
||||
" MmrReranker(diversity_bias=0.2, limit=100),\n",
|
||||
" ]\n",
|
||||
" ),\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"config = VectaraQueryConfig(\n",
|
||||
" search=search_config,\n",
|
||||
" generation=generation_config,\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"query_str = \"what did Biden say?\"\n",
|
||||
"\n",
|
||||
"rag = vectara.as_rag(config)\n",
|
||||
"rag.invoke(query_str)[\"answer\"]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "cd825d63-93a0-4e45-a455-bfabb01ee1a1",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We can also use the streaming interface like this:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 14,
|
||||
"id": "27f01330-8917-4eff-b603-59ab2571a4d2",
|
||||
"metadata": {
|
||||
"scrolled": true
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"President Biden emphasized several key points in his statements. He highlighted the importance of keeping schools open and noted that with a high vaccination rate and reduced hospitalizations, most Americans can safely return to normal activities without masks [1]. He addressed the need to hold social media platforms accountable for their impact on children and called for stronger privacy protections and mental health services [2]. Biden also discussed measures against Russia, including preventing their central bank from defending the Ruble and targeting Russian oligarchs' assets [3]. Additionally, he reaffirmed the commitment to protect women's rights, particularly the right to choose as affirmed in Roe v. Wade [5]. Lastly, he advocated for funding the police to ensure community safety [6]."
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"output = {}\n",
|
||||
"curr_key = None\n",
|
||||
"for chunk in rag.stream(query_str):\n",
|
||||
" for key in chunk:\n",
|
||||
" if key not in output:\n",
|
||||
" output[key] = chunk[key]\n",
|
||||
" else:\n",
|
||||
" output[key] += chunk[key]\n",
|
||||
" if key == \"answer\":\n",
|
||||
" print(chunk[key], end=\"\", flush=True)\n",
|
||||
" curr_key = key"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "8f16bf8d",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"For more details about Vectara as VectorStore [go to this notebook](../vectorstores/vectara.ipynb)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "d49a91d2-9c53-48cb-8065-a3ba1292e8d0",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Vectara Chat\n",
|
||||
"\n",
|
||||
"In most uses of LangChain to create chatbots, one must integrate a special `memory` component that maintains the history of chat sessions and then uses that history to ensure the chatbot is aware of conversation history.\n",
|
||||
"\n",
|
||||
"With Vectara Chat - all of that is performed in the backend by Vectara automatically."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 15,
|
||||
"id": "f57264ec-e8b5-4d55-9c16-54898d506f73",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'The president stated that nominating someone to serve on the United States Supreme Court is one of the most serious constitutional responsibilities he has. He nominated Circuit Court of Appeals Judge Ketanji Brown Jackson, describing her as one of the nation’s top legal minds who will continue Justice Breyer’s legacy of excellence [1].'"
|
||||
]
|
||||
},
|
||||
"execution_count": 15,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"generation_config = GenerationConfig(\n",
|
||||
" max_used_search_results=7,\n",
|
||||
" response_language=\"eng\",\n",
|
||||
" generation_preset_name=\"vectara-summary-ext-24-05-med-omni\",\n",
|
||||
" enable_factual_consistency_score=True,\n",
|
||||
")\n",
|
||||
"search_config = SearchConfig(\n",
|
||||
" corpora=[CorpusConfig(corpus_key=corpus_key, limit=25)],\n",
|
||||
" reranker=MmrReranker(diversity_bias=0.2),\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"config = VectaraQueryConfig(\n",
|
||||
" search=search_config,\n",
|
||||
" generation=generation_config,\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"bot = vectara.as_chat(config)\n",
|
||||
"\n",
|
||||
"bot.invoke(\"What did the president say about Ketanji Brown Jackson?\")[\"answer\"]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "13714687-672d-47af-997a-61bb9dd66923",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"For more details about Vectara chat [go to this notebook](../chat/vectara.ipynb)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "baf687dc-08c4-49af-98aa-0359e2591f2e",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Vectara as self-querying retriever\n",
|
||||
"Vectara offers Intelligent Query Rewriting option which enhances search precision by automatically generating metadata filter expressions from natural language queries. This capability analyzes user queries, extracts relevant metadata filters, and rephrases the query to focus on the core information need. For more details [go to this notebook](../retrievers/self_query/vectara_self_query.ipynb)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "8060a423-b291-4166-8fd7-ba0e01692b51",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.12.0"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
@ -1,181 +0,0 @@
|
||||
# Vectara
|
||||
|
||||
>[Vectara](https://vectara.com/) provides a Trusted Generative AI platform, allowing organizations to rapidly create a ChatGPT-like experience (an AI assistant)
|
||||
> which is grounded in the data, documents, and knowledge that they have (technically, it is Retrieval-Augmented-Generation-as-a-service).
|
||||
|
||||
**Vectara Overview:**
|
||||
[Vectara](https://vectara.com/) is the trusted AI Assistant and Agent platform which focuses on enterprise readiness for mission-critical applications.
|
||||
Vectara serverless RAG-as-a-service provides all the components of RAG behind an easy-to-use API, including:
|
||||
1. A way to extract text from files (PDF, PPT, DOCX, etc)
|
||||
2. ML-based chunking that provides state of the art performance.
|
||||
3. The [Boomerang](https://vectara.com/how-boomerang-takes-retrieval-augmented-generation-to-the-next-level-via-grounded-generation/) embeddings model.
|
||||
4. Its own internal vector database where text chunks and embedding vectors are stored.
|
||||
5. A query service that automatically encodes the query into embedding, and retrieves the most relevant text segments, including support for [Hybrid Search](https://docs.vectara.com/docs/api-reference/search-apis/lexical-matching) as well as multiple reranking options such as the [multi-lingual relevance reranker](https://www.vectara.com/blog/deep-dive-into-vectara-multilingual-reranker-v1-state-of-the-art-reranker-across-100-languages), [MMR](https://vectara.com/get-diverse-results-and-comprehensive-summaries-with-vectaras-mmr-reranker/), [UDF reranker](https://www.vectara.com/blog/rag-with-user-defined-functions-based-reranking).
|
||||
6. An LLM to for creating a [generative summary](https://docs.vectara.com/docs/learn/grounded-generation/grounded-generation-overview), based on the retrieved documents (context), including citations.
|
||||
|
||||
For more information:
|
||||
- [Documentation](https://docs.vectara.com/docs/)
|
||||
- [API Playground](https://docs.vectara.com/docs/rest-api/)
|
||||
- [Quickstart](https://docs.vectara.com/docs/quickstart)
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
To use `Vectara` with LangChain no special installation steps are required.
|
||||
To get started, [sign up](https://vectara.com/integrations/langchain) for a free Vectara trial,
|
||||
and follow the [quickstart](https://docs.vectara.com/docs/quickstart) guide to create a corpus and an API key.
|
||||
Once you have these, you can provide them as arguments to the Vectara `vectorstore`, or you can set them as environment variables.
|
||||
|
||||
- export `VECTARA_CUSTOMER_ID`="your_customer_id"
|
||||
- export `VECTARA_CORPUS_ID`="your_corpus_id"
|
||||
- export `VECTARA_API_KEY`="your-vectara-api-key"
|
||||
|
||||
## Vectara as a Vector Store
|
||||
|
||||
There exists a wrapper around the Vectara platform, allowing you to use it as a `vectorstore` in LangChain:
|
||||
|
||||
To import this vectorstore:
|
||||
```python
|
||||
from langchain_community.vectorstores import Vectara
|
||||
```
|
||||
|
||||
To create an instance of the Vectara vectorstore:
|
||||
```python
|
||||
vectara = Vectara(
|
||||
vectara_customer_id=customer_id,
|
||||
vectara_corpus_id=corpus_id,
|
||||
vectara_api_key=api_key
|
||||
)
|
||||
```
|
||||
The `customer_id`, `corpus_id` and `api_key` are optional, and if they are not supplied will be read from
|
||||
the environment variables `VECTARA_CUSTOMER_ID`, `VECTARA_CORPUS_ID` and `VECTARA_API_KEY`, respectively.
|
||||
|
||||
### Adding Texts or Files
|
||||
|
||||
After you have the vectorstore, you can `add_texts` or `add_documents` as per the standard `VectorStore` interface, for example:
|
||||
|
||||
```python
|
||||
vectara.add_texts(["to be or not to be", "that is the question"])
|
||||
```
|
||||
|
||||
Since Vectara supports file-upload in the platform, we also added the ability to upload files (PDF, TXT, HTML, PPT, DOC, etc) directly.
|
||||
When using this method, each file is uploaded directly to the Vectara backend, processed and chunked optimally there, so you don't have to use the LangChain document loader or chunking mechanism.
|
||||
|
||||
As an example:
|
||||
|
||||
```python
|
||||
vectara.add_files(["path/to/file1.pdf", "path/to/file2.pdf",...])
|
||||
```
|
||||
|
||||
Of course you do not have to add any data, and instead just connect to an existing Vectara corpus where data may already be indexed.
|
||||
|
||||
### Querying the VectorStore
|
||||
|
||||
To query the Vectara vectorstore, you can use the `similarity_search` method (or `similarity_search_with_score`), which takes a query string and returns a list of results:
|
||||
```python
|
||||
results = vectara.similarity_search_with_score("what is LangChain?")
|
||||
```
|
||||
The results are returned as a list of relevant documents, and a relevance score of each document.
|
||||
|
||||
In this case, we used the default retrieval parameters, but you can also specify the following additional arguments in `similarity_search` or `similarity_search_with_score`:
|
||||
- `k`: number of results to return (defaults to 5)
|
||||
- `lambda_val`: the [lexical matching](https://docs.vectara.com/docs/api-reference/search-apis/lexical-matching) factor for hybrid search (defaults to 0.025)
|
||||
- `filter`: a [filter](https://docs.vectara.com/docs/common-use-cases/filtering-by-metadata/filter-overview) to apply to the results (default None)
|
||||
- `n_sentence_context`: number of sentences to include before/after the actual matching segment when returning results. This defaults to 2.
|
||||
- `rerank_config`: can be used to specify reranker for thr results
|
||||
- `reranker`: mmr, rerank_multilingual_v1 or none. Note that "rerank_multilingual_v1" is a Scale only feature
|
||||
- `rerank_k`: number of results to use for reranking
|
||||
- `mmr_diversity_bias`: 0 = no diversity, 1 = full diversity. This is the lambda parameter in the MMR formula and is in the range 0...1
|
||||
|
||||
To get results without the relevance score, you can simply use the 'similarity_search' method:
|
||||
```python
|
||||
results = vectara.similarity_search("what is LangChain?")
|
||||
```
|
||||
|
||||
## Vectara for Retrieval Augmented Generation (RAG)
|
||||
|
||||
Vectara provides a full RAG pipeline, including generative summarization. To use it as a complete RAG solution, you can use the `as_rag` method.
|
||||
There are a few additional parameters that can be specified in the `VectaraQueryConfig` object to control retrieval and summarization:
|
||||
* k: number of results to return
|
||||
* lambda_val: the lexical matching factor for hybrid search
|
||||
* summary_config (optional): can be used to request an LLM summary in RAG
|
||||
- is_enabled: True or False
|
||||
- max_results: number of results to use for summary generation
|
||||
- response_lang: language of the response summary, in ISO 639-2 format (e.g. 'en', 'fr', 'de', etc)
|
||||
* rerank_config (optional): can be used to specify Vectara Reranker of the results
|
||||
- reranker: mmr, rerank_multilingual_v1 or none
|
||||
- rerank_k: number of results to use for reranking
|
||||
- mmr_diversity_bias: 0 = no diversity, 1 = full diversity.
|
||||
This is the lambda parameter in the MMR formula and is in the range 0...1
|
||||
|
||||
For example:
|
||||
|
||||
```python
|
||||
summary_config = SummaryConfig(is_enabled=True, max_results=7, response_lang='eng')
|
||||
rerank_config = RerankConfig(reranker="mmr", rerank_k=50, mmr_diversity_bias=0.2)
|
||||
config = VectaraQueryConfig(k=10, lambda_val=0.005, rerank_config=rerank_config, summary_config=summary_config)
|
||||
```
|
||||
Then you can use the `as_rag` method to create a RAG pipeline:
|
||||
|
||||
```python
|
||||
query_str = "what did Biden say?"
|
||||
|
||||
rag = vectara.as_rag(config)
|
||||
rag.invoke(query_str)['answer']
|
||||
```
|
||||
|
||||
The `as_rag` method returns a `VectaraRAG` object, which behaves just like any LangChain Runnable, including the `invoke` or `stream` methods.
|
||||
|
||||
## Vectara Chat
|
||||
|
||||
The RAG functionality can be used to create a chatbot. For example, you can create a simple chatbot that responds to user input:
|
||||
|
||||
```python
|
||||
summary_config = SummaryConfig(is_enabled=True, max_results=7, response_lang='eng')
|
||||
rerank_config = RerankConfig(reranker="mmr", rerank_k=50, mmr_diversity_bias=0.2)
|
||||
config = VectaraQueryConfig(k=10, lambda_val=0.005, rerank_config=rerank_config, summary_config=summary_config)
|
||||
|
||||
query_str = "what did Biden say?"
|
||||
bot = vectara.as_chat(config)
|
||||
bot.invoke(query_str)['answer']
|
||||
```
|
||||
|
||||
The main difference is the following: with `as_chat` Vectara internally tracks the chat history and conditions each response on the full chat history.
|
||||
There is no need to keep that history locally to LangChain, as Vectara will manage it internally.
|
||||
|
||||
## Vectara as a LangChain retriever only
|
||||
|
||||
If you want to use Vectara as a retriever only, you can use the `as_retriever` method, which returns a `VectaraRetriever` object.
|
||||
```python
|
||||
retriever = vectara.as_retriever(config=config)
|
||||
retriever.invoke(query_str)
|
||||
```
|
||||
|
||||
Like with as_rag, you provide a `VectaraQueryConfig` object to control the retrieval parameters.
|
||||
In most cases you would not enable the summary_config, but it is left as an option for backwards compatibility.
|
||||
If no summary is requested, the response will be a list of relevant documents, each with a relevance score.
|
||||
If a summary is requested, the response will be a list of relevant documents as before, plus an additional document that includes the generative summary.
|
||||
|
||||
## Hallucination Detection score
|
||||
|
||||
Vectara created [HHEM](https://huggingface.co/vectara/hallucination_evaluation_model) - an open source model that can be used to evaluate RAG responses for factual consistency.
|
||||
As part of the Vectara RAG, the "Factual Consistency Score" (or FCS), which is an improved version of the open source HHEM is made available via the API.
|
||||
This is automatically included in the output of the RAG pipeline
|
||||
|
||||
```python
|
||||
summary_config = SummaryConfig(is_enabled=True, max_results=7, response_lang='eng')
|
||||
rerank_config = RerankConfig(reranker="mmr", rerank_k=50, mmr_diversity_bias=0.2)
|
||||
config = VectaraQueryConfig(k=10, lambda_val=0.005, rerank_config=rerank_config, summary_config=summary_config)
|
||||
|
||||
rag = vectara.as_rag(config)
|
||||
resp = rag.invoke(query_str)
|
||||
print(resp['answer'])
|
||||
print(f"Vectara FCS = {resp['fcs']}")
|
||||
```
|
||||
|
||||
## Example Notebooks
|
||||
|
||||
For a more detailed examples of using Vectara with LangChain, see the following example notebooks:
|
||||
* [this notebook](/docs/integrations/vectorstores/vectara) shows how to use Vectara: with full RAG or just as a retriever.
|
||||
* [this notebook](/docs/integrations/retrievers/self_query/vectara_self_query) shows the self-query capability with Vectara.
|
||||
* [this notebook](/docs/integrations/providers/vectara/vectara_chat) shows how to build a chatbot with Langchain and Vectara
|
||||
|
@ -8,7 +8,6 @@
|
||||
"# Vectara self-querying \n",
|
||||
"\n",
|
||||
"[Vectara](https://vectara.com/) is the trusted AI Assistant and Agent platform which focuses on enterprise readiness for mission-critical applications.\n",
|
||||
"\n",
|
||||
"Vectara serverless RAG-as-a-service provides all the components of RAG behind an easy-to-use API, including:\n",
|
||||
"1. A way to extract text from files (PDF, PPT, DOCX, etc)\n",
|
||||
"2. ML-based chunking that provides state of the art performance.\n",
|
||||
@ -17,9 +16,27 @@
|
||||
"5. A query service that automatically encodes the query into embedding, and retrieves the most relevant text segments, including support for [Hybrid Search](https://docs.vectara.com/docs/api-reference/search-apis/lexical-matching) as well as multiple reranking options such as the [multi-lingual relevance reranker](https://www.vectara.com/blog/deep-dive-into-vectara-multilingual-reranker-v1-state-of-the-art-reranker-across-100-languages), [MMR](https://vectara.com/get-diverse-results-and-comprehensive-summaries-with-vectaras-mmr-reranker/), [UDF reranker](https://www.vectara.com/blog/rag-with-user-defined-functions-based-reranking). \n",
|
||||
"6. An LLM to for creating a [generative summary](https://docs.vectara.com/docs/learn/grounded-generation/grounded-generation-overview), based on the retrieved documents (context), including citations.\n",
|
||||
"\n",
|
||||
"See the [Vectara API documentation](https://docs.vectara.com/docs/) for more information on how to use the API.\n",
|
||||
"For more information:\n",
|
||||
"- [Documentation](https://docs.vectara.com/docs/)\n",
|
||||
"- [API Playground](https://docs.vectara.com/docs/rest-api/)\n",
|
||||
"- [Quickstart](https://docs.vectara.com/docs/quickstart)\n",
|
||||
"\n",
|
||||
"This notebook shows how to use `SelfQueryRetriever` with Vectara."
|
||||
"\n",
|
||||
"This notebook shows how to use `Vectara` as `SelfQueryRetriever`.\n",
|
||||
"\n",
|
||||
"## Setup\n",
|
||||
"\n",
|
||||
"To use the `VectaraVectorStore` you first need to install the partner package.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "07f3f1a4-f552-4d07-ba48-18fb5d8641c6",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"!uv pip install -U pip && uv pip install -qU langchain-vectara"
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -30,14 +47,14 @@
|
||||
"# Getting Started\n",
|
||||
"\n",
|
||||
"To get started, use the following steps:\n",
|
||||
"1. If you don't already have one, [Sign up](https://www.vectara.com/integrations/langchain) for your free Vectara trial. Once you have completed your sign up you will have a Vectara customer ID. You can find your customer ID by clicking on your name, on the top-right of the Vectara console window.\n",
|
||||
"1. If you don't already have one, [Sign up](https://www.vectara.com/integrations/langchain) for your free Vectara trial.\n",
|
||||
"2. Within your account you can create one or more corpora. Each corpus represents an area that stores text data upon ingest from input documents. To create a corpus, use the **\"Create Corpus\"** button. You then provide a name to your corpus as well as a description. Optionally you can define filtering attributes and apply some advanced options. If you click on your created corpus, you can see its name and corpus ID right on the top.\n",
|
||||
"3. Next you'll need to create API keys to access the corpus. Click on the **\"Access Control\"** tab in the corpus view and then the **\"Create API Key\"** button. Give your key a name, and choose whether you want query-only or query+index for your key. Click \"Create\" and you now have an active API key. Keep this key confidential. \n",
|
||||
"\n",
|
||||
"To use LangChain with Vectara, you'll need to have these three values: `customer ID`, `corpus ID` and `api_key`.\n",
|
||||
"You can provide those to LangChain in two ways:\n",
|
||||
"To use LangChain with Vectara, you'll need to have these two values: `corpus_key` and `api_key`.\n",
|
||||
"You can provide `VECTARA_API_KEY` to LangChain in two ways:\n",
|
||||
"\n",
|
||||
"1. Include in your environment these three variables: `VECTARA_CUSTOMER_ID`, `VECTARA_CORPUS_ID` and `VECTARA_API_KEY`.\n",
|
||||
"1. Include in your environment these two variables: `VECTARA_API_KEY`.\n",
|
||||
"\n",
|
||||
" For example, you can set these variables using os.environ and getpass as follows:\n",
|
||||
"\n",
|
||||
@ -45,8 +62,6 @@
|
||||
"import os\n",
|
||||
"import getpass\n",
|
||||
"\n",
|
||||
"os.environ[\"VECTARA_CUSTOMER_ID\"] = getpass.getpass(\"Vectara Customer ID:\")\n",
|
||||
"os.environ[\"VECTARA_CORPUS_ID\"] = getpass.getpass(\"Vectara Corpus ID:\")\n",
|
||||
"os.environ[\"VECTARA_API_KEY\"] = getpass.getpass(\"Vectara API Key:\")\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
@ -54,14 +69,11 @@
|
||||
"\n",
|
||||
"```python\n",
|
||||
"vectara = Vectara(\n",
|
||||
" vectara_customer_id=vectara_customer_id,\n",
|
||||
" vectara_corpus_id=vectara_corpus_id,\n",
|
||||
" vectara_api_key=vectara_api_key\n",
|
||||
" )\n",
|
||||
" vectara_api_key=vectara_api_key\n",
|
||||
")\n",
|
||||
"```\n",
|
||||
"In this notebook we assume they are provided in the environment.\n",
|
||||
"\n",
|
||||
"**Notes:** The self-query retriever requires you to have `lark` installed (`pip install lark`). "
|
||||
"In this notebook we assume they are provided in the environment."
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -71,14 +83,14 @@
|
||||
"source": [
|
||||
"## Connecting to Vectara from LangChain\n",
|
||||
"\n",
|
||||
"In this example, we assume that you've created an account and a corpus, and added your `VECTARA_CUSTOMER_ID`, `VECTARA_CORPUS_ID` and `VECTARA_API_KEY` (created with permissions for both indexing and query) as environment variables.\n",
|
||||
"In this example, we assume that you've created an account and a corpus, and added your `VECTARA_CORPUS_KEY` and `VECTARA_API_KEY` (created with permissions for both indexing and query) as environment variables.\n",
|
||||
"\n",
|
||||
"We further assume the corpus has 4 fields defined as filterable metadata attributes: `year`, `director`, `rating`, and `genre`"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"execution_count": 2,
|
||||
"id": "9d3aa44f",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@ -87,14 +99,10 @@
|
||||
"\n",
|
||||
"from langchain_core.documents import Document\n",
|
||||
"\n",
|
||||
"os.environ[\"VECTARA_API_KEY\"] = \"<YOUR_VECTARA_API_KEY>\"\n",
|
||||
"os.environ[\"VECTARA_CORPUS_ID\"] = \"<YOUR_VECTARA_CORPUS_ID>\"\n",
|
||||
"os.environ[\"VECTARA_CUSTOMER_ID\"] = \"<YOUR_VECTARA_CUSTOMER_ID>\"\n",
|
||||
"os.environ[\"VECTARA_API_KEY\"] = \"VECTARA_API_KEY\"\n",
|
||||
"os.environ[\"VECTARA_CORPUS_KEY\"] = \"VECTARA_CORPUS_KEY\"\n",
|
||||
"\n",
|
||||
"from langchain.chains.query_constructor.schema import AttributeInfo\n",
|
||||
"from langchain.retrievers.self_query.base import SelfQueryRetriever\n",
|
||||
"from langchain_community.vectorstores import Vectara\n",
|
||||
"from langchain_openai.chat_models import ChatOpenAI"
|
||||
"from langchain_vectara import Vectara"
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -109,7 +117,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"execution_count": 6,
|
||||
"id": "bcbe04d9",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
@ -148,9 +156,12 @@
|
||||
" ),\n",
|
||||
"]\n",
|
||||
"\n",
|
||||
"corpus_key = os.getenv(\"VECTARA_CORPUS_KEY\")\n",
|
||||
"vectara = Vectara()\n",
|
||||
"for doc in docs:\n",
|
||||
" vectara.add_texts([doc.page_content], doc_metadata=doc.metadata)"
|
||||
" vectara.add_texts(\n",
|
||||
" [doc.page_content], corpus_key=corpus_key, doc_metadata=doc.metadata\n",
|
||||
" )"
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -158,45 +169,32 @@
|
||||
"id": "5ecaab6d",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Creating the self-querying retriever\n",
|
||||
"Now we can instantiate our retriever. To do this we'll need to provide some information upfront about the metadata fields that our documents support and a short description of the document contents.\n",
|
||||
"## Self-query with Vectara\n",
|
||||
" You don't need self-query via the LangChain mechanism—enabling `intelligent_query_rewriting` on the Vectara platform achieves the same result.\n",
|
||||
"Vectara offers Intelligent Query Rewriting option which enhances search precision by automatically generating metadata filter expressions from natural language queries. This capability analyzes user queries, extracts relevant metadata filters, and rephrases the query to focus on the core information need. For more [details](https://docs.vectara.com/docs/search-and-retrieval/intelligent-query-rewriting).\n",
|
||||
"\n",
|
||||
"We then provide an llm (in this case OpenAI) and the `vectara` vectorstore as arguments:"
|
||||
"Enable intelligent query rewriting on a per-query basis by setting the `intelligent_query_rewriting` parameter to `true` in `VectaraQueryConfig`."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"execution_count": 7,
|
||||
"id": "86e34dbf",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"metadata_field_info = [\n",
|
||||
" AttributeInfo(\n",
|
||||
" name=\"genre\",\n",
|
||||
" description=\"The genre of the movie\",\n",
|
||||
" type=\"string or list[string]\",\n",
|
||||
" ),\n",
|
||||
" AttributeInfo(\n",
|
||||
" name=\"year\",\n",
|
||||
" description=\"The year the movie was released\",\n",
|
||||
" type=\"integer\",\n",
|
||||
" ),\n",
|
||||
" AttributeInfo(\n",
|
||||
" name=\"director\",\n",
|
||||
" description=\"The name of the movie director\",\n",
|
||||
" type=\"string\",\n",
|
||||
" ),\n",
|
||||
" AttributeInfo(\n",
|
||||
" name=\"rating\", description=\"A 1-10 rating for the movie\", type=\"float\"\n",
|
||||
" ),\n",
|
||||
"]\n",
|
||||
"document_content_description = \"Brief summary of a movie\"\n",
|
||||
"llm = ChatOpenAI(temperature=0, model=\"gpt-4o\", max_tokens=4069)\n",
|
||||
"retriever = SelfQueryRetriever.from_llm(\n",
|
||||
" llm, vectara, document_content_description, metadata_field_info, verbose=True\n",
|
||||
"from langchain_vectara.vectorstores import (\n",
|
||||
" CorpusConfig,\n",
|
||||
" SearchConfig,\n",
|
||||
" VectaraQueryConfig,\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"config = VectaraQueryConfig(\n",
|
||||
" search=SearchConfig(corpora=[CorpusConfig(corpus_key=corpus_key)]),\n",
|
||||
" generation=None,\n",
|
||||
" intelligent_query_rewriting=True,\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
@ -205,116 +203,31 @@
|
||||
"id": "ea9df8d4",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Self-retrieval Queries\n",
|
||||
"And now we can try actually using our retriever!"
|
||||
"## Queries\n",
|
||||
"And now we can try actually using our vectara_queries method!"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"execution_count": 8,
|
||||
"id": "38a126e9",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[Document(page_content='A bunch of scientists bring back dinosaurs and mayhem breaks loose', metadata={'lang': 'eng', 'offset': '0', 'len': '66', 'year': '1993', 'rating': '7.7', 'genre': 'science fiction', 'source': 'langchain'}),\n",
|
||||
" Document(page_content='A psychologist / detective gets lost in a series of dreams within dreams within dreams and Inception reused the idea', metadata={'lang': 'eng', 'offset': '0', 'len': '116', 'year': '2006', 'director': 'Satoshi Kon', 'rating': '8.6', 'source': 'langchain'}),\n",
|
||||
" Document(page_content='Toys come alive and have a blast doing so', metadata={'lang': 'eng', 'offset': '0', 'len': '41', 'year': '1995', 'genre': 'animated', 'source': 'langchain'}),\n",
|
||||
" Document(page_content='Three men walk into the Zone, three men walk out of the Zone', metadata={'lang': 'eng', 'offset': '0', 'len': '60', 'year': '1979', 'rating': '9.9', 'director': 'Andrei Tarkovsky', 'genre': 'science fiction', 'source': 'langchain'}),\n",
|
||||
" Document(page_content='A bunch of normal-sized women are supremely wholesome and some men pine after them', metadata={'lang': 'eng', 'offset': '0', 'len': '82', 'year': '2019', 'director': 'Greta Gerwig', 'rating': '8.3', 'source': 'langchain'}),\n",
|
||||
" Document(page_content='Leo DiCaprio gets lost in a dream within a dream within a dream within a ...', metadata={'lang': 'eng', 'offset': '0', 'len': '76', 'year': '2010', 'director': 'Christopher Nolan', 'rating': '8.2', 'source': 'langchain'})]"
|
||||
]
|
||||
},
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# This example only specifies a relevant query\n",
|
||||
"retriever.invoke(\"What are movies about scientists\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "fc3f1e6e",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[Document(page_content='A psychologist / detective gets lost in a series of dreams within dreams within dreams and Inception reused the idea', metadata={'lang': 'eng', 'offset': '0', 'len': '116', 'year': '2006', 'director': 'Satoshi Kon', 'rating': '8.6', 'source': 'langchain'}),\n",
|
||||
" Document(page_content='Three men walk into the Zone, three men walk out of the Zone', metadata={'lang': 'eng', 'offset': '0', 'len': '60', 'year': '1979', 'rating': '9.9', 'director': 'Andrei Tarkovsky', 'genre': 'science fiction', 'source': 'langchain'})]"
|
||||
]
|
||||
},
|
||||
"execution_count": 5,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# This example only specifies a filter\n",
|
||||
"retriever.invoke(\"I want to watch a movie rated higher than 8.5\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"id": "b19d4da0",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[Document(page_content='A bunch of normal-sized women are supremely wholesome and some men pine after them', metadata={'lang': 'eng', 'offset': '0', 'len': '82', 'year': '2019', 'director': 'Greta Gerwig', 'rating': '8.3', 'source': 'langchain'})]"
|
||||
]
|
||||
},
|
||||
"execution_count": 6,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# This example specifies a query and a filter\n",
|
||||
"retriever.invoke(\"Has Greta Gerwig directed any movies about women\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"id": "f900e40e",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[Document(page_content='A psychologist / detective gets lost in a series of dreams within dreams within dreams and Inception reused the idea', metadata={'lang': 'eng', 'offset': '0', 'len': '116', 'year': '2006', 'director': 'Satoshi Kon', 'rating': '8.6', 'source': 'langchain'}),\n",
|
||||
" Document(page_content='Three men walk into the Zone, three men walk out of the Zone', metadata={'lang': 'eng', 'offset': '0', 'len': '60', 'year': '1979', 'rating': '9.9', 'director': 'Andrei Tarkovsky', 'genre': 'science fiction', 'source': 'langchain'})]"
|
||||
]
|
||||
},
|
||||
"execution_count": 7,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# This example specifies a composite filter\n",
|
||||
"retriever.invoke(\"What's a highly rated (above 8.5) science fiction film?\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"id": "12a51522",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[Document(page_content='Toys come alive and have a blast doing so', metadata={'lang': 'eng', 'offset': '0', 'len': '41', 'year': '1995', 'genre': 'animated', 'source': 'langchain'}),\n",
|
||||
" Document(page_content='A bunch of scientists bring back dinosaurs and mayhem breaks loose', metadata={'lang': 'eng', 'offset': '0', 'len': '66', 'year': '1993', 'rating': '7.7', 'genre': 'science fiction', 'source': 'langchain'})]"
|
||||
"[(Document(metadata={'year': 1995, 'genre': 'animated', 'source': 'langchain'}, page_content='Toys come alive and have a blast doing so'),\n",
|
||||
" 0.4141285717487335),\n",
|
||||
" (Document(metadata={'year': 1979, 'rating': 9.9, 'director': 'Andrei Tarkovsky', 'genre': 'science fiction', 'source': 'langchain'}, page_content='Three men walk into the Zone, three men walk out of the Zone'),\n",
|
||||
" 0.4046250879764557),\n",
|
||||
" (Document(metadata={'year': 2010, 'director': 'Christopher Nolan', 'rating': 8.2, 'source': 'langchain'}, page_content='Leo DiCaprio gets lost in a dream within a dream within a dream within a ...'),\n",
|
||||
" 0.227469339966774),\n",
|
||||
" (Document(metadata={'year': 2019, 'director': 'Greta Gerwig', 'rating': 8.3, 'source': 'langchain'}, page_content='A bunch of normal-sized women are supremely wholesome and some men pine after them'),\n",
|
||||
" 0.19208428263664246),\n",
|
||||
" (Document(metadata={'year': 1993, 'rating': 7.7, 'genre': 'science fiction', 'source': 'langchain'}, page_content='A bunch of scientists bring back dinosaurs and mayhem breaks loose'),\n",
|
||||
" 0.1902722418308258),\n",
|
||||
" (Document(metadata={'year': 2006, 'director': 'Satoshi Kon', 'rating': 8.6, 'source': 'langchain'}, page_content='A psychologist / detective gets lost in a series of dreams within dreams within dreams and Inception reused the idea'),\n",
|
||||
" 0.08151976019144058)]"
|
||||
]
|
||||
},
|
||||
"execution_count": 8,
|
||||
@ -323,74 +236,107 @@
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# This example specifies a query and composite filter\n",
|
||||
"retriever.invoke(\n",
|
||||
" \"What's a movie after 1990 but before 2005 that's all about toys, and preferably is animated\"\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "39bd1de1-b9fe-4a98-89da-58d8a7a6ae51",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Filter k\n",
|
||||
"\n",
|
||||
"We can also use the self query retriever to specify `k`: the number of documents to fetch.\n",
|
||||
"\n",
|
||||
"We can do this by passing `enable_limit=True` to the constructor."
|
||||
"# This example only specifies a relevant query\n",
|
||||
"vectara.vectara_query(\"What are movies about scientists\", config)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"id": "bff36b88-b506-4877-9c63-e5a1a8d78e64",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"retriever = SelfQueryRetriever.from_llm(\n",
|
||||
" llm,\n",
|
||||
" vectara,\n",
|
||||
" document_content_description,\n",
|
||||
" metadata_field_info,\n",
|
||||
" enable_limit=True,\n",
|
||||
" verbose=True,\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "00e8baad-a9d7-4498-bd8d-ca41d0691386",
|
||||
"id": "fc3f1e6e",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"This is cool, we can include the number of results we would like to see in the query and the self retriever would correctly understand it. For example, let's look for "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"id": "2758d229-4f97-499c-819f-888acaf8ee10",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[Document(page_content='A psychologist / detective gets lost in a series of dreams within dreams within dreams and Inception reused the idea', metadata={'lang': 'eng', 'offset': '0', 'len': '116', 'year': '2006', 'director': 'Satoshi Kon', 'rating': '8.6', 'source': 'langchain'}),\n",
|
||||
" Document(page_content='Three men walk into the Zone, three men walk out of the Zone', metadata={'lang': 'eng', 'offset': '0', 'len': '60', 'year': '1979', 'rating': '9.9', 'director': 'Andrei Tarkovsky', 'genre': 'science fiction', 'source': 'langchain'})]"
|
||||
"[(Document(metadata={'year': 2006, 'director': 'Satoshi Kon', 'rating': 8.6, 'source': 'langchain'}, page_content='A psychologist / detective gets lost in a series of dreams within dreams within dreams and Inception reused the idea'),\n",
|
||||
" 0.34279149770736694),\n",
|
||||
" (Document(metadata={'year': 1979, 'rating': 9.9, 'director': 'Andrei Tarkovsky', 'genre': 'science fiction', 'source': 'langchain'}, page_content='Three men walk into the Zone, three men walk out of the Zone'),\n",
|
||||
" 0.242923304438591)]"
|
||||
]
|
||||
},
|
||||
"execution_count": 10,
|
||||
"execution_count": 9,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# This example only specifies a relevant query\n",
|
||||
"retriever.invoke(\"what are two movies with a rating above 8.5\")"
|
||||
"# This example only specifies a filter\n",
|
||||
"vectara.vectara_query(\"I want to watch a movie rated higher than 8.5\", config)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 12,
|
||||
"id": "b19d4da0",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[(Document(metadata={'year': 2019, 'director': 'Greta Gerwig', 'rating': 8.3, 'source': 'langchain'}, page_content='A bunch of normal-sized women are supremely wholesome and some men pine after them'),\n",
|
||||
" 0.10141132771968842)]"
|
||||
]
|
||||
},
|
||||
"execution_count": 12,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# This example specifies a query and a filter\n",
|
||||
"vectara.vectara_query(\"Has Greta Gerwig directed any movies about women\", config)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 14,
|
||||
"id": "f900e40e",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[(Document(metadata={'year': 1979, 'rating': 9.9, 'director': 'Andrei Tarkovsky', 'genre': 'science fiction', 'source': 'langchain'}, page_content='Three men walk into the Zone, three men walk out of the Zone'),\n",
|
||||
" 0.9508692026138306)]"
|
||||
]
|
||||
},
|
||||
"execution_count": 14,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# This example specifies a composite filter\n",
|
||||
"vectara.vectara_query(\"What's a highly rated (above 8.5) science fiction film?\", config)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 15,
|
||||
"id": "12a51522",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[(Document(metadata={'year': 1995, 'genre': 'animated', 'source': 'langchain'}, page_content='Toys come alive and have a blast doing so'),\n",
|
||||
" 0.7290377616882324),\n",
|
||||
" (Document(metadata={'year': 1993, 'rating': 7.7, 'genre': 'science fiction', 'source': 'langchain'}, page_content='A bunch of scientists bring back dinosaurs and mayhem breaks loose'),\n",
|
||||
" 0.4838160574436188)]"
|
||||
]
|
||||
},
|
||||
"execution_count": 15,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# This example specifies a query and composite filter\n",
|
||||
"vectara.vectara_query(\n",
|
||||
" \"What's a movie after 1990 but before 2005 that's all about toys, and preferably is animated\",\n",
|
||||
" config,\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -418,7 +364,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.8"
|
||||
"version": "3.12.0"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
@ -8,20 +8,35 @@
|
||||
"# Vectara\n",
|
||||
"\n",
|
||||
"[Vectara](https://vectara.com/) is the trusted AI Assistant and Agent platform which focuses on enterprise readiness for mission-critical applications.\n",
|
||||
"\n",
|
||||
"Vectara serverless RAG-as-a-service provides all the components of RAG behind an easy-to-use API, including:\n",
|
||||
"1. A way to extract text from files (PDF, PPT, DOCX, etc)\n",
|
||||
"2. ML-based chunking that provides state of the art performance.\n",
|
||||
"3. The [Boomerang](https://vectara.com/how-boomerang-takes-retrieval-augmented-generation-to-the-next-level-via-grounded-generation/) embeddings model.\n",
|
||||
"4. Its own internal vector database where text chunks and embedding vectors are stored.\n",
|
||||
"5. A query service that automatically encodes the query into embedding, and retrieves the most relevant text segments (including support for [Hybrid Search](https://docs.vectara.com/docs/api-reference/search-apis/lexical-matching) as well as multiple reranking options such as the [multi-lingual relevance reranker](https://www.vectara.com/blog/deep-dive-into-vectara-multilingual-reranker-v1-state-of-the-art-reranker-across-100-languages), [MMR](https://vectara.com/get-diverse-results-and-comprehensive-summaries-with-vectaras-mmr-reranker/), [UDF reranker](https://www.vectara.com/blog/rag-with-user-defined-functions-based-reranking). \n",
|
||||
"5. A query service that automatically encodes the query into embedding, and retrieves the most relevant text segments, including support for [Hybrid Search](https://docs.vectara.com/docs/api-reference/search-apis/lexical-matching) as well as multiple reranking options such as the [multi-lingual relevance reranker](https://www.vectara.com/blog/deep-dive-into-vectara-multilingual-reranker-v1-state-of-the-art-reranker-across-100-languages), [MMR](https://vectara.com/get-diverse-results-and-comprehensive-summaries-with-vectaras-mmr-reranker/), [UDF reranker](https://www.vectara.com/blog/rag-with-user-defined-functions-based-reranking). \n",
|
||||
"6. An LLM to for creating a [generative summary](https://docs.vectara.com/docs/learn/grounded-generation/grounded-generation-overview), based on the retrieved documents (context), including citations.\n",
|
||||
"\n",
|
||||
"See the [Vectara API documentation](https://docs.vectara.com/docs/) for more information on how to use the API.\n",
|
||||
"For more information:\n",
|
||||
"- [Documentation](https://docs.vectara.com/docs/)\n",
|
||||
"- [API Playground](https://docs.vectara.com/docs/rest-api/)\n",
|
||||
"- [Quickstart](https://docs.vectara.com/docs/quickstart)\n",
|
||||
"\n",
|
||||
"This notebook shows how to use the basic retrieval functionality, when utilizing Vectara just as a Vector Store (without summarization), incuding: `similarity_search` and `similarity_search_with_score` as well as using the LangChain `as_retriever` functionality.\n",
|
||||
"\n",
|
||||
"You'll need to install `langchain-community` with `pip install -qU langchain-community` to use this integration"
|
||||
"\n",
|
||||
"## Setup\n",
|
||||
"\n",
|
||||
"To use the `VectaraVectorStore` you first need to install the partner package.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "dfdf03ba-d6f5-4b1e-86d3-a65c4bc99aa1",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"!uv pip install -U pip && uv pip install -qU langchain-vectara"
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -32,14 +47,14 @@
|
||||
"# Getting Started\n",
|
||||
"\n",
|
||||
"To get started, use the following steps:\n",
|
||||
"1. If you don't already have one, [Sign up](https://www.vectara.com/integrations/langchain) for your free Vectara trial. Once you have completed your sign up you will have a Vectara customer ID. You can find your customer ID by clicking on your name, on the top-right of the Vectara console window.\n",
|
||||
"1. If you don't already have one, [Sign up](https://www.vectara.com/integrations/langchain) for your free Vectara trial.\n",
|
||||
"2. Within your account you can create one or more corpora. Each corpus represents an area that stores text data upon ingest from input documents. To create a corpus, use the **\"Create Corpus\"** button. You then provide a name to your corpus as well as a description. Optionally you can define filtering attributes and apply some advanced options. If you click on your created corpus, you can see its name and corpus ID right on the top.\n",
|
||||
"3. Next you'll need to create API keys to access the corpus. Click on the **\"Access Control\"** tab in the corpus view and then the **\"Create API Key\"** button. Give your key a name, and choose whether you want query-only or query+index for your key. Click \"Create\" and you now have an active API key. Keep this key confidential. \n",
|
||||
"\n",
|
||||
"To use LangChain with Vectara, you'll need to have these three values: `customer ID`, `corpus ID` and `api_key`.\n",
|
||||
"You can provide those to LangChain in two ways:\n",
|
||||
"To use LangChain with Vectara, you'll need to have these two values: `corpus_key` and `api_key`.\n",
|
||||
"You can provide `VECTARA_API_KEY` to LangChain in two ways:\n",
|
||||
"\n",
|
||||
"1. Include in your environment these three variables: `VECTARA_CUSTOMER_ID`, `VECTARA_CORPUS_ID` and `VECTARA_API_KEY`.\n",
|
||||
"1. Include in your environment these two variables: `VECTARA_API_KEY`.\n",
|
||||
"\n",
|
||||
" For example, you can set these variables using os.environ and getpass as follows:\n",
|
||||
"\n",
|
||||
@ -47,8 +62,6 @@
|
||||
"import os\n",
|
||||
"import getpass\n",
|
||||
"\n",
|
||||
"os.environ[\"VECTARA_CUSTOMER_ID\"] = getpass.getpass(\"Vectara Customer ID:\")\n",
|
||||
"os.environ[\"VECTARA_CORPUS_ID\"] = getpass.getpass(\"Vectara Corpus ID:\")\n",
|
||||
"os.environ[\"VECTARA_API_KEY\"] = getpass.getpass(\"Vectara API Key:\")\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
@ -56,10 +69,8 @@
|
||||
"\n",
|
||||
"```python\n",
|
||||
"vectara = Vectara(\n",
|
||||
" vectara_customer_id=vectara_customer_id,\n",
|
||||
" vectara_corpus_id=vectara_corpus_id,\n",
|
||||
" vectara_api_key=vectara_api_key\n",
|
||||
" )\n",
|
||||
" vectara_api_key=vectara_api_key\n",
|
||||
")\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"In this notebook we assume they are provided in the environment."
|
||||
@ -67,23 +78,29 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"execution_count": 4,
|
||||
"id": "aac7a9a6",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"\n",
|
||||
"os.environ[\"VECTARA_API_KEY\"] = \"<YOUR_VECTARA_API_KEY>\"\n",
|
||||
"os.environ[\"VECTARA_CORPUS_ID\"] = \"<YOUR_VECTARA_CORPUS_ID>\"\n",
|
||||
"os.environ[\"VECTARA_CUSTOMER_ID\"] = \"<YOUR_VECTARA_CUSTOMER_ID>\"\n",
|
||||
"os.environ[\"VECTARA_API_KEY\"] = \"<VECTARA_API_KEY>\"\n",
|
||||
"os.environ[\"VECTARA_CORPUS_KEY\"] = \"VECTARA_CORPUS_KEY\"\n",
|
||||
"\n",
|
||||
"from langchain_community.vectorstores import Vectara\n",
|
||||
"from langchain_community.vectorstores.vectara import (\n",
|
||||
" RerankConfig,\n",
|
||||
" SummaryConfig,\n",
|
||||
"from langchain_vectara import Vectara\n",
|
||||
"from langchain_vectara.vectorstores import (\n",
|
||||
" ChainReranker,\n",
|
||||
" CorpusConfig,\n",
|
||||
" CustomerSpecificReranker,\n",
|
||||
" File,\n",
|
||||
" GenerationConfig,\n",
|
||||
" MmrReranker,\n",
|
||||
" SearchConfig,\n",
|
||||
" VectaraQueryConfig,\n",
|
||||
")"
|
||||
")\n",
|
||||
"\n",
|
||||
"vectara = Vectara(vectara_api_key=os.getenv(\"VECTARA_API_KEY\"))"
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -91,21 +108,37 @@
|
||||
"id": "875ffb7e",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"First we load the state-of-the-union text into Vectara. \n",
|
||||
"First we load the state-of-the-union text into Vectara.\n",
|
||||
"\n",
|
||||
"Note that we use the `from_files` interface which does not require any local processing or chunking - Vectara receives the file content and performs all the necessary pre-processing, chunking and embedding of the file into its knowledge store.\n",
|
||||
"Note that we use the add_files interface which does not require any local processing or chunking - Vectara receives the file content and performs all the necessary pre-processing, chunking and embedding of the file into its knowledge store.\n",
|
||||
"\n",
|
||||
"In this case it uses a `.txt` file but the same works for many other [file types](https://docs.vectara.com/docs/api-reference/indexing-apis/file-upload/file-upload-filetypes)."
|
||||
"In this case it uses a .txt file but the same works for many other [file types](https://docs.vectara.com/docs/api-reference/indexing-apis/file-upload/file-upload-filetypes)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"execution_count": 5,
|
||||
"id": "be0a4973",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"['state_of_the_union.txt']"
|
||||
]
|
||||
},
|
||||
"execution_count": 5,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"vectara = Vectara.from_files([\"state_of_the_union.txt\"])"
|
||||
"corpus_key = os.getenv(\"VECTARA_CORPUS_KEY\")\n",
|
||||
"file_obj = File(\n",
|
||||
" file_path=\"../document_loaders/example_data/state_of_the_union.txt\",\n",
|
||||
" metadata={\"source\": \"text_file\"},\n",
|
||||
")\n",
|
||||
"vectara.add_files([file_obj], corpus_key)"
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -113,38 +146,52 @@
|
||||
"id": "22a6b953",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Basic Vectara RAG (retrieval augmented generation)\n",
|
||||
"## Vectara RAG (retrieval augmented generation)\n",
|
||||
"\n",
|
||||
"We now create a `VectaraQueryConfig` object to control the retrieval and summarization options:\n",
|
||||
"* We enable summarization, specifying we would like the LLM to pick the top 7 matching chunks and respond in English\n",
|
||||
"* We enable MMR (max marginal relevance) in the retrieval process, with a 0.2 diversity bias factor\n",
|
||||
"* We want the top-10 results, with hybrid search configured with a value of 0.025\n",
|
||||
"\n",
|
||||
"Using this configuration, let's create a LangChain `Runnable` object that encpasulates the full Vectara RAG pipeline, using the `as_rag` method:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"execution_count": 6,
|
||||
"id": "9ecda054-96a8-4a91-aeae-32006efb1ac8",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"\"Biden addressed various topics in his statements. He highlighted the need to confront Putin by building a coalition of nations[1]. He also expressed commitment to investigating the impact of burn pits on soldiers' health, including his son's case[2]. Additionally, Biden outlined a plan to fight inflation by cutting prescription drug costs[3]. He emphasized the importance of continuing to combat COVID-19 and not just accepting living with it[4]. Furthermore, he discussed measures to weaken Russia economically and target Russian oligarchs[6]. Biden also advocated for passing the Equality Act to support LGBTQ+ Americans and condemned state laws targeting transgender individuals[7].\""
|
||||
"\"President Biden discussed several key issues in his recent statements. He emphasized the importance of keeping schools open and noted that with a high vaccination rate and reduced hospitalizations, most Americans can safely return to normal activities without masks [1]. He addressed the need to hold social media platforms accountable for their impact on children and called for stronger privacy protections and mental health services [2]. Biden also announced measures against Russian oligarchs, including closing American airspace to Russian flights and targeting their assets, as part of efforts to weaken Russia's economy [3], [7]. Additionally, he reaffirmed the need to protect women's rights, particularly the right to choose as affirmed in Roe v. Wade [5].\""
|
||||
]
|
||||
},
|
||||
"execution_count": 3,
|
||||
"execution_count": 6,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"summary_config = SummaryConfig(is_enabled=True, max_results=7, response_lang=\"eng\")\n",
|
||||
"rerank_config = RerankConfig(reranker=\"mmr\", rerank_k=50, mmr_diversity_bias=0.2)\n",
|
||||
"generation_config = GenerationConfig(\n",
|
||||
" max_used_search_results=7,\n",
|
||||
" response_language=\"eng\",\n",
|
||||
" generation_preset_name=\"vectara-summary-ext-24-05-med-omni\",\n",
|
||||
" enable_factual_consistency_score=True,\n",
|
||||
")\n",
|
||||
"search_config = SearchConfig(\n",
|
||||
" corpora=[CorpusConfig(corpus_key=corpus_key)],\n",
|
||||
" limit=25,\n",
|
||||
" reranker=ChainReranker(\n",
|
||||
" rerankers=[\n",
|
||||
" CustomerSpecificReranker(reranker_id=\"rnk_272725719\", limit=100),\n",
|
||||
" MmrReranker(diversity_bias=0.2, limit=100),\n",
|
||||
" ]\n",
|
||||
" ),\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"config = VectaraQueryConfig(\n",
|
||||
" k=10, lambda_val=0.005, rerank_config=rerank_config, summary_config=summary_config\n",
|
||||
" search=search_config,\n",
|
||||
" generation=generation_config,\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"query_str = \"what did Biden say?\"\n",
|
||||
@ -163,7 +210,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"execution_count": 7,
|
||||
"id": "27f01330-8917-4eff-b603-59ab2571a4d2",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
@ -171,7 +218,7 @@
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Biden addressed various topics in his statements. He highlighted the importance of building coalitions to confront global challenges [1]. He also expressed commitment to investigating the impact of burn pits on soldiers' health, including his son's case [2, 4]. Additionally, Biden outlined his plan to combat inflation by cutting prescription drug costs and reducing the deficit, with support from Nobel laureates and business leaders [3]. He emphasized the ongoing fight against COVID-19 and the need to continue combating the virus [5]. Furthermore, Biden discussed measures taken to weaken Russia's economic and military strength, targeting Russian oligarchs and corrupt leaders [6]. He also advocated for passing the Equality Act to support LGBTQ+ Americans and address discriminatory state laws [7]."
|
||||
"President Biden discussed several key issues in his recent statements. He emphasized the importance of keeping schools open and noted that with a high vaccination rate and reduced hospitalizations, most Americans can safely return to normal activities without masks [1]. He addressed the need to hold social media platforms accountable for their impact on children and called for stronger privacy protections and mental health services [2]. Biden also announced measures against Russia, including preventing its central bank from defending the Ruble and targeting Russian oligarchs' assets, as part of efforts to weaken Russia's economy and military [3]. Additionally, he reaffirmed the commitment to protect women's rights, particularly the right to choose as affirmed in Roe v. Wade [5]. Lastly, he advocated for funding the police with necessary resources and training to ensure community safety [6]."
|
||||
]
|
||||
}
|
||||
],
|
||||
@ -203,7 +250,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"execution_count": 8,
|
||||
"id": "b2e0aa2c-7c8e-4d79-8abc-66f5a1f961b3",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
@ -211,19 +258,12 @@
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Biden addressed various topics in his statements. He highlighted the need to confront Putin by building a coalition of nations[1]. He also expressed his commitment to investigating the impact of burn pits on soldiers' health, referencing his son's experience[2]. Additionally, Biden discussed his plan to fight inflation by cutting prescription drug costs and garnering support from Nobel laureates and business leaders[4]. Furthermore, he emphasized the importance of continuing to combat COVID-19 and not merely accepting living with the virus[5]. Biden's remarks encompassed international relations, healthcare challenges faced by soldiers, economic strategies, and the ongoing battle against the pandemic.\n",
|
||||
"Vectara FCS = 0.41796625\n"
|
||||
"President Biden discussed several key topics in his recent statements. He emphasized the importance of keeping schools open and noted that with a high vaccination rate and reduced hospitalizations, most Americans can safely return to normal activities without masks [1]. He addressed the need to hold social media platforms accountable for their impact on children and called for stronger privacy protections and mental health services [2]. Biden also announced measures against Russian oligarchs, including closing American airspace to Russian flights and targeting their assets, as part of efforts to weaken Russia's economy [3], [7]. Additionally, he reaffirmed the need to protect women's rights, particularly the right to choose as affirmed in Roe v. Wade [5].\n",
|
||||
"Vectara FCS = 0.61621094\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"summary_config = SummaryConfig(is_enabled=True, max_results=5, response_lang=\"eng\")\n",
|
||||
"rerank_config = RerankConfig(reranker=\"mmr\", rerank_k=50, mmr_diversity_bias=0.1)\n",
|
||||
"config = VectaraQueryConfig(\n",
|
||||
" k=10, lambda_val=0.005, rerank_config=rerank_config, summary_config=summary_config\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"rag = vectara.as_rag(config)\n",
|
||||
"resp = rag.invoke(query_str)\n",
|
||||
"print(resp[\"answer\"])\n",
|
||||
"print(f\"Vectara FCS = {resp['fcs']}\")"
|
||||
@ -243,26 +283,28 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"execution_count": 9,
|
||||
"id": "19cd2f86",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[Document(page_content='He thought the West and NATO wouldn’t respond. And he thought he could divide us at home. We were ready. Here is what we did. We prepared extensively and carefully. We spent months building a coalition of other freedom-loving nations from Europe and the Americas to Asia and Africa to confront Putin.', metadata={'lang': 'eng', 'section': '1', 'offset': '2160', 'len': '36', 'X-TIKA:Parsed-By': 'org.apache.tika.parser.csv.TextAndCSVParser', 'Content-Encoding': 'UTF-8', 'Content-Type': 'text/plain; charset=UTF-8', 'source': 'vectara'}),\n",
|
||||
" Document(page_content='When they came home, many of the world’s fittest and best trained warriors were never the same. Dizziness. \\n\\nA cancer that would put them in a flag-draped coffin. I know. \\n\\nOne of those soldiers was my son Major Beau Biden. We don’t know for sure if a burn pit was the cause of his brain cancer, or the diseases of so many of our troops. But I’m committed to finding out everything we can.', metadata={'lang': 'eng', 'section': '1', 'offset': '34652', 'len': '60', 'X-TIKA:Parsed-By': 'org.apache.tika.parser.csv.TextAndCSVParser', 'Content-Encoding': 'UTF-8', 'Content-Type': 'text/plain; charset=UTF-8', 'source': 'vectara'}),\n",
|
||||
" Document(page_content='But cancer from prolonged exposure to burn pits ravaged Heath’s lungs and body. Danielle says Heath was a fighter to the very end. He didn’t know how to stop fighting, and neither did she. Through her pain she found purpose to demand we do better. Tonight, Danielle—we are.', metadata={'lang': 'eng', 'section': '1', 'offset': '35442', 'len': '57', 'X-TIKA:Parsed-By': 'org.apache.tika.parser.csv.TextAndCSVParser', 'Content-Encoding': 'UTF-8', 'Content-Type': 'text/plain; charset=UTF-8', 'source': 'vectara'})]"
|
||||
"[Document(metadata={'X-TIKA:Parsed-By': 'org.apache.tika.parser.csv.TextAndCSVParser', 'Content-Encoding': 'UTF-8', 'X-TIKA:detectedEncoding': 'UTF-8', 'X-TIKA:encodingDetector': 'UniversalEncodingDetector', 'Content-Type': 'text/plain; charset=UTF-8', 'source': 'text_file', 'framework': 'langchain'}, page_content='The U.S. Department of Justice is assembling a dedicated task force to go after the crimes of Russian oligarchs. We are joining with our European allies to find and seize your yachts your luxury apartments your private jets. We are coming for your ill-begotten gains. And tonight I am announcing that we will join our allies in closing off American air space to all Russian flights – further isolating Russia – and adding an additional squeeze –on their economy. The Ruble has lost 30% of its value.'),\n",
|
||||
" Document(metadata={'X-TIKA:Parsed-By': 'org.apache.tika.parser.csv.TextAndCSVParser', 'Content-Encoding': 'UTF-8', 'X-TIKA:detectedEncoding': 'UTF-8', 'X-TIKA:encodingDetector': 'UniversalEncodingDetector', 'Content-Type': 'text/plain; charset=UTF-8', 'source': 'text_file', 'framework': 'langchain'}, page_content='When they came home, many of the world’s fittest and best trained warriors were never the same. Dizziness. \\n\\nA cancer that would put them in a flag-draped coffin. I know. \\n\\nOne of those soldiers was my son Major Beau Biden. We don’t know for sure if a burn pit was the cause of his brain cancer, or the diseases of so many of our troops. But I’m committed to finding out everything we can.'),\n",
|
||||
" Document(metadata={'X-TIKA:Parsed-By': 'org.apache.tika.parser.csv.TextAndCSVParser', 'Content-Encoding': 'UTF-8', 'X-TIKA:detectedEncoding': 'UTF-8', 'X-TIKA:encodingDetector': 'UniversalEncodingDetector', 'Content-Type': 'text/plain; charset=UTF-8', 'source': 'text_file', 'framework': 'langchain'}, page_content='He rejected repeated efforts at diplomacy. He thought the West and NATO wouldn’t respond. And he thought he could divide us at home. We were ready. Here is what we did. We prepared extensively and carefully.'),\n",
|
||||
" Document(metadata={'X-TIKA:Parsed-By': 'org.apache.tika.parser.csv.TextAndCSVParser', 'Content-Encoding': 'UTF-8', 'X-TIKA:detectedEncoding': 'UTF-8', 'X-TIKA:encodingDetector': 'UniversalEncodingDetector', 'Content-Type': 'text/plain; charset=UTF-8', 'source': 'text_file', 'framework': 'langchain'}, page_content='And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson.'),\n",
|
||||
" Document(metadata={'X-TIKA:Parsed-By': 'org.apache.tika.parser.csv.TextAndCSVParser', 'Content-Encoding': 'UTF-8', 'X-TIKA:detectedEncoding': 'UTF-8', 'X-TIKA:encodingDetector': 'UniversalEncodingDetector', 'Content-Type': 'text/plain; charset=UTF-8', 'source': 'text_file', 'framework': 'langchain'}, page_content='Putin’s latest attack on Ukraine was premeditated and unprovoked. He rejected repeated efforts at diplomacy. He thought the West and NATO wouldn’t respond. And he thought he could divide us at home. We were ready. Here is what we did.')]"
|
||||
]
|
||||
},
|
||||
"execution_count": 6,
|
||||
"execution_count": 9,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"config.summary_config.is_enabled = False\n",
|
||||
"config.k = 3\n",
|
||||
"config.generation = None\n",
|
||||
"config.search.limit = 5\n",
|
||||
"retriever = vectara.as_retriever(config=config)\n",
|
||||
"retriever.invoke(query_str)"
|
||||
]
|
||||
@ -277,27 +319,34 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"execution_count": 10,
|
||||
"id": "59268e9a-6089-4bb2-8c61-1ea6b956f83c",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[Document(page_content='He thought the West and NATO wouldn’t respond. And he thought he could divide us at home. We were ready. Here is what we did. We prepared extensively and carefully. We spent months building a coalition of other freedom-loving nations from Europe and the Americas to Asia and Africa to confront Putin.', metadata={'lang': 'eng', 'section': '1', 'offset': '2160', 'len': '36', 'X-TIKA:Parsed-By': 'org.apache.tika.parser.csv.TextAndCSVParser', 'Content-Encoding': 'UTF-8', 'Content-Type': 'text/plain; charset=UTF-8', 'source': 'vectara'}),\n",
|
||||
" Document(page_content='When they came home, many of the world’s fittest and best trained warriors were never the same. Dizziness. \\n\\nA cancer that would put them in a flag-draped coffin. I know. \\n\\nOne of those soldiers was my son Major Beau Biden. We don’t know for sure if a burn pit was the cause of his brain cancer, or the diseases of so many of our troops. But I’m committed to finding out everything we can.', metadata={'lang': 'eng', 'section': '1', 'offset': '34652', 'len': '60', 'X-TIKA:Parsed-By': 'org.apache.tika.parser.csv.TextAndCSVParser', 'Content-Encoding': 'UTF-8', 'Content-Type': 'text/plain; charset=UTF-8', 'source': 'vectara'}),\n",
|
||||
" Document(page_content='But cancer from prolonged exposure to burn pits ravaged Heath’s lungs and body. Danielle says Heath was a fighter to the very end. He didn’t know how to stop fighting, and neither did she. Through her pain she found purpose to demand we do better. Tonight, Danielle—we are.', metadata={'lang': 'eng', 'section': '1', 'offset': '35442', 'len': '57', 'X-TIKA:Parsed-By': 'org.apache.tika.parser.csv.TextAndCSVParser', 'Content-Encoding': 'UTF-8', 'Content-Type': 'text/plain; charset=UTF-8', 'source': 'vectara'}),\n",
|
||||
" Document(page_content=\"Biden discussed various topics in his statements. He highlighted the importance of unity and preparation to confront challenges, such as building coalitions to address global issues [1]. Additionally, he shared personal stories about the impact of health issues on soldiers, including his son's experience with brain cancer possibly linked to burn pits [2]. Biden also outlined his plans to combat inflation by cutting prescription drug costs and emphasized the ongoing efforts to combat COVID-19, rejecting the idea of merely living with the virus [4, 5]. Overall, Biden's messages revolved around unity, healthcare challenges faced by soldiers, economic plans, and the ongoing fight against COVID-19.\", metadata={'summary': True, 'fcs': 0.54751414})]"
|
||||
"[Document(metadata={'X-TIKA:Parsed-By': 'org.apache.tika.parser.csv.TextAndCSVParser', 'Content-Encoding': 'UTF-8', 'X-TIKA:detectedEncoding': 'UTF-8', 'X-TIKA:encodingDetector': 'UniversalEncodingDetector', 'Content-Type': 'text/plain; charset=UTF-8', 'source': 'text_file', 'framework': 'langchain'}, page_content='We won’t be able to compete for the jobs of the 21st Century if we don’t fix that. That’s why it was so important to pass the Bipartisan Infrastructure Law—the most sweeping investment to rebuild America in history. This was a bipartisan effort, and I want to thank the members of both parties who worked to make it happen. We’re done talking about infrastructure weeks. We’re going to have an infrastructure decade.'),\n",
|
||||
" Document(metadata={'X-TIKA:Parsed-By': 'org.apache.tika.parser.csv.TextAndCSVParser', 'Content-Encoding': 'UTF-8', 'X-TIKA:detectedEncoding': 'UTF-8', 'X-TIKA:encodingDetector': 'UniversalEncodingDetector', 'Content-Type': 'text/plain; charset=UTF-8', 'source': 'text_file', 'framework': 'langchain'}, page_content='The U.S. Department of Justice is assembling a dedicated task force to go after the crimes of Russian oligarchs. We are joining with our European allies to find and seize your yachts your luxury apartments your private jets. We are coming for your ill-begotten gains. And tonight I am announcing that we will join our allies in closing off American air space to all Russian flights – further isolating Russia – and adding an additional squeeze –on their economy. The Ruble has lost 30% of its value.'),\n",
|
||||
" Document(metadata={'X-TIKA:Parsed-By': 'org.apache.tika.parser.csv.TextAndCSVParser', 'Content-Encoding': 'UTF-8', 'X-TIKA:detectedEncoding': 'UTF-8', 'X-TIKA:encodingDetector': 'UniversalEncodingDetector', 'Content-Type': 'text/plain; charset=UTF-8', 'source': 'text_file', 'framework': 'langchain'}, page_content='When they came home, many of the world’s fittest and best trained warriors were never the same. Dizziness. \\n\\nA cancer that would put them in a flag-draped coffin. I know. \\n\\nOne of those soldiers was my son Major Beau Biden. We don’t know for sure if a burn pit was the cause of his brain cancer, or the diseases of so many of our troops. But I’m committed to finding out everything we can.'),\n",
|
||||
" Document(metadata={'X-TIKA:Parsed-By': 'org.apache.tika.parser.csv.TextAndCSVParser', 'Content-Encoding': 'UTF-8', 'X-TIKA:detectedEncoding': 'UTF-8', 'X-TIKA:encodingDetector': 'UniversalEncodingDetector', 'Content-Type': 'text/plain; charset=UTF-8', 'source': 'text_file', 'framework': 'langchain'}, page_content='Preventing Russia’s central bank from defending the Russian Ruble making Putin’s $630 Billion “war fund” worthless. We are choking off Russia’s access to technology that will sap its economic strength and weaken its military for years to come. Tonight I say to the Russian oligarchs and corrupt leaders who have bilked billions of dollars off this violent regime no more. The U.S. Department of Justice is assembling a dedicated task force to go after the crimes of Russian oligarchs. We are joining with our European allies to find and seize your yachts your luxury apartments your private jets.'),\n",
|
||||
" Document(metadata={'X-TIKA:Parsed-By': 'org.apache.tika.parser.csv.TextAndCSVParser', 'Content-Encoding': 'UTF-8', 'X-TIKA:detectedEncoding': 'UTF-8', 'X-TIKA:encodingDetector': 'UniversalEncodingDetector', 'Content-Type': 'text/plain; charset=UTF-8', 'source': 'text_file', 'framework': 'langchain'}, page_content='He rejected repeated efforts at diplomacy. He thought the West and NATO wouldn’t respond. And he thought he could divide us at home. We were ready. Here is what we did. We prepared extensively and carefully.'),\n",
|
||||
" Document(metadata={'X-TIKA:Parsed-By': 'org.apache.tika.parser.csv.TextAndCSVParser', 'Content-Encoding': 'UTF-8', 'X-TIKA:detectedEncoding': 'UTF-8', 'X-TIKA:encodingDetector': 'UniversalEncodingDetector', 'Content-Type': 'text/plain; charset=UTF-8', 'source': 'text_file', 'framework': 'langchain'}, page_content='It delivered immediate economic relief for tens of millions of Americans. Helped put food on their table, keep a roof over their heads, and cut the cost of health insurance. And as my Dad used to say, it gave people a little breathing room. And unlike the $2 Trillion tax cut passed in the previous administration that benefitted the top 1% of Americans, the American Rescue Plan helped working people—and left no one behind. Lots of jobs. \\n\\nIn fact—our economy created over 6.5 Million new jobs just last year, more jobs created in one year \\nthan ever before in the history of America.'),\n",
|
||||
" Document(metadata={'X-TIKA:Parsed-By': 'org.apache.tika.parser.csv.TextAndCSVParser', 'Content-Encoding': 'UTF-8', 'X-TIKA:detectedEncoding': 'UTF-8', 'X-TIKA:encodingDetector': 'UniversalEncodingDetector', 'Content-Type': 'text/plain; charset=UTF-8', 'source': 'text_file', 'framework': 'langchain'}, page_content='And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson.'),\n",
|
||||
" Document(metadata={'X-TIKA:Parsed-By': 'org.apache.tika.parser.csv.TextAndCSVParser', 'Content-Encoding': 'UTF-8', 'X-TIKA:detectedEncoding': 'UTF-8', 'X-TIKA:encodingDetector': 'UniversalEncodingDetector', 'Content-Type': 'text/plain; charset=UTF-8', 'source': 'text_file', 'framework': 'langchain'}, page_content='All told, we created 369,000 new manufacturing jobs in America just last year. Powered by people I’ve met like JoJo Burgess, from generations of union steelworkers from Pittsburgh, who’s here with us tonight. As Ohio Senator Sherrod Brown says, “It’s time to bury the label “Rust Belt.” It’s time. \\n\\nBut with all the bright spots in our economy, record job growth and higher wages, too many families are struggling to keep up with the bills. Inflation is robbing them of the gains they might otherwise feel.'),\n",
|
||||
" Document(metadata={'X-TIKA:Parsed-By': 'org.apache.tika.parser.csv.TextAndCSVParser', 'Content-Encoding': 'UTF-8', 'X-TIKA:detectedEncoding': 'UTF-8', 'X-TIKA:encodingDetector': 'UniversalEncodingDetector', 'Content-Type': 'text/plain; charset=UTF-8', 'source': 'text_file', 'framework': 'langchain'}, page_content='Putin’s latest attack on Ukraine was premeditated and unprovoked. He rejected repeated efforts at diplomacy. He thought the West and NATO wouldn’t respond. And he thought he could divide us at home. We were ready. Here is what we did.'),\n",
|
||||
" Document(metadata={'X-TIKA:Parsed-By': 'org.apache.tika.parser.csv.TextAndCSVParser', 'Content-Encoding': 'UTF-8', 'X-TIKA:detectedEncoding': 'UTF-8', 'X-TIKA:encodingDetector': 'UniversalEncodingDetector', 'Content-Type': 'text/plain; charset=UTF-8', 'source': 'text_file', 'framework': 'langchain'}, page_content='Danielle says Heath was a fighter to the very end. He didn’t know how to stop fighting, and neither did she. Through her pain she found purpose to demand we do better. Tonight, Danielle—we are. The VA is pioneering new ways of linking toxic exposures to diseases, already helping more veterans get benefits.'),\n",
|
||||
" Document(metadata={'summary': True, 'fcs': (0.54785156,)}, page_content='President Biden spoke about several key issues. He emphasized the importance of the Bipartisan Infrastructure Law, calling it the most significant investment to rebuild America and highlighting it as a bipartisan effort [1]. He also announced measures against Russian oligarchs, including assembling a task force to seize their assets and closing American airspace to Russian flights, further isolating Russia economically [2]. Additionally, he expressed a commitment to investigating the health impacts of burn pits on military personnel, referencing his son, Major Beau Biden, who suffered from brain cancer [3].')]"
|
||||
]
|
||||
},
|
||||
"execution_count": 7,
|
||||
"execution_count": 10,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"config.summary_config.is_enabled = True\n",
|
||||
"config.k = 3\n",
|
||||
"config.generation = GenerationConfig()\n",
|
||||
"config.search.limit = 10\n",
|
||||
"retriever = vectara.as_retriever(config=config)\n",
|
||||
"retriever.invoke(query_str)"
|
||||
]
|
||||
@ -316,17 +365,17 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"execution_count": 14,
|
||||
"id": "e14325b9",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"\"Biden's statement highlighted his efforts to unite freedom-loving nations against Putin's aggression, sharing information in advance to counter Russian lies and hold Putin accountable[1]. Additionally, he emphasized his commitment to military families, like Danielle Robinson, and outlined plans for more affordable housing, Pre-K for 3- and 4-year-olds, and ensuring no additional taxes for those earning less than $400,000 a year[2][3]. The statement also touched on the readiness of the West and NATO to respond to Putin's actions, showcasing extensive preparation and coalition-building efforts[4]. Heath Robinson's story, a combat medic who succumbed to cancer from burn pits, was used to illustrate the resilience and fight for better conditions[5].\""
|
||||
"'The remarks made by Biden include his emphasis on the importance of the Bipartisan Infrastructure Law, which he describes as the most significant investment to rebuild America in history. He highlights the bipartisan effort involved in passing this law and expresses gratitude to members of both parties for their collaboration. Biden also mentions the transition from \"infrastructure weeks\" to an \"infrastructure decade\" [1]. Additionally, he shares a personal story about his father having to leave their home in Scranton, Pennsylvania, to find work, which influenced his decision to fight for the American Rescue Plan to help those in need [2].'"
|
||||
]
|
||||
},
|
||||
"execution_count": 8,
|
||||
"execution_count": 14,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
@ -371,7 +420,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.8"
|
||||
"version": "3.12.0"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
Loading…
Reference in New Issue
Block a user