Konko fix dependency

This commit is contained in:
Bagatur
2023-09-08 10:06:37 -07:00
parent c6b27b3692
commit 9095dc69ac
138 changed files with 10707 additions and 2862 deletions

View File

@@ -18,7 +18,7 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
@@ -93,8 +93,22 @@
}
],
"metadata": {
"kernelspec": {
"display_name": "langchain",
"language": "python",
"name": "python3"
},
"language_info": {
"name": "python"
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.5"
},
"orig_nbformat": 4
},

View File

@@ -31,11 +31,16 @@
"outputs": [],
"source": [
"# get new tokens: https://app.banana.dev/\n",
"# We need two tokens, not just an `api_key`: `BANANA_API_KEY` and `YOUR_MODEL_KEY`\n",
"# We need three parameters to make a Banana.dev API call:\n",
"# * a team api key\n",
"# * the model's unique key\n",
"# * the model's url slug\n",
"\n",
"import os\n",
"from getpass import getpass\n",
"\n",
"# You can get this from the main dashboard\n",
"# at https://app.banana.dev\n",
"os.environ[\"BANANA_API_KEY\"] = \"YOUR_API_KEY\"\n",
"# OR\n",
"# BANANA_API_KEY = getpass()"
@@ -70,7 +75,9 @@
"metadata": {},
"outputs": [],
"source": [
"llm = Banana(model_key=\"YOUR_MODEL_KEY\")"
"# Both of these are found in your model's \n",
"# detail page in https://app.banana.dev\n",
"llm = Banana(model_key=\"YOUR_MODEL_KEY\", model_url_slug=\"YOUR_MODEL_URL_SLUG\")"
]
},
{

View File

@@ -9,13 +9,20 @@ pip install awadb
```
## VectorStore
## Vector Store
There exists a wrapper around AwaDB vector databases, allowing you to use it as a vectorstore,
whether for semantic search or example selection.
```python
from langchain.vectorstores import AwaDB
```
For a more detailed walkthrough of the AwaDB wrapper, see [here](/docs/integrations/vectorstores/awadb.html).
See a [usage example](/docs/integrations/vectorstores/awadb).
## Text Embedding Model
```python
from langchain.embeddings import AwaEmbeddings
```
See a [usage example](/docs/integrations/text_embedding/awadb).

View File

@@ -1,79 +1,72 @@
# Banana
This page covers how to use the Banana ecosystem within LangChain.
It is broken into two parts: installation and setup, and then references to specific Banana wrappers.
Banana provided serverless GPU inference for AI models, including a CI/CD build pipeline and a simple Python framework (Potassium) to server your models.
This page covers how to use the [Banana](https://www.banana.dev) ecosystem within LangChain.
It is broken into two parts:
* installation and setup,
* and then references to specific Banana wrappers.
## Installation and Setup
- Install with `pip install banana-dev`
- Get an Banana api key and set it as an environment variable (`BANANA_API_KEY`)
- Get an Banana api key from the [Banana.dev dashboard](https://app.banana.dev) and set it as an environment variable (`BANANA_API_KEY`)
- Get your model's key and url slug from the model's details page
## Define your Banana Template
If you want to use an available language model template you can find one [here](https://app.banana.dev/templates/conceptofmind/serverless-template-palmyra-base).
This template uses the Palmyra-Base model by [Writer](https://writer.com/product/api/).
You can check out an example Banana repository [here](https://github.com/conceptofmind/serverless-template-palmyra-base).
You'll need to set up a Github repo for your Banana app. You can get started in 5 minutes using [this guide](https://docs.banana.dev/banana-docs/).
Alternatively, for a ready-to-go LLM example, you can check out Banana's [CodeLlama-7B-Instruct-GPTQ](https://github.com/bananaml/demo-codellama-7b-instruct-gptq) GitHub repository. Just fork it and deploy it within Banana.
Other starter repos are available [here](https://github.com/orgs/bananaml/repositories?q=demo-&type=all&language=&sort=).
## Build the Banana app
Banana Apps must include the "output" key in the return json.
There is a rigid response structure.
To use Banana apps within Langchain, they must include the `outputs` key
in the returned json, and the value must be a string.
```python
# Return the results as a dictionary
result = {'output': result}
result = {'outputs': result}
```
An example inference function would be:
```python
def inference(model_inputs:dict) -> dict:
global model
global tokenizer
# Parse out your arguments
prompt = model_inputs.get('prompt', None)
if prompt == None:
return {'message': "No prompt provided"}
# Run the model
input_ids = tokenizer.encode(prompt, return_tensors='pt').cuda()
output = model.generate(
input_ids,
max_length=100,
do_sample=True,
top_k=50,
top_p=0.95,
num_return_sequences=1,
temperature=0.9,
early_stopping=True,
no_repeat_ngram_size=3,
num_beams=5,
length_penalty=1.5,
repetition_penalty=1.5,
bad_words_ids=[[tokenizer.encode(' ', add_prefix_space=True)[0]]]
)
result = tokenizer.decode(output[0], skip_special_tokens=True)
# Return the results as a dictionary
result = {'output': result}
return result
@app.handler("/")
def handler(context: dict, request: Request) -> Response:
"""Handle a request to generate code from a prompt."""
model = context.get("model")
tokenizer = context.get("tokenizer")
max_new_tokens = request.json.get("max_new_tokens", 512)
temperature = request.json.get("temperature", 0.7)
prompt = request.json.get("prompt")
prompt_template=f'''[INST] Write code to solve the following coding problem that obeys the constraints and passes the example test cases. Please wrap your code answer using ```:
{prompt}
[/INST]
'''
input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda()
output = model.generate(inputs=input_ids, temperature=temperature, max_new_tokens=max_new_tokens)
result = tokenizer.decode(output[0])
return Response(json={"outputs": result}, status=200)
```
You can find a full example of a Banana app [here](https://github.com/conceptofmind/serverless-template-palmyra-base/blob/main/app.py).
This example is from the `app.py` file in [CodeLlama-7B-Instruct-GPTQ](https://github.com/bananaml/demo-codellama-7b-instruct-gptq).
## Wrappers
### LLM
There exists an Banana LLM wrapper, which you can access with
Within Langchain, there exists a Banana LLM wrapper, which you can access with
```python
from langchain.llms import Banana
```
You need to provide a model key located in the dashboard:
You need to provide a model key and model url slug, which you can get from the model's details page in the [Banana.dev dashboard](https://app.banana.dev).
```python
llm = Banana(model_key="YOUR_MODEL_KEY")
llm = Banana(model_key="YOUR_MODEL_KEY", model_url_slug="YOUR_MODEL_URL_SLUG")
```

View File

@@ -1,20 +1,24 @@
# ModelScope
>[ModelScope](https://www.modelscope.cn/home) is a big repository of the models and datasets.
This page covers how to use the modelscope ecosystem within LangChain.
It is broken into two parts: installation and setup, and then references to specific modelscope wrappers.
## Installation and Setup
* Install the Python SDK with `pip install modelscope`
Install the `modelscope` package.
```bash
pip install modelscope
```
## Wrappers
### Embeddings
## Text Embedding Models
There exists a modelscope Embeddings wrapper, which you can access with
```python
from langchain.embeddings import ModelScopeEmbeddings
```
For a more detailed walkthrough of this, see [this notebook](/docs/integrations/text_embedding/modelscope_hub.html)
For a more detailed walkthrough of this, see [this notebook](/docs/integrations/text_embedding/modelscope_hub)

View File

@@ -1,17 +1,31 @@
# NLPCloud
This page covers how to use the NLPCloud ecosystem within LangChain.
It is broken into two parts: installation and setup, and then references to specific NLPCloud wrappers.
>[NLP Cloud](https://docs.nlpcloud.com/#introduction) is an artificial intelligence platform that allows you to use the most advanced AI engines, and even train your own engines with your own data.
## Installation and Setup
- Install the Python SDK with `pip install nlpcloud`
- Install the `nlpcloud` package.
```bash
pip install nlpcloud
```
- Get an NLPCloud api key and set it as an environment variable (`NLPCLOUD_API_KEY`)
## Wrappers
### LLM
## LLM
See a [usage example](/docs/integrations/llms/nlpcloud).
There exists an NLPCloud LLM wrapper, which you can access with
```python
from langchain.llms import NLPCloud
```
## Text Embedding Models
See a [usage example](/docs/integrations/text_embedding/nlp_cloud)
```python
from langchain.embeddings import NLPCloudEmbeddings
```

View File

@@ -1,4 +1,10 @@
# Portkey
>[Portkey](https://docs.portkey.ai/overview/introduction) is a platform designed to streamline the deployment
> and management of Generative AI applications.
> It provides comprehensive features for monitoring, managing models,
> and improving the performance of your AI applications.
## LLMOps for Langchain
Portkey brings production readiness to Langchain. With Portkey, you can

View File

@@ -1,19 +1,14 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Log, Trace, and Monitor Langchain LLM Calls\n",
"# Log, Trace, and Monitor\n",
"\n",
"When building apps or agents using Langchain, you end up making multiple API calls to fulfill a single user request. However, these requests are not chained when you want to analyse them. With [**Portkey**](/docs/ecosystem/integrations/portkey), all the embeddings, completion, and other requests from a single user request will get logged and traced to a common ID, enabling you to gain full visibility of user interactions.\n",
"\n",
"This notebook serves as a step-by-step guide on how to integrate and use Portkey in your Langchain app."
"This notebook serves as a step-by-step guide on how to log, trace, and monitor Langchain LLM calls using `Portkey` in your Langchain app."
]
},
{
@@ -234,9 +229,9 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.3"
"version": "3.10.12"
}
},
"nbformat": 4,
"nbformat_minor": 2
"nbformat_minor": 4
}

View File

@@ -18,3 +18,11 @@ See a [usage example](/docs/modules/data_connection/document_transformers/text_s
```python
from langchain.text_splitter import SpacyTextSplitter
```
## Text Embedding Models
See a [usage example](/docs/integrations/text_embedding/spacy_embedding)
```python
from langchain.embeddings.spacy_embeddings import SpacyEmbeddings
```

View File

@@ -11,9 +11,10 @@ What is Vectara?
- You can use Vectara's integration with LangChain as a Vector store or using the Retriever abstraction.
## Installation and Setup
To use Vectara with LangChain no special installation steps are required. You just have to provide your customer_id, corpus ID, and an API key created within the Vectara console to enable indexing and searching.
To use Vectara with LangChain no special installation steps are required.
To get started, follow our [quickstart](https://docs.vectara.com/docs/quickstart) guide to create an account, a corpus and an API key.
Once you have these, you can provide them as arguments to the Vectara vectorstore, or you can set them as environment variables.
Alternatively these can be provided as environment variables
- export `VECTARA_CUSTOMER_ID`="your_customer_id"
- export `VECTARA_CORPUS_ID`="your_corpus_id"
- export `VECTARA_API_KEY`="your-vectara-api-key"

View File

@@ -5,9 +5,11 @@
"id": "b14a24db",
"metadata": {},
"source": [
"# AwaEmbedding\n",
"# AwaDB\n",
"\n",
"This notebook explains how to use AwaEmbedding, which is included in [awadb](https://github.com/awa-ai/awadb), to embedding texts in langchain."
">[AwaDB](https://github.com/awa-ai/awadb) is an AI Native database for the search and storage of embedding vectors used by LLM Applications.\n",
"\n",
"This notebook explains how to use `AwaEmbeddings` in LangChain."
]
},
{
@@ -101,7 +103,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.4"
"version": "3.10.12"
}
},
"nbformat": 4,

View File

@@ -5,7 +5,9 @@
"id": "75e378f5-55d7-44b6-8e2e-6d7b8b171ec4",
"metadata": {},
"source": [
"# Bedrock Embeddings"
"# Bedrock\n",
"\n",
">[Amazon Bedrock](https://aws.amazon.com/bedrock/) is a fully managed service that makes FMs from leading AI startups and Amazon available via an API, so you can choose from a wide range of FMs to find the model that is best suited for your use case.\n"
]
},
{
@@ -91,7 +93,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.13"
"version": "3.10.12"
}
},
"nbformat": 4,

View File

@@ -5,26 +5,29 @@
"id": "719619d3",
"metadata": {},
"source": [
"# BGE Hugging Face Embeddings\n",
"# BGE on Hugging Face\n",
"\n",
"This notebook shows how to use BGE Embeddings through Hugging Face"
">[BGE models on the HuggingFace](https://huggingface.co/BAAI/bge-large-en) are [the best open-source embedding models](https://huggingface.co/spaces/mteb/leaderboard).\n",
">BGE model is created by the [Beijing Academy of Artificial Intelligence (BAAI)](https://www.baai.ac.cn/english.html). `BAAI` is a private non-profit organization engaged in AI research and development.\n",
"\n",
"This notebook shows how to use `BGE Embeddings` through `Hugging Face`"
]
},
{
"cell_type": "code",
"execution_count": 8,
"execution_count": null,
"id": "f7a54279",
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"# !pip install sentence_transformers"
"#!pip install sentence_transformers"
]
},
{
"cell_type": "code",
"execution_count": 5,
"execution_count": null,
"id": "9e1d5b6b",
"metadata": {},
"outputs": [],
@@ -43,12 +46,24 @@
},
{
"cell_type": "code",
"execution_count": 7,
"execution_count": 5,
"id": "e59d1a89",
"metadata": {},
"outputs": [],
"outputs": [
{
"data": {
"text/plain": [
"384"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"embedding = hf.embed_query(\"hi this is harrison\")"
"embedding = hf.embed_query(\"hi this is harrison\")\n",
"len(embedding)"
]
},
{
@@ -76,7 +91,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.1"
"version": "3.10.12"
}
},
"nbformat": 4,

View File

@@ -1,13 +1,14 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# Google Cloud Platform Vertex AI PaLM \n",
"# Google Vertex AI PaLM \n",
"\n",
"Note: This is seperate from the Google PaLM integration, it exposes [Vertex AI PaLM API](https://cloud.google.com/vertex-ai/docs/generative-ai/learn/overview) on Google Cloud. \n",
">[Vertex AI PaLM API](https://cloud.google.com/vertex-ai/docs/generative-ai/learn/overview) is a service on Google Cloud exposing the embedding models. \n",
"\n",
"Note: This integration is seperate from the Google PaLM integration.\n",
"\n",
"By default, Google Cloud [does not use](https://cloud.google.com/vertex-ai/docs/generative-ai/data-governance#foundation_model_development) Customer Data to train its foundation models as part of Google Cloud`s AI/ML Privacy Commitment. More details about how Google processes data can also be found in [Google's Customer Data Processing Addendum (CDPA)](https://cloud.google.com/terms/data-processing-addendum).\n",
"\n",
@@ -96,7 +97,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
"version": "3.10.12"
},
"vscode": {
"interpreter": {

View File

@@ -5,13 +5,23 @@
"id": "ed47bb62",
"metadata": {},
"source": [
"# Hugging Face Hub\n",
"# Hugging Face\n",
"Let's load the Hugging Face Embedding class."
]
},
{
"cell_type": "code",
"execution_count": 7,
"execution_count": null,
"id": "16b20335-da1d-46ba-aa23-fbf3e2c6fe60",
"metadata": {},
"outputs": [],
"source": [
"!pip install langchain sentence_transformers"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "861521a9",
"metadata": {},
"outputs": [],
@@ -21,7 +31,7 @@
},
{
"cell_type": "code",
"execution_count": 16,
"execution_count": 3,
"id": "ff9be586",
"metadata": {},
"outputs": [],
@@ -31,7 +41,7 @@
},
{
"cell_type": "code",
"execution_count": 12,
"execution_count": 3,
"id": "d0a98ae9",
"metadata": {},
"outputs": [],
@@ -41,7 +51,7 @@
},
{
"cell_type": "code",
"execution_count": 13,
"execution_count": 5,
"id": "5d6c682b",
"metadata": {},
"outputs": [],
@@ -51,7 +61,28 @@
},
{
"cell_type": "code",
"execution_count": 14,
"execution_count": 6,
"id": "b57b8ce9-ef7d-4e63-979e-aa8763d1f9a8",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[-0.04895168915390968, -0.03986193612217903, -0.021562768146395683]"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"query_result[:3]"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "bb5e74c0",
"metadata": {},
"outputs": [],
@@ -60,19 +91,71 @@
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "aaad49f8",
"cell_type": "markdown",
"id": "92019ef1-5d30-4985-b4e6-c0d98bdfe265",
"metadata": {},
"outputs": [],
"source": []
"source": [
"## Hugging Face Inference API\n",
"We can also access embedding models via the Hugging Face Inference API, which does not require us to install ``sentence_transformers`` and download models locally."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "66f5c6ba-1446-43e1-b012-800d17cef300",
"metadata": {},
"outputs": [
{
"name": "stdin",
"output_type": "stream",
"text": [
"Enter your HF Inference API Key:\n",
"\n",
" ········\n"
]
}
],
"source": [
"import getpass\n",
"\n",
"inference_api_key = getpass.getpass(\"Enter your HF Inference API Key:\\n\\n\")"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "d0623c1f-cd82-4862-9bce-3655cb9b66ac",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[-0.038338541984558105, 0.1234646737575531, -0.028642963618040085]"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from langchain.embeddings import HuggingFaceInferenceAPIEmbeddings\n",
"\n",
"embeddings = HuggingFaceInferenceAPIEmbeddings(\n",
" api_key=inference_api_key,\n",
" model_name=\"sentence-transformers/all-MiniLM-l6-v2\"\n",
")\n",
"\n",
"query_result = embeddings.embed_query(text)\n",
"query_result[:3]"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"display_name": "poetry-venv",
"language": "python",
"name": "python3"
"name": "poetry-venv"
},
"language_info": {
"codemirror_mode": {

View File

@@ -1,12 +1,13 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# ModelScope\n",
"\n",
">[ModelScope](https://www.modelscope.cn/home) is big repository of the models and datasets.\n",
"\n",
"Let's load the ModelScope Embedding class."
]
},
@@ -67,16 +68,23 @@
],
"metadata": {
"kernelspec": {
"display_name": "chatgpt",
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"version": "3.9.15"
},
"orig_nbformat": 4
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.12"
}
},
"nbformat": 4,
"nbformat_minor": 2
"nbformat_minor": 4
}

View File

@@ -1,15 +1,14 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# MosaicML embeddings\n",
"# MosaicML\n",
"\n",
"[MosaicML](https://docs.mosaicml.com/en/latest/inference.html) offers a managed inference service. You can either use a variety of open source models, or deploy your own.\n",
">[MosaicML](https://docs.mosaicml.com/en/latest/inference.html) offers a managed inference service. You can either use a variety of open source models, or deploy your own.\n",
"\n",
"This example goes over how to use LangChain to interact with MosaicML Inference for text embedding."
"This example goes over how to use LangChain to interact with `MosaicML` Inference for text embedding."
]
},
{
@@ -94,6 +93,11 @@
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
@@ -103,9 +107,10 @@
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3"
"pygments_lexer": "ipython3",
"version": "3.10.12"
}
},
"nbformat": 4,
"nbformat_minor": 2
"nbformat_minor": 4
}

View File

@@ -7,7 +7,7 @@
"source": [
"# NLP Cloud\n",
"\n",
"NLP Cloud is an artificial intelligence platform that allows you to use the most advanced AI engines, and even train your own engines with your own data. \n",
">[NLP Cloud](https://docs.nlpcloud.com/#introduction) is an artificial intelligence platform that allows you to use the most advanced AI engines, and even train your own engines with your own data. \n",
"\n",
"The [embeddings](https://docs.nlpcloud.com/#embeddings) endpoint offers the following model:\n",
"\n",
@@ -80,7 +80,7 @@
],
"metadata": {
"kernelspec": {
"display_name": "Python 3.11.2 64-bit",
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
@@ -94,7 +94,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.2"
"version": "3.10.12"
},
"vscode": {
"interpreter": {

View File

@@ -5,11 +5,13 @@
"id": "1f83f273",
"metadata": {},
"source": [
"# SageMaker Endpoint Embeddings\n",
"# SageMaker\n",
"\n",
"Let's load the SageMaker Endpoints Embeddings class. The class can be used if you host, e.g. your own Hugging Face model on SageMaker.\n",
"Let's load the `SageMaker Endpoints Embeddings` class. The class can be used if you host, e.g. your own Hugging Face model on SageMaker.\n",
"\n",
"For instructions on how to do this, please see [here](https://www.philschmid.de/custom-inference-huggingface-sagemaker). **Note**: In order to handle batched requests, you will need to adjust the return line in the `predict_fn()` function within the custom `inference.py` script:\n",
"For instructions on how to do this, please see [here](https://www.philschmid.de/custom-inference-huggingface-sagemaker). \n",
"\n",
"**Note**: In order to handle batched requests, you will need to adjust the return line in the `predict_fn()` function within the custom `inference.py` script:\n",
"\n",
"Change from\n",
"\n",
@@ -143,7 +145,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
"version": "3.10.12"
},
"vscode": {
"interpreter": {

View File

@@ -5,8 +5,8 @@
"id": "eec4efda",
"metadata": {},
"source": [
"# Self Hosted Embeddings\n",
"Let's load the SelfHostedEmbeddings, SelfHostedHuggingFaceEmbeddings, and SelfHostedHuggingFaceInstructEmbeddings classes."
"# Self Hosted\n",
"Let's load the `SelfHostedEmbeddings`, `SelfHostedHuggingFaceEmbeddings`, and `SelfHostedHuggingFaceInstructEmbeddings` classes."
]
},
{
@@ -149,9 +149,7 @@
"cell_type": "code",
"execution_count": null,
"id": "fc1bfd0f",
"metadata": {
"scrolled": false
},
"metadata": {},
"outputs": [],
"source": [
"query_result = embeddings.embed_query(text)"
@@ -182,7 +180,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
"version": "3.10.12"
},
"vscode": {
"interpreter": {

View File

@@ -1,16 +1,15 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"id": "ed47bb62",
"metadata": {},
"source": [
"# Sentence Transformers Embeddings\n",
"# Sentence Transformers\n",
"\n",
"[SentenceTransformers](https://www.sbert.net/) embeddings are called using the `HuggingFaceEmbeddings` integration. We have also added an alias for `SentenceTransformerEmbeddings` for users who are more familiar with directly using that package.\n",
">[SentenceTransformers](https://www.sbert.net/) embeddings are called using the `HuggingFaceEmbeddings` integration. We have also added an alias for `SentenceTransformerEmbeddings` for users who are more familiar with directly using that package.\n",
"\n",
"SentenceTransformers is a python package that can generate text and image embeddings, originating from [Sentence-BERT](https://arxiv.org/abs/1908.10084)"
"`SentenceTransformers` is a python package that can generate text and image embeddings, originating from [Sentence-BERT](https://arxiv.org/abs/1908.10084)"
]
},
{
@@ -109,7 +108,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.16"
"version": "3.10.12"
},
"vscode": {
"interpreter": {

View File

@@ -1,21 +1,31 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# Spacy Embedding\n",
"# SpaCy\n",
"\n",
"### Loading the Spacy embedding class to generate and query embeddings"
">[spaCy](https://spacy.io/) is an open-source software library for advanced natural language processing, written in the programming languages Python and Cython.\n",
" \n",
"\n",
"## Installation and Setup"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#!pip install spacy"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Import the necessary classes"
"Import the necessary classes"
]
},
{
@@ -28,11 +38,12 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Initialize SpacyEmbeddings.This will load the Spacy model into memory."
"## Example\n",
"\n",
"Initialize SpacyEmbeddings.This will load the Spacy model into memory."
]
},
{
@@ -45,11 +56,10 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Define some example texts . These could be any documents that you want to analyze - for example, news articles, social media posts, or product reviews."
"Define some example texts . These could be any documents that you want to analyze - for example, news articles, social media posts, or product reviews."
]
},
{
@@ -67,11 +77,10 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Generate and print embeddings for the texts . The SpacyEmbeddings class generates an embedding for each document, which is a numerical representation of the document's content. These embeddings can be used for various natural language processing tasks, such as document similarity comparison or text classification."
"Generate and print embeddings for the texts . The SpacyEmbeddings class generates an embedding for each document, which is a numerical representation of the document's content. These embeddings can be used for various natural language processing tasks, such as document similarity comparison or text classification."
]
},
{
@@ -86,11 +95,10 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Generate and print an embedding for a single piece of text. You can also generate an embedding for a single piece of text, such as a search query. This can be useful for tasks like information retrieval, where you want to find documents that are similar to a given query."
"Generate and print an embedding for a single piece of text. You can also generate an embedding for a single piece of text, such as a search query. This can be useful for tasks like information retrieval, where you want to find documents that are similar to a given query."
]
},
{
@@ -106,11 +114,24 @@
}
],
"metadata": {
"language_info": {
"name": "python"
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"orig_nbformat": 4
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.12"
}
},
"nbformat": 4,
"nbformat_minor": 2
"nbformat_minor": 4
}

View File

@@ -1,7 +0,0 @@
# SQL Database Chain
This example demonstrates the use of the `SQLDatabaseChain` for answering questions over a SQL database.
import Example from "@snippets/modules/chains/popular/sqlite.mdx"
<Example/>

View File

@@ -0,0 +1,126 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# NucliaDB\n",
"\n",
"You can use a local NucliaDB instance or use [Nuclia Cloud](https://nuclia.cloud).\n",
"\n",
"When using a local instance, you need a Nuclia Understanding API key, so your texts are properly vectorized and indexed. You can get a key by creating a free account at [https://nuclia.cloud](https://nuclia.cloud), and then [create a NUA key](https://docs.nuclia.dev/docs/docs/using/understanding/intro)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#!pip install langchain nuclia"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Usage with nuclia.cloud"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain.vectorstores.nucliadb import NucliaDB\n",
"API_KEY = \"YOUR_API_KEY\"\n",
"\n",
"ndb = NucliaDB(knowledge_box=\"YOUR_KB_ID\", local=False, api_key=API_KEY)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Usage with a local instance\n",
"\n",
"Note: By default `backend` is set to `http://localhost:8080`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain.vectorstores.nucliadb import NucliaDB\n",
"\n",
"ndb = NucliaDB(knowledge_box=\"YOUR_KB_ID\", local=True, backend=\"http://my-local-server\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Add and delete texts to your Knowledge Box"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ids = ndb.add_texts([\"This is a new test\", \"This is a second test\"])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ndb.delete(ids=ids)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Search in your Knowledge Box"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"results = ndb.similarity_search(\"Who was inspired by Ada Lovelace?\")\n",
"print(res.page_content)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
}
},
"nbformat": 4,
"nbformat_minor": 4
}

View File

@@ -0,0 +1,207 @@
{
"cells": [
{
"cell_type": "markdown",
"source": [
"# sqlite-vss\n",
"\n",
">[sqlite-vss](https://alexgarcia.xyz/sqlite-vss/) is an SQLite extension designed for vector search, emphasizing local-first operations and easy integration into applications without external servers. Leveraging the Faiss library, it offers efficient similarity search and clustering capabilities.\n",
"\n",
"This notebook shows how to use the `SQLiteVSS` vector database."
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "code",
"execution_count": null,
"outputs": [],
"source": [
"# You need to install sqlite-vss as a dependency.\n",
"%pip install sqlite-vss"
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "markdown",
"source": [
"### Quickstart"
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "code",
"execution_count": 2,
"outputs": [
{
"data": {
"text/plain": "'Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while youre at it, pass the Disclose Act so Americans can know who is funding our elections. \\n\\nTonight, Id like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \\n\\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \\n\\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nations top legal minds, who will continue Justice Breyers legacy of excellence.'"
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from langchain.embeddings.sentence_transformer import SentenceTransformerEmbeddings\n",
"from langchain.text_splitter import CharacterTextSplitter\n",
"from langchain.vectorstores import SQLiteVSS\n",
"from langchain.document_loaders import TextLoader\n",
"\n",
"# load the document and split it into chunks\n",
"loader = TextLoader(\"../../../state_of_the_union.txt\")\n",
"documents = loader.load()\n",
"\n",
"# split it into chunks\n",
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
"docs = text_splitter.split_documents(documents)\n",
"texts = [doc.page_content for doc in docs]\n",
"\n",
"\n",
"# create the open-source embedding function\n",
"embedding_function = SentenceTransformerEmbeddings(model_name=\"all-MiniLM-L6-v2\")\n",
"\n",
"\n",
"# load it in sqlite-vss in a table named state_union.\n",
"# the db_file parameter is the name of the file you want\n",
"# as your sqlite database.\n",
"db = SQLiteVSS.from_texts(\n",
" texts=texts,\n",
" embedding=embedding_function,\n",
" table=\"state_union\",\n",
" db_file=\"/tmp/vss.db\"\n",
")\n",
"\n",
"# query it\n",
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
"data = db.similarity_search(query)\n",
"\n",
"# print results\n",
"data[0].page_content"
],
"metadata": {
"collapsed": false,
"ExecuteTime": {
"end_time": "2023-09-06T14:55:55.370351Z",
"start_time": "2023-09-06T14:55:53.547755Z"
}
}
},
{
"cell_type": "markdown",
"source": [
"### Using existing sqlite connection"
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "code",
"execution_count": 7,
"outputs": [
{
"data": {
"text/plain": "'Ketanji Brown Jackson is awesome'"
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from langchain.embeddings.sentence_transformer import SentenceTransformerEmbeddings\n",
"from langchain.text_splitter import CharacterTextSplitter\n",
"from langchain.vectorstores import SQLiteVSS\n",
"from langchain.document_loaders import TextLoader\n",
"\n",
"# load the document and split it into chunks\n",
"loader = TextLoader(\"../../../state_of_the_union.txt\")\n",
"documents = loader.load()\n",
"\n",
"# split it into chunks\n",
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
"docs = text_splitter.split_documents(documents)\n",
"texts = [doc.page_content for doc in docs]\n",
"\n",
"\n",
"# create the open-source embedding function\n",
"embedding_function = SentenceTransformerEmbeddings(model_name=\"all-MiniLM-L6-v2\")\n",
"connection = SQLiteVSS.create_connection(db_file=\"/tmp/vss.db\")\n",
"\n",
"db1 = SQLiteVSS(\n",
" table=\"state_union\",\n",
" embedding=embedding_function,\n",
" connection=connection\n",
")\n",
"\n",
"db1.add_texts([\"Ketanji Brown Jackson is awesome\"])\n",
"# query it again\n",
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
"data = db1.similarity_search(query)\n",
"\n",
"# print results\n",
"data[0].page_content"
],
"metadata": {
"collapsed": false,
"ExecuteTime": {
"end_time": "2023-09-06T14:59:22.086252Z",
"start_time": "2023-09-06T14:59:21.693237Z"
}
}
},
{
"cell_type": "code",
"execution_count": 13,
"outputs": [],
"source": [
"# Cleaning up\n",
"import os\n",
"os.remove(\"/tmp/vss.db\")"
],
"metadata": {
"collapsed": false,
"ExecuteTime": {
"end_time": "2023-09-06T15:01:15.550318Z",
"start_time": "2023-09-06T15:01:15.546428Z"
}
}
},
{
"cell_type": "code",
"execution_count": null,
"outputs": [],
"source": [],
"metadata": {
"collapsed": false
}
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.6"
}
},
"nbformat": 4,
"nbformat_minor": 0
}

View File

@@ -28,43 +28,41 @@
"The following function determines cosine similarity, but you can adjust to your needs.\n",
"\n",
"```sql\n",
" -- Enable the pgvector extension to work with embedding vectors\n",
" create extension vector;\n",
"-- Enable the pgvector extension to work with embedding vectors\n",
"create extension if not exists vector;\n",
"\n",
" -- Create a table to store your documents\n",
" create table documents (\n",
" id uuid primary key,\n",
" content text, -- corresponds to Document.pageContent\n",
" metadata jsonb, -- corresponds to Document.metadata\n",
" embedding vector(1536) -- 1536 works for OpenAI embeddings, change if needed\n",
" );\n",
"-- Create a table to store your documents\n",
"create table\n",
" documents (\n",
" id uuid primary key,\n",
" content text, -- corresponds to Document.pageContent\n",
" metadata jsonb, -- corresponds to Document.metadata\n",
" embedding vector (1536) -- 1536 works for OpenAI embeddings, change if needed\n",
" );\n",
"\n",
" CREATE FUNCTION match_documents(query_embedding vector(1536), match_count int)\n",
" RETURNS TABLE(\n",
" id uuid,\n",
" content text,\n",
" metadata jsonb,\n",
" -- we return matched vectors to enable maximal marginal relevance searches\n",
" embedding vector(1536),\n",
" similarity float)\n",
" LANGUAGE plpgsql\n",
" AS $$\n",
" # variable_conflict use_column\n",
" BEGIN\n",
" RETURN query\n",
" SELECT\n",
" id,\n",
" content,\n",
" metadata,\n",
" embedding,\n",
" 1 -(documents.embedding <=> query_embedding) AS similarity\n",
" FROM\n",
" documents\n",
" ORDER BY\n",
" documents.embedding <=> query_embedding\n",
" LIMIT match_count;\n",
" END;\n",
" $$;\n",
"-- Create a function to search for documents\n",
"create function match_documents (\n",
" query_embedding vector (1536),\n",
" filter jsonb default '{}'\n",
") returns table (\n",
" id uuid,\n",
" content text,\n",
" metadata jsonb,\n",
" similarity float\n",
") language plpgsql as $$\n",
"#variable_conflict use_column\n",
"begin\n",
" return query\n",
" select\n",
" id,\n",
" content,\n",
" metadata,\n",
" 1 - (documents.embedding <=> query_embedding) as similarity\n",
" from documents\n",
" where metadata @> filter\n",
" order by documents.embedding <=> query_embedding;\n",
"end;\n",
"$$;\n",
"```"
]
},

View File

@@ -26,7 +26,7 @@
"source": [
"# Setup\n",
"\n",
"You will need a Vectara account to use Vectara with LangChain. To get started, use the following steps:\n",
"You will need a Vectara account to use Vectara with LangChain. To get started, use the following steps (see our [quickstart](https://docs.vectara.com/docs/quickstart) guide):\n",
"1. [Sign up](https://console.vectara.com/signup) for a Vectara account if you don't already have one. Once you have completed your sign up you will have a Vectara customer ID. You can find your customer ID by clicking on your name, on the top-right of the Vectara console window.\n",
"2. Within your account you can create one or more corpora. Each corpus represents an area that stores text data upon ingest from input documents. To create a corpus, use the **\"Create Corpus\"** button. You then provide a name to your corpus as well as a description. Optionally you can define filtering attributes and apply some advanced options. If you click on your created corpus, you can see its name and corpus ID right on the top.\n",
"3. Next you'll need to create API keys to access the corpus. Click on the **\"Authorization\"** tab in the corpus view and then the **\"Create API Key\"** button. Give your key a name, and choose whether you want query only or query+index for your key. Click \"Create\" and you now have an active API key. Keep this key confidential. \n",
@@ -47,7 +47,7 @@
"os.environ[\"VECTARA_API_KEY\"] = getpass.getpass(\"Vectara API Key:\")\n",
"```\n",
"\n",
"2. Add them to the Vectara vectorstore constructor:\n",
"1. Provide them as arguments when creating the Vectara vectorstore object:\n",
"\n",
"```python\n",
"vectorstore = Vectara(\n",
@@ -65,13 +65,22 @@
"source": [
"## Connecting to Vectara from LangChain\n",
"\n",
"To get started, let's ingest the documents using the from_documents() method.\n",
"We assume here that you've added your VECTARA_CUSTOMER_ID, VECTARA_CORPUS_ID and query+indexing VECTARA_API_KEY as environment variables."
"In this example, we assume that you've created an account and a corpus, and added your VECTARA_CUSTOMER_ID, VECTARA_CORPUS_ID and VECTARA_API_KEY (created with permissions for both indexing and query) as environment variables.\n",
"\n",
"The corpus has 3 fields defined as metadata for filtering:\n",
"* url: a string field containing the source URL of the document (where relevant)\n",
"* speech: a string field containing the name of the speech\n",
"* author: the name of the author\n",
"\n",
"Let's start by ingesting 3 documents into the corpus:\n",
"1. The State of the Union speech from 2022, available in the LangChain repository as a text file\n",
"2. The \"I have a dream\" speech by Dr. Kind\n",
"3. The \"We shall Fight on the Beaches\" speech by Winston Churchil"
]
},
{
"cell_type": "code",
"execution_count": 1,
"execution_count": 2,
"id": "04a1f1a0",
"metadata": {},
"outputs": [],
@@ -79,12 +88,17 @@
"from langchain.embeddings import FakeEmbeddings\n",
"from langchain.text_splitter import CharacterTextSplitter\n",
"from langchain.vectorstores import Vectara\n",
"from langchain.document_loaders import TextLoader"
"from langchain.document_loaders import TextLoader\n",
"\n",
"from langchain.llms import OpenAI\n",
"from langchain.chains import ConversationalRetrievalChain\n",
"from langchain.retrievers.self_query.base import SelfQueryRetriever\n",
"from langchain.chains.query_constructor.base import AttributeInfo"
]
},
{
"cell_type": "code",
"execution_count": 2,
"execution_count": 3,
"id": "be0a4973",
"metadata": {},
"outputs": [],
@@ -97,7 +111,7 @@
},
{
"cell_type": "code",
"execution_count": 3,
"execution_count": 4,
"id": "8429667e",
"metadata": {
"ExecuteTime": {
@@ -111,7 +125,7 @@
"vectara = Vectara.from_documents(\n",
" docs,\n",
" embedding=FakeEmbeddings(size=768),\n",
" doc_metadata={\"speech\": \"state-of-the-union\"},\n",
" doc_metadata={\"speech\": \"state-of-the-union\", \"author\": \"Biden\"},\n",
")"
]
},
@@ -130,7 +144,7 @@
},
{
"cell_type": "code",
"execution_count": 4,
"execution_count": 5,
"id": "85ef3468",
"metadata": {},
"outputs": [],
@@ -142,14 +156,16 @@
" [\n",
" \"https://www.gilderlehrman.org/sites/default/files/inline-pdfs/king.dreamspeech.excerpts.pdf\",\n",
" \"I-have-a-dream\",\n",
" \"Dr. King\"\n",
" ],\n",
" [\n",
" \"https://www.parkwayschools.net/cms/lib/MO01931486/Centricity/Domain/1578/Churchill_Beaches_Speech.pdf\",\n",
" \"we shall fight on the beaches\",\n",
" \"Churchil\"\n",
" ],\n",
"]\n",
"files_list = []\n",
"for url, _ in urls:\n",
"for url, _, _ in urls:\n",
" name = tempfile.NamedTemporaryFile().name\n",
" urllib.request.urlretrieve(url, name)\n",
" files_list.append(name)\n",
@@ -157,7 +173,7 @@
"docsearch: Vectara = Vectara.from_files(\n",
" files=files_list,\n",
" embedding=FakeEmbeddings(size=768),\n",
" metadatas=[{\"url\": url, \"speech\": title} for url, title in urls],\n",
" metadatas=[{\"url\": url, \"speech\": title, \"author\": author} for url, title, author in urls],\n",
")"
]
},
@@ -178,7 +194,7 @@
},
{
"cell_type": "code",
"execution_count": 5,
"execution_count": 6,
"id": "a8c513ab",
"metadata": {
"ExecuteTime": {
@@ -197,7 +213,7 @@
},
{
"cell_type": "code",
"execution_count": 6,
"execution_count": 7,
"id": "fc516993",
"metadata": {
"ExecuteTime": {
@@ -231,7 +247,7 @@
},
{
"cell_type": "code",
"execution_count": 7,
"execution_count": 8,
"id": "8804a21d",
"metadata": {
"ExecuteTime": {
@@ -249,7 +265,7 @@
},
{
"cell_type": "code",
"execution_count": 8,
"execution_count": 9,
"id": "756a6887",
"metadata": {
"ExecuteTime": {
@@ -264,7 +280,7 @@
"text": [
"Justice Breyer, thank you for your service. One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nations top legal minds, who will continue Justice Breyers legacy of excellence. A former top litigator in private practice.\n",
"\n",
"Score: 0.786569\n"
"Score: 0.8299499\n"
]
}
],
@@ -284,7 +300,7 @@
},
{
"cell_type": "code",
"execution_count": 9,
"execution_count": 10,
"id": "47784de5",
"metadata": {},
"outputs": [
@@ -307,7 +323,7 @@
},
{
"cell_type": "code",
"execution_count": 10,
"execution_count": 11,
"id": "3e22949f",
"metadata": {},
"outputs": [
@@ -315,7 +331,7 @@
"name": "stdout",
"output_type": "stream",
"text": [
"With this threshold of 0.2 we have 3 documents\n"
"With this threshold of 0.2 we have 5 documents\n"
]
}
],
@@ -340,7 +356,7 @@
},
{
"cell_type": "code",
"execution_count": 11,
"execution_count": 12,
"id": "9427195f",
"metadata": {
"ExecuteTime": {
@@ -352,10 +368,10 @@
{
"data": {
"text/plain": [
"VectaraRetriever(tags=['Vectara'], metadata=None, vectorstore=<langchain.vectorstores.vectara.Vectara object at 0x1586bd330>, search_type='similarity', search_kwargs={'lambda_val': 0.025, 'k': 5, 'filter': '', 'n_sentence_context': '2'})"
"VectaraRetriever(tags=['Vectara'], metadata=None, vectorstore=<langchain.vectorstores.vectara.Vectara object at 0x13b15e9b0>, search_type='similarity', search_kwargs={'lambda_val': 0.025, 'k': 5, 'filter': '', 'n_sentence_context': '2'})"
]
},
"execution_count": 11,
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
@@ -367,7 +383,7 @@
},
{
"cell_type": "code",
"execution_count": 12,
"execution_count": 13,
"id": "f3c70c31",
"metadata": {
"ExecuteTime": {
@@ -379,10 +395,10 @@
{
"data": {
"text/plain": [
"Document(page_content='Justice Breyer, thank you for your service. One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nations top legal minds, who will continue Justice Breyers legacy of excellence. A former top litigator in private practice.', metadata={'source': 'langchain', 'lang': 'eng', 'offset': '596', 'len': '97', 'speech': 'state-of-the-union'})"
"Document(page_content='Justice Breyer, thank you for your service. One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nations top legal minds, who will continue Justice Breyers legacy of excellence. A former top litigator in private practice.', metadata={'source': 'langchain', 'lang': 'eng', 'offset': '596', 'len': '97', 'speech': 'state-of-the-union', 'author': 'Biden'})"
]
},
"execution_count": 12,
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
@@ -392,10 +408,118 @@
"retriever.get_relevant_documents(query)[0]"
]
},
{
"cell_type": "markdown",
"id": "e944c26a",
"metadata": {},
"source": [
"## Using Vectara as a SelfQuery Retriever"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "8be674de",
"metadata": {},
"outputs": [],
"source": [
"metadata_field_info = [\n",
" AttributeInfo(\n",
" name=\"speech\",\n",
" description=\"what name of the speech\",\n",
" type=\"string or list[string]\",\n",
" ),\n",
" AttributeInfo(\n",
" name=\"author\",\n",
" description=\"author of the speech\",\n",
" type=\"string or list[string]\",\n",
" ),\n",
"]\n",
"document_content_description = \"the text of the speech\"\n",
"\n",
"vectordb = Vectara()\n",
"llm = OpenAI(temperature=0)\n",
"retriever = SelfQueryRetriever.from_llm(llm, vectara, \n",
" document_content_description, metadata_field_info, \n",
" verbose=True)"
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "f8938999",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/Users/ofer/dev/langchain/libs/langchain/langchain/chains/llm.py:278: UserWarning: The predict_and_parse method is deprecated, instead pass an output parser directly to LLMChain.\n",
" warnings.warn(\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"query='freedom' filter=Comparison(comparator=<Comparator.EQ: 'eq'>, attribute='author', value='Biden') limit=None\n"
]
},
{
"data": {
"text/plain": [
"[Document(page_content='Well I know this nation. We will meet the test. To protect freedom and liberty, to expand fairness and opportunity. We will save democracy. As hard as these times have been, I am more optimistic about America today than I have been my whole life.', metadata={'source': 'langchain', 'lang': 'eng', 'offset': '346', 'len': '67', 'speech': 'state-of-the-union', 'author': 'Biden'}),\n",
" Document(page_content='To our fellow Ukrainian Americans who forge a deep bond that connects our two nations we stand with you. Putin may circle Kyiv with tanks, but he will never gain the hearts and souls of the Ukrainian people. He will never extinguish their love of freedom. He will never weaken the resolve of the free world. We meet tonight in an America that has lived through two of the hardest years this nation has ever faced.', metadata={'source': 'langchain', 'lang': 'eng', 'offset': '740', 'len': '47', 'speech': 'state-of-the-union', 'author': 'Biden'}),\n",
" Document(page_content='But most importantly as Americans. With a duty to one another to the American people to the Constitution. And with an unwavering resolve that freedom will always triumph over tyranny. Six days ago, Russias Vladimir Putin sought to shake the foundations of the free world thinking he could make it bend to his menacing ways. But he badly miscalculated.', metadata={'source': 'langchain', 'lang': 'eng', 'offset': '413', 'len': '77', 'speech': 'state-of-the-union', 'author': 'Biden'}),\n",
" Document(page_content='We can do this. \\n\\nMy fellow Americans—tonight , we have gathered in a sacred space—the citadel of our democracy. In this Capitol, generation after generation, Americans have debated great questions amid great strife, and have done great things. We have fought for freedom, expanded liberty, defeated totalitarianism and terror. And built the strongest, freest, and most prosperous nation the world has ever known. Now is the hour. \\n\\nOur moment of responsibility.', metadata={'source': 'langchain', 'lang': 'eng', 'offset': '906', 'len': '82', 'speech': 'state-of-the-union', 'author': 'Biden'}),\n",
" Document(page_content='In state after state, new laws have been passed, not only to suppress the vote, but to subvert entire elections. We cannot let this happen. Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while youre at it, pass the Disclose Act so Americans can know who is funding our elections.', metadata={'source': 'langchain', 'lang': 'eng', 'offset': '0', 'len': '63', 'speech': 'state-of-the-union', 'author': 'Biden'})]"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"retriever.get_relevant_documents(\"what did Biden say about the freedom?\")"
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "a97037fb",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"query='freedom' filter=Comparison(comparator=<Comparator.EQ: 'eq'>, attribute='author', value='Dr. King') limit=None\n"
]
},
{
"data": {
"text/plain": [
"[Document(page_content='And if America is to be a great nation, this must become true. So\\nlet freedom ring from the prodigious hilltops of New Hampshire. Let freedom ring from the mighty\\nmountains of New York. Let freedom ring from the heightening Alleghenies of Pennsylvania. Let\\nfreedom ring from the snowcapped Rockies of Colorado.', metadata={'lang': 'eng', 'section': '3', 'offset': '1534', 'len': '55', 'CreationDate': '1424880481', 'Producer': 'Adobe PDF Library 10.0', 'Author': 'Sasha Rolon-Pereira', 'Title': 'Martin Luther King Jr.pdf', 'Creator': 'Acrobat PDFMaker 10.1 for Word', 'ModDate': '1424880524', 'url': 'https://www.gilderlehrman.org/sites/default/files/inline-pdfs/king.dreamspeech.excerpts.pdf', 'speech': 'I-have-a-dream', 'author': 'Dr. King', 'title': 'Martin Luther King Jr.pdf'}),\n",
" Document(page_content='And if America is to be a great nation, this must become true. So\\nlet freedom ring from the prodigious hilltops of New Hampshire. Let freedom ring from the mighty\\nmountains of New York. Let freedom ring from the heightening Alleghenies of Pennsylvania. Let\\nfreedom ring from the snowcapped Rockies of Colorado.', metadata={'lang': 'eng', 'section': '3', 'offset': '1534', 'len': '55', 'CreationDate': '1424880481', 'Producer': 'Adobe PDF Library 10.0', 'Author': 'Sasha Rolon-Pereira', 'Title': 'Martin Luther King Jr.pdf', 'Creator': 'Acrobat PDFMaker 10.1 for Word', 'ModDate': '1424880524', 'url': 'https://www.gilderlehrman.org/sites/default/files/inline-pdfs/king.dreamspeech.excerpts.pdf', 'speech': 'I-have-a-dream', 'author': 'Dr. King', 'title': 'Martin Luther King Jr.pdf'}),\n",
" Document(page_content='Let freedom ring from the curvaceous slopes of\\nCalifornia. But not only that. Let freedom ring from Stone Mountain of Georgia. Let freedom ring from Lookout\\nMountain of Tennessee. Let freedom ring from every hill and molehill of Mississippi, from every\\nmountain side. Let freedom ring . . .\\nWhen we allow freedom to ring—when we let it ring from every city and every hamlet, from every state\\nand every city, we will be able to speed up that day when all of Gods children, black men and white\\nmen, Jews and Gentiles, Protestants and Catholics, will be able to join hands and sing in the words of the\\nold Negro spiritual, “Free at last, Free at last, Great God a-mighty, We are free at last.”', metadata={'lang': 'eng', 'section': '3', 'offset': '1842', 'len': '52', 'CreationDate': '1424880481', 'Producer': 'Adobe PDF Library 10.0', 'Author': 'Sasha Rolon-Pereira', 'Title': 'Martin Luther King Jr.pdf', 'Creator': 'Acrobat PDFMaker 10.1 for Word', 'ModDate': '1424880524', 'url': 'https://www.gilderlehrman.org/sites/default/files/inline-pdfs/king.dreamspeech.excerpts.pdf', 'speech': 'I-have-a-dream', 'author': 'Dr. King', 'title': 'Martin Luther King Jr.pdf'}),\n",
" Document(page_content='Let freedom ring from the curvaceous slopes of\\nCalifornia. But not only that. Let freedom ring from Stone Mountain of Georgia. Let freedom ring from Lookout\\nMountain of Tennessee. Let freedom ring from every hill and molehill of Mississippi, from every\\nmountain side. Let freedom ring . . .\\nWhen we allow freedom to ring—when we let it ring from every city and every hamlet, from every state\\nand every city, we will be able to speed up that day when all of Gods children, black men and white\\nmen, Jews and Gentiles, Protestants and Catholics, will be able to join hands and sing in the words of the\\nold Negro spiritual, “Free at last, Free at last, Great God a-mighty, We are free at last.”', metadata={'lang': 'eng', 'section': '3', 'offset': '1842', 'len': '52', 'CreationDate': '1424880481', 'Producer': 'Adobe PDF Library 10.0', 'Author': 'Sasha Rolon-Pereira', 'Title': 'Martin Luther King Jr.pdf', 'Creator': 'Acrobat PDFMaker 10.1 for Word', 'ModDate': '1424880524', 'url': 'https://www.gilderlehrman.org/sites/default/files/inline-pdfs/king.dreamspeech.excerpts.pdf', 'speech': 'I-have-a-dream', 'author': 'Dr. King', 'title': 'Martin Luther King Jr.pdf'}),\n",
" Document(page_content='Let freedom ring from the mighty\\nmountains of New York. Let freedom ring from the heightening Alleghenies of Pennsylvania. Let\\nfreedom ring from the snowcapped Rockies of Colorado. Let freedom ring from the curvaceous slopes of\\nCalifornia. But not only that. Let freedom ring from Stone Mountain of Georgia.', metadata={'lang': 'eng', 'section': '3', 'offset': '1657', 'len': '57', 'CreationDate': '1424880481', 'Producer': 'Adobe PDF Library 10.0', 'Author': 'Sasha Rolon-Pereira', 'Title': 'Martin Luther King Jr.pdf', 'Creator': 'Acrobat PDFMaker 10.1 for Word', 'ModDate': '1424880524', 'url': 'https://www.gilderlehrman.org/sites/default/files/inline-pdfs/king.dreamspeech.excerpts.pdf', 'speech': 'I-have-a-dream', 'author': 'Dr. King', 'title': 'Martin Luther King Jr.pdf'})]"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"retriever.get_relevant_documents(\"what did Dr. King say about the freedom?\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2300e785",
"id": "f6d17e90",
"metadata": {},
"outputs": [],
"source": []