langchain/docs/docs/integrations/text_embedding/elasticsearch.ipynb
Leonid Ganeline 18aba7fdef
docs: notebook linting (#14366)
Many jupyter notebooks didn't pass linting. List of these files are
presented in the [tool.ruff.lint.per-file-ignores] section of the
pyproject.toml . Addressed these bugs:
- fixed bugs; added missed imports; updated pyproject.toml
 Only the `document_loaders/tensorflow_datasets.ipyn`,
`cookbook/gymnasium_agent_simulation.ipynb` are not completely fixed.
I'm not sure about imports.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2023-12-07 15:47:48 -08:00

270 lines
5.9 KiB
Plaintext

{
"cells": [
{
"cell_type": "markdown",
"id": "72644940",
"metadata": {
"id": "1eZl1oaVUNeC"
},
"source": [
"# Elasticsearch\n",
"Walkthrough of how to generate embeddings using a hosted embedding model in Elasticsearch\n",
"\n",
"The easiest way to instantiate the `ElasticsearchEmbeddings` class it either\n",
"- using the `from_credentials` constructor if you are using Elastic Cloud\n",
"- or using the `from_es_connection` constructor with any Elasticsearch cluster"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "298759cb",
"metadata": {
"id": "6dJxqebov4eU"
},
"outputs": [],
"source": [
"!pip -q install elasticsearch langchain"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "76489aff",
"metadata": {
"id": "RV7C3DUmv4aq"
},
"outputs": [],
"source": [
"from langchain.embeddings.elasticsearch import ElasticsearchEmbeddings"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "57bfdc82",
"metadata": {
"id": "MrT3jplJvp09"
},
"outputs": [],
"source": [
"# Define the model ID\n",
"model_id = \"your_model_id\""
]
},
{
"cell_type": "markdown",
"id": "0ffad1ec",
"metadata": {
"id": "j5F-nwLVS_Zu"
},
"source": [
"## Testing with `from_credentials`\n",
"This required an Elastic Cloud `cloud_id`"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "fc2e9dcb",
"metadata": {
"id": "svtdnC-dvpxR"
},
"outputs": [],
"source": [
"# Instantiate ElasticsearchEmbeddings using credentials\n",
"embeddings = ElasticsearchEmbeddings.from_credentials(\n",
" model_id,\n",
" es_cloud_id=\"your_cloud_id\",\n",
" es_user=\"your_user\",\n",
" es_password=\"your_password\",\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8ee7f1fc",
"metadata": {
"id": "7DXZAK7Kvpth"
},
"outputs": [],
"source": [
"# Create embeddings for multiple documents\n",
"documents = [\n",
" \"This is an example document.\",\n",
" \"Another example document to generate embeddings for.\",\n",
"]\n",
"document_embeddings = embeddings.embed_documents(documents)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0b9d8471",
"metadata": {
"id": "K8ra75W_vpqy"
},
"outputs": [],
"source": [
"# Print document embeddings\n",
"for i, embedding in enumerate(document_embeddings):\n",
" print(f\"Embedding for document {i+1}: {embedding}\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3989ab23",
"metadata": {
"id": "V4Q5kQo9vpna"
},
"outputs": [],
"source": [
"# Create an embedding for a single query\n",
"query = \"This is a single query.\"\n",
"query_embedding = embeddings.embed_query(query)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0da6d2bf",
"metadata": {
"id": "O0oQDzGKvpkz"
},
"outputs": [],
"source": [
"# Print query embedding\n",
"print(f\"Embedding for query: {query_embedding}\")"
]
},
{
"cell_type": "markdown",
"id": "32700096",
"metadata": {
"id": "rHN03yV6TJ5q"
},
"source": [
"## Testing with Existing Elasticsearch client connection\n",
"This can be used with any Elasticsearch deployment"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0bc60465",
"metadata": {
"id": "GMQcJDwBTJFm"
},
"outputs": [],
"source": [
"# Create Elasticsearch connection\n",
"from elasticsearch import Elasticsearch\n",
"\n",
"es_connection = Elasticsearch(\n",
" hosts=[\"https://es_cluster_url:port\"], basic_auth=(\"user\", \"password\")\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8085843b",
"metadata": {
"id": "WTYIU4u3TJO1"
},
"outputs": [],
"source": [
"# Instantiate ElasticsearchEmbeddings using es_connection\n",
"embeddings = ElasticsearchEmbeddings.from_es_connection(\n",
" model_id,\n",
" es_connection,\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "59a90bf3",
"metadata": {
"id": "4gdAUHwoTJO3"
},
"outputs": [],
"source": [
"# Create embeddings for multiple documents\n",
"documents = [\n",
" \"This is an example document.\",\n",
" \"Another example document to generate embeddings for.\",\n",
"]\n",
"document_embeddings = embeddings.embed_documents(documents)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "54b18673",
"metadata": {
"id": "RC_-tov6TJO3"
},
"outputs": [],
"source": [
"# Print document embeddings\n",
"for i, embedding in enumerate(document_embeddings):\n",
" print(f\"Embedding for document {i+1}: {embedding}\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a4812d5e",
"metadata": {
"id": "6GEnHBqETJO3"
},
"outputs": [],
"source": [
"# Create an embedding for a single query\n",
"query = \"This is a single query.\"\n",
"query_embedding = embeddings.embed_query(query)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c6c69916",
"metadata": {
"id": "-kyUQAXDTJO4"
},
"outputs": [],
"source": [
"# Print query embedding\n",
"print(f\"Embedding for query: {query_embedding}\")"
]
}
],
"metadata": {
"colab": {
"provenance": []
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.12"
}
},
"nbformat": 4,
"nbformat_minor": 5
}