Add SpacyEmbeddings class (#6967)

- Description: Added a new SpacyEmbeddings class for generating embeddings using the Spacy library. - Issue: Sentencebert/Bert/Spacy/Doc2vec embedding support #6952 - Dependencies: This change requires the Spacy library and the 'en_core_web_sm' Spacy model. - Tag maintainer: @dev2049 - Twitter handle: N/A This change includes a new SpacyEmbeddings class, but does not include a test or an example notebook. --------- Co-authored-by: Bagatur <baskaryan@gmail.com>
2025-09-08 22:42:05 +00:00 · 2023-07-03 21:08:31 +05:30
parent 16fbd528c5
commit e2d61ab85a
3 changed files with 242 additions and 0 deletions
--- a/docs/extras/modules/data_connection/text_embedding/integrations/spacy_embedding.ipynb
+++ b/docs/extras/modules/data_connection/text_embedding/integrations/spacy_embedding.ipynb
@@ -0,0 +1,126 @@
+{
+ "cells": [
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Spacy Embedding\n",
+    "\n",
+    "### Loading the Spacy embedding class to generate and query embeddings"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Import the necessary classes"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "\n",
+    "from langchain.embeddings.spacy_embeddings import SpacyEmbeddings\n",
+    "\n",
+    "\n"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Initialize SpacyEmbeddings.This will load the Spacy model into memory."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "\n",
+    "embedder = SpacyEmbeddings()\n"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Define some example texts . These could be any documents that you want to analyze - for example, news articles, social media posts, or product reviews."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "\n",
+    "\n",
+    "texts = [\n",
+    "    \"The quick brown fox jumps over the lazy dog.\",\n",
+    "    \"Pack my box with five dozen liquor jugs.\",\n",
+    "    \"How vexingly quick daft zebras jump!\",\n",
+    "    \"Bright vixens jump; dozy fowl quack.\"\n",
+    "]\n",
+    "\n"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Generate and print embeddings for the texts . The SpacyEmbeddings class generates an embedding for each document, which is a numerical representation of the document's content. These embeddings can be used for various natural language processing tasks, such as document similarity comparison or text classification."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "\n",
+    "embeddings = embedder.embed_documents(texts)\n",
+    "for i, embedding in enumerate(embeddings):\n",
+    "    print(f\"Embedding for document {i+1}: {embedding}\")\n",
+    "\n"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Generate and print an embedding for a single piece of text. You can also generate an embedding for a single piece of text, such as a search query. This can be useful for tasks like information retrieval, where you want to find documents that are similar to a given query."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "\n",
+    "query = \"Quick foxes and lazy dogs.\"\n",
+    "query_embedding = embedder.embed_query(query)\n",
+    "print(f\"Embedding for query: {query_embedding}\")"
+   ]
+  }
+ ],
+ "metadata": {
+  "language_info": {
+   "name": "python"
+  },
+  "orig_nbformat": 4
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}