pinecone: Add embedding Inference Support (#24515)

**Description**

Add support for Pinecone hosted embedding models as
`PineconeEmbeddings`. Replacement for #22890

**Dependencies**
Add `aiohttp` to support async embeddings call against REST directly

- [x] **Add tests and docs**: If you're adding a new integration, please
include

Added `docs/docs/integrations/text_embedding/pinecone.ipynb`


- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Twitter: `gdjdg17`

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
This commit is contained in:
Gareth
2024-07-23 18:50:28 -04:00
committed by GitHub
parent aaf788b7cb
commit ac41c97d21
9 changed files with 947 additions and 71 deletions

View File

@@ -0,0 +1,155 @@
{
"cells": [
{
"cell_type": "markdown",
"source": [
"# Pinecone Embeddings\n",
"\n",
"Pinecone's inference API can be accessed via `PineconeEmbeddings`. Providing text embeddings via the Pinecone service. We start by installing prerequisite libraries:"
],
"metadata": {
"collapsed": false
},
"id": "f4b5d823fee826c2"
},
{
"cell_type": "code",
"outputs": [],
"source": [
"!pip install -qU \"langchain-pinecone>=0.2.0\" "
],
"metadata": {
"collapsed": false
},
"id": "3bc5d3a5ed7f5ce3",
"execution_count": null
},
{
"cell_type": "markdown",
"source": [
"Next, we [sign up / log in to Pinecone](https://app.pinecone.io) to get our API key:"
],
"metadata": {
"collapsed": false
},
"id": "62a77d25c3fd8bd5"
},
{
"cell_type": "code",
"outputs": [],
"source": [
"import os\n",
"from getpass import getpass\n",
"\n",
"os.environ[\"PINECONE_API_KEY\"] = os.getenv(\"PINECONE_API_KEY\") or getpass(\n",
" \"Enter your Pinecone API key: \"\n",
")"
],
"metadata": {
"collapsed": false
},
"id": "8162dbcbcf7d3d55",
"execution_count": null
},
{
"cell_type": "markdown",
"source": [
"Check the document for available [models](https://docs.pinecone.io/models/overview). Now we initialize our embedding model like so:"
],
"metadata": {
"collapsed": false
},
"id": "98d860a0a2d8b907"
},
{
"cell_type": "code",
"outputs": [],
"source": [
"from langchain_pinecone import PineconeEmbeddings\n",
"\n",
"embeddings = PineconeEmbeddings(model=\"multilingual-e5-large\")"
],
"metadata": {
"collapsed": false
},
"id": "2b3adb72786a5275",
"execution_count": null
},
{
"cell_type": "markdown",
"source": [
"From here we can create embeddings either sync or async, let's start with sync! We embed a single text as a query embedding (ie what we search with in RAG) using `embed_query`:"
],
"metadata": {
"collapsed": false
},
"id": "11e24da855517230"
},
{
"cell_type": "code",
"outputs": [],
"source": [
"docs = [\n",
" \"Apple is a popular fruit known for its sweetness and crisp texture.\",\n",
" \"The tech company Apple is known for its innovative products like the iPhone.\",\n",
" \"Many people enjoy eating apples as a healthy snack.\",\n",
" \"Apple Inc. has revolutionized the tech industry with its sleek designs and user-friendly interfaces.\",\n",
" \"An apple a day keeps the doctor away, as the saying goes.\",\n",
"]"
],
"metadata": {
"collapsed": false
},
"id": "2da515e2a61ef7e9",
"execution_count": null
},
{
"cell_type": "code",
"outputs": [],
"source": [
"doc_embeds = embeddings.embed_documents(docs)\n",
"doc_embeds"
],
"metadata": {
"collapsed": false
},
"id": "2897e0d570c90b2f",
"execution_count": null
},
{
"cell_type": "code",
"outputs": [],
"source": [
"query = \"Tell me about the tech company known as Apple\"\n",
"query_embed = embeddings.embed_query(query)\n",
"query_embed"
],
"metadata": {
"collapsed": false
},
"id": "510784963c0e17a",
"execution_count": null
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.6"
}
},
"nbformat": 4,
"nbformat_minor": 5
}