From fdba711d28375e86b23cfbad10a17feb67276ef5 Mon Sep 17 00:00:00 2001 From: Leonid Ganeline Date: Thu, 7 Sep 2023 19:53:33 -0700 Subject: [PATCH] docs `integrations/embeddings` consistency (#10302) Updated `integrations/embeddings`: fixed titles; added links, descriptions Updated `integrations/providers`. --- docs/docs_skeleton/vercel.json | 4 ++ docs/extras/integrations/providers/awadb.md | 15 +++-- .../integrations/providers/modelscope.mdx | 14 +++-- .../integrations/providers/nlpcloud.mdx | 26 +++++++-- docs/extras/integrations/providers/spacy.mdx | 8 +++ .../text_embedding/{Awa.ipynb => awadb.ipynb} | 8 ++- .../integrations/text_embedding/bedrock.ipynb | 6 +- .../text_embedding/bge_huggingface.ipynb | 33 ++++++++--- .../google_vertex_ai_palm.ipynb | 9 +-- .../text_embedding/modelscope_hub.ipynb | 20 +++++-- .../text_embedding/mosaicml.ipynb | 17 ++++-- .../text_embedding/nlp_cloud.ipynb | 6 +- .../text_embedding/sagemaker-endpoint.ipynb | 10 ++-- .../text_embedding/self-hosted.ipynb | 10 ++-- .../sentence_transformers.ipynb | 9 ++- .../text_embedding/spacy_embedding.ipynb | 55 +++++++++++++------ 16 files changed, 170 insertions(+), 80 deletions(-) rename docs/extras/integrations/text_embedding/{Awa.ipynb => awadb.ipynb} (89%) diff --git a/docs/docs_skeleton/vercel.json b/docs/docs_skeleton/vercel.json index 2f560db73a0..47e08936b49 100644 --- a/docs/docs_skeleton/vercel.json +++ b/docs/docs_skeleton/vercel.json @@ -2216,6 +2216,10 @@ "source": "/docs/modules/data_connection/text_embedding/integrations/tensorflowhub", "destination": "/docs/integrations/text_embedding/tensorflowhub" }, + { + "source": "/docs/integrations/text_embedding/Awa", + "destination": "/docs/integrations/text_embedding/awadb" + }, { "source": "/en/latest/modules/indexes/vectorstores/examples/analyticdb.html", "destination": "/docs/integrations/vectorstores/analyticdb" diff --git a/docs/extras/integrations/providers/awadb.md b/docs/extras/integrations/providers/awadb.md index 7c2e9943f54..be6d4d66fe1 100644 --- a/docs/extras/integrations/providers/awadb.md +++ b/docs/extras/integrations/providers/awadb.md @@ -9,13 +9,20 @@ pip install awadb ``` -## VectorStore +## Vector Store -There exists a wrapper around AwaDB vector databases, allowing you to use it as a vectorstore, -whether for semantic search or example selection. ```python from langchain.vectorstores import AwaDB ``` -For a more detailed walkthrough of the AwaDB wrapper, see [here](/docs/integrations/vectorstores/awadb.html). +See a [usage example](/docs/integrations/vectorstores/awadb). + + +## Text Embedding Model + +```python +from langchain.embeddings import AwaEmbeddings +``` + +See a [usage example](/docs/integrations/text_embedding/awadb). diff --git a/docs/extras/integrations/providers/modelscope.mdx b/docs/extras/integrations/providers/modelscope.mdx index c37c5f60c43..df6add2bb1b 100644 --- a/docs/extras/integrations/providers/modelscope.mdx +++ b/docs/extras/integrations/providers/modelscope.mdx @@ -1,20 +1,24 @@ # ModelScope +>[ModelScope](https://www.modelscope.cn/home) is a big repository of the models and datasets. + This page covers how to use the modelscope ecosystem within LangChain. It is broken into two parts: installation and setup, and then references to specific modelscope wrappers. ## Installation and Setup -* Install the Python SDK with `pip install modelscope` +Install the `modelscope` package. + +```bash +pip install modelscope +``` -## Wrappers -### Embeddings +## Text Embedding Models -There exists a modelscope Embeddings wrapper, which you can access with ```python from langchain.embeddings import ModelScopeEmbeddings ``` -For a more detailed walkthrough of this, see [this notebook](/docs/integrations/text_embedding/modelscope_hub.html) +For a more detailed walkthrough of this, see [this notebook](/docs/integrations/text_embedding/modelscope_hub) diff --git a/docs/extras/integrations/providers/nlpcloud.mdx b/docs/extras/integrations/providers/nlpcloud.mdx index 050da5af047..e401faeb5aa 100644 --- a/docs/extras/integrations/providers/nlpcloud.mdx +++ b/docs/extras/integrations/providers/nlpcloud.mdx @@ -1,17 +1,31 @@ # NLPCloud -This page covers how to use the NLPCloud ecosystem within LangChain. -It is broken into two parts: installation and setup, and then references to specific NLPCloud wrappers. +>[NLP Cloud](https://docs.nlpcloud.com/#introduction) is an artificial intelligence platform that allows you to use the most advanced AI engines, and even train your own engines with your own data. + ## Installation and Setup -- Install the Python SDK with `pip install nlpcloud` + +- Install the `nlpcloud` package. + +```bash +pip install nlpcloud +``` + - Get an NLPCloud api key and set it as an environment variable (`NLPCLOUD_API_KEY`) -## Wrappers -### LLM +## LLM + +See a [usage example](/docs/integrations/llms/nlpcloud). -There exists an NLPCloud LLM wrapper, which you can access with ```python from langchain.llms import NLPCloud ``` + +## Text Embedding Models + +See a [usage example](/docs/integrations/text_embedding/nlp_cloud) + +```python +from langchain.embeddings import NLPCloudEmbeddings +``` diff --git a/docs/extras/integrations/providers/spacy.mdx b/docs/extras/integrations/providers/spacy.mdx index f4d49497dd5..ab9b6858985 100644 --- a/docs/extras/integrations/providers/spacy.mdx +++ b/docs/extras/integrations/providers/spacy.mdx @@ -18,3 +18,11 @@ See a [usage example](/docs/modules/data_connection/document_transformers/text_s ```python from langchain.text_splitter import SpacyTextSplitter ``` + +## Text Embedding Models + +See a [usage example](/docs/integrations/text_embedding/spacy_embedding) + +```python +from langchain.embeddings.spacy_embeddings import SpacyEmbeddings +``` diff --git a/docs/extras/integrations/text_embedding/Awa.ipynb b/docs/extras/integrations/text_embedding/awadb.ipynb similarity index 89% rename from docs/extras/integrations/text_embedding/Awa.ipynb rename to docs/extras/integrations/text_embedding/awadb.ipynb index 1fb7ddca6f6..f2c1e733923 100644 --- a/docs/extras/integrations/text_embedding/Awa.ipynb +++ b/docs/extras/integrations/text_embedding/awadb.ipynb @@ -5,9 +5,11 @@ "id": "b14a24db", "metadata": {}, "source": [ - "# AwaEmbedding\n", + "# AwaDB\n", "\n", - "This notebook explains how to use AwaEmbedding, which is included in [awadb](https://github.com/awa-ai/awadb), to embedding texts in langchain." + ">[AwaDB](https://github.com/awa-ai/awadb) is an AI Native database for the search and storage of embedding vectors used by LLM Applications.\n", + "\n", + "This notebook explains how to use `AwaEmbeddings` in LangChain." ] }, { @@ -101,7 +103,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.11.4" + "version": "3.10.12" } }, "nbformat": 4, diff --git a/docs/extras/integrations/text_embedding/bedrock.ipynb b/docs/extras/integrations/text_embedding/bedrock.ipynb index 7c16cb8ead4..0dbbcd080f4 100644 --- a/docs/extras/integrations/text_embedding/bedrock.ipynb +++ b/docs/extras/integrations/text_embedding/bedrock.ipynb @@ -5,7 +5,9 @@ "id": "75e378f5-55d7-44b6-8e2e-6d7b8b171ec4", "metadata": {}, "source": [ - "# Bedrock Embeddings" + "# Bedrock\n", + "\n", + ">[Amazon Bedrock](https://aws.amazon.com/bedrock/) is a fully managed service that makes FMs from leading AI startups and Amazon available via an API, so you can choose from a wide range of FMs to find the model that is best suited for your use case.\n" ] }, { @@ -91,7 +93,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.9.13" + "version": "3.10.12" } }, "nbformat": 4, diff --git a/docs/extras/integrations/text_embedding/bge_huggingface.ipynb b/docs/extras/integrations/text_embedding/bge_huggingface.ipynb index bcf196fc205..923ba928746 100644 --- a/docs/extras/integrations/text_embedding/bge_huggingface.ipynb +++ b/docs/extras/integrations/text_embedding/bge_huggingface.ipynb @@ -5,26 +5,29 @@ "id": "719619d3", "metadata": {}, "source": [ - "# BGE Hugging Face Embeddings\n", + "# BGE on Hugging Face\n", "\n", - "This notebook shows how to use BGE Embeddings through Hugging Face" + ">[BGE models on the HuggingFace](https://huggingface.co/BAAI/bge-large-en) are [the best open-source embedding models](https://huggingface.co/spaces/mteb/leaderboard).\n", + ">BGE model is created by the [Beijing Academy of Artificial Intelligence (BAAI)](https://www.baai.ac.cn/english.html). `BAAI` is a private non-profit organization engaged in AI research and development.\n", + "\n", + "This notebook shows how to use `BGE Embeddings` through `Hugging Face`" ] }, { "cell_type": "code", - "execution_count": 8, + "execution_count": null, "id": "f7a54279", "metadata": { "scrolled": true }, "outputs": [], "source": [ - "# !pip install sentence_transformers" + "#!pip install sentence_transformers" ] }, { "cell_type": "code", - "execution_count": 5, + "execution_count": null, "id": "9e1d5b6b", "metadata": {}, "outputs": [], @@ -43,12 +46,24 @@ }, { "cell_type": "code", - "execution_count": 7, + "execution_count": 5, "id": "e59d1a89", "metadata": {}, - "outputs": [], + "outputs": [ + { + "data": { + "text/plain": [ + "384" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], "source": [ - "embedding = hf.embed_query(\"hi this is harrison\")" + "embedding = hf.embed_query(\"hi this is harrison\")\n", + "len(embedding)" ] }, { @@ -76,7 +91,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.10.1" + "version": "3.10.12" } }, "nbformat": 4, diff --git a/docs/extras/integrations/text_embedding/google_vertex_ai_palm.ipynb b/docs/extras/integrations/text_embedding/google_vertex_ai_palm.ipynb index ea607467fb0..4c0c515e806 100644 --- a/docs/extras/integrations/text_embedding/google_vertex_ai_palm.ipynb +++ b/docs/extras/integrations/text_embedding/google_vertex_ai_palm.ipynb @@ -1,13 +1,14 @@ { "cells": [ { - "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ - "# Google Cloud Platform Vertex AI PaLM \n", + "# Google Vertex AI PaLM \n", "\n", - "Note: This is seperate from the Google PaLM integration, it exposes [Vertex AI PaLM API](https://cloud.google.com/vertex-ai/docs/generative-ai/learn/overview) on Google Cloud. \n", + ">[Vertex AI PaLM API](https://cloud.google.com/vertex-ai/docs/generative-ai/learn/overview) is a service on Google Cloud exposing the embedding models. \n", + "\n", + "Note: This integration is seperate from the Google PaLM integration.\n", "\n", "By default, Google Cloud [does not use](https://cloud.google.com/vertex-ai/docs/generative-ai/data-governance#foundation_model_development) Customer Data to train its foundation models as part of Google Cloud`s AI/ML Privacy Commitment. More details about how Google processes data can also be found in [Google's Customer Data Processing Addendum (CDPA)](https://cloud.google.com/terms/data-processing-addendum).\n", "\n", @@ -96,7 +97,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.9.1" + "version": "3.10.12" }, "vscode": { "interpreter": { diff --git a/docs/extras/integrations/text_embedding/modelscope_hub.ipynb b/docs/extras/integrations/text_embedding/modelscope_hub.ipynb index 765d46769ca..e2f47c4f3a4 100644 --- a/docs/extras/integrations/text_embedding/modelscope_hub.ipynb +++ b/docs/extras/integrations/text_embedding/modelscope_hub.ipynb @@ -1,12 +1,13 @@ { "cells": [ { - "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "# ModelScope\n", "\n", + ">[ModelScope](https://www.modelscope.cn/home) is big repository of the models and datasets.\n", + "\n", "Let's load the ModelScope Embedding class." ] }, @@ -67,16 +68,23 @@ ], "metadata": { "kernelspec": { - "display_name": "chatgpt", + "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", "name": "python", - "version": "3.9.15" - }, - "orig_nbformat": 4 + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.12" + } }, "nbformat": 4, - "nbformat_minor": 2 + "nbformat_minor": 4 } diff --git a/docs/extras/integrations/text_embedding/mosaicml.ipynb b/docs/extras/integrations/text_embedding/mosaicml.ipynb index 2d91c8d9c5c..24d7aecb724 100644 --- a/docs/extras/integrations/text_embedding/mosaicml.ipynb +++ b/docs/extras/integrations/text_embedding/mosaicml.ipynb @@ -1,15 +1,14 @@ { "cells": [ { - "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ - "# MosaicML embeddings\n", + "# MosaicML\n", "\n", - "[MosaicML](https://docs.mosaicml.com/en/latest/inference.html) offers a managed inference service. You can either use a variety of open source models, or deploy your own.\n", + ">[MosaicML](https://docs.mosaicml.com/en/latest/inference.html) offers a managed inference service. You can either use a variety of open source models, or deploy your own.\n", "\n", - "This example goes over how to use LangChain to interact with MosaicML Inference for text embedding." + "This example goes over how to use LangChain to interact with `MosaicML` Inference for text embedding." ] }, { @@ -94,6 +93,11 @@ } ], "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, "language_info": { "codemirror_mode": { "name": "ipython", @@ -103,9 +107,10 @@ "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", - "pygments_lexer": "ipython3" + "pygments_lexer": "ipython3", + "version": "3.10.12" } }, "nbformat": 4, - "nbformat_minor": 2 + "nbformat_minor": 4 } diff --git a/docs/extras/integrations/text_embedding/nlp_cloud.ipynb b/docs/extras/integrations/text_embedding/nlp_cloud.ipynb index 73ae71fe0f1..9567d59c4be 100644 --- a/docs/extras/integrations/text_embedding/nlp_cloud.ipynb +++ b/docs/extras/integrations/text_embedding/nlp_cloud.ipynb @@ -7,7 +7,7 @@ "source": [ "# NLP Cloud\n", "\n", - "NLP Cloud is an artificial intelligence platform that allows you to use the most advanced AI engines, and even train your own engines with your own data. \n", + ">[NLP Cloud](https://docs.nlpcloud.com/#introduction) is an artificial intelligence platform that allows you to use the most advanced AI engines, and even train your own engines with your own data. \n", "\n", "The [embeddings](https://docs.nlpcloud.com/#embeddings) endpoint offers the following model:\n", "\n", @@ -80,7 +80,7 @@ ], "metadata": { "kernelspec": { - "display_name": "Python 3.11.2 64-bit", + "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, @@ -94,7 +94,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.11.2" + "version": "3.10.12" }, "vscode": { "interpreter": { diff --git a/docs/extras/integrations/text_embedding/sagemaker-endpoint.ipynb b/docs/extras/integrations/text_embedding/sagemaker-endpoint.ipynb index fe5299ae6f2..ec80112e101 100644 --- a/docs/extras/integrations/text_embedding/sagemaker-endpoint.ipynb +++ b/docs/extras/integrations/text_embedding/sagemaker-endpoint.ipynb @@ -5,11 +5,13 @@ "id": "1f83f273", "metadata": {}, "source": [ - "# SageMaker Endpoint Embeddings\n", + "# SageMaker\n", "\n", - "Let's load the SageMaker Endpoints Embeddings class. The class can be used if you host, e.g. your own Hugging Face model on SageMaker.\n", + "Let's load the `SageMaker Endpoints Embeddings` class. The class can be used if you host, e.g. your own Hugging Face model on SageMaker.\n", "\n", - "For instructions on how to do this, please see [here](https://www.philschmid.de/custom-inference-huggingface-sagemaker). **Note**: In order to handle batched requests, you will need to adjust the return line in the `predict_fn()` function within the custom `inference.py` script:\n", + "For instructions on how to do this, please see [here](https://www.philschmid.de/custom-inference-huggingface-sagemaker). \n", + "\n", + "**Note**: In order to handle batched requests, you will need to adjust the return line in the `predict_fn()` function within the custom `inference.py` script:\n", "\n", "Change from\n", "\n", @@ -143,7 +145,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.9.1" + "version": "3.10.12" }, "vscode": { "interpreter": { diff --git a/docs/extras/integrations/text_embedding/self-hosted.ipynb b/docs/extras/integrations/text_embedding/self-hosted.ipynb index 00c497220e0..47faa6bf2d7 100644 --- a/docs/extras/integrations/text_embedding/self-hosted.ipynb +++ b/docs/extras/integrations/text_embedding/self-hosted.ipynb @@ -5,8 +5,8 @@ "id": "eec4efda", "metadata": {}, "source": [ - "# Self Hosted Embeddings\n", - "Let's load the SelfHostedEmbeddings, SelfHostedHuggingFaceEmbeddings, and SelfHostedHuggingFaceInstructEmbeddings classes." + "# Self Hosted\n", + "Let's load the `SelfHostedEmbeddings`, `SelfHostedHuggingFaceEmbeddings`, and `SelfHostedHuggingFaceInstructEmbeddings` classes." ] }, { @@ -149,9 +149,7 @@ "cell_type": "code", "execution_count": null, "id": "fc1bfd0f", - "metadata": { - "scrolled": false - }, + "metadata": {}, "outputs": [], "source": [ "query_result = embeddings.embed_query(text)" @@ -182,7 +180,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.9.1" + "version": "3.10.12" }, "vscode": { "interpreter": { diff --git a/docs/extras/integrations/text_embedding/sentence_transformers.ipynb b/docs/extras/integrations/text_embedding/sentence_transformers.ipynb index 67eb83ab7cd..e4649e6b719 100644 --- a/docs/extras/integrations/text_embedding/sentence_transformers.ipynb +++ b/docs/extras/integrations/text_embedding/sentence_transformers.ipynb @@ -1,16 +1,15 @@ { "cells": [ { - "attachments": {}, "cell_type": "markdown", "id": "ed47bb62", "metadata": {}, "source": [ - "# Sentence Transformers Embeddings\n", + "# Sentence Transformers\n", "\n", - "[SentenceTransformers](https://www.sbert.net/) embeddings are called using the `HuggingFaceEmbeddings` integration. We have also added an alias for `SentenceTransformerEmbeddings` for users who are more familiar with directly using that package.\n", + ">[SentenceTransformers](https://www.sbert.net/) embeddings are called using the `HuggingFaceEmbeddings` integration. We have also added an alias for `SentenceTransformerEmbeddings` for users who are more familiar with directly using that package.\n", "\n", - "SentenceTransformers is a python package that can generate text and image embeddings, originating from [Sentence-BERT](https://arxiv.org/abs/1908.10084)" + "`SentenceTransformers` is a python package that can generate text and image embeddings, originating from [Sentence-BERT](https://arxiv.org/abs/1908.10084)" ] }, { @@ -109,7 +108,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.8.16" + "version": "3.10.12" }, "vscode": { "interpreter": { diff --git a/docs/extras/integrations/text_embedding/spacy_embedding.ipynb b/docs/extras/integrations/text_embedding/spacy_embedding.ipynb index bfea82d5d45..edda4828b47 100644 --- a/docs/extras/integrations/text_embedding/spacy_embedding.ipynb +++ b/docs/extras/integrations/text_embedding/spacy_embedding.ipynb @@ -1,21 +1,31 @@ { "cells": [ { - "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ - "# Spacy Embedding\n", + "# SpaCy\n", "\n", - "### Loading the Spacy embedding class to generate and query embeddings" + ">[spaCy](https://spacy.io/) is an open-source software library for advanced natural language processing, written in the programming languages Python and Cython.\n", + " \n", + "\n", + "## Installation and Setup" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#!pip install spacy" ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ - "#### Import the necessary classes" + "Import the necessary classes" ] }, { @@ -28,11 +38,12 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ - "#### Initialize SpacyEmbeddings.This will load the Spacy model into memory." + "## Example\n", + "\n", + "Initialize SpacyEmbeddings.This will load the Spacy model into memory." ] }, { @@ -45,11 +56,10 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ - "#### Define some example texts . These could be any documents that you want to analyze - for example, news articles, social media posts, or product reviews." + "Define some example texts . These could be any documents that you want to analyze - for example, news articles, social media posts, or product reviews." ] }, { @@ -67,11 +77,10 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ - "#### Generate and print embeddings for the texts . The SpacyEmbeddings class generates an embedding for each document, which is a numerical representation of the document's content. These embeddings can be used for various natural language processing tasks, such as document similarity comparison or text classification." + "Generate and print embeddings for the texts . The SpacyEmbeddings class generates an embedding for each document, which is a numerical representation of the document's content. These embeddings can be used for various natural language processing tasks, such as document similarity comparison or text classification." ] }, { @@ -86,11 +95,10 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ - "#### Generate and print an embedding for a single piece of text. You can also generate an embedding for a single piece of text, such as a search query. This can be useful for tasks like information retrieval, where you want to find documents that are similar to a given query." + "Generate and print an embedding for a single piece of text. You can also generate an embedding for a single piece of text, such as a search query. This can be useful for tasks like information retrieval, where you want to find documents that are similar to a given query." ] }, { @@ -106,11 +114,24 @@ } ], "metadata": { - "language_info": { - "name": "python" + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" }, - "orig_nbformat": 4 + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.12" + } }, "nbformat": 4, - "nbformat_minor": 2 + "nbformat_minor": 4 }