langchain[minor]: Adding infinity embedding integration. (#13928)

This adds integation to https://github.com/michaelfeil/infinity. Users requested it in https://github.com/michaelfeil/infinity/issues/36 @saatvikshah Follows my implementation of gradient.ai. Feedback 1: Well done - I love your CI / repo / poetry setup - I adapted a lot in https://github.com/michaelfeil/infinity. Feedback 2: Not so good: The openai integration contains to much reverse engineering - in general projects such as michaelfeil/infinity and huggingface/text-embeddings-inference are compatible to the `pip install openai` package. Reverse engineering like this one is really hindering the use for me: 8e88ba16a8/libs/langchain/langchain/embeddings/openai.py (L347) 8e88ba16a8/libs/langchain/langchain/embeddings/openai.py (L351) - it is about preventing 3rd party providers to use the same url + uses interfaces of openai, that are not publically documented.
2025-08-15 07:36:08 +00:00 · 2023-11-28 01:43:47 +01:00 · 2023-11-28 01:43:47 +01:00 · 686162670e
commit 686162670e
parent 10a6e7cbb6
6 changed files with 629 additions and 0 deletions
--- a/docs/docs/integrations/providers/infinity.mdx
+++ b/docs/docs/integrations/providers/infinity.mdx
@ -0,0 +1,11 @@
 # Infinity
 >[Infinity](https://github.com/michaelfeil/infinity) allows the creation of text embeddings.
 ## Text Embedding Model
 There exists an infinity Embedding model, which you can access with 
 ```python
 from langchain.embeddings import InfinityEmbeddings
 ```
 For a more detailed walkthrough of this, see [this notebook](/docs/integrations/text_embedding/infinity)
--- a/docs/docs/integrations/text_embedding/infinity.ipynb
+++ b/docs/docs/integrations/text_embedding/infinity.ipynb
@ -0,0 +1,191 @@
 {
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Infinity\n",
    "\n",
    "`Infinity` allows to create `Embeddings` using a MIT-licensed Embedding Server. \n",
    "\n",
    "This notebook goes over how to use Langchain with Embeddings with the [Infinity Github Project](https://github.com/michaelfeil/infinity).\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Imports"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.embeddings import InfinityEmbeddings"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Optional: Make sure to start the Infinity instance\n",
    "\n",
    "To install infinity use the following command. For further details check out the [Docs on Github](https://github.com/michaelfeil/infinity).\n",
    "```bash\n",
    "pip install infinity_emb[all]\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Requirement already satisfied: infinity_emb[cli] in /home/michi/langchain/.venv/lib/python3.10/site-packages (0.0.8)\n",
      "\u001b[33mWARNING: infinity-emb 0.0.8 does not provide the extra 'cli'\u001b[0m\u001b[33m\n",
      "\u001b[0mRequirement already satisfied: numpy>=1.20.0 in /home/michi/langchain/.venv/lib/python3.10/site-packages (from infinity_emb[cli]) (1.24.4)\n",
      "\u001b[33mWARNING: There was an error checking the latest version of pip.\u001b[0m\u001b[33m\n",
      "\u001b[0m"
     ]
    }
   ],
   "source": [
    "# Install the infinity package\n",
    "!pip install infinity_emb[cli,torch]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Start up the server - best to be done from a separate terminal, not inside Jupyter Notebook\n",
    "\n",
    "```bash\n",
    "model=sentence-transformers/all-MiniLM-L6-v2\n",
    "port=7797\n",
    "infinity_emb --port $port --model-name-or-path $model\n",
    "```\n",
    "\n",
    "or alternativley just use docker:\n",
    "```bash\n",
    "model=sentence-transformers/all-MiniLM-L6-v2\n",
    "port=7797\n",
    "docker run -it --gpus all -p $port:$port michaelf34/infinity:latest --model-name-or-path $model --port $port\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Embed your documents using your Infinity instance "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "documents = [\n",
    "    \"Baguette is a dish.\",\n",
    "    \"Paris is the capital of France.\",\n",
    "    \"numpy is a lib for linear algebra\",\n",
    "    \"You escaped what I've escaped - You'd be in Paris getting fucked up too\",\n",
    "]\n",
    "query = \"Where is Paris?\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "embeddings created successful\n"
     ]
    }
   ],
   "source": [
    "#\n",
    "infinity_api_url = \"http://localhost:7797/v1\"\n",
    "# model is currently not validated.\n",
    "embeddings = InfinityEmbeddings(\n",
    "    model=\"sentence-transformers/all-MiniLM-L6-v2\", infinity_api_url=infinity_api_url\n",
    ")\n",
    "try:\n",
    "    documents_embedded = embeddings.embed_documents(documents)\n",
    "    query_result = embeddings.embed_query(query)\n",
    "    print(\"embeddings created successful\")\n",
    "except Exception as ex:\n",
    "    print(\n",
    "        \"Make sure the infinity instance is running. Verify by clicking on \"\n",
    "        f\"{infinity_api_url.replace('v1','docs')} Exception: {ex}. \"\n",
    "    )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'Baguette is a dish.': 0.31344215908661155,\n",
       " 'Paris is the capital of France.': 0.8148670296896388,\n",
       " 'numpy is a lib for linear algebra': 0.004429399861302009,\n",
       " \"You escaped what I've escaped - You'd be in Paris getting fucked up too\": 0.5088476180154582}"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# (demo) compute similarity\n",
    "import numpy as np\n",
    "\n",
    "scores = np.array(documents_embedded) @ np.array(query_result).T\n",
    "dict(zip(documents, scores))"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.12"
  },
  "vscode": {
   "interpreter": {
    "hash": "a0a0263b650d907a3bfe41c0f8d6a63a071b884df3cfdc1579f00cdc1aed6b03"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
 }
--- a/libs/langchain/langchain/embeddings/init.py
+++ b/libs/langchain/langchain/embeddings/init.py
@ -43,6 +43,7 @@ from langchain.embeddings.huggingface import (
    HuggingFaceInstructEmbeddings,
 )
 from langchain.embeddings.huggingface_hub import HuggingFaceHubEmbeddings
 from langchain.embeddings.infinity import InfinityEmbeddings
 from langchain.embeddings.javelin_ai_gateway import JavelinAIGatewayEmbeddings
 from langchain.embeddings.jina import JinaEmbeddings
 from langchain.embeddings.johnsnowlabs import JohnSnowLabsEmbeddings
@ -81,6 +82,7 @@ __all__ = [
    "FastEmbedEmbeddings",
    "HuggingFaceEmbeddings",
    "HuggingFaceInferenceAPIEmbeddings",
    "InfinityEmbeddings",
    "GradientEmbeddings",
    "JinaEmbeddings",
    "LlamaCppEmbeddings",
--- a/libs/langchain/langchain/embeddings/infinity.py
+++ b/libs/langchain/langchain/embeddings/infinity.py
@ -0,0 +1,323 @@
 """written under MIT Licence, Michael Feil 2023."""
 import asyncio
 from concurrent.futures import ThreadPoolExecutor
 from typing import Any, Callable, Dict, List, Optional, Tuple
 import aiohttp
 import numpy as np
 import requests
 from langchain_core.embeddings import Embeddings
 from langchain_core.pydantic_v1 import BaseModel, Extra, root_validator
 from langchain.utils import get_from_dict_or_env
 __all__ = ["InfinityEmbeddings"]
 class InfinityEmbeddings(BaseModel, Embeddings):
    """Embedding models for self-hosted https://github.com/michaelfeil/infinity
    This should also work for text-embeddings-inference and other
    self-hosted openai-compatible servers.
    Infinity is a class to interact with Embedding Models on https://github.com/michaelfeil/infinity
    Example:
        .. code-block:: python
            from langchain.embeddings import InfinityEmbeddings
            InfinityEmbeddings(
                model="BAAI/bge-small",
                infinity_api_url="http://localhost:7797/v1",
            )
    """
    model: str
    "Underlying Infinity model id."
    infinity_api_url: str = "http://localhost:7797/v1"
    """Endpoint URL to use."""
    client: Any = None  #: :meta private:
    """Infinity client."""
    # LLM call kwargs
    class Config:
        """Configuration for this pydantic object."""
        extra = Extra.forbid
    @root_validator(allow_reuse=True)
    def validate_environment(cls, values: Dict) -> Dict:
        """Validate that api key and python package exists in environment."""
        values["infinity_api_url"] = get_from_dict_or_env(
            values, "infinity_api_url", "INFINITY_API_URL"
        )
        values["client"] = TinyAsyncOpenAIInfinityEmbeddingClient(
            host=values["infinity_api_url"],
        )
        return values
    def embed_documents(self, texts: List[str]) -> List[List[float]]:
        """Call out to Infinity's embedding endpoint.
        Args:
            texts: The list of texts to embed.
        Returns:
            List of embeddings, one for each text.
        """
        embeddings = self.client.embed(
            model=self.model,
            texts=texts,
        )
        return embeddings
    async def aembed_documents(self, texts: List[str]) -> List[List[float]]:
        """Async call out to Infinity's embedding endpoint.
        Args:
            texts: The list of texts to embed.
        Returns:
            List of embeddings, one for each text.
        """
        embeddings = await self.client.aembed(
            model=self.model,
            texts=texts,
        )
        return embeddings
    def embed_query(self, text: str) -> List[float]:
        """Call out to Infinity's embedding endpoint.
        Args:
            text: The text to embed.
        Returns:
            Embeddings for the text.
        """
        return self.embed_documents([text])[0]
    async def aembed_query(self, text: str) -> List[float]:
        """Async call out to Infinity's embedding endpoint.
        Args:
            text: The text to embed.
        Returns:
            Embeddings for the text.
        """
        embeddings = await self.aembed_documents([text])
        return embeddings[0]
 class TinyAsyncOpenAIInfinityEmbeddingClient:  #: :meta private:
    """A helper tool to embed Infinity. Not part of Langchain's stable API,
    direct use discouraged.
    Example:
        .. code-block:: python
            mini_client = TinyAsyncInfinityEmbeddingClient(
            )
            embeds = mini_client.embed(
                model="BAAI/bge-small",
                text=["doc1", "doc2"]
            )
            # or
            embeds = await mini_client.aembed(
                model="BAAI/bge-small",
                text=["doc1", "doc2"]
            )
    """
    def __init__(
        self,
        host: str = "http://localhost:7797/v1",
        aiosession: Optional[aiohttp.ClientSession] = None,
    ) -> None:
        self.host = host
        self.aiosession = aiosession
        if self.host is None or len(self.host) < 3:
            raise ValueError(" param `host` must be set to a valid url")
        self._batch_size = 128
    @staticmethod
    def _permute(
        texts: List[str], sorter: Callable = len
    ) -> Tuple[List[str], Callable]:
        """Sort texts in ascending order, and
        delivers a lambda expr, which can sort a same length list
        https://github.com/UKPLab/sentence-transformers/blob/
        c5f93f70eca933c78695c5bc686ceda59651ae3b/sentence_transformers/SentenceTransformer.py#L156
        Args:
            texts (List[str]): _description_
            sorter (Callable, optional): _description_. Defaults to len.
        Returns:
            Tuple[List[str], Callable]: _description_
        Example:
            ```
            texts = ["one","three","four"]
            perm_texts, undo = self._permute(texts)
            texts == undo(perm_texts)
            ```
        """
        if len(texts) == 1:
            # special case query
            return texts, lambda t: t
        length_sorted_idx = np.argsort([-sorter(sen) for sen in texts])
        texts_sorted = [texts[idx] for idx in length_sorted_idx]
        return texts_sorted, lambda unsorted_embeddings: [  # noqa E731
            unsorted_embeddings[idx] for idx in np.argsort(length_sorted_idx)
        ]
    def _batch(self, texts: List[str]) -> List[List[str]]:
        """
        splits Lists of text parts into batches of size max `self._batch_size`
        When encoding vector database,
        Args:
            texts (List[str]): List of sentences
            self._batch_size (int, optional): max batch size of one request.
        Returns:
            List[List[str]]: Batches of List of sentences
        """
        if len(texts) == 1:
            # special case query
            return [texts]
        batches = []
        for start_index in range(0, len(texts), self._batch_size):
            batches.append(texts[start_index : start_index + self._batch_size])
        return batches
    @staticmethod
    def _unbatch(batch_of_texts: List[List[Any]]) -> List[Any]:
        if len(batch_of_texts) == 1 and len(batch_of_texts[0]) == 1:
            # special case query
            return batch_of_texts[0]
        texts = []
        for sublist in batch_of_texts:
            texts.extend(sublist)
        return texts
    def _kwargs_post_request(self, model: str, texts: List[str]) -> Dict[str, Any]:
        """Build the kwargs for the Post request, used by sync
        Args:
            model (str): _description_
            texts (List[str]): _description_
        Returns:
            Dict[str, Collection[str]]: _description_
        """
        return dict(
            url=f"{self.host}/embeddings",
            headers={
                # "accept": "application/json",
                "content-type": "application/json",
            },
            json=dict(
                input=texts,
                model=model,
            ),
        )
    def _sync_request_embed(
        self, model: str, batch_texts: List[str]
    ) -> List[List[float]]:
        response = requests.post(
            **self._kwargs_post_request(model=model, texts=batch_texts)
        )
        if response.status_code != 200:
            raise Exception(
                f"Infinity returned an unexpected response with status "
                f"{response.status_code}: {response.text}"
            )
        return [e["embedding"] for e in response.json()["data"]]
    def embed(self, model: str, texts: List[str]) -> List[List[float]]:
        """call the embedding of model
        Args:
            model (str): to embedding model
            texts (List[str]): List of sentences to embed.
        Returns:
            List[List[float]]: List of vectors for each sentence
        """
        perm_texts, unpermute_func = self._permute(texts)
        perm_texts_batched = self._batch(perm_texts)
        # Request
        map_args = (
            self._sync_request_embed,
            [model] * len(perm_texts_batched),
            perm_texts_batched,
        )
        if len(perm_texts_batched) == 1:
            embeddings_batch_perm = list(map(*map_args))
        else:
            with ThreadPoolExecutor(32) as p:
                embeddings_batch_perm = list(p.map(*map_args))
        embeddings_perm = self._unbatch(embeddings_batch_perm)
        embeddings = unpermute_func(embeddings_perm)
        return embeddings
    async def _async_request(
        self, session: aiohttp.ClientSession, kwargs: Dict[str, Any]
    ) -> List[List[float]]:
        async with session.post(**kwargs) as response:
            if response.status != 200:
                raise Exception(
                    f"Infinity returned an unexpected response with status "
                    f"{response.status}: {response.text}"
                )
            embedding = (await response.json())["embeddings"]
            return [e["embedding"] for e in embedding]
    async def aembed(self, model: str, texts: List[str]) -> List[List[float]]:
        """call the embedding of model, async method
        Args:
            model (str): to embedding model
            texts (List[str]): List of sentences to embed.
        Returns:
            List[List[float]]: List of vectors for each sentence
        """
        perm_texts, unpermute_func = self._permute(texts)
        perm_texts_batched = self._batch(perm_texts)
        # Request
        if self.aiosession is None:
            self.aiosession = aiohttp.ClientSession(
                trust_env=True, connector=aiohttp.TCPConnector(limit=32)
            )
        async with self.aiosession as session:
            embeddings_batch_perm = await asyncio.gather(
                *[
                    self._async_request(
                        session=session,
                        **self._kwargs_post_request(model=model, texts=t),
                    )
                    for t in perm_texts_batched
                ]
            )
        embeddings_perm = self._unbatch(embeddings_batch_perm)
        embeddings = unpermute_func(embeddings_perm)
        return embeddings
--- a/libs/langchain/tests/unit_tests/embeddings/test_imports.py
+++ b/libs/langchain/tests/unit_tests/embeddings/test_imports.py
@ -10,6 +10,7 @@ EXPECTED_ALL = [
    "FastEmbedEmbeddings",
    "HuggingFaceEmbeddings",
    "HuggingFaceInferenceAPIEmbeddings",
    "InfinityEmbeddings",
    "GradientEmbeddings",
    "JinaEmbeddings",
    "LlamaCppEmbeddings",
--- a/libs/langchain/tests/unit_tests/embeddings/test_infinity.py
+++ b/libs/langchain/tests/unit_tests/embeddings/test_infinity.py
@ -0,0 +1,101 @@
 from typing import Dict
 from pytest_mock import MockerFixture
 from langchain.embeddings import InfinityEmbeddings
 _MODEL_ID = "BAAI/bge-small"
 _INFINITY_BASE_URL = "https://localhost/api"
 _DOCUMENTS = [
    "pizza",
    "another pizza",
    "a document",
    "another pizza",
    "super long document with many tokens",
 ]
 class MockResponse:
    def __init__(self, json_data: Dict, status_code: int):
        self.json_data = json_data
        self.status_code = status_code
    def json(self) -> Dict:
        return self.json_data
 def mocked_requests_post(
    url: str,
    headers: dict,
    json: dict,
 ) -> MockResponse:
    assert url.startswith(_INFINITY_BASE_URL)
    assert "model" in json and _MODEL_ID in json["model"]
    assert json
    assert headers
    assert "input" in json and isinstance(json["input"], list)
    embeddings = []
    for inp in json["input"]:
        # verify correct ordering
        if "pizza" in inp:
            v = [1.0, 0.0, 0.0]
        elif "document" in inp:
            v = [0.0, 0.9, 0.0]
        else:
            v = [0.0, 0.0, -1.0]
        if len(inp) > 10:
            v[2] += 0.1
        embeddings.append({"embedding": v})
    return MockResponse(
        json_data={"data": embeddings},
        status_code=200,
    )
 def test_infinity_emb_sync(
    mocker: MockerFixture,
 ) -> None:
    mocker.patch("requests.post", side_effect=mocked_requests_post)
    embedder = InfinityEmbeddings(model=_MODEL_ID, infinity_api_url=_INFINITY_BASE_URL)
    assert embedder.infinity_api_url == _INFINITY_BASE_URL
    assert embedder.model == _MODEL_ID
    response = embedder.embed_documents(_DOCUMENTS)
    want = [
        [1.0, 0.0, 0.0],  # pizza
        [1.0, 0.0, 0.1],  # pizza  + long
        [0.0, 0.9, 0.0],  # doc
        [1.0, 0.0, 0.1],  # pizza + long
        [0.0, 0.9, 0.1],  # doc + long
    ]
    assert response == want
 def test_infinity_large_batch_size(
    mocker: MockerFixture,
 ) -> None:
    mocker.patch("requests.post", side_effect=mocked_requests_post)
    embedder = InfinityEmbeddings(
        infinity_api_url=_INFINITY_BASE_URL,
        model=_MODEL_ID,
    )
    assert embedder.infinity_api_url == _INFINITY_BASE_URL
    assert embedder.model == _MODEL_ID
    response = embedder.embed_documents(_DOCUMENTS * 1024)
    want = [
        [1.0, 0.0, 0.0],  # pizza
        [1.0, 0.0, 0.1],  # pizza  + long
        [0.0, 0.9, 0.0],  # doc
        [1.0, 0.0, 0.1],  # pizza + long
        [0.0, 0.9, 0.1],  # doc + long
    ] * 1024
    assert response == want