LangChain relies on NumPy to compute cosine distances, which becomes a bottleneck with the growing dimensionality and number of embeddings. To avoid this bottleneck, in our libraries at [Unum](https://github.com/unum-cloud), we have created a specialized package - [SimSIMD](https://github.com/ashvardanian/simsimd), that knows how to use newer hardware capabilities. Compared to SciPy and NumPy, it reaches 3x-200x performance for various data types. Since publication, several LangChain users have asked me if I can integrate it into LangChain to accelerate their workflows, so here I am 🤗 ## Benchmarking To conduct benchmarks locally, run this in your Jupyter: ```py import numpy as np import scipy as sp import simsimd as simd import timeit as tt def cosine_similarity_np(X: np.ndarray, Y: np.ndarray) -> np.ndarray: X_norm = np.linalg.norm(X, axis=1) Y_norm = np.linalg.norm(Y, axis=1) with np.errstate(divide="ignore", invalid="ignore"): similarity = np.dot(X, Y.T) / np.outer(X_norm, Y_norm) similarity[np.isnan(similarity) | np.isinf(similarity)] = 0.0 return similarity def cosine_similarity_sp(X: np.ndarray, Y: np.ndarray) -> np.ndarray: return 1 - sp.spatial.distance.cdist(X, Y, metric='cosine') def cosine_similarity_simd(X: np.ndarray, Y: np.ndarray) -> np.ndarray: return 1 - simd.cdist(X, Y, metric='cosine') X = np.random.randn(1, 1536).astype(np.float32) Y = np.random.randn(1, 1536).astype(np.float32) repeat = 1000 print("NumPy: {:,.0f} ops/s, SciPy: {:,.0f} ops/s, SimSIMD: {:,.0f} ops/s".format( repeat / tt.timeit(lambda: cosine_similarity_np(X, Y), number=repeat), repeat / tt.timeit(lambda: cosine_similarity_sp(X, Y), number=repeat), repeat / tt.timeit(lambda: cosine_similarity_simd(X, Y), number=repeat), )) ``` ## Results I ran this on an M2 Pro Macbook for various data types and different number of rows in `X` and reformatted the results as a table for readability: | Data Type | NumPy | SciPy | SimSIMD | | :--- | ---: | ---: | ---: | | `f32, 1` | 59,114 ops/s | 80,330 ops/s | 475,351 ops/s | | `f16, 1` | 32,880 ops/s | 82,420 ops/s | 650,177 ops/s | | `i8, 1` | 47,916 ops/s | 115,084 ops/s | 866,958 ops/s | | `f32, 10` | 40,135 ops/s | 24,305 ops/s | 185,373 ops/s | | `f16, 10` | 7,041 ops/s | 17,596 ops/s | 192,058 ops/s | | `f16, 10` | 21,989 ops/s | 25,064 ops/s | 619,131 ops/s | | `f32, 100` | 3,536 ops/s | 3,094 ops/s | 24,206 ops/s | | `f16, 100` | 900 ops/s | 2,014 ops/s | 23,364 ops/s | | `i8, 100` | 5,510 ops/s | 3,214 ops/s | 143,922 ops/s | It's important to note that SimSIMD will underperform if both matrices are huge. That, however, seems to be an uncommon usage pattern for LangChain users. You can find a much more detailed performance report for different hardware models here: - [Apple M2 Pro](https://ashvardanian.com/posts/simsimd-faster-scipy/#appendix-1-performance-on-apple-m2-pro). - [4th Gen Intel Xeon Platinum](https://ashvardanian.com/posts/simsimd-faster-scipy/#appendix-2-performance-on-4th-gen-intel-xeon-platinum-8480). - [AWS Graviton 3](https://ashvardanian.com/posts/simsimd-faster-scipy/#appendix-3-performance-on-aws-graviton-3). ## Additional Notes 1. Previous version used `X = np.array(X)`, to repackage lists of lists. It's an anti-pattern, as it will use double-precision floating-point numbers, which are slow on both CPUs and GPUs. I have replaced it with `X = np.array(X, dtype=np.float32)`, but a more selective approach should be discussed. 2. In numerical computations, it's recommended to explicitly define tolerance levels, which were previously avoided in `np.allclose(expected, actual)` calls. For now, I've set absolute tolerance to distance computation errors as 0.01: `np.allclose(expected, actual, atol=1e-2)`. --- - **Dependencies:** adds `simsimd` dependency - **Tag maintainer:** @hwchase17 - **Twitter handle:** @ashvardanian --------- Co-authored-by: Bagatur <baskaryan@gmail.com> |
||
---|---|---|
.devcontainer | ||
.github | ||
docker | ||
docs | ||
libs | ||
.gitattributes | ||
.gitignore | ||
.gitmodules | ||
.readthedocs.yaml | ||
CITATION.cff | ||
LICENSE | ||
Makefile | ||
MIGRATE.md | ||
poetry.lock | ||
poetry.toml | ||
pyproject.toml | ||
README.md | ||
SECURITY.md |
🦜️🔗 LangChain
⚡ Building applications with LLMs through composability ⚡
Looking for the JS/TS version? Check out LangChain.js.
Production Support: As you move your LangChains into production, we'd love to offer more hands-on support. Fill out this form to share more about what you're building, and our team will get in touch.
🚨Breaking Changes for select chains (SQLDatabase) on 7/28/23
In an effort to make langchain
leaner and safer, we are moving select chains to langchain_experimental
.
This migration has already started, but we are remaining backwards compatible until 7/28.
On that date, we will remove functionality from langchain
.
Read more about the motivation and the progress here.
Read how to migrate your code here.
Quick Install
pip install langchain
or
pip install langsmith && conda install langchain -c conda-forge
🤔 What is this?
Large language models (LLMs) are emerging as a transformative technology, enabling developers to build applications that they previously could not. However, using these LLMs in isolation is often insufficient for creating a truly powerful app - the real power comes when you can combine them with other sources of computation or knowledge.
This library aims to assist in the development of those types of applications. Common examples of these applications include:
❓ Question Answering over specific documents
- Documentation
- End-to-end Example: Question Answering over Notion Database
💬 Chatbots
- Documentation
- End-to-end Example: Chat-LangChain
🤖 Agents
- Documentation
- End-to-end Example: GPT+WolframAlpha
📖 Documentation
Please see here for full documentation on:
- Getting started (installation, setting up the environment, simple examples)
- How-To examples (demos, integrations, helper functions)
- Reference (full API docs)
- Resources (high-level explanation of core concepts)
🚀 What can this help with?
There are six main areas that LangChain is designed to help with. These are, in increasing order of complexity:
📃 LLMs and Prompts:
This includes prompt management, prompt optimization, a generic interface for all LLMs, and common utilities for working with LLMs.
🔗 Chains:
Chains go beyond a single LLM call and involve sequences of calls (whether to an LLM or a different utility). LangChain provides a standard interface for chains, lots of integrations with other tools, and end-to-end chains for common applications.
📚 Data Augmented Generation:
Data Augmented Generation involves specific types of chains that first interact with an external data source to fetch data for use in the generation step. Examples include summarization of long pieces of text and question/answering over specific data sources.
🤖 Agents:
Agents involve an LLM making decisions about which Actions to take, taking that Action, seeing an Observation, and repeating that until done. LangChain provides a standard interface for agents, a selection of agents to choose from, and examples of end-to-end agents.
🧠 Memory:
Memory refers to persisting state between calls of a chain/agent. LangChain provides a standard interface for memory, a collection of memory implementations, and examples of chains/agents that use memory.
🧐 Evaluation:
[BETA] Generative models are notoriously hard to evaluate with traditional metrics. One new way of evaluating them is using language models themselves to do the evaluation. LangChain provides some prompts/chains for assisting in this.
For more information on these concepts, please see our full documentation.
💁 Contributing
As an open-source project in a rapidly developing field, we are extremely open to contributions, whether it be in the form of a new feature, improved infrastructure, or better documentation.
For detailed information on how to contribute, see here.