mirror of https://github.com/hwchase17/langchain.git synced 2025-07-15 17:33:53 +00:00

Davit Buniatyan aaac7071a3

Deep Lake retriever example analyzing Twitter the-algorithm source code (#2602 )

Improvements to Deep Lake Vector Store
- much faster view loading of embeddings after filters with
`fetch_chunks=True`
- 2x faster ingestion
- use np.float32 for embeddings to save 2x storage, LZ4 compression for
text and metadata storage (saves up to 4x storage for text data)
- user defined functions as filters

Docs
- Added retriever full example for analyzing twitter the-algorithm
source code with GPT4
- Added a use case for code analysis (please let us know your thoughts
how we can improve it)

---------

Co-authored-by: Davit Buniatyan <d@activeloop.ai>

2023-04-09 12:29:47 -07:00

1.7 KiB

Raw Blame History

Deep Lake

This page covers how to use the Deep Lake ecosystem within LangChain.

Why Deep Lake?

More than just a (multi-modal) vector store. You can later use the dataset to fine-tune your own LLM models.
Not only stores embeddings, but also the original data with automatic version control.
Truly serverless. Doesn't require another service and can be used with major cloud providers (AWS S3, GCS, etc.)

More Resources

Ultimate Guide to LangChain & Deep Lake: Build ChatGPT to Answer Questions on Your Financial Data
Twitter the-algorithm codebase analysis with Deep Lake
Here is whitepaper and academic paper for Deep Lake
Here is a set of additional resources available for review: Deep Lake, Getting Started and Tutorials

Installation and Setup

Install the Python package with pip install deeplake

Wrappers

VectorStore

There exists a wrapper around Deep Lake, a data lake for Deep Learning applications, allowing you to use it as a vector store (for now), whether for semantic search or example selection.

To import this vectorstore:

from langchain.vectorstores import DeepLake

For a more detailed walkthrough of the Deep Lake wrapper, see this notebook

1.7 KiB Raw Blame History