If the document loader recieves Pathlib path instead of str, it reads the file correctly, but the problem begins when the document is added to Deeplake. This problem arises from casting the path to str in the metadata. ```python deeplake = True fname = Path('./lorem_ipsum.txt') loader = TextLoader(fname, encoding="utf-8") docs = loader.load_and_split() text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=100) chunks= text_splitter.split_documents(docs) if deeplake: db = DeepLake(dataset_path=ds_path, embedding=embeddings, token=activeloop_token) db.add_documents(chunks) else: db = Chroma.from_documents(docs, embeddings) ``` So using this snippet of code the error message for deeplake looks like this: ``` [part of error message omitted] Traceback (most recent call last): File "/home/mwm/repositories/sources/fixing_langchain/main.py", line 53, in <module> db.add_documents(chunks) File "/home/mwm/repositories/sources/langchain/libs/core/langchain_core/vectorstores.py", line 139, in add_documents return self.add_texts(texts, metadatas, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/mwm/repositories/sources/langchain/libs/community/langchain_community/vectorstores/deeplake.py", line 258, in add_texts return self.vectorstore.add( ^^^^^^^^^^^^^^^^^^^^^ File "/home/mwm/anaconda3/envs/langchain/lib/python3.11/site-packages/deeplake/core/vectorstore/deeplake_vectorstore.py", line 226, in add return self.dataset_handler.add( ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/mwm/anaconda3/envs/langchain/lib/python3.11/site-packages/deeplake/core/vectorstore/dataset_handlers/client_side_dataset_handler.py", line 139, in add dataset_utils.extend_or_ingest_dataset( File "/home/mwm/anaconda3/envs/langchain/lib/python3.11/site-packages/deeplake/core/vectorstore/vector_search/dataset/dataset.py", line 544, in extend_or_ingest_dataset extend( File "/home/mwm/anaconda3/envs/langchain/lib/python3.11/site-packages/deeplake/core/vectorstore/vector_search/dataset/dataset.py", line 505, in extend dataset.extend(batched_processed_tensors, progressbar=False) File "/home/mwm/anaconda3/envs/langchain/lib/python3.11/site-packages/deeplake/core/dataset/dataset.py", line 3247, in extend raise SampleExtendError(str(e)) from e.__cause__ deeplake.util.exceptions.SampleExtendError: Failed to append a sample to the tensor 'metadata'. See more details in the traceback. If you wish to skip the samples that cause errors, please specify `ignore_errors=True`. ``` Which is does not explain the error well enough. The same error for chroma looks like this ``` During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/mwm/repositories/sources/fixing_langchain/main.py", line 56, in <module> db = Chroma.from_documents(docs, embeddings) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/mwm/repositories/sources/langchain/libs/community/langchain_community/vectorstores/chroma.py", line 778, in from_documents return cls.from_texts( ^^^^^^^^^^^^^^^ File "/home/mwm/repositories/sources/langchain/libs/community/langchain_community/vectorstores/chroma.py", line 736, in from_texts chroma_collection.add_texts( File "/home/mwm/repositories/sources/langchain/libs/community/langchain_community/vectorstores/chroma.py", line 309, in add_texts raise ValueError(e.args[0] + "\n\n" + msg) ValueError: Expected metadata value to be a str, int, float or bool, got lorem_ipsum.txt which is a <class 'pathlib.PosixPath'> Try filtering complex metadata from the document using langchain_community.vectorstores.utils.filter_complex_metadata. ``` Which is way more user friendly, so I just added information about possible mismatch of the type in the error message, the same way it is covered in chroma https://github.com/langchain-ai/langchain/blob/master/libs/community/langchain_community/vectorstores/chroma.py#L224 |
||
---|---|---|
.devcontainer | ||
.github | ||
cookbook | ||
docker | ||
docs | ||
libs | ||
templates | ||
.gitattributes | ||
.gitignore | ||
.readthedocs.yaml | ||
CITATION.cff | ||
LICENSE | ||
Makefile | ||
MIGRATE.md | ||
poetry.lock | ||
poetry.toml | ||
pyproject.toml | ||
README.md | ||
SECURITY.md |
🦜️🔗 LangChain
⚡ Build context-aware reasoning applications ⚡
Looking for the JS/TS library? Check out LangChain.js.
To help you ship LangChain apps to production faster, check out LangSmith. LangSmith is a unified developer platform for building, testing, and monitoring LLM applications. Fill out this form to speak with our sales team.
Quick Install
With pip:
pip install langchain
With conda:
conda install langchain -c conda-forge
🤔 What is LangChain?
LangChain is a framework for developing applications powered by language models. It enables applications that:
- Are context-aware: connect a language model to sources of context (prompt instructions, few shot examples, content to ground its response in, etc.)
- Reason: rely on a language model to reason (about how to answer based on provided context, what actions to take, etc.)
This framework consists of several parts.
- LangChain Libraries: The Python and JavaScript libraries. Contains interfaces and integrations for a myriad of components, a basic run time for combining these components into chains and agents, and off-the-shelf implementations of chains and agents.
- LangChain Templates: A collection of easily deployable reference architectures for a wide variety of tasks.
- LangServe: A library for deploying LangChain chains as a REST API.
- LangSmith: A developer platform that lets you debug, test, evaluate, and monitor chains built on any LLM framework and seamlessly integrates with LangChain.
- LangGraph: LangGraph is a library for building stateful, multi-actor applications with LLMs, built on top of (and intended to be used with) LangChain. It extends the LangChain Expression Language with the ability to coordinate multiple chains (or actors) across multiple steps of computation in a cyclic manner.
The LangChain libraries themselves are made up of several different packages.
langchain-core
: Base abstractions and LangChain Expression Language.langchain-community
: Third party integrations.langchain
: Chains, agents, and retrieval strategies that make up an application's cognitive architecture.
🧱 What can you build with LangChain?
❓ Retrieval augmented generation
- Documentation
- End-to-end Example: Chat LangChain and repo
💬 Analyzing structured data
- Documentation
- End-to-end Example: SQL Llama2 Template
🤖 Chatbots
- Documentation
- End-to-end Example: Web LangChain (web researcher chatbot) and repo
And much more! Head to the Use cases section of the docs for more.
🚀 How does LangChain help?
The main value props of the LangChain libraries are:
- Components: composable tools and integrations for working with language models. Components are modular and easy-to-use, whether you are using the rest of the LangChain framework or not
- Off-the-shelf chains: built-in assemblages of components for accomplishing higher-level tasks
Off-the-shelf chains make it easy to get started. Components make it easy to customize existing chains and build new ones.
Components fall into the following modules:
📃 Model I/O:
This includes prompt management, prompt optimization, a generic interface for all LLMs, and common utilities for working with LLMs.
📚 Retrieval:
Data Augmented Generation involves specific types of chains that first interact with an external data source to fetch data for use in the generation step. Examples include summarization of long pieces of text and question/answering over specific data sources.
🤖 Agents:
Agents involve an LLM making decisions about which Actions to take, taking that Action, seeing an Observation, and repeating that until done. LangChain provides a standard interface for agents, a selection of agents to choose from, and examples of end-to-end agents.
📖 Documentation
Please see here for full documentation, which includes:
- Getting started: installation, setting up the environment, simple examples
- Overview of the interfaces, modules, and integrations
- Use case walkthroughs and best practice guides
- LangSmith, LangServe, and LangChain Template overviews
- Reference: full API docs
💁 Contributing
As an open-source project in a rapidly developing field, we are extremely open to contributions, whether it be in the form of a new feature, improved infrastructure, or better documentation.
For detailed information on how to contribute, see here.