mirror of https://github.com/hwchase17/langchain.git synced 2025-09-14 05:56:40 +00:00

Go to file

HackHuang 80ca310c15 langchain : Add the full code snippet in rag.ipynb (#29820 )

docs(rag.ipynb) : Add the `full code` snippet, it’s necessary and useful
for beginners to demonstrate.

Preview the change :
https://langchain-git-fork-googtech-patch-3-langchain.vercel.app/docs/tutorials/rag/

Two `full code` snippets are added as below :
<details>
<summary>Full Code:</summary>

```python
import bs4
from langchain_community.document_loaders import WebBaseLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain.chat_models import init_chat_model
from langchain_openai import OpenAIEmbeddings
from langchain_core.vectorstores import InMemoryVectorStore
from google.colab import userdata
from langchain_core.prompts import PromptTemplate
from langchain_core.documents import Document
from typing_extensions import List, TypedDict
from langgraph.graph import START, StateGraph

#################################################
# 1.Initialize the ChatModel and EmbeddingModel #
#################################################
llm = init_chat_model(
    model="gpt-4o-mini",
    model_provider="openai",
    openai_api_key=userdata.get('OPENAI_API_KEY'),
    base_url=userdata.get('BASE_URL'),
)
embeddings = OpenAIEmbeddings(
    model="text-embedding-3-large",
    openai_api_key=userdata.get('OPENAI_API_KEY'),
    base_url=userdata.get('BASE_URL'),
)

#######################
# 2.Loading documents #
#######################
loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs=dict(
        # Only keep post title, headers, and content from the full HTML.
        parse_only=bs4.SoupStrainer(
            class_=("post-content", "post-title", "post-header")
        )
    ),
)
docs = loader.load()

#########################
# 3.Splitting documents #
#########################
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,  # chunk size (characters)
    chunk_overlap=200,  # chunk overlap (characters)
    add_start_index=True,  # track index in original document
)
all_splits = text_splitter.split_documents(docs)

###########################################################
# 4.Embedding documents and storing them in a vectorstore #
###########################################################
vector_store = InMemoryVectorStore(embeddings)
_ = vector_store.add_documents(documents=all_splits)

##########################################################
# 5.Customizing the prompt or loading it from Prompt Hub #
##########################################################
# prompt = hub.pull("rlm/rag-prompt") # load the prompt from the prompt-hub
template = """Use the following pieces of context to answer the question at the end.
If you don't know the answer, just say that you don't know, don't try to make up an answer.
Use three sentences maximum and keep the answer as concise as possible.
Always say "thanks for asking!" at the end of the answer.

{context}

Question: {question}

Helpful Answer:"""
prompt = PromptTemplate.from_template(template)

##################################################################################################
# 5.Using LangGraph to tie together the retrieval and generation steps into a single application #                               #
##################################################################################################
# 5.1.Define the state of application, which controls the application datas
class State(TypedDict):
    question: str
    context: List[Document]
    answer: str

# 5.2.1.Define the node of application, which signifies the application steps
def retrieve(state: State):
    retrieved_docs = vector_store.similarity_search(state["question"])
    return {"context": retrieved_docs}

# 5.2.2.Define the node of application, which signifies the application steps
def generate(state: State):
    docs_content = "\n\n".join(doc.page_content for doc in state["context"])
    messages = prompt.invoke({"question": state["question"], "context": docs_content})
    response = llm.invoke(messages)
    return {"answer": response.content}

# 6.Define the "control flow" of application, which signifies the ordering of the application steps
graph_builder = StateGraph(State).add_sequence([retrieve, generate])
graph_builder.add_edge(START, "retrieve")
graph = graph_builder.compile()
```

</details>

<details>
<summary>Full Code:</summary>

```python
import bs4
from langchain_community.document_loaders import WebBaseLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain.chat_models import init_chat_model
from langchain_openai import OpenAIEmbeddings
from langchain_core.vectorstores import InMemoryVectorStore
from google.colab import userdata
from langchain_core.prompts import PromptTemplate
from langchain_core.documents import Document
from typing_extensions import List, TypedDict
from langgraph.graph import START, StateGraph
from typing import Literal
from typing_extensions import Annotated

#################################################
# 1.Initialize the ChatModel and EmbeddingModel #
#################################################
llm = init_chat_model(
    model="gpt-4o-mini",
    model_provider="openai",
    openai_api_key=userdata.get('OPENAI_API_KEY'),
    base_url=userdata.get('BASE_URL'),
)
embeddings = OpenAIEmbeddings(
    model="text-embedding-3-large",
    openai_api_key=userdata.get('OPENAI_API_KEY'),
    base_url=userdata.get('BASE_URL'),
)

#######################
# 2.Loading documents #
#######################
loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs=dict(
        # Only keep post title, headers, and content from the full HTML.
        parse_only=bs4.SoupStrainer(
            class_=("post-content", "post-title", "post-header")
        )
    ),
)
docs = loader.load()

#########################
# 3.Splitting documents #
#########################
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,  # chunk size (characters)
    chunk_overlap=200,  # chunk overlap (characters)
    add_start_index=True,  # track index in original document
)
all_splits = text_splitter.split_documents(docs)

# Search analysis: Add some metadata to the documents in our vector store,
# so that we can filter on section later. 
total_documents = len(all_splits)
third = total_documents // 3
for i, document in enumerate(all_splits):
    if i < third:
        document.metadata["section"] = "beginning"
    elif i < 2 * third:
        document.metadata["section"] = "middle"
    else:
        document.metadata["section"] = "end"

# Search analysis: Define the schema for our search query
class Search(TypedDict):
    query: Annotated[str, ..., "Search query to run."]
    section: Annotated[
        Literal["beginning", "middle", "end"], ..., "Section to query."]

###########################################################
# 4.Embedding documents and storing them in a vectorstore #
###########################################################
vector_store = InMemoryVectorStore(embeddings)
_ = vector_store.add_documents(documents=all_splits)

##########################################################
# 5.Customizing the prompt or loading it from Prompt Hub #
##########################################################
# prompt = hub.pull("rlm/rag-prompt") # load the prompt from the prompt-hub
template = """Use the following pieces of context to answer the question at the end.
If you don't know the answer, just say that you don't know, don't try to make up an answer.
Use three sentences maximum and keep the answer as concise as possible.
Always say "thanks for asking!" at the end of the answer.

{context}

Question: {question}

Helpful Answer:"""
prompt = PromptTemplate.from_template(template)

###################################################################
# 5.Using LangGraph to tie together the analyze_query, retrieval  #
# and generation steps into a single application                  #
###################################################################
# 5.1.Define the state of application, which controls the application datas
class State(TypedDict):
    question: str
    query: Search
    context: List[Document]
    answer: str

# Search analysis: Define the node of application, 
# which be used to generate a query from the user's raw input
def analyze_query(state: State):
    structured_llm = llm.with_structured_output(Search)
    query = structured_llm.invoke(state["question"])
    return {"query": query}

# 5.2.1.Define the node of application, which signifies the application steps
def retrieve(state: State):
    query = state["query"]
    retrieved_docs = vector_store.similarity_search(
        query["query"],
        filter=lambda doc: doc.metadata.get("section") == query["section"],
    )
    return {"context": retrieved_docs}

# 5.2.2.Define the node of application, which signifies the application steps
def generate(state: State):
    docs_content = "\n\n".join(doc.page_content for doc in state["context"])
    messages = prompt.invoke({"question": state["question"], "context": docs_content})
    response = llm.invoke(messages)
    return {"answer": response.content}

# 6.Define the "control flow" of application, which signifies the ordering of the application steps
graph_builder = StateGraph(State).add_sequence([analyze_query, retrieve, generate]) 
graph_builder.add_edge(START, "analyze_query")
graph = graph_builder.compile()
```

</details>

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>

2025-02-15 21:37:58 -05:00

.devcontainer

community[minor]: Add ApertureDB as a vectorstore (#24088 )

2024-07-16 09:32:59 -07:00

.github

pinecone[patch]: add support for python 3.13 (#29737 )

2025-02-11 11:20:21 -08:00

cookbook

Deprecating sql_database access for creating UC functions for agent tools (#29745 )

2025-02-13 02:24:44 +00:00

docs

langchain : Add the full code snippet in rag.ipynb (#29820 )

2025-02-15 21:37:58 -05:00

libs

langchain_community: add image support to DuckDuckGoSearchAPIWrapper (#29816 )

2025-02-15 21:32:14 -05:00

scripts

…

.gitattributes

…

.gitignore

infra: gitignore api_ref mds (#25705 )

2024-08-23 09:50:30 -07:00

.pre-commit-config.yaml

all: Add pre-commit hook (#26993 )

2024-12-18 22:22:58 +00:00

.readthedocs.yaml

…

CITATION.cff

…

LICENSE

…

Makefile

packages: update counts, add command (#29789 )

2025-02-13 20:45:25 +00:00

MIGRATE.md

Proofreading and Editing Report for Migration Guide (#28084 )

2024-11-13 11:03:09 -05:00

poetry.toml

…

pyproject.toml

multiple: fix uv path deps (#29790 )

2025-02-13 21:32:34 +00:00

README.md

docs: update README/intro (#29492 )

2025-01-29 22:50:00 +00:00

SECURITY.md

docs: single security doc (#28515 )

2024-12-04 18:15:34 +00:00

uv.lock

multiple: fix uv path deps (#29790 )

2025-02-13 21:32:34 +00:00

yarn.lock

box: add langchain box package and DocumentLoader (#25506 )

2024-08-21 02:23:43 +00:00

README.md

🦜️🔗 LangChain

⚡ Build context-aware reasoning applications ⚡

Looking for the JS/TS library? Check out LangChain.js.

To help you ship LangChain apps to production faster, check out LangSmith. LangSmith is a unified developer platform for building, testing, and monitoring LLM applications. Fill out this form to speak with our sales team.

Quick Install

With pip:

pip install langchain

With conda:

conda install langchain -c conda-forge

🤔 What is LangChain?

LangChain is a framework for developing applications powered by large language models (LLMs).

For these applications, LangChain simplifies the entire application lifecycle:

Open-source libraries: Build your applications using LangChain's open-source components and third-party integrations. Use LangGraph to build stateful agents with first-class streaming and human-in-the-loop support.
Productionization: Inspect, monitor, and evaluate your apps with LangSmith so that you can constantly optimize and deploy with confidence.
Deployment: Turn your LangGraph applications into production-ready APIs and Assistants with LangGraph Platform.

Open-source libraries

langchain-core: Base abstractions.
Integration packages (e.g. langchain-openai, langchain-anthropic, etc.): Important integrations have been split into lightweight packages that are co-maintained by the LangChain team and the integration developers.
langchain: Chains, agents, and retrieval strategies that make up an application's cognitive architecture.
langchain-community: Third-party integrations that are community maintained.
LangGraph: LangGraph powers production-grade agents, trusted by Linkedin, Uber, Klarna, GitLab, and many more. Build robust and stateful multi-actor applications with LLMs by modeling steps as edges and nodes in a graph. Integrates smoothly with LangChain, but can be used without it. To learn more about LangGraph, check out our first LangChain Academy course, Introduction to LangGraph, available here.

Productionization:

LangSmith: A developer platform that lets you debug, test, evaluate, and monitor chains built on any LLM framework and seamlessly integrates with LangChain.

Deployment:

LangGraph Platform: Turn your LangGraph applications into production-ready APIs and Assistants.

🧱 What can you build with LangChain?

❓ Question answering with RAG

Documentation
End-to-end Example: Chat LangChain and repo

🧱 Extracting structured output

Documentation
End-to-end Example: LangChain Extract

🤖 Chatbots

Documentation
End-to-end Example: Web LangChain (web researcher chatbot) and repo

And much more! Head to the Tutorials section of the docs for more.

🚀 How does LangChain help?

The main value props of the LangChain libraries are:

Components: composable building blocks, tools and integrations for working with language models. Components are modular and easy-to-use, whether you are using the rest of the LangChain framework or not.
Easy orchestration with LangGraph: LangGraph, built on top of langchain-core, has built-in support for messages, tools, and other LangChain abstractions. This makes it easy to combine components into production-ready applications with persistence, streaming, and other key features. Check out the LangChain tutorials page for examples.

Components

Components fall into the following modules:

📃 Model I/O

This includes prompt management and a generic interface for chat models, including a consistent interface for tool-calling and structured output across model providers.

📚 Retrieval

Retrieval Augmented Generation involves loading data from a variety of sources, preparing it, then searching over (a.k.a. retrieving from) it for use in the generation step.

🤖 Agents

Agents allow an LLM autonomy over how a task is accomplished. Agents make decisions about which Actions to take, then take that Action, observe the result, and repeat until the task is complete. LangGraph makes it easy to use LangChain components to build both custom and built-in LLM agents.

📖 Documentation

Please see here for full documentation, which includes:

Introduction: Overview of the framework and the structure of the docs.
Tutorials: If you're looking to build something specific or are more of a hands-on learner, check out our tutorials. This is the best place to get started.
How-to guides: Answers to “How do I….?” type questions. These guides are goal-oriented and concrete; they're meant to help you complete a specific task.
Conceptual guide: Conceptual explanations of the key parts of the framework.
API Reference: Thorough documentation of every class and method.

🌐 Ecosystem

🦜🛠️ LangSmith: Trace and evaluate your language model applications and intelligent agents to help you move from prototype to production.
🦜🕸️ LangGraph: Create stateful, multi-actor applications with LLMs. Integrates smoothly with LangChain, but can be used without it.
🦜🕸️ LangGraph Platform: Deploy LLM applications built with LangGraph into production.

💁 Contributing

As an open-source project in a rapidly developing field, we are extremely open to contributions, whether it be in the form of a new feature, improved infrastructure, or better documentation.

For detailed information on how to contribute, see here.

🌟 Contributors

Description

⚡ Building applications with LLMs through composability ⚡

Readme MIT Cite this repository 4.8 GiB

Languages

Jupyter Notebook 73.9%

Python 21%

omnetpp-msg 4.8%

Makefile 0.1%

MDX 0.1%

README.md Unescape Escape

🦜️🔗 LangChain

Quick Install

🤔 What is LangChain?

Open-source libraries

Productionization:

Deployment:

🧱 What can you build with LangChain?

🚀 How does LangChain help?

Components

📖 Documentation

🌐 Ecosystem

💁 Contributing

🌟 Contributors

README.md