docs(rag.ipynb) : Add the `full code` snippet, it’s necessary and useful
for beginners to demonstrate.
Preview the change :
https://langchain-git-fork-googtech-patch-3-langchain.vercel.app/docs/tutorials/rag/
Two `full code` snippets are added as below :
<details>
<summary>Full Code:</summary>
```python
import bs4
from langchain_community.document_loaders import WebBaseLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain.chat_models import init_chat_model
from langchain_openai import OpenAIEmbeddings
from langchain_core.vectorstores import InMemoryVectorStore
from google.colab import userdata
from langchain_core.prompts import PromptTemplate
from langchain_core.documents import Document
from typing_extensions import List, TypedDict
from langgraph.graph import START, StateGraph
#################################################
# 1.Initialize the ChatModel and EmbeddingModel #
#################################################
llm = init_chat_model(
model="gpt-4o-mini",
model_provider="openai",
openai_api_key=userdata.get('OPENAI_API_KEY'),
base_url=userdata.get('BASE_URL'),
)
embeddings = OpenAIEmbeddings(
model="text-embedding-3-large",
openai_api_key=userdata.get('OPENAI_API_KEY'),
base_url=userdata.get('BASE_URL'),
)
#######################
# 2.Loading documents #
#######################
loader = WebBaseLoader(
web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
bs_kwargs=dict(
# Only keep post title, headers, and content from the full HTML.
parse_only=bs4.SoupStrainer(
class_=("post-content", "post-title", "post-header")
)
),
)
docs = loader.load()
#########################
# 3.Splitting documents #
#########################
text_splitter = RecursiveCharacterTextSplitter(
chunk_size=1000, # chunk size (characters)
chunk_overlap=200, # chunk overlap (characters)
add_start_index=True, # track index in original document
)
all_splits = text_splitter.split_documents(docs)
###########################################################
# 4.Embedding documents and storing them in a vectorstore #
###########################################################
vector_store = InMemoryVectorStore(embeddings)
_ = vector_store.add_documents(documents=all_splits)
##########################################################
# 5.Customizing the prompt or loading it from Prompt Hub #
##########################################################
# prompt = hub.pull("rlm/rag-prompt") # load the prompt from the prompt-hub
template = """Use the following pieces of context to answer the question at the end.
If you don't know the answer, just say that you don't know, don't try to make up an answer.
Use three sentences maximum and keep the answer as concise as possible.
Always say "thanks for asking!" at the end of the answer.
{context}
Question: {question}
Helpful Answer:"""
prompt = PromptTemplate.from_template(template)
##################################################################################################
# 5.Using LangGraph to tie together the retrieval and generation steps into a single application # #
##################################################################################################
# 5.1.Define the state of application, which controls the application datas
class State(TypedDict):
question: str
context: List[Document]
answer: str
# 5.2.1.Define the node of application, which signifies the application steps
def retrieve(state: State):
retrieved_docs = vector_store.similarity_search(state["question"])
return {"context": retrieved_docs}
# 5.2.2.Define the node of application, which signifies the application steps
def generate(state: State):
docs_content = "\n\n".join(doc.page_content for doc in state["context"])
messages = prompt.invoke({"question": state["question"], "context": docs_content})
response = llm.invoke(messages)
return {"answer": response.content}
# 6.Define the "control flow" of application, which signifies the ordering of the application steps
graph_builder = StateGraph(State).add_sequence([retrieve, generate])
graph_builder.add_edge(START, "retrieve")
graph = graph_builder.compile()
```
</details>
<details>
<summary>Full Code:</summary>
```python
import bs4
from langchain_community.document_loaders import WebBaseLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain.chat_models import init_chat_model
from langchain_openai import OpenAIEmbeddings
from langchain_core.vectorstores import InMemoryVectorStore
from google.colab import userdata
from langchain_core.prompts import PromptTemplate
from langchain_core.documents import Document
from typing_extensions import List, TypedDict
from langgraph.graph import START, StateGraph
from typing import Literal
from typing_extensions import Annotated
#################################################
# 1.Initialize the ChatModel and EmbeddingModel #
#################################################
llm = init_chat_model(
model="gpt-4o-mini",
model_provider="openai",
openai_api_key=userdata.get('OPENAI_API_KEY'),
base_url=userdata.get('BASE_URL'),
)
embeddings = OpenAIEmbeddings(
model="text-embedding-3-large",
openai_api_key=userdata.get('OPENAI_API_KEY'),
base_url=userdata.get('BASE_URL'),
)
#######################
# 2.Loading documents #
#######################
loader = WebBaseLoader(
web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
bs_kwargs=dict(
# Only keep post title, headers, and content from the full HTML.
parse_only=bs4.SoupStrainer(
class_=("post-content", "post-title", "post-header")
)
),
)
docs = loader.load()
#########################
# 3.Splitting documents #
#########################
text_splitter = RecursiveCharacterTextSplitter(
chunk_size=1000, # chunk size (characters)
chunk_overlap=200, # chunk overlap (characters)
add_start_index=True, # track index in original document
)
all_splits = text_splitter.split_documents(docs)
# Search analysis: Add some metadata to the documents in our vector store,
# so that we can filter on section later.
total_documents = len(all_splits)
third = total_documents // 3
for i, document in enumerate(all_splits):
if i < third:
document.metadata["section"] = "beginning"
elif i < 2 * third:
document.metadata["section"] = "middle"
else:
document.metadata["section"] = "end"
# Search analysis: Define the schema for our search query
class Search(TypedDict):
query: Annotated[str, ..., "Search query to run."]
section: Annotated[
Literal["beginning", "middle", "end"], ..., "Section to query."]
###########################################################
# 4.Embedding documents and storing them in a vectorstore #
###########################################################
vector_store = InMemoryVectorStore(embeddings)
_ = vector_store.add_documents(documents=all_splits)
##########################################################
# 5.Customizing the prompt or loading it from Prompt Hub #
##########################################################
# prompt = hub.pull("rlm/rag-prompt") # load the prompt from the prompt-hub
template = """Use the following pieces of context to answer the question at the end.
If you don't know the answer, just say that you don't know, don't try to make up an answer.
Use three sentences maximum and keep the answer as concise as possible.
Always say "thanks for asking!" at the end of the answer.
{context}
Question: {question}
Helpful Answer:"""
prompt = PromptTemplate.from_template(template)
###################################################################
# 5.Using LangGraph to tie together the analyze_query, retrieval #
# and generation steps into a single application #
###################################################################
# 5.1.Define the state of application, which controls the application datas
class State(TypedDict):
question: str
query: Search
context: List[Document]
answer: str
# Search analysis: Define the node of application,
# which be used to generate a query from the user's raw input
def analyze_query(state: State):
structured_llm = llm.with_structured_output(Search)
query = structured_llm.invoke(state["question"])
return {"query": query}
# 5.2.1.Define the node of application, which signifies the application steps
def retrieve(state: State):
query = state["query"]
retrieved_docs = vector_store.similarity_search(
query["query"],
filter=lambda doc: doc.metadata.get("section") == query["section"],
)
return {"context": retrieved_docs}
# 5.2.2.Define the node of application, which signifies the application steps
def generate(state: State):
docs_content = "\n\n".join(doc.page_content for doc in state["context"])
messages = prompt.invoke({"question": state["question"], "context": docs_content})
response = llm.invoke(messages)
return {"answer": response.content}
# 6.Define the "control flow" of application, which signifies the ordering of the application steps
graph_builder = StateGraph(State).add_sequence([analyze_query, retrieve, generate])
graph_builder.add_edge(START, "analyze_query")
graph = graph_builder.compile()
```
</details>
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
The .dict() method is deprecated inf Pydantic V2.0 and use `model_dump`
method instead.
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "infra: ..."
for CI changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
**Issue**:
This tutorial depends on langgraph, however Langgraph is not mentioned
on the installation section for the tutorial, which raises an error when
copying and pasting the code snippets as following:

**Solution**:
Just adding langgraph package to installation section, for both pip and
Conda tabs as this tutorial requires it.
This is a simple change for a variable name in the Embeddings section of
this document:
https://python.langchain.com/docs/tutorials/retrievers/#embeddings .
The variable name `embeddings_model` seems to be wrong and doesn't match
what follows in the template: `embeddings`.
Adds deprecation notices for Neo4j components moving to the
`langchain_neo4j` partner package.
- Adds deprecation warnings to all Neo4j-related classes and functions
that have been migrated to the new `langchain_neo4j` partner package
- Updates documentation to reference the new `langchain_neo4j` package
instead of `langchain_community`
**Description:** some of the required packages are missing in the
installation cell in tutorial notebooks. So I added required packages to
installation cell or created latter one if it was not presented in the
notebook at all.
Tested in colab: "Kernel" -> "Run all cells". All the notebooks under
`docs/tutorials` run as expected without `ModuleNotFoundError` error.
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
Updated the documentation to fix some grammar errors
- **Description:** Some language errors exist in the documentation
- **Issue:** the issue # Changed the structure of some sentences
Edited various notebooks in the tutorial section to fix:
* Grammatical Errors
* Improve Readability by changing the sentence structure or reducing
repeated words which bears the same meaning
* Edited a code block to follow the PEP 8 Standard
* Added more information in some sentences to make the concept more
clear and reduce ambiguity
---------
Co-authored-by: Eugene Yurtsev <eugene@langchain.dev>
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
docs:docs:tutorials:sql_qa.ipynb: fix typo
- [x] **PR message**: ***Delete this entire checklist*** and replace
with
Fix typo in docs:docs:tutorials:sql_qa.ipynb
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
Async invocation:
remove : from at the end of line
line 441 because there is not any structure block after it.
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "templates:
..." for template changes, "infra: ..." for CI changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
**Docs Chatbot Tutorial**
The docs state that you can omit the language parameter, but the code
sample to demonstrate, still contains it.
Co-authored-by: Erick Friis <erick@langchain.dev>
Thank you for contributing to LangChain!
- [x] **PR title**: Fix typo in conda environment code block in
rag.ipynb
- In docs/tutorials/rag.ipynb
Co-authored-by: Erick Friis <erick@langchain.dev>
While going through the chatbot tutorial, I noticed a couple of typos
and grammatical issues. Also, the pip install command for
langchain_community was commented out, but the document mentions
installing it.
---------
Co-authored-by: Erick Friis <erickfriis@gmail.com>
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
docs:tutorials:llm_chain:fix typo
- [ ] **PR message**:
fix typo in llm chain tutorial
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
Co-authored-by: Erick Friis <erick@langchain.dev>