docs: added template to arxiv page (#21846)

Updated `arXiv` page with the arxiv references from Templates (were references from Docs and API Refs, not Templates). Re #21450 CC @eyurtsev
2025-08-17 08:29:28 +00:00 · 2024-05-20 15:30:35 -07:00 · 2024-05-20 15:30:35 -07:00 · 6a59f76f2b
commit 6a59f76f2b
parent e6207ad4f3
2 changed files with 346 additions and 182 deletions
--- a/docs/docs/additional_resources/arxiv_references.mdx
+++ b/docs/docs/additional_resources/arxiv_references.mdx
@ -1,54 +1,146 @@
 # arXiv
 LangChain implements the latest research in the field of Natural Language Processing.
-This page contains `arXiv` papers referenced in the LangChain Documentation and API Reference.
+This page contains `arXiv` papers referenced in the LangChain Documentation, API Reference,
 and Templates.
 ## Summary
-| arXiv id / Title | Authors | Published date 🔻 | LangChain Documentation and API Reference |
+| arXiv id / Title | Authors | Published date 🔻 | LangChain Documentation|
-|------------------|---------|-------------------|-------------------------|
+|------------------|---------|-------------------|------------------------|
-| `2307.03172v3` [Lost in the Middle: How Language Models Use Long Contexts](http://arxiv.org/abs/2307.03172v3) | Nelson F. Liu, Kevin Lin, John Hewitt,  et al. | 2023-07-06 | `Docs:` [docs/modules/data_connection/retrievers/long_context_reorder](https://python.langchain.com/docs/modules/data_connection/retrievers/long_context_reorder)
+| `2312.06648v2` [Dense X Retrieval: What Retrieval Granularity Should We Use?](http://arxiv.org/abs/2312.06648v2) | Tong Chen, Hongwei Wang, Sihao Chen,  et al. | 2023-12-11 | `Template:` [propositional-retrieval](https://python.langchain.com/docs/templates/propositional-retrieval)
 | `2311.09210v1` [Chain-of-Note: Enhancing Robustness in Retrieval-Augmented Language Models](http://arxiv.org/abs/2311.09210v1) | Wenhao Yu, Hongming Zhang, Xiaoman Pan,  et al. | 2023-11-15 | `Template:` [chain-of-note-wiki](https://python.langchain.com/docs/templates/chain-of-note-wiki)
 | `2310.06117v2` [Take a Step Back: Evoking Reasoning via Abstraction in Large Language Models](http://arxiv.org/abs/2310.06117v2) | Huaixiu Steven Zheng, Swaroop Mishra, Xinyun Chen,  et al. | 2023-10-09 | `Template:` [stepback-qa-prompting](https://python.langchain.com/docs/templates/stepback-qa-prompting)
 | `2305.14283v3` [Query Rewriting for Retrieval-Augmented Large Language Models](http://arxiv.org/abs/2305.14283v3) | Xinbei Ma, Yeyun Gong, Pengcheng He,  et al. | 2023-05-23 | `Template:` [rewrite-retrieve-read](https://python.langchain.com/docs/templates/rewrite-retrieve-read)
 | `2305.08291v1` [Large Language Model Guided Tree-of-Thought](http://arxiv.org/abs/2305.08291v1) | Jieyi Long | 2023-05-15 | `API:` [langchain_experimental.tot](https://api.python.langchain.com/en/latest/experimental_api_reference.html#module-langchain_experimental.tot)
 | `2305.06983v2` [Active Retrieval Augmented Generation](http://arxiv.org/abs/2305.06983v2) | Zhengbao Jiang, Frank F. Xu, Luyu Gao,  et al. | 2023-05-11 | `Docs:` [docs/modules/chains](https://python.langchain.com/docs/modules/chains)
 | `2303.17580v4` [HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face](http://arxiv.org/abs/2303.17580v4) | Yongliang Shen, Kaitao Song, Xu Tan,  et al. | 2023-03-30 | `API:` [langchain_experimental.autonomous_agents](https://api.python.langchain.com/en/latest/experimental_api_reference.html#module-langchain_experimental.autonomous_agents)
 | `2303.08774v6` [GPT-4 Technical Report](http://arxiv.org/abs/2303.08774v6) | OpenAI, Josh Achiam, Steven Adler,  et al. | 2023-03-15 | `Docs:` [docs/integrations/vectorstores/mongodb_atlas](https://python.langchain.com/docs/integrations/vectorstores/mongodb_atlas)
-| `2301.10226v4` [A Watermark for Large Language Models](http://arxiv.org/abs/2301.10226v4) | John Kirchenbauer, Jonas Geiping, Yuxin Wen,  et al. | 2023-01-24 | `API:` [langchain_community.llms...HuggingFaceTextGenInference](https://api.python.langchain.com/en/latest/llms/langchain_community.llms.huggingface_text_gen_inference.HuggingFaceTextGenInference.html#langchain_community.llms.huggingface_text_gen_inference.HuggingFaceTextGenInference), [langchain_community.llms...HuggingFaceEndpoint](https://api.python.langchain.com/en/latest/llms/langchain_community.llms.huggingface_endpoint.HuggingFaceEndpoint.html#langchain_community.llms.huggingface_endpoint.HuggingFaceEndpoint), [langchain_community.llms...OCIModelDeploymentTGI](https://api.python.langchain.com/en/latest/llms/langchain_community.llms.oci_data_science_model_deployment_endpoint.OCIModelDeploymentTGI.html#langchain_community.llms.oci_data_science_model_deployment_endpoint.OCIModelDeploymentTGI)
+| `2301.10226v4` [A Watermark for Large Language Models](http://arxiv.org/abs/2301.10226v4) | John Kirchenbauer, Jonas Geiping, Yuxin Wen,  et al. | 2023-01-24 | `API:` [langchain_community.llms...OCIModelDeploymentTGI](https://api.python.langchain.com/en/latest/llms/langchain_community.llms.oci_data_science_model_deployment_endpoint.OCIModelDeploymentTGI.html#langchain_community.llms.oci_data_science_model_deployment_endpoint.OCIModelDeploymentTGI), [langchain_community.llms...HuggingFaceTextGenInference](https://api.python.langchain.com/en/latest/llms/langchain_community.llms.huggingface_text_gen_inference.HuggingFaceTextGenInference.html#langchain_community.llms.huggingface_text_gen_inference.HuggingFaceTextGenInference), [langchain_community.llms...HuggingFaceEndpoint](https://api.python.langchain.com/en/latest/llms/langchain_community.llms.huggingface_endpoint.HuggingFaceEndpoint.html#langchain_community.llms.huggingface_endpoint.HuggingFaceEndpoint)
-| `2212.10496v1` [Precise Zero-Shot Dense Retrieval without Relevance Labels](http://arxiv.org/abs/2212.10496v1) | Luyu Gao, Xueguang Ma, Jimmy Lin,  et al. | 2022-12-20 | `Docs:` [docs/use_cases/query_analysis/techniques/hyde](https://python.langchain.com/docs/use_cases/query_analysis/techniques/hyde), `API:` [langchain.chains...HypotheticalDocumentEmbedder](https://api.python.langchain.com/en/latest/chains/langchain.chains.hyde.base.HypotheticalDocumentEmbedder.html#langchain.chains.hyde.base.HypotheticalDocumentEmbedder)
+| `2212.10496v1` [Precise Zero-Shot Dense Retrieval without Relevance Labels](http://arxiv.org/abs/2212.10496v1) | Luyu Gao, Xueguang Ma, Jimmy Lin,  et al. | 2022-12-20 | `API:` [langchain.chains...HypotheticalDocumentEmbedder](https://api.python.langchain.com/en/latest/chains/langchain.chains.hyde.base.HypotheticalDocumentEmbedder.html#langchain.chains.hyde.base.HypotheticalDocumentEmbedder), `Template:` [hyde](https://python.langchain.com/docs/templates/hyde)
 | `2212.08073v1` [Constitutional AI: Harmlessness from AI Feedback](http://arxiv.org/abs/2212.08073v1) | Yuntao Bai, Saurav Kadavath, Sandipan Kundu,  et al. | 2022-12-15 | `Docs:` [docs/guides/productionization/evaluation/string/criteria_eval_chain](https://python.langchain.com/docs/guides/productionization/evaluation/string/criteria_eval_chain)
 | `2212.07425v3` [Robust and Explainable Identification of Logical Fallacies in Natural Language Arguments](http://arxiv.org/abs/2212.07425v3) | Zhivar Sourati, Vishnu Priya Prasanna Venkatesh, Darshan Deshpande,  et al. | 2022-12-12 | `API:` [langchain_experimental.fallacy_removal](https://api.python.langchain.com/en/latest/experimental_api_reference.html#module-langchain_experimental.fallacy_removal)
 | `2211.13892v2` [Complementary Explanations for Effective In-Context Learning](http://arxiv.org/abs/2211.13892v2) | Xi Ye, Srinivasan Iyer, Asli Celikyilmaz,  et al. | 2022-11-25 | `API:` [langchain_core.example_selectors...MaxMarginalRelevanceExampleSelector](https://api.python.langchain.com/en/latest/example_selectors/langchain_core.example_selectors.semantic_similarity.MaxMarginalRelevanceExampleSelector.html#langchain_core.example_selectors.semantic_similarity.MaxMarginalRelevanceExampleSelector)
 | `2211.10435v2` [PAL: Program-aided Language Models](http://arxiv.org/abs/2211.10435v2) | Luyu Gao, Aman Madaan, Shuyan Zhou,  et al. | 2022-11-18 | `API:` [langchain_experimental.pal_chain...PALChain](https://api.python.langchain.com/en/latest/pal_chain/langchain_experimental.pal_chain.base.PALChain.html#langchain_experimental.pal_chain.base.PALChain), [langchain_experimental.pal_chain](https://api.python.langchain.com/en/latest/experimental_api_reference.html#module-langchain_experimental.pal_chain)
 | `2209.10785v2` [Deep Lake: a Lakehouse for Deep Learning](http://arxiv.org/abs/2209.10785v2) | Sasun Hambardzumyan, Abhinav Tuli, Levon Ghukasyan,  et al. | 2022-09-22 | `Docs:` [docs/integrations/providers/activeloop_deeplake](https://python.langchain.com/docs/integrations/providers/activeloop_deeplake)
 | `2205.12654v1` [Bitext Mining Using Distilled Sentence Representations for Low-Resource Languages](http://arxiv.org/abs/2205.12654v1) | Kevin Heffernan, Onur Çelebi, Holger Schwenk | 2022-05-25 | `API:` [langchain_community.embeddings...LaserEmbeddings](https://api.python.langchain.com/en/latest/embeddings/langchain_community.embeddings.laser.LaserEmbeddings.html#langchain_community.embeddings.laser.LaserEmbeddings)
-| `2204.00498v1` [Evaluating the Text-to-SQL Capabilities of Large Language Models](http://arxiv.org/abs/2204.00498v1) | Nitarshan Rajkumar, Raymond Li, Dzmitry Bahdanau | 2022-03-15 | `Docs:` [docs/use_cases/sql/quickstart](https://python.langchain.com/docs/use_cases/sql/quickstart), `API:` [langchain_community.utilities...SQLDatabase](https://api.python.langchain.com/en/latest/utilities/langchain_community.utilities.sql_database.SQLDatabase.html#langchain_community.utilities.sql_database.SQLDatabase), [langchain_community.utilities...SparkSQL](https://api.python.langchain.com/en/latest/utilities/langchain_community.utilities.spark_sql.SparkSQL.html#langchain_community.utilities.spark_sql.SparkSQL)
+| `2204.00498v1` [Evaluating the Text-to-SQL Capabilities of Large Language Models](http://arxiv.org/abs/2204.00498v1) | Nitarshan Rajkumar, Raymond Li, Dzmitry Bahdanau | 2022-03-15 | `API:` [langchain_community.utilities...SQLDatabase](https://api.python.langchain.com/en/latest/utilities/langchain_community.utilities.sql_database.SQLDatabase.html#langchain_community.utilities.sql_database.SQLDatabase), [langchain_community.utilities...SparkSQL](https://api.python.langchain.com/en/latest/utilities/langchain_community.utilities.spark_sql.SparkSQL.html#langchain_community.utilities.spark_sql.SparkSQL)
 | `2202.00666v5` [Locally Typical Sampling](http://arxiv.org/abs/2202.00666v5) | Clara Meister, Tiago Pimentel, Gian Wiher,  et al. | 2022-02-01 | `API:` [langchain_community.llms...HuggingFaceTextGenInference](https://api.python.langchain.com/en/latest/llms/langchain_community.llms.huggingface_text_gen_inference.HuggingFaceTextGenInference.html#langchain_community.llms.huggingface_text_gen_inference.HuggingFaceTextGenInference), [langchain_community.llms...HuggingFaceEndpoint](https://api.python.langchain.com/en/latest/llms/langchain_community.llms.huggingface_endpoint.HuggingFaceEndpoint.html#langchain_community.llms.huggingface_endpoint.HuggingFaceEndpoint)
 | `2103.00020v1` [Learning Transferable Visual Models From Natural Language Supervision](http://arxiv.org/abs/2103.00020v1) | Alec Radford, Jong Wook Kim, Chris Hallacy,  et al. | 2021-02-26 | `API:` [langchain_experimental.open_clip](https://api.python.langchain.com/en/latest/experimental_api_reference.html#module-langchain_experimental.open_clip)
 | `1909.05858v2` [CTRL: A Conditional Transformer Language Model for Controllable Generation](http://arxiv.org/abs/1909.05858v2) | Nitish Shirish Keskar, Bryan McCann, Lav R. Varshney,  et al. | 2019-09-11 | `API:` [langchain_community.llms...HuggingFaceTextGenInference](https://api.python.langchain.com/en/latest/llms/langchain_community.llms.huggingface_text_gen_inference.HuggingFaceTextGenInference.html#langchain_community.llms.huggingface_text_gen_inference.HuggingFaceTextGenInference), [langchain_community.llms...HuggingFaceEndpoint](https://api.python.langchain.com/en/latest/llms/langchain_community.llms.huggingface_endpoint.HuggingFaceEndpoint.html#langchain_community.llms.huggingface_endpoint.HuggingFaceEndpoint)
 | `1908.10084v1` [Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks](http://arxiv.org/abs/1908.10084v1) | Nils Reimers, Iryna Gurevych | 2019-08-27 | `Docs:` [docs/integrations/text_embedding/sentence_transformers](https://python.langchain.com/docs/integrations/text_embedding/sentence_transformers)
-## Lost in the Middle: How Language Models Use Long Contexts
+## Dense X Retrieval: What Retrieval Granularity Should We Use?
- **arXiv id:** 2307.03172v3
+- **arXiv id:** 2312.06648v2
- **Title:** Lost in the Middle: How Language Models Use Long Contexts
+- **Title:** Dense X Retrieval: What Retrieval Granularity Should We Use?
- **Authors:** Nelson F. Liu, Kevin Lin, John Hewitt,  et al.
+- **Authors:** Tong Chen, Hongwei Wang, Sihao Chen,  et al.
- **Published Date:** 2023-07-06
+- **Published Date:** 2023-12-11
- **URL:** http://arxiv.org/abs/2307.03172v3
+- **URL:** http://arxiv.org/abs/2312.06648v2
- **LangChain Documentation:** [docs/modules/data_connection/retrievers/long_context_reorder](https://python.langchain.com/docs/modules/data_connection/retrievers/long_context_reorder)
+- **LangChain:**
   - **Template:** [propositional-retrieval](https://python.langchain.com/docs/templates/propositional-retrieval)
-**Abstract:** While recent language models have the ability to take long contexts as input,
+**Abstract:** Dense retrieval has become a prominent method to obtain relevant context or
-relatively little is known about how well they use longer context. We analyze
+world knowledge in open-domain NLP tasks. When we use a learned dense retriever
-the performance of language models on two tasks that require identifying
+on a retrieval corpus at inference time, an often-overlooked design choice is
-relevant information in their input contexts: multi-document question answering
+the retrieval unit in which the corpus is indexed, e.g. document, passage, or
-and key-value retrieval. We find that performance can degrade significantly
+sentence. We discover that the retrieval unit choice significantly impacts the
-when changing the position of relevant information, indicating that current
+performance of both retrieval and downstream tasks. Distinct from the typical
-language models do not robustly make use of information in long input contexts.
+approach of using passages or sentences, we introduce a novel retrieval unit,
-In particular, we observe that performance is often highest when relevant
+proposition, for dense retrieval. Propositions are defined as atomic
-information occurs at the beginning or end of the input context, and
+expressions within text, each encapsulating a distinct factoid and presented in
-significantly degrades when models must access relevant information in the
+a concise, self-contained natural language format. We conduct an empirical
-middle of long contexts, even for explicitly long-context models. Our analysis
+comparison of different retrieval granularity. Our results reveal that
-provides a better understanding of how language models use their input context
+proposition-based retrieval significantly outperforms traditional passage or
-and provides new evaluation protocols for future long-context language models.
+sentence-based methods in dense retrieval. Moreover, retrieval by proposition
 also enhances the performance of downstream QA tasks, since the retrieved texts
 are more condensed with question-relevant information, reducing the need for
 lengthy input tokens and minimizing the inclusion of extraneous, irrelevant
 information.
 ## Chain-of-Note: Enhancing Robustness in Retrieval-Augmented Language Models
 - **arXiv id:** 2311.09210v1
 - **Title:** Chain-of-Note: Enhancing Robustness in Retrieval-Augmented Language Models
 - **Authors:** Wenhao Yu, Hongming Zhang, Xiaoman Pan,  et al.
 - **Published Date:** 2023-11-15
 - **URL:** http://arxiv.org/abs/2311.09210v1
 - **LangChain:**
   - **Template:** [chain-of-note-wiki](https://python.langchain.com/docs/templates/chain-of-note-wiki)
 **Abstract:** Retrieval-augmented language models (RALMs) represent a substantial
 advancement in the capabilities of large language models, notably in reducing
 factual hallucination by leveraging external knowledge sources. However, the
 reliability of the retrieved information is not always guaranteed. The
 retrieval of irrelevant data can lead to misguided responses, and potentially
 causing the model to overlook its inherent knowledge, even when it possesses
 adequate information to address the query. Moreover, standard RALMs often
 struggle to assess whether they possess adequate knowledge, both intrinsic and
 retrieved, to provide an accurate answer. In situations where knowledge is
 lacking, these systems should ideally respond with "unknown" when the answer is
 unattainable. In response to these challenges, we introduces Chain-of-Noting
 (CoN), a novel approach aimed at improving the robustness of RALMs in facing
 noisy, irrelevant documents and in handling unknown scenarios. The core idea of
 CoN is to generate sequential reading notes for retrieved documents, enabling a
 thorough evaluation of their relevance to the given question and integrating
 this information to formulate the final answer. We employed ChatGPT to create
 training data for CoN, which was subsequently trained on an LLaMa-2 7B model.
 Our experiments across four open-domain QA benchmarks show that RALMs equipped
 with CoN significantly outperform standard RALMs. Notably, CoN achieves an
 average improvement of +7.9 in EM score given entirely noisy retrieved
 documents and +10.5 in rejection rates for real-time questions that fall
 outside the pre-training knowledge scope.
 ## Take a Step Back: Evoking Reasoning via Abstraction in Large Language Models
 - **arXiv id:** 2310.06117v2
 - **Title:** Take a Step Back: Evoking Reasoning via Abstraction in Large Language Models
 - **Authors:** Huaixiu Steven Zheng, Swaroop Mishra, Xinyun Chen,  et al.
 - **Published Date:** 2023-10-09
 - **URL:** http://arxiv.org/abs/2310.06117v2
 - **LangChain:**
   - **Template:** [stepback-qa-prompting](https://python.langchain.com/docs/templates/stepback-qa-prompting)
 **Abstract:** We present Step-Back Prompting, a simple prompting technique that enables
 LLMs to do abstractions to derive high-level concepts and first principles from
 instances containing specific details. Using the concepts and principles to
 guide reasoning, LLMs significantly improve their abilities in following a
 correct reasoning path towards the solution. We conduct experiments of
 Step-Back Prompting with PaLM-2L, GPT-4 and Llama2-70B models, and observe
 substantial performance gains on various challenging reasoning-intensive tasks
 including STEM, Knowledge QA, and Multi-Hop Reasoning. For instance, Step-Back
 Prompting improves PaLM-2L performance on MMLU (Physics and Chemistry) by 7%
 and 11% respectively, TimeQA by 27%, and MuSiQue by 7%.
 ## Query Rewriting for Retrieval-Augmented Large Language Models
 - **arXiv id:** 2305.14283v3
 - **Title:** Query Rewriting for Retrieval-Augmented Large Language Models
 - **Authors:** Xinbei Ma, Yeyun Gong, Pengcheng He,  et al.
 - **Published Date:** 2023-05-23
 - **URL:** http://arxiv.org/abs/2305.14283v3
 - **LangChain:**
   - **Template:** [rewrite-retrieve-read](https://python.langchain.com/docs/templates/rewrite-retrieve-read)
 **Abstract:** Large Language Models (LLMs) play powerful, black-box readers in the
 retrieve-then-read pipeline, making remarkable progress in knowledge-intensive
 tasks. This work introduces a new framework, Rewrite-Retrieve-Read instead of
 the previous retrieve-then-read for the retrieval-augmented LLMs from the
 perspective of the query rewriting. Unlike prior studies focusing on adapting
 either the retriever or the reader, our approach pays attention to the
 adaptation of the search query itself, for there is inevitably a gap between
 the input text and the needed knowledge in retrieval. We first prompt an LLM to
 generate the query, then use a web search engine to retrieve contexts.
 Furthermore, to better align the query to the frozen modules, we propose a
 trainable scheme for our pipeline. A small language model is adopted as a
 trainable rewriter to cater to the black-box LLM reader. The rewriter is
 trained using the feedback of the LLM reader by reinforcement learning.
 Evaluation is conducted on downstream tasks, open-domain QA and multiple-choice
 QA. Experiments results show consistent performance improvement, indicating
 that our framework is proven effective and scalable, and brings a new framework
 for retrieval-augmented LLM.
 ## Large Language Model Guided Tree-of-Thought
@ -57,8 +149,9 @@ and provides new evaluation protocols for future long-context language models.
 - **Authors:** Jieyi Long
 - **Published Date:** 2023-05-15
 - **URL:** http://arxiv.org/abs/2305.08291v1
 - **LangChain:**
- **LangChain API Reference:** [langchain_experimental.tot](https://api.python.langchain.com/en/latest/experimental_api_reference.html#module-langchain_experimental.tot)
+   - **API Reference:** [langchain_experimental.tot](https://api.python.langchain.com/en/latest/experimental_api_reference.html#module-langchain_experimental.tot)
 **Abstract:** In this paper, we introduce the Tree-of-Thought (ToT) framework, a novel
 approach aimed at improving the problem-solving capabilities of auto-regressive
@ -78,35 +171,6 @@ significantly increase the success rate of Sudoku puzzle solving. Our
 implementation of the ToT-based Sudoku solver is available on GitHub:
 \url{https://github.com/jieyilong/tree-of-thought-puzzle-solver}.
 ## Active Retrieval Augmented Generation
 - **arXiv id:** 2305.06983v2
 - **Title:** Active Retrieval Augmented Generation
 - **Authors:** Zhengbao Jiang, Frank F. Xu, Luyu Gao,  et al.
 - **Published Date:** 2023-05-11
 - **URL:** http://arxiv.org/abs/2305.06983v2
 - **LangChain Documentation:** [docs/modules/chains](https://python.langchain.com/docs/modules/chains)
 **Abstract:** Despite the remarkable ability of large language models (LMs) to comprehend
 and generate language, they have a tendency to hallucinate and create factually
 inaccurate output. Augmenting LMs by retrieving information from external
 knowledge resources is one promising solution. Most existing retrieval
 augmented LMs employ a retrieve-and-generate setup that only retrieves
 information once based on the input. This is limiting, however, in more general
 scenarios involving generation of long texts, where continually gathering
 information throughout generation is essential. In this work, we provide a
 generalized view of active retrieval augmented generation, methods that
 actively decide when and what to retrieve across the course of the generation.
 We propose Forward-Looking Active REtrieval augmented generation (FLARE), a
 generic method which iteratively uses a prediction of the upcoming sentence to
 anticipate future content, which is then utilized as a query to retrieve
 relevant documents to regenerate the sentence if it contains low-confidence
 tokens. We test FLARE along with baselines comprehensively over 4 long-form
 knowledge-intensive generation tasks/datasets. FLARE achieves superior or
 competitive performance on all tasks, demonstrating the effectiveness of our
 method. Code and datasets are available at https://github.com/jzbjyb/FLARE.
 ## HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face
 - **arXiv id:** 2303.17580v4
@ -114,8 +178,9 @@ method. Code and datasets are available at https://github.com/jzbjyb/FLARE.
 - **Authors:** Yongliang Shen, Kaitao Song, Xu Tan,  et al.
 - **Published Date:** 2023-03-30
 - **URL:** http://arxiv.org/abs/2303.17580v4
 - **LangChain:**
- **LangChain API Reference:** [langchain_experimental.autonomous_agents](https://api.python.langchain.com/en/latest/experimental_api_reference.html#module-langchain_experimental.autonomous_agents)
+   - **API Reference:** [langchain_experimental.autonomous_agents](https://api.python.langchain.com/en/latest/experimental_api_reference.html#module-langchain_experimental.autonomous_agents)
 **Abstract:** Solving complicated AI tasks with different domains and modalities is a key
 step toward artificial general intelligence. While there are numerous AI models
@ -144,8 +209,9 @@ realization of artificial general intelligence.
 - **Authors:** OpenAI, Josh Achiam, Steven Adler,  et al.
 - **Published Date:** 2023-03-15
 - **URL:** http://arxiv.org/abs/2303.08774v6
- **LangChain Documentation:** [docs/integrations/vectorstores/mongodb_atlas](https://python.langchain.com/docs/integrations/vectorstores/mongodb_atlas)
+- **LangChain:**
   - **Documentation:** [docs/integrations/vectorstores/mongodb_atlas](https://python.langchain.com/docs/integrations/vectorstores/mongodb_atlas)
 **Abstract:** We report the development of GPT-4, a large-scale, multimodal model which can
 accept image and text inputs and produce text outputs. While less capable than
@ -167,8 +233,9 @@ more than 1/1,000th the compute of GPT-4.
 - **Authors:** John Kirchenbauer, Jonas Geiping, Yuxin Wen,  et al.
 - **Published Date:** 2023-01-24
 - **URL:** http://arxiv.org/abs/2301.10226v4
 - **LangChain:**
- **LangChain API Reference:** [langchain_community.llms.huggingface_text_gen_inference.HuggingFaceTextGenInference](https://api.python.langchain.com/en/latest/llms/langchain_community.llms.huggingface_text_gen_inference.HuggingFaceTextGenInference.html#langchain_community.llms.huggingface_text_gen_inference.HuggingFaceTextGenInference), [langchain_community.llms.huggingface_endpoint.HuggingFaceEndpoint](https://api.python.langchain.com/en/latest/llms/langchain_community.llms.huggingface_endpoint.HuggingFaceEndpoint.html#langchain_community.llms.huggingface_endpoint.HuggingFaceEndpoint), [langchain_community.llms.oci_data_science_model_deployment_endpoint.OCIModelDeploymentTGI](https://api.python.langchain.com/en/latest/llms/langchain_community.llms.oci_data_science_model_deployment_endpoint.OCIModelDeploymentTGI.html#langchain_community.llms.oci_data_science_model_deployment_endpoint.OCIModelDeploymentTGI)
+   - **API Reference:** [langchain_community.llms...OCIModelDeploymentTGI](https://api.python.langchain.com/en/latest/llms/langchain_community.llms.oci_data_science_model_deployment_endpoint.OCIModelDeploymentTGI.html#langchain_community.llms.oci_data_science_model_deployment_endpoint.OCIModelDeploymentTGI), [langchain_community.llms...HuggingFaceTextGenInference](https://api.python.langchain.com/en/latest/llms/langchain_community.llms.huggingface_text_gen_inference.HuggingFaceTextGenInference.html#langchain_community.llms.huggingface_text_gen_inference.HuggingFaceTextGenInference), [langchain_community.llms...HuggingFaceEndpoint](https://api.python.langchain.com/en/latest/llms/langchain_community.llms.huggingface_endpoint.HuggingFaceEndpoint.html#langchain_community.llms.huggingface_endpoint.HuggingFaceEndpoint)
 **Abstract:** Potential harms of large language models can be mitigated by watermarking
 model output, i.e., embedding signals into generated text that are invisible to
@ -191,8 +258,10 @@ family, and discuss robustness and security.
 - **Authors:** Luyu Gao, Xueguang Ma, Jimmy Lin,  et al.
 - **Published Date:** 2022-12-20
 - **URL:** http://arxiv.org/abs/2212.10496v1
- **LangChain Documentation:** [docs/use_cases/query_analysis/techniques/hyde](https://python.langchain.com/docs/use_cases/query_analysis/techniques/hyde)
+- **LangChain:**
- **LangChain API Reference:** [langchain.chains.hyde.base.HypotheticalDocumentEmbedder](https://api.python.langchain.com/en/latest/chains/langchain.chains.hyde.base.HypotheticalDocumentEmbedder.html#langchain.chains.hyde.base.HypotheticalDocumentEmbedder)
+
   - **API Reference:** [langchain.chains...HypotheticalDocumentEmbedder](https://api.python.langchain.com/en/latest/chains/langchain.chains.hyde.base.HypotheticalDocumentEmbedder.html#langchain.chains.hyde.base.HypotheticalDocumentEmbedder)
   - **Template:** [hyde](https://python.langchain.com/docs/templates/hyde)
 **Abstract:** While dense retrieval has been shown effective and efficient across tasks and
 languages, it remains difficult to create effective fully zero-shot dense
@ -212,35 +281,6 @@ state-of-the-art unsupervised dense retriever Contriever and shows strong
 performance comparable to fine-tuned retrievers, across various tasks (e.g. web
 search, QA, fact verification) and languages~(e.g. sw, ko, ja).
 ## Constitutional AI: Harmlessness from AI Feedback
 - **arXiv id:** 2212.08073v1
 - **Title:** Constitutional AI: Harmlessness from AI Feedback
 - **Authors:** Yuntao Bai, Saurav Kadavath, Sandipan Kundu,  et al.
 - **Published Date:** 2022-12-15
 - **URL:** http://arxiv.org/abs/2212.08073v1
 - **LangChain Documentation:** [docs/guides/productionization/evaluation/string/criteria_eval_chain](https://python.langchain.com/docs/guides/productionization/evaluation/string/criteria_eval_chain)
 **Abstract:** As AI systems become more capable, we would like to enlist their help to
 supervise other AIs. We experiment with methods for training a harmless AI
 assistant through self-improvement, without any human labels identifying
 harmful outputs. The only human oversight is provided through a list of rules
 or principles, and so we refer to the method as 'Constitutional AI'. The
 process involves both a supervised learning and a reinforcement learning phase.
 In the supervised phase we sample from an initial model, then generate
 self-critiques and revisions, and then finetune the original model on revised
 responses. In the RL phase, we sample from the finetuned model, use a model to
 evaluate which of the two samples is better, and then train a preference model
 from this dataset of AI preferences. We then train with RL using the preference
 model as the reward signal, i.e. we use 'RL from AI Feedback' (RLAIF). As a
 result we are able to train a harmless but non-evasive AI assistant that
 engages with harmful queries by explaining its objections to them. Both the SL
 and RL methods can leverage chain-of-thought style reasoning to improve the
 human-judged performance and transparency of AI decision making. These methods
 make it possible to control AI behavior more precisely and with far fewer human
 labels.
 ## Robust and Explainable Identification of Logical Fallacies in Natural Language Arguments
 - **arXiv id:** 2212.07425v3
@ -248,8 +288,9 @@ labels.
 - **Authors:** Zhivar Sourati, Vishnu Priya Prasanna Venkatesh, Darshan Deshpande,  et al.
 - **Published Date:** 2022-12-12
 - **URL:** http://arxiv.org/abs/2212.07425v3
 - **LangChain:**
- **LangChain API Reference:** [langchain_experimental.fallacy_removal](https://api.python.langchain.com/en/latest/experimental_api_reference.html#module-langchain_experimental.fallacy_removal)
+   - **API Reference:** [langchain_experimental.fallacy_removal](https://api.python.langchain.com/en/latest/experimental_api_reference.html#module-langchain_experimental.fallacy_removal)
 **Abstract:** The spread of misinformation, propaganda, and flawed argumentation has been
 amplified in the Internet era. Given the volume of data and the subtlety of
@ -280,8 +321,9 @@ further work on logical fallacy identification.
 - **Authors:** Xi Ye, Srinivasan Iyer, Asli Celikyilmaz,  et al.
 - **Published Date:** 2022-11-25
 - **URL:** http://arxiv.org/abs/2211.13892v2
 - **LangChain:**
- **LangChain API Reference:** [langchain_core.example_selectors.semantic_similarity.MaxMarginalRelevanceExampleSelector](https://api.python.langchain.com/en/latest/example_selectors/langchain_core.example_selectors.semantic_similarity.MaxMarginalRelevanceExampleSelector.html#langchain_core.example_selectors.semantic_similarity.MaxMarginalRelevanceExampleSelector)
+   - **API Reference:** [langchain_core.example_selectors...MaxMarginalRelevanceExampleSelector](https://api.python.langchain.com/en/latest/example_selectors/langchain_core.example_selectors.semantic_similarity.MaxMarginalRelevanceExampleSelector.html#langchain_core.example_selectors.semantic_similarity.MaxMarginalRelevanceExampleSelector)
 **Abstract:** Large language models (LLMs) have exhibited remarkable capabilities in
 learning from explanations in prompts, but there has been limited understanding
@ -307,8 +349,9 @@ performance across three real-world tasks on multiple LLMs.
 - **Authors:** Luyu Gao, Aman Madaan, Shuyan Zhou,  et al.
 - **Published Date:** 2022-11-18
 - **URL:** http://arxiv.org/abs/2211.10435v2
 - **LangChain:**
- **LangChain API Reference:** [langchain_experimental.pal_chain.base.PALChain](https://api.python.langchain.com/en/latest/pal_chain/langchain_experimental.pal_chain.base.PALChain.html#langchain_experimental.pal_chain.base.PALChain), [langchain_experimental.pal_chain](https://api.python.langchain.com/en/latest/experimental_api_reference.html#module-langchain_experimental.pal_chain)
+   - **API Reference:** [langchain_experimental.pal_chain...PALChain](https://api.python.langchain.com/en/latest/pal_chain/langchain_experimental.pal_chain.base.PALChain.html#langchain_experimental.pal_chain.base.PALChain), [langchain_experimental.pal_chain](https://api.python.langchain.com/en/latest/experimental_api_reference.html#module-langchain_experimental.pal_chain)
 **Abstract:** Large language models (LLMs) have recently demonstrated an impressive ability
 to perform arithmetic and symbolic reasoning tasks, when provided with a few
@ -340,8 +383,9 @@ publicly available at http://reasonwithpal.com/ .
 - **Authors:** Sasun Hambardzumyan, Abhinav Tuli, Levon Ghukasyan,  et al.
 - **Published Date:** 2022-09-22
 - **URL:** http://arxiv.org/abs/2209.10785v2
- **LangChain Documentation:** [docs/integrations/providers/activeloop_deeplake](https://python.langchain.com/docs/integrations/providers/activeloop_deeplake)
+- **LangChain:**
   - **Documentation:** [docs/integrations/providers/activeloop_deeplake](https://python.langchain.com/docs/integrations/providers/activeloop_deeplake)
 **Abstract:** Traditional data lakes provide critical data infrastructure for analytical
 workloads by enabling time travel, running SQL queries, ingesting data with
@ -367,8 +411,9 @@ TensorFlow, JAX, and integrate with numerous MLOps tools.
 - **Authors:** Kevin Heffernan, Onur Çelebi, Holger Schwenk
 - **Published Date:** 2022-05-25
 - **URL:** http://arxiv.org/abs/2205.12654v1
 - **LangChain:**
- **LangChain API Reference:** [langchain_community.embeddings.laser.LaserEmbeddings](https://api.python.langchain.com/en/latest/embeddings/langchain_community.embeddings.laser.LaserEmbeddings.html#langchain_community.embeddings.laser.LaserEmbeddings)
+   - **API Reference:** [langchain_community.embeddings...LaserEmbeddings](https://api.python.langchain.com/en/latest/embeddings/langchain_community.embeddings.laser.LaserEmbeddings.html#langchain_community.embeddings.laser.LaserEmbeddings)
 **Abstract:** Scaling multilingual representation learning beyond the hundred most frequent
 languages is challenging, in particular to cover the long tail of low-resource
@ -395,8 +440,9 @@ encoders, mine bitexts, and validate the bitexts by training NMT systems.
 - **Authors:** Nitarshan Rajkumar, Raymond Li, Dzmitry Bahdanau
 - **Published Date:** 2022-03-15
 - **URL:** http://arxiv.org/abs/2204.00498v1
- **LangChain Documentation:** [docs/use_cases/sql/quickstart](https://python.langchain.com/docs/use_cases/sql/quickstart)
+- **LangChain:**
- **LangChain API Reference:** [langchain_community.utilities.sql_database.SQLDatabase](https://api.python.langchain.com/en/latest/utilities/langchain_community.utilities.sql_database.SQLDatabase.html#langchain_community.utilities.sql_database.SQLDatabase), [langchain_community.utilities.spark_sql.SparkSQL](https://api.python.langchain.com/en/latest/utilities/langchain_community.utilities.spark_sql.SparkSQL.html#langchain_community.utilities.spark_sql.SparkSQL)
+
   - **API Reference:** [langchain_community.utilities...SQLDatabase](https://api.python.langchain.com/en/latest/utilities/langchain_community.utilities.sql_database.SQLDatabase.html#langchain_community.utilities.sql_database.SQLDatabase), [langchain_community.utilities...SparkSQL](https://api.python.langchain.com/en/latest/utilities/langchain_community.utilities.spark_sql.SparkSQL.html#langchain_community.utilities.spark_sql.SparkSQL)
 **Abstract:** We perform an empirical evaluation of Text-to-SQL capabilities of the Codex
 language model. We find that, without any finetuning, Codex is a strong
@ -413,8 +459,9 @@ few-shot examples.
 - **Authors:** Clara Meister, Tiago Pimentel, Gian Wiher,  et al.
 - **Published Date:** 2022-02-01
 - **URL:** http://arxiv.org/abs/2202.00666v5
 - **LangChain:**
- **LangChain API Reference:** [langchain_community.llms.huggingface_text_gen_inference.HuggingFaceTextGenInference](https://api.python.langchain.com/en/latest/llms/langchain_community.llms.huggingface_text_gen_inference.HuggingFaceTextGenInference.html#langchain_community.llms.huggingface_text_gen_inference.HuggingFaceTextGenInference), [langchain_community.llms.huggingface_endpoint.HuggingFaceEndpoint](https://api.python.langchain.com/en/latest/llms/langchain_community.llms.huggingface_endpoint.HuggingFaceEndpoint.html#langchain_community.llms.huggingface_endpoint.HuggingFaceEndpoint)
+   - **API Reference:** [langchain_community.llms...HuggingFaceTextGenInference](https://api.python.langchain.com/en/latest/llms/langchain_community.llms.huggingface_text_gen_inference.HuggingFaceTextGenInference.html#langchain_community.llms.huggingface_text_gen_inference.HuggingFaceTextGenInference), [langchain_community.llms...HuggingFaceEndpoint](https://api.python.langchain.com/en/latest/llms/langchain_community.llms.huggingface_endpoint.HuggingFaceEndpoint.html#langchain_community.llms.huggingface_endpoint.HuggingFaceEndpoint)
 **Abstract:** Today's probabilistic language generators fall short when it comes to
 producing coherent and fluent text despite the fact that the underlying models
@ -444,8 +491,9 @@ reducing degenerate repetitions.
 - **Authors:** Alec Radford, Jong Wook Kim, Chris Hallacy,  et al.
 - **Published Date:** 2021-02-26
 - **URL:** http://arxiv.org/abs/2103.00020v1
 - **LangChain:**
- **LangChain API Reference:** [langchain_experimental.open_clip](https://api.python.langchain.com/en/latest/experimental_api_reference.html#module-langchain_experimental.open_clip)
+   - **API Reference:** [langchain_experimental.open_clip](https://api.python.langchain.com/en/latest/experimental_api_reference.html#module-langchain_experimental.open_clip)
 **Abstract:** State-of-the-art computer vision systems are trained to predict a fixed set
 of predetermined object categories. This restricted form of supervision limits
@ -475,8 +523,9 @@ https://github.com/OpenAI/CLIP.
 - **Authors:** Nitish Shirish Keskar, Bryan McCann, Lav R. Varshney,  et al.
 - **Published Date:** 2019-09-11
 - **URL:** http://arxiv.org/abs/1909.05858v2
 - **LangChain:**
- **LangChain API Reference:** [langchain_community.llms.huggingface_text_gen_inference.HuggingFaceTextGenInference](https://api.python.langchain.com/en/latest/llms/langchain_community.llms.huggingface_text_gen_inference.HuggingFaceTextGenInference.html#langchain_community.llms.huggingface_text_gen_inference.HuggingFaceTextGenInference), [langchain_community.llms.huggingface_endpoint.HuggingFaceEndpoint](https://api.python.langchain.com/en/latest/llms/langchain_community.llms.huggingface_endpoint.HuggingFaceEndpoint.html#langchain_community.llms.huggingface_endpoint.HuggingFaceEndpoint)
+   - **API Reference:** [langchain_community.llms...HuggingFaceTextGenInference](https://api.python.langchain.com/en/latest/llms/langchain_community.llms.huggingface_text_gen_inference.HuggingFaceTextGenInference.html#langchain_community.llms.huggingface_text_gen_inference.HuggingFaceTextGenInference), [langchain_community.llms...HuggingFaceEndpoint](https://api.python.langchain.com/en/latest/llms/langchain_community.llms.huggingface_endpoint.HuggingFaceEndpoint.html#langchain_community.llms.huggingface_endpoint.HuggingFaceEndpoint)
 **Abstract:** Large-scale language models show promising text generation capabilities, but
 users cannot easily control particular aspects of the generated text. We
@ -497,8 +546,9 @@ full-sized, pretrained versions of CTRL at https://github.com/salesforce/ctrl.
 - **Authors:** Nils Reimers, Iryna Gurevych
 - **Published Date:** 2019-08-27
 - **URL:** http://arxiv.org/abs/1908.10084v1
- **LangChain Documentation:** [docs/integrations/text_embedding/sentence_transformers](https://python.langchain.com/docs/integrations/text_embedding/sentence_transformers)
+- **LangChain:**
   - **Documentation:** [docs/integrations/text_embedding/sentence_transformers](https://python.langchain.com/docs/integrations/text_embedding/sentence_transformers)
 **Abstract:** BERT (Devlin et al., 2018) and RoBERTa (Liu et al., 2019) has set a new
 state-of-the-art performance on sentence-pair regression tasks like semantic
--- a/docs/scripts/arxiv_references.py
+++ b/docs/scripts/arxiv_references.py
@ -11,15 +11,14 @@ from typing import Any, Dict, List, Set
 from pydantic.v1 import BaseModel, root_validator
 # TODO parse docstrings for arXiv references
 # TODO Generate a page with a table of the references with correspondent modules/classes/functions.
 logger = logging.getLogger(__name__)
 _ROOT_DIR = Path(os.path.abspath(__file__)).parents[2]
 DOCS_DIR = _ROOT_DIR / "docs" / "docs"
 CODE_DIR = _ROOT_DIR / "libs"
 TEMPLATES_DIR = _ROOT_DIR / "templates"
 ARXIV_ID_PATTERN = r"https://arxiv\.org/(abs|pdf)/(\d+\.\d+)"
 LANGCHAIN_PYTHON_URL = "python.langchain.com"
@dataclass
@ -27,8 +26,9 @@ class ArxivPaper:
    """ArXiv paper information."""
    arxiv_id: str
-    referencing_docs: list[str]  # TODO: Add the referencing docs
+    referencing_doc2url: dict[str, str]
-    referencing_api_refs: list[str]  # TODO: Add the referencing docs
+    referencing_api_ref2url: dict[str, str]
    referencing_template2url: dict[str, str]
    title: str
    authors: list[str]
    abstract: str
@ -218,6 +218,35 @@ def search_code_for_arxiv_references(code_dir: Path) -> dict[str, set[str]]:
    return arxiv_id2module_name_and_members_reduced
 def search_templates_for_arxiv_references(templates_dir: Path) -> dict[str, set[str]]:
    arxiv_url_pattern = re.compile(ARXIV_ID_PATTERN)
    # exclude_strings = {"file_path", "metadata", "link", "loader", "PyPDFLoader"}
    # loop all the Readme.md files since they are parsed into LangChain documentation
    # exclude the Readme.md in the root folder
    files = (
        p.resolve()
        for p in Path(templates_dir).glob("**/*")
        if p.name.lower() in {"readme.md"} and p.parent.name != "templates"
    )
    arxiv_id2template_names: dict[str, set[str]] = {}
    for file in files:
        with open(file, "r", encoding="utf-8") as f:
            lines = f.readlines()
            for line in lines:
                # if any(exclude_string in line for exclude_string in exclude_strings):
                #     continue
                matches = arxiv_url_pattern.search(line)
                if matches:
                    arxiv_id = matches.group(2)
                    template_name = file.parent.name
                    if arxiv_id not in arxiv_id2template_names:
                        arxiv_id2template_names[arxiv_id] = {template_name}
                    else:
                        arxiv_id2template_names[arxiv_id].add(template_name)
    return arxiv_id2template_names
 def _get_doc_path(file_parts: tuple[str, ...], file_extension) -> str:
    """Get the relative path to the documentation page
    from the absolute path of the file.
@ -257,58 +286,70 @@ def _get_module_name(file_parts: tuple[str, ...]) -> str:
 def compound_urls(
-    arxiv_id2file_names: dict[str, set[str]], arxiv_id2code_urls: dict[str, set[str]]
+    arxiv_id2file_names: dict[str, set[str]],
    arxiv_id2code_urls: dict[str, set[str]],
    arxiv_id2templates: dict[str, set[str]],
 ) -> dict[str, dict[str, set[str]]]:
-    arxiv_id2urls = dict()
+    # format urls and verify that the urls are correct
-    for arxiv_id, code_urls in arxiv_id2code_urls.items():
+    arxiv_id2file_names_new = {}
        arxiv_id2urls[arxiv_id] = {"api": code_urls}
        # intersection of the two sets
        if arxiv_id in arxiv_id2file_names:
            arxiv_id2urls[arxiv_id]["docs"] = arxiv_id2file_names[arxiv_id]
    for arxiv_id, file_names in arxiv_id2file_names.items():
-        if arxiv_id not in arxiv_id2code_urls:
+        key2urls = {
-            arxiv_id2urls[arxiv_id] = {"docs": file_names}
+            key: _format_doc_url(key)
-    # reverse sort by the arxiv_id (the newest papers first)
+            for key in file_names
-    ret = dict(sorted(arxiv_id2urls.items(), key=lambda item: item[0], reverse=True))
+            if _is_url_ok(_format_doc_url(key))
-    return ret
+        }
        if key2urls:
            arxiv_id2file_names_new[arxiv_id] = key2urls
    arxiv_id2code_urls_new = {}
    for arxiv_id, code_urls in arxiv_id2code_urls.items():
        key2urls = {
            key: _format_api_ref_url(key)
            for key in code_urls
            if _is_url_ok(_format_api_ref_url(key))
        }
        if key2urls:
            arxiv_id2code_urls_new[arxiv_id] = key2urls
-def _format_doc_link(doc_paths: list[str]) -> list[str]:
+    arxiv_id2templates_new = {}
-    return [
+    for arxiv_id, templates in arxiv_id2templates.items():
-        f"[{doc_path}](https://python.langchain.com/{doc_path})"
+        key2urls = {
-        for doc_path in doc_paths
+            key: _format_template_url(key)
-    ]
+            for key in templates
            if _is_url_ok(_format_template_url(key))
        }
        if key2urls:
            arxiv_id2templates_new[arxiv_id] = key2urls
-
+    arxiv_id2type2key2urls = dict.fromkeys(
-def _format_api_ref_link(
+        arxiv_id2file_names_new | arxiv_id2code_urls_new | arxiv_id2templates_new
    doc_paths: list[str], compact: bool = False
 ) -> list[str]:  # TODO
    # agents/langchain_core.agents.AgentAction.html#langchain_core.agents.AgentAction
    ret = []
    for doc_path in doc_paths:
        module = doc_path.split("#")[1].replace("module-", "")
        if compact and module.count(".") > 2:
            # langchain_community.llms.oci_data_science_model_deployment_endpoint.OCIModelDeploymentTGI
            # -> langchain_community.llms...OCIModelDeploymentTGI
            module_parts = module.split(".")
            module = f"{module_parts[0]}.{module_parts[1]}...{module_parts[-1]}"
        ret.append(
            f"[{module}](https://api.python.langchain.com/en/latest/{doc_path.split('langchain.com/')[-1]})"
        )
    return ret
 def log_results(arxiv_id2urls):
    arxiv_ids = arxiv_id2urls.keys()
    doc_number, api_number = 0, 0
    for urls in arxiv_id2urls.values():
        if "docs" in urls:
            doc_number += len(urls["docs"])
        if "api" in urls:
            api_number += len(urls["api"])
    logger.info(
        f"Found {len(arxiv_ids)} arXiv references in the {doc_number} docs and in {api_number} API Refs."
    )
    arxiv_id2type2key2urls = {k: {} for k in arxiv_id2type2key2urls}
    for arxiv_id, key2urls in arxiv_id2file_names_new.items():
        arxiv_id2type2key2urls[arxiv_id]["docs"] = key2urls
    for arxiv_id, key2urls in arxiv_id2code_urls_new.items():
        arxiv_id2type2key2urls[arxiv_id]["apis"] = key2urls
    for arxiv_id, key2urls in arxiv_id2templates_new.items():
        arxiv_id2type2key2urls[arxiv_id]["templates"] = key2urls
    # reverse sort by the arxiv_id (the newest papers first)
    ret = dict(
        sorted(arxiv_id2type2key2urls.items(), key=lambda item: item[0], reverse=True)
    )
    return ret
 def _is_url_ok(url: str) -> bool:
    """Check if the url page is open without error."""
    import requests
    try:
        response = requests.get(url)
        response.raise_for_status()
    except requests.exceptions.RequestException as ex:
        logger.warning(f"Could not open the {url}.")
        return False
    return True
 class ArxivAPIWrapper(BaseModel):
@ -335,7 +376,7 @@ class ArxivAPIWrapper(BaseModel):
        return values
    def get_papers(
-        self, arxiv_id2urls: dict[str, dict[str, set[str]]]
+        self, arxiv_id2type2key2urls: dict[str, dict[str, dict[str, str]]]
    ) -> list[ArxivPaper]:
        """
        Performs an arxiv search and returns information about the papers found.
@ -343,8 +384,8 @@ class ArxivAPIWrapper(BaseModel):
        If an error occurs or no documents found, error text
        is returned instead.
        Args:
-            arxiv_id2urls: Dictionary with arxiv_id as key and dictionary
+            arxiv_id2type2key2urls: Dictionary with arxiv_id as key and dictionary
-             with sets of doc file names and API Ref urls.
+             with dicts of doc file names/API objects/templates to urls.
        Returns:
            List of ArxivPaper objects.
@ -356,10 +397,10 @@ class ArxivAPIWrapper(BaseModel):
            else:
                return [str(a) for a in authors]
-        if not arxiv_id2urls:
+        if not arxiv_id2type2key2urls:
            return []
        try:
-            arxiv_ids = list(arxiv_id2urls.keys())
+            arxiv_ids = list(arxiv_id2type2key2urls.keys())
            results = self.arxiv_search(
                id_list=arxiv_ids,
                max_results=len(arxiv_ids),
@ -374,38 +415,99 @@ class ArxivAPIWrapper(BaseModel):
                abstract=result.summary,
                url=result.entry_id,
                published_date=str(result.published.date()),
-                referencing_docs=urls["docs"] if "docs" in urls else [],
+                referencing_doc2url=type2key2urls["docs"]
-                referencing_api_refs=urls["api"] if "api" in urls else [],
+                if "docs" in type2key2urls
                else {},
                referencing_api_ref2url=type2key2urls["apis"]
                if "apis" in type2key2urls
                else {},
                referencing_template2url=type2key2urls["templates"]
                if "templates" in type2key2urls
                else {},
            )
-            for result, urls in zip(results, arxiv_id2urls.values())
+            for result, type2key2urls in zip(results, arxiv_id2type2key2urls.values())
        ]
        return papers
-def generate_arxiv_references_page(file_name: str, papers: list[ArxivPaper]) -> None:
+def _format_doc_url(doc_path: str) -> str:
    return f"https://{LANGCHAIN_PYTHON_URL}/{doc_path}"
 def _format_api_ref_url(doc_path: str, compact: bool = False) -> str:
    # agents/langchain_core.agents.AgentAction.html#langchain_core.agents.AgentAction
    return f"https://api.{LANGCHAIN_PYTHON_URL}/en/latest/{doc_path.split('langchain.com/')[-1]}"
 def _format_template_url(template_name: str) -> str:
    return f"https://{LANGCHAIN_PYTHON_URL}/docs/templates/{template_name}"
 def _compact_module_full_name(doc_path: str) -> str:
    # agents/langchain_core.agents.AgentAction.html#langchain_core.agents.AgentAction
    module = doc_path.split("#")[1].replace("module-", "")
    if module.count(".") > 2:
        # langchain_community.llms.oci_data_science_model_deployment_endpoint.OCIModelDeploymentTGI
        # -> langchain_community.llms...OCIModelDeploymentTGI
        module_parts = module.split(".")
        module = f"{module_parts[0]}.{module_parts[1]}...{module_parts[-1]}"
    return module
 def log_results(arxiv_id2type2key2urls):
    arxiv_ids = arxiv_id2type2key2urls.keys()
    doc_number, api_number, templates_number = 0, 0, 0
    for type2key2url in arxiv_id2type2key2urls.values():
        if "docs" in type2key2url:
            doc_number += len(type2key2url["docs"])
        if "apis" in type2key2url:
            api_number += len(type2key2url["apis"])
        if "templates" in type2key2url:
            templates_number += len(type2key2url["templates"])
    logger.warning(
        f"Found {len(arxiv_ids)} arXiv references in the {doc_number} docs, {api_number} API Refs,"
        f" and {templates_number} Templates."
    )
 def generate_arxiv_references_page(file_name: Path, papers: list[ArxivPaper]) -> None:
    with open(file_name, "w") as f:
        # Write the table headers
        f.write("""# arXiv
 LangChain implements the latest research in the field of Natural Language Processing.
-This page contains `arXiv` papers referenced in the LangChain Documentation and API Reference.
+This page contains `arXiv` papers referenced in the LangChain Documentation, API Reference,
 and Templates.
 ## Summary
-| arXiv id / Title | Authors | Published date 🔻 | LangChain Documentation and API Reference |
+| arXiv id / Title | Authors | Published date 🔻 | LangChain Documentation|
-|------------------|---------|-------------------|-------------------------|
+|------------------|---------|-------------------|------------------------|
 """)
        for paper in papers:
            refs = []
-            if paper.referencing_docs:
+            if paper.referencing_doc2url:
                refs += [
-                    "`Docs:` " + ", ".join(_format_doc_link(paper.referencing_docs))
+                    "`Docs:` "
                    + ", ".join(
                        f"[{key}]({url})"
                        for key, url in paper.referencing_doc2url.items()
                    )
                ]
-            if paper.referencing_api_refs:
+            if paper.referencing_api_ref2url:
                refs += [
                    "`API:` "
                    + ", ".join(
-                        _format_api_ref_link(paper.referencing_api_refs, compact=True)
+                        f"[{_compact_module_full_name(key)}]({url})"
                        for key, url in paper.referencing_api_ref2url.items()
                    )
                ]
            if paper.referencing_template2url:
                refs += [
                    "`Template:` "
                    + ", ".join(
                        f"[{key}]({url})"
                        for key, url in paper.referencing_template2url.items()
                    )
                ]
            refs_str = ", ".join(refs)
@ -417,15 +519,23 @@ This page contains `arXiv` papers referenced in the LangChain Documentation and
        for paper in papers:
            docs_refs = (
-                f"- **LangChain Documentation:** {', '.join(_format_doc_link(paper.referencing_docs))}"
+                f"   - **Documentation:** {', '.join(f'[{key}]({url})' for key, url in paper.referencing_doc2url.items())}"
-                if paper.referencing_docs
+                if paper.referencing_doc2url
                else ""
            )
            api_ref_refs = (
-                f"- **LangChain API Reference:** {', '.join(_format_api_ref_link(paper.referencing_api_refs))}"
+                f"   - **API Reference:** {', '.join(f'[{_compact_module_full_name(key)}]({url})' for key, url in paper.referencing_api_ref2url.items())}"
-                if paper.referencing_api_refs
+                if paper.referencing_api_ref2url
                else ""
            )
            template_refs = (
                f"   - **Template:** {', '.join(f'[{key}]({url})' for key, url in paper.referencing_template2url.items())}"
                if paper.referencing_template2url
                else ""
            )
            refs = "\n".join(
                [el for el in [docs_refs, api_ref_refs, template_refs] if el]
            )
            f.write(f"""
 ## {paper.title}
@ -434,13 +544,14 @@ This page contains `arXiv` papers referenced in the LangChain Documentation and
 - **Authors:** {', '.join(paper.authors)}
 - **Published Date:** {paper.published_date}
 - **URL:** {paper.url}
-{docs_refs}
+- **LangChain:**
-{api_ref_refs}
+
 {refs}
 **Abstract:** {paper.abstract}
                """)
-    logger.info(f"Created the {file_name} file with {len(papers)} arXiv references.")
+    logger.warning(f"Created the {file_name} file with {len(papers)} arXiv references.")
 def main():
@ -450,14 +561,17 @@ def main():
        arxiv_id2module_name_and_members
    )
    arxiv_id2file_names = search_documentation_for_arxiv_references(DOCS_DIR)
-    arxiv_id2urls = compound_urls(arxiv_id2file_names, arxiv_id2code_urls)
+    arxiv_id2templates = search_templates_for_arxiv_references(TEMPLATES_DIR)
-    log_results(arxiv_id2urls)
+    arxiv_id2type2key2urls = compound_urls(
        arxiv_id2file_names, arxiv_id2code_urls, arxiv_id2templates
    )
    log_results(arxiv_id2type2key2urls)
    # get the arXiv paper information
-    papers = ArxivAPIWrapper().get_papers(arxiv_id2urls)
+    papers = ArxivAPIWrapper().get_papers(arxiv_id2type2key2urls)
    # generate the arXiv references page
-    output_file = str(DOCS_DIR / "additional_resources" / "arxiv_references.mdx")
+    output_file = DOCS_DIR / "additional_resources" / "arxiv_references.mdx"
    generate_arxiv_references_page(output_file, papers)