Merge branch 'master' into bagatur/locals_in_config

2025-07-17 18:23:59 +00:00 · 2023-08-21 17:31:39 -07:00 · 2023-08-21 17:31:39 -07:00 · fa478638a9
commit fa478638a9
parent 182b059bf4 0fa4516ce4
97 changed files with 4825 additions and 840 deletions
--- a/.github/CONTRIBUTING.md
+++ b/.github/CONTRIBUTING.md
@ -33,7 +33,7 @@ best way to get our attention.
 ### 🚩GitHub Issues

 Our [issues](https://github.com/hwchase17/langchain/issues) page is kept up to date
-with bugs, improvements, and feature requests. 
+with bugs, improvements, and feature requests.

 There is a taxonomy of labels to help with sorting and discovery of issues of interest. Please use these to help
 organize issues.
@ -61,11 +61,11 @@ we do not want these to get in the way of getting good code into the codebase.

 > **Note:** You can run this repository locally (which is described below) or in a [development container](https://containers.dev/) (which is described in the [.devcontainer folder](https://github.com/hwchase17/langchain/tree/master/.devcontainer)).

-This project uses [Poetry](https://python-poetry.org/) as a dependency manager. Check out Poetry's [documentation on how to install it](https://python-poetry.org/docs/#installation) on your system before proceeding.
+This project uses [Poetry](https://python-poetry.org/) v1.5.1 as a dependency manager. Check out Poetry's [documentation on how to install it](https://python-poetry.org/docs/#installation) on your system before proceeding.

 ❗Note: If you use `Conda` or `Pyenv` as your environment / package manager, avoid dependency conflicts by doing the following first:
 1. *Before installing Poetry*, create and activate a new Conda env (e.g. `conda create -n langchain python=3.9`)
-2. Install Poetry (see above)
+2. Install Poetry v1.5.1 (see above)
 3. Tell Poetry to use the virtualenv python environment (`poetry config virtualenvs.prefer-active-python true`)
 4. Continue with the following steps.

@ -73,7 +73,7 @@ There are two separate projects in this repository:
 - `langchain`: core langchain code, abstractions, and use cases
 - `langchain.experimental`: more experimental code

-Each of these has their OWN development environment. 
+Each of these has their OWN development environment.
 In order to run any of the commands below, please move into their respective directories.
 For example, to contribute to `langchain` run `cd libs/langchain` before getting started with the below.

@ -85,7 +85,7 @@ poetry install -E all

 This will install all requirements for running the package, examples, linting, formatting, tests, and coverage. Note the `-E all` flag will install all optional dependencies necessary for integration testing.

-❗Note: If you're running Poetry 1.4.1 and receive a `WheelFileValidationError` for `debugpy` during installation, you can try either downgrading to Poetry 1.4.0 or disabling "modern installation" (`poetry config installer.modern-installation false`) and re-install requirements. See [this `debugpy` issue](https://github.com/microsoft/debugpy/issues/1246) for more details.
+❗Note: If during installation you receive a `WheelFileValidationError` for `debugpy`, please make sure you are running Poetry v1.5.1. This bug was present in older versions of Poetry (e.g. 1.4.1) and has been resolved in newer releases. If you are still seeing this bug on v1.5.1, you may also try disabling "modern installation" (`poetry config installer.modern-installation false`) and re-installing requirements. See [this `debugpy` issue](https://github.com/microsoft/debugpy/issues/1246) for more details.

 Now, you should be able to run the common tasks in the following section. To double check, run `make test`, all tests should pass. If they don't you may need to pip install additional dependencies, such as `numexpr` and `openapi_schema_pydantic`.

@ -175,9 +175,9 @@ If you're adding a new dependency to Langchain, assume that it will be an option
 that most users won't have it installed.

 Users that do not have the dependency installed should be able to **import** your code without
-any side effects (no warnings, no errors, no exceptions). 
+any side effects (no warnings, no errors, no exceptions).

-To introduce the dependency to the pyproject.toml file correctly, please do the following: 
+To introduce the dependency to the pyproject.toml file correctly, please do the following:

 1. Add the dependency to the main group as an optional dependency
  ```bash
@ -220,7 +220,7 @@ If you add new logic, please add a unit test.

 Integration tests cover logic that requires making calls to outside APIs (often integration with other services).

-**warning** Almost no tests should be integration tests. 
+**warning** Almost no tests should be integration tests.

  Tests that require making network connections make it difficult for other
  developers to test the code.
@ -307,4 +307,3 @@ even patch releases may contain [non-backwards-compatible changes](https://semve

 If your contribution has made its way into a release, we will want to give you credit on Twitter (only if you want though)!
 If you have a Twitter account you would like us to mention, please let us know in the PR or in another manner.
-
--- a/.github/ISSUE_TEMPLATE/bug-report.yml
+++ b/.github/ISSUE_TEMPLATE/bug-report.yml
@ -1,5 +1,5 @@
 name: "\U0001F41B Bug Report"
-description: Submit a bug report to help us improve LangChain
+description: Submit a bug report to help us improve LangChain. To report a security issue, please instead use the security option below.
 labels: ["02 Bug Report"]
 body:
  - type: markdown
--- a/.github/actions/poetry_setup/action.yml
+++ b/.github/actions/poetry_setup/action.yml
@ -47,8 +47,12 @@ runs:
          ~/.cache/pip
        key: pip-${{ runner.os }}-${{ runner.arch }}-py-${{ inputs.python-version }}

-    - run: pipx install poetry==${{ inputs.poetry-version }} --python python${{ inputs.python-version }}
+    - name: Install poetry
      shell: bash
+      env:
+        POETRY_VERSION: ${{ inputs.poetry-version }}
+        PYTHON_VERSION: ${{ inputs.python-version }}
+      run: pipx install "poetry==$POETRY_VERSION" --python "python$PYTHON_VERSION" --verbose

    - name: Check Poetry File
      shell: bash
@ -62,30 +66,18 @@ runs:
      run: |
        poetry lock --check

-    - name: Set proper Poetry.lock file
-      shell: bash
-      env:
-        WORKDIR: ${{ inputs.working-directory == '' && '.' || inputs.working-directory }}
-      run: |
-        if [ -f "$WORKDIR/poetry.lock" ]; then
-          echo 'Using working directory poetry.lock in cache key'
-          cp "$WORKDIR/poetry.lock" poetry-lock.cache-key
-        else
-          echo 'Using the top-level poetry.lock in cache key'
-          cp poetry.lock poetry-lock.cache-key
-        fi
-
    - uses: actions/cache@v3
      id: cache-poetry
      env:
        SEGMENT_DOWNLOAD_TIMEOUT_MIN: "15"
+        WORKDIR: ${{ inputs.working-directory == '' && '.' || inputs.working-directory }}
      with:
        path: |
          ~/.cache/pypoetry/virtualenvs
          ~/.cache/pypoetry/cache
          ~/.cache/pypoetry/artifacts
-          ${{ inputs.working-directory == '' && '.' || inputs.working-directory }}/.venv
-        key: poetry-${{ runner.os }}-${{ runner.arch }}-py-${{ inputs.python-version }}-poetry-${{ inputs.poetry-version }}-${{ inputs.cache-key }}-${{ hashFiles('poetry-lock.cache-key') }}
+          ${{ env.WORKDIR }}/.venv
+        key: poetry-${{ runner.os }}-${{ runner.arch }}-py-${{ inputs.python-version }}-poetry-${{ inputs.poetry-version }}-${{ inputs.cache-key }}-${{ hashFiles(format('{0}/poetry.lock', env.WORKDIR)) }}

    - run: ${{ inputs.install-command }}
      working-directory: ${{ inputs.working-directory }}
--- a/.github/workflows/_lint.yml
+++ b/.github/workflows/_lint.yml
@ -9,7 +9,8 @@ on:
        description: "From which folder this pipeline executes"

 env:
-  POETRY_VERSION: "1.4.2"
+  POETRY_VERSION: "1.5.1"
+  WORKDIR: ${{ inputs.working-directory == '' && '.' || inputs.working-directory }}

 jobs:
  build:
@ -20,13 +21,17 @@ jobs:
      # and also as small as possible since increasing the number makes
      # the initial `git fetch` slower.
      FETCH_DEPTH: 50
-      WORKDIR: ${{ inputs.working-directory == '' && '.' || inputs.working-directory }}
    strategy:
      matrix:
+        # Only lint on the min and max supported Python versions.
+        # It's extremely unlikely that there's a lint issue on any version in between
+        # that doesn't show up on the min or max versions.
+        #
+        # GitHub rate-limits how many jobs can be running at any one time.
+        # Starting new jobs is also relatively slow,
+        # so linting on fewer versions makes CI faster.
        python-version:
          - "3.8"
-          - "3.9"
-          - "3.10"
          - "3.11"
    steps:
      - uses: actions/checkout@v3
@ -42,8 +47,6 @@ jobs:
        # since the previous action step just created them.
        # This command resets the mtime to the last time the files were modified in git instead,
        # which is a high-quality and stable representation of the last modification date.
-        env:
-          WORKDIR: ${{ inputs.working-directory == '' && '.' || inputs.working-directory }}
        run: |
          # Important considerations:
          # - These commands run at base of the repo, since we never `cd` to the `WORKDIR`.
@ -88,7 +91,7 @@ jobs:
          key: pip-editable-langchain-deps-${{ runner.os }}-${{ runner.arch }}-py-${{ matrix.python-version }}
      - name: Install poetry
        run: |
-          pipx install poetry==$POETRY_VERSION
+          pipx install "poetry==$POETRY_VERSION"
      - name: Set up Python ${{ matrix.python-version }}
        uses: actions/setup-python@v4
        env:
@ -97,14 +100,14 @@ jobs:
          python-version: ${{ matrix.python-version }}
          cache: poetry
          cache-dependency-path: |
-            ${{ inputs.working-directory == '' && '.' || inputs.working-directory }}/**/poetry.lock
+            ${{ env.WORKDIR }}/**/poetry.lock
      - name: Install dependencies
        working-directory: ${{ inputs.working-directory }}
        run: |
          poetry install
      - name: Install langchain editable
        working-directory: ${{ inputs.working-directory }}
-        if: ${{ inputs.working-directory != 'langchain' }}
+        if: ${{ inputs.working-directory != 'libs/langchain' }}
        run: |
          pip install -e ../langchain

--- a/.github/workflows/_release.yml
+++ b/.github/workflows/_release.yml
@ -9,21 +9,27 @@ on:
        description: "From which folder this pipeline executes"

 env:
-  POETRY_VERSION: "1.4.2"
+  POETRY_VERSION: "1.5.1"

 jobs:
  if_release:
-    if: |
-        ${{ github.event.pull_request.merged == true }}
-        && ${{ contains(github.event.pull_request.labels.*.name, 'release') }}
+    # Disallow publishing from branches that aren't `master`.
+    if: github.ref == 'refs/heads/master'
    runs-on: ubuntu-latest
+    permissions:
+      # This permission is used for trusted publishing:
+      # https://blog.pypi.org/posts/2023-04-20-introducing-trusted-publishers/
+      #
+      # Trusted publishing has to also be configured on PyPI for each package:
+      # https://docs.pypi.org/trusted-publishers/adding-a-publisher/
+      id-token: write
    defaults:
      run:
        working-directory: ${{ inputs.working-directory }}
    steps:
      - uses: actions/checkout@v3
      - name: Install poetry
-        run: pipx install poetry==$POETRY_VERSION
+        run: pipx install "poetry==$POETRY_VERSION"
      - name: Set up Python 3.10
        uses: actions/setup-python@v4
        with:
@ -45,8 +51,9 @@ jobs:
          generateReleaseNotes: true
          tag: v${{ steps.check-version.outputs.version }}
          commit: master
-      - name: Publish to PyPI
-        env:
-          POETRY_PYPI_TOKEN_PYPI: ${{ secrets.PYPI_API_TOKEN }}
-        run: |
-          poetry publish
+      - name: Publish package distributions to PyPI
+        uses: pypa/gh-action-pypi-publish@release/v1
+        with:
+          packages-dir: ${{ inputs.working-directory }}/dist/
+          verbose: true
+          print-hash: true
--- a/.github/workflows/_test.yml
+++ b/.github/workflows/_test.yml
@ -13,7 +13,7 @@ on:
        default: '["core", "extended", "core-pydantic-2"]'

 env:
-  POETRY_VERSION: "1.4.2"
+  POETRY_VERSION: "1.5.1"

 jobs:
  build:
@ -37,7 +37,7 @@ jobs:
        with:
          python-version: ${{ matrix.python-version }}
          working-directory: ${{ inputs.working-directory }}
-          poetry-version: "1.4.2"
+          poetry-version: ${{ env.POETRY_VERSION }}
          cache-key: ${{ matrix.test_type }}
          install-command: |
              if [ "${{ matrix.test_type }}" == "core" ]; then
--- a/.github/workflows/langchain_experimental_release.yml
+++ b/.github/workflows/langchain_experimental_release.yml
@ -2,13 +2,6 @@
 name: libs/experimental Release

 on:
-  pull_request:
-    types:
-      - closed
-    branches:
-      - master
-    paths:
-      - 'libs/experimental/pyproject.toml'
  workflow_dispatch:  # Allows to trigger the workflow manually in GitHub UI

 jobs:
@ -17,4 +10,4 @@ jobs:
      ./.github/workflows/_release.yml
    with:
      working-directory: libs/experimental
-    secrets: inherit
+    secrets: inherit
--- a/.github/workflows/langchain_release.yml
+++ b/.github/workflows/langchain_release.yml
@ -2,13 +2,6 @@
 name: libs/langchain Release

 on:
-  pull_request:
-    types:
-      - closed
-    branches:
-      - master
-    paths:
-      - 'libs/langchain/pyproject.toml'
  workflow_dispatch:  # Allows to trigger the workflow manually in GitHub UI

 jobs:
@ -17,4 +10,4 @@ jobs:
      ./.github/workflows/_release.yml
    with:
      working-directory: libs/langchain
-    secrets: inherit
+    secrets: inherit
--- a/.github/workflows/scheduled_test.yml
+++ b/.github/workflows/scheduled_test.yml
@ -6,7 +6,7 @@ on:
    - cron:  '0 13 * * *'

 env:
-  POETRY_VERSION: "1.4.2"
+  POETRY_VERSION: "1.5.1"

 jobs:
  build:
@ -29,7 +29,7 @@ jobs:
        uses: "./.github/actions/poetry_setup"
        with:
          python-version: ${{ matrix.python-version }}
-          poetry-version: "1.4.2"
+          poetry-version: ${{ env.POETRY_VERSION }}
          working-directory: libs/langchain
          install-command: |
            echo "Running scheduled tests, installing dependencies with poetry..."
--- a/README.md
+++ b/README.md
@ -2,18 +2,18 @@

 ⚡ Building applications with LLMs through composability ⚡

-[![Release Notes](https://img.shields.io/github/release/hwchase17/langchain)](https://github.com/hwchase17/langchain/releases)
-[![CI](https://github.com/hwchase17/langchain/actions/workflows/langchain_ci.yml/badge.svg)](https://github.com/hwchase17/langchain/actions/workflows/langchain_ci.yml)
-[![Experimental CI](https://github.com/hwchase17/langchain/actions/workflows/langchain_experimental_ci.yml/badge.svg)](https://github.com/hwchase17/langchain/actions/workflows/langchain_experimental_ci.yml)
+[![Release Notes](https://img.shields.io/github/release/langchain-ai/langchain)](https://github.com/langchain-ai/langchain/releases)
+[![CI](https://github.com/langchain-ai/langchain/actions/workflows/langchain_ci.yml/badge.svg)](https://github.com/langchain-ai/langchain/actions/workflows/langchain_ci.yml)
+[![Experimental CI](https://github.com/langchain-ai/langchain/actions/workflows/langchain_experimental_ci.yml/badge.svg)](https://github.com/langchain-ai/langchain/actions/workflows/langchain_experimental_ci.yml)
 [![Downloads](https://static.pepy.tech/badge/langchain/month)](https://pepy.tech/project/langchain)
 [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
 [![Twitter](https://img.shields.io/twitter/url/https/twitter.com/langchainai.svg?style=social&label=Follow%20%40LangChainAI)](https://twitter.com/langchainai)
 [![](https://dcbadge.vercel.app/api/server/6adMQxSpJS?compact=true&style=flat)](https://discord.gg/6adMQxSpJS)
-[![Open in Dev Containers](https://img.shields.io/static/v1?label=Dev%20Containers&message=Open&color=blue&logo=visualstudiocode)](https://vscode.dev/redirect?url=vscode://ms-vscode-remote.remote-containers/cloneInVolume?url=https://github.com/hwchase17/langchain)
-[![Open in GitHub Codespaces](https://github.com/codespaces/badge.svg)](https://codespaces.new/hwchase17/langchain)
-[![GitHub star chart](https://img.shields.io/github/stars/hwchase17/langchain?style=social)](https://star-history.com/#hwchase17/langchain)
+[![Open in Dev Containers](https://img.shields.io/static/v1?label=Dev%20Containers&message=Open&color=blue&logo=visualstudiocode)](https://vscode.dev/redirect?url=vscode://ms-vscode-remote.remote-containers/cloneInVolume?url=https://github.com/langchain-ai/langchain)
+[![Open in GitHub Codespaces](https://github.com/codespaces/badge.svg)](https://codespaces.new/langchain-ai/langchain)
+[![GitHub star chart](https://img.shields.io/github/stars/langchain-ai/langchain?style=social)](https://star-history.com/#langchain-ai/langchain)
 [![Dependency Status](https://img.shields.io/librariesio/github/langchain-ai/langchain)](https://libraries.io/github/langchain-ai/langchain)
-[![Open Issues](https://img.shields.io/github/issues-raw/hwchase17/langchain)](https://github.com/hwchase17/langchain/issues)
+[![Open Issues](https://img.shields.io/github/issues-raw/langchain-ai/langchain)](https://github.com/langchain-ai/langchain/issues)


 Looking for the JS/TS version? Check out [LangChain.js](https://github.com/hwchase17/langchainjs).
--- a/SECURITY.md
+++ b/SECURITY.md
@ -0,0 +1,6 @@
+# Security Policy
+
+## Reporting a Vulnerability
+
+Please report security vulnerabilities by email to `security@langchain.dev`.
+This email is an alias to a subset of our maintainers, and will ensure the issue is promptly triaged and acted upon as needed.
--- a/docs/api_reference/requirements.txt
+++ b/docs/api_reference/requirements.txt
@ -1,5 +1,6 @@
 -e libs/langchain
 -e libs/experimental
+pydantic<2
 autodoc_pydantic==1.8.0
 myst_parser
 nbsphinx==0.8.9
--- a/docs/docs_skeleton/docs/get_started/introduction.mdx
+++ b/docs/docs_skeleton/docs/get_started/introduction.mdx
@ -28,7 +28,7 @@ LangChain provides standard, extendable interfaces and external integrations for

 #### [Model I/O](/docs/modules/model_io/)
 Interface with language models
-#### [Data connection](/docs/modules/data_connection/)
+#### [Retrieval](/docs/modules/data_connection/)
 Interface with application-specific data
 #### [Chains](/docs/modules/chains/)
 Construct sequences of calls
--- a/docs/docs_skeleton/docs/modules/data_connection/index.mdx
+++ b/docs/docs_skeleton/docs/modules/data_connection/index.mdx
@ -2,15 +2,60 @@
 sidebar_position: 1
 ---

-# Data connection
+# Retrieval

-Many LLM applications require user-specific data that is not part of the model's training set. LangChain gives you the 
-building blocks to load, transform, store and query your data via:
+Many LLM applications require user-specific data that is not part of the model's training set.
+The primary way of accomplishing this is through Retrieval Augmented Generation (RAG).
+In this process, external data is *retrieved* and then passed to the LLM when doing the *generation* step.

- [Document loaders](/docs/modules/data_connection/document_loaders/): Load documents from many different sources
- [Document transformers](/docs/modules/data_connection/document_transformers/): Split documents, convert documents into Q&A format, drop redundant documents, and more
- [Text embedding models](/docs/modules/data_connection/text_embedding/): Take unstructured text and turn it into a list of floating point numbers
- [Vector stores](/docs/modules/data_connection/vectorstores/): Store and search over embedded data
- [Retrievers](/docs/modules/data_connection/retrievers/): Query your data
+LangChain provides all the building blocks for RAG applications - from simple to complex.
+This section of the documentation covers everything related to the *retrieval* step - e.g. the fetching of the data.
+Although this sounds simple, it can be subtly complex.
+This encompasses several key modules.

 ![data_connection_diagram](/img/data_connection.jpg)
+
+**[Document loaders](/docs/modules/data_connection/document_loaders/)**
+
+Load documents from many different sources.
+LangChain provides over a 100 different document loaders as well as integrations with other major providers in the space,
+like AirByte and Unstructured.
+We provide integrations to load all types of documents (html, PDF, code) from all types of locations (private s3 buckets, public websites).
+
+**[Document transformers](/docs/modules/data_connection/document_transformers/)**
+
+A key part of retrieval is fetching only the relevant parts of documents.
+This involves several transformation steps in order to best prepare the documents for retrieval.
+One of the primary ones here is splitting (or chunking) a large document into smaller chunks.
+LangChain provides several different algorithms for doing this, as well as logic optimized for specific document types (code, markdown, etc).
+
+**[Text embedding models](/docs/modules/data_connection/text_embedding/)**
+
+Another key part of retrieval has become creating embeddings for documents.
+Embeddings capture the semantic meaning of text, allowing you to quickly and
+efficiently find other pieces of text that are similar.
+LangChain provides integrations with over 25 different embedding providers and methods,
+from open-source to proprietary API,
+allowing you to choose the one best suited for your needs.
+LangChain exposes a standard interface, allowing you to easily swap between models.
+
+**[Vector stores](/docs/modules/data_connection/vectorstores/)**
+
+With the rise of embeddings, there has emerged a need for databases to support efficient storage and searching of these embeddings.
+LangChain provides integrations with over 50 different vectorstores, from open-source local ones to cloud-hosted proprietary ones,
+allowing you choose the one best suited for your needs.
+LangChain exposes a standard interface, allowing you to easily swap between vector stores.
+
+**[Retrievers](/docs/modules/data_connection/retrievers/)**
+
+Once the data is in the database, you still need to retrieve it.
+LangChain supports many different retrieval algorithms and is one of the places where we add the most value.
+We support basic methods that are easy to get started - namely simple semantic search.
+However, we have also added a collection of algorithms on top of this to increase performance.
+These include:
+
+- [Parent Document Retriever](/docs/modules/data_connection/retrievers/parent_document_retriever): This allows you to create multiple embeddings per parent document, allowing you to look up smaller chunks but return larger context.
+- [Self Query Retriever](/docs/modules/data_connection/retrievers/self_query): User questions often contain reference to something that isn't just semantic, but rather expresses some logic that can best be represented as a metadata filter. Self-query allows you to parse out the *semantic* part of a query from other *metadata filters* present in the query
+- [Ensemble Retriever](/docs/modules/data_connection/retrievers/ensemble): Sometimes you may want to retrieve documents from multiple different sources, or using multiple different algorithms. The ensemble retriever allows you to easily do this.
+- And more!
+
--- a/docs/docs_skeleton/docs/modules/index.mdx
+++ b/docs/docs_skeleton/docs/modules/index.mdx
@ -8,7 +8,7 @@ LangChain provides standard, extendable interfaces and external integrations for

 #### [Model I/O](/docs/modules/model_io/)
 Interface with language models
-#### [Data connection](/docs/modules/data_connection/)
+#### [Retrieval](/docs/modules/data_connection/)
 Interface with application-specific data
 #### [Chains](/docs/modules/chains/)
 Construct sequences of calls
--- a/docs/extras/additional_resources/tutorials.mdx
+++ b/docs/extras/additional_resources/tutorials.mdx
@ -1,15 +1,15 @@
 # Tutorials

-Below are links to video tutorials and courses on LangChain. For written guides on common use cases for LangChain, check out the [use cases guides](/docs/use_cases).
+Below are links to tutorials and courses on LangChain. For written guides on common use cases for LangChain, check out the [use cases guides](/docs/use_cases).

-⛓ icon marks a new addition [last update 2023-07-05]
+⛓ icon marks a new addition [last update 2023-08-20]

 ---------------------

 ### DeepLearning.AI courses
 by [Harrison Chase](https://github.com/hwchase17) and [Andrew Ng](https://en.wikipedia.org/wiki/Andrew_Ng)
 - [LangChain for LLM Application Development](https://learn.deeplearning.ai/langchain)
- ⛓ [LangChain Chat with Your Data](https://learn.deeplearning.ai/langchain-chat-with-your-data)
+- [LangChain Chat with Your Data](https://learn.deeplearning.ai/langchain-chat-with-your-data)

 ### Handbook
 [LangChain AI Handbook](https://www.pinecone.io/learn/langchain/) By **James Briggs** and **Francisco Ingham**
@ -36,7 +36,7 @@ Below are links to video tutorials and courses on LangChain. For written guides
 - #8 [Create Custom Tools for Chatbots in LangChain](https://youtu.be/q-HNphrWsDE)
 - #9 [Build Conversational Agents with Vector DBs](https://youtu.be/H6bCqqw9xyI)
 - [Using NEW `MPT-7B` in Hugging Face and LangChain](https://youtu.be/DXpk9K7DgMo)
- ⛓ [`MPT-30B` Chatbot with LangChain](https://youtu.be/pnem-EhT6VI)
+- [`MPT-30B` Chatbot with LangChain](https://youtu.be/pnem-EhT6VI)


 ### [LangChain 101](https://www.youtube.com/playlist?list=PLqZXAkvF1bPNQER9mLmDbntNfSpzdDIU5) by [Greg Kamradt (Data Indy)](https://www.youtube.com/@DataIndependent)
@ -63,7 +63,7 @@ Below are links to video tutorials and courses on LangChain. For written guides
 - [Build Your Own `AI Twitter Bot` Using LLMs](https://youtu.be/yLWLDjT01q8)
 - [ChatGPT made my interview questions for me (`Streamlit` + LangChain)](https://youtu.be/zvoAMx0WKkw)
 - [Function Calling via ChatGPT API - First Look With LangChain](https://youtu.be/0-zlUy7VUjg)
- ⛓ [Extract Topics From Video/Audio With LLMs (Topic Modeling w/ LangChain)](https://youtu.be/pEkxRQFNAs4)
+- [Extract Topics From Video/Audio With LLMs (Topic Modeling w/ LangChain)](https://youtu.be/pEkxRQFNAs4)


 ### [LangChain How to and guides](https://www.youtube.com/playlist?list=PL8motc6AQftk1Bs42EW45kwYbyJ4jOdiZ) by [Sam Witteveen](https://www.youtube.com/@samwitteveenai)
@ -99,7 +99,7 @@ Below are links to video tutorials and courses on LangChain. For written guides
 - [`OpenAI Functions` + LangChain : Building a Multi Tool Agent](https://youtu.be/4KXK6c6TVXQ)
 - [What can you do with 16K tokens in LangChain?](https://youtu.be/z2aCZBAtWXs)
 - [Tagging and Extraction - Classification using `OpenAI Functions`](https://youtu.be/a8hMgIcUEnE)
- ⛓ [HOW to Make Conversational Form with LangChain](https://youtu.be/IT93On2LB5k)
+- [HOW to Make Conversational Form with LangChain](https://youtu.be/IT93On2LB5k)


 ### [LangChain](https://www.youtube.com/playlist?list=PLVEEucA9MYhOu89CX8H3MBZqayTbcCTMr) by [Prompt Engineering](https://www.youtube.com/@engineerprompt)
@ -121,5 +121,9 @@ Below are links to video tutorials and courses on LangChain. For written guides
 - [LangChain Agents: Build Personal Assistants For Your Data (Q&A with Harrison Chase and Mayo Oshin)](https://youtu.be/gVkF8cwfBLI)


+### Codebase Analysis
+- ⛓ [Codebase Analysis: Langchain Agents](https://carbonated-yacht-2c5.notion.site/Codebase-Analysis-Langchain-Agents-0b0587acd50647ca88aaae7cff5df1f2)
+
+
 ---------------------
-⛓ icon marks a new addition [last update 2023-07-05]
+⛓ icon marks a new addition [last update 2023-08-20]
--- a/docs/extras/guides/fallbacks.ipynb
+++ b/docs/extras/guides/fallbacks.ipynb
@ -19,9 +19,9 @@
   "source": [
    "## Handling LLM API Errors\n",
    "\n",
-    "This is maybe the most common use case for fallbacks. A request to an LLM API can fail for a variety of reasons - the API could be down, you could have hit rate limits, any number of things. Therefor, using fallbacks can help protect against these types of things.\n",
+    "This is maybe the most common use case for fallbacks. A request to an LLM API can fail for a variety of reasons - the API could be down, you could have hit rate limits, any number of things. Therefore, using fallbacks can help protect against these types of things.\n",
    "\n",
-    "IMPORTANT: By default, a lot of the LLM wrappers catch errors and retry. You will most likely want to turn those off when working with fallbacks. Otherwise the first wrapper will keep on retying and not failing."
+    "IMPORTANT: By default, a lot of the LLM wrappers catch errors and retry. You will most likely want to turn those off when working with fallbacks. Otherwise the first wrapper will keep on retrying and not failing."
   ]
  },
  {
--- a/docs/extras/guides/pydantic_compatibility.md
+++ b/docs/extras/guides/pydantic_compatibility.md
@ -4,23 +4,13 @@
 - v2 contains has a number of breaking changes (https://docs.pydantic.dev/2.0/migration/)
 - Pydantic v2 and v1 are under the same package name, so both versions cannot be installed at the same time

-
 ## LangChain Pydantic Migration Plan

-Langchain will carry out the migration to pydantic v2 in two steps:
-
-1. 2023-08-17: LangChain will allow users to install either Pydantic V1 or V2. 
+As of `langchain>=0.0.267`, LangChain will allow users to install either Pydantic V1 or V2. 
   * Internally LangChain will continue to [use V1](https://docs.pydantic.dev/latest/migration/#continue-using-pydantic-v1-features).
   * During this time, users can pin their pydantic version to v1 to avoid breaking changes, or start a partial
   migration using pydantic v2 throughout their code, but avoiding mixing v1 and v2 code for LangChain (see below).

-2. 2023-08-25: Langchain will migrate internally to using V2 code. 
-  * Users will have to upgrade to V2 as well to use LangChain.
-  * Users should stop using the `pydantic.v1` namespace when using LangChain.
-  * See the [bump-pydantic package](https://github.com/pydantic/bump-pydantic) to help with the upgrade process.
-
-## Between 2023-08-17 and 2023-08-25 releases
-
 User can either pin to pydantic v1, and upgrade their code in one go once LangChain has migrated to v2 internally, or they can start a partial migration to v2, but must avoid mixing v1 and v2 code for LangChain.

 Below are two examples of showing how to avoid mixing pydantic v1 and v2 code in
@ -112,10 +102,4 @@ Tool.from_function( # <-- tool uses v1 namespace
    description="useful for when you need to answer questions about math",
    args_schema=CalculatorInput
 )
-```
-
-## After 2023-08-25 release
-
-* Users must upgrade to v2
-* Users should not pass `pydantic.v1` derived objects to LangChain or rely on `pydantic.v1` when extending functionality
-
+```
--- a/docs/extras/integrations/document_loaders/microsoft_sharepoint.ipynb
+++ b/docs/extras/integrations/document_loaders/microsoft_sharepoint.ipynb
@ -0,0 +1,105 @@
+{
+ "cells": [
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Microsoft SharePoint\n",
+    "\n",
+    "> [Microsoft SharePoint](https://en.wikipedia.org/wiki/SharePoint) is a website-based collaboration system that uses workflow applications, “list” databases, and other web parts and security features to empower business teams to work together developed by Microsoft.\n",
+    "\n",
+    "This notebook covers how to load documents from the [SharePoint Document Library](https://support.microsoft.com/en-us/office/what-is-a-document-library-3b5976dd-65cf-4c9e-bf5a-713c10ca2872). Currently, only docx, doc, and pdf files are supported.\n",
+    "\n",
+    "## Prerequisites\n",
+    "1. Register an application with the [Microsoft identity platform](https://learn.microsoft.com/en-us/azure/active-directory/develop/quickstart-register-app) instructions.\n",
+    "2. When registration finishes, the Azure portal displays the app registration's Overview pane. You see the Application (client) ID. Also called the `client ID`, this value uniquely identifies your application in the Microsoft identity platform.\n",
+    "3. During the steps you will be following at **item 1**, you can set the redirect URI as `https://login.microsoftonline.com/common/oauth2/nativeclient`\n",
+    "4. During the steps you will be following at **item 1**, generate a new password (`client_secret`) under Application Secrets section.\n",
+    "5. Follow the instructions at this [document](https://learn.microsoft.com/en-us/azure/active-directory/develop/quickstart-configure-app-expose-web-apis#add-a-scope) to add the following `SCOPES` (`offline_access` and `Sites.Read.All`) to your application.\n",
+    "6. To retrieve files from your **Document Library**, you will need its ID. To obtain it, you will need values of `Tenant Name`, `Collection ID`, and `Subsite ID`.\n",
+    "7. To find your `Tenant Name` follow the instructions at this [document](https://learn.microsoft.com/en-us/azure/active-directory-b2c/tenant-management-read-tenant-name). Once you got this, just remove `.onmicrosoft.com` from the value and hold the rest as your `Tenant Name`.\n",
+    "8. To obtain your `Collection ID` and `Subsite ID`, you will need your **SharePoint** `site-name`. Your `SharePoint` site URL has the following format `https://<tenant-name>.sharepoint.com/sites/<site-name>`. The last part of this URL is the `site-name`.\n",
+    "9. To Get the Site `Collection ID`, hit this URL in the browser: `https://<tenant>.sharepoint.com/sites/<site-name>/_api/site/id` and copy the value of the `Edm.Guid` property.\n",
+    "10. To get the `Subsite ID` (or web ID) use: `https://<tenant>.sharepoint.com/<site-name>/_api/web/id` and copy the value of the `Edm.Guid` property.\n",
+    "11. The `SharePoint site ID` has the following format: `<tenant-name>.sharepoint.com,<Collection ID>,<subsite ID>`. You can hold that value to use in the next step.\n",
+    "12. Visit the [Graph Explorer Playground](https://developer.microsoft.com/en-us/graph/graph-explorer) to obtain your `Document Library ID`. The first step is to ensure you are logged in with the account associated with your **SharePoint** site. Then you need to make a request to `https://graph.microsoft.com/v1.0/sites/<SharePoint site ID>/drive` and the response will return a payload with a field `id` that holds the ID of your `Document Library ID`.\n",
+    "\n",
+    "## 🧑 Instructions for ingesting your documents from SharePoint Document Library\n",
+    "\n",
+    "### 🔑 Authentication\n",
+    "\n",
+    "By default, the `SharePointLoader` expects that the values of `CLIENT_ID` and `CLIENT_SECRET` must be stored as environment variables named `O365_CLIENT_ID` and `O365_CLIENT_SECRET` respectively. You could pass those environment variables through a `.env` file at the root of your application or using the following command in your script.\n",
+    "\n",
+    "```python\n",
+    "os.environ['O365_CLIENT_ID'] = \"YOUR CLIENT ID\"\n",
+    "os.environ['O365_CLIENT_SECRET'] = \"YOUR CLIENT SECRET\"\n",
+    "```\n",
+    "\n",
+    "This loader uses an authentication called [*on behalf of a user*](https://learn.microsoft.com/en-us/graph/auth-v2-user?context=graph%2Fapi%2F1.0&view=graph-rest-1.0). It is a 2 step authentication with user consent. When you instantiate the loader, it will call will print a url that the user must visit to give consent to the app on the required permissions. The user must then visit this url and give consent to the application. Then the user must copy the resulting page url and paste it back on the console. The method will then return True if the login attempt was succesful.\n",
+    "\n",
+    "```python\n",
+    "from langchain.document_loaders.sharepoint import SharePointLoader\n",
+    "\n",
+    "loader = SharePointLoader(document_library_id=\"YOUR DOCUMENT LIBRARY ID\")\n",
+    "```\n",
+    "\n",
+    "Once the authentication has been done, the loader will store a token (`o365_token.txt`) at `~/.credentials/` folder. This token could be used later to authenticate without the copy/paste steps explained earlier. To use this token for authentication, you need to change the `auth_with_token` parameter to True in the instantiation of the loader.\n",
+    "\n",
+    "```python\n",
+    "from langchain.document_loaders.sharepoint import SharePointLoader\n",
+    "\n",
+    "loader = SharePointLoader(document_library_id=\"YOUR DOCUMENT LIBRARY ID\", auth_with_token=True)\n",
+    "```\n",
+    "\n",
+    "### 🗂️ Documents loader\n",
+    "\n",
+    "#### 📑 Loading documents from a Document Library Directory\n",
+    "\n",
+    "`SharePointLoader` can load documents from a specific folder within your Document Library. For instance, you want to load all documents that are stored at `Documents/marketing` folder within your Document Library.\n",
+    "\n",
+    "```python\n",
+    "from langchain.document_loaders.sharepoint import SharePointLoader\n",
+    "\n",
+    "loader = SharePointLoader(document_library_id=\"YOUR DOCUMENT LIBRARY ID\", folder_path=\"Documents/marketing\", auth_with_token=True)\n",
+    "documents = loader.load()\n",
+    "```\n",
+    "\n",
+    "#### 📑 Loading documents from a list of Documents IDs\n",
+    "\n",
+    "Another possibility is to provide a list of `object_id` for each document you want to load. For that, you will need to query the [Microsoft Graph API](https://developer.microsoft.com/en-us/graph/graph-explorer) to find all the documents ID that you are interested in. This [link](https://learn.microsoft.com/en-us/graph/api/resources/onedrive?view=graph-rest-1.0#commonly-accessed-resources) provides a list of endpoints that will be helpful to retrieve the documents ID.\n",
+    "\n",
+    "For instance, to retrieve information about all objects that are stored at `data/finance/` folder, you need make a request to: `https://graph.microsoft.com/v1.0/drives/<document-library-id>/root:/data/finance:/children`. Once you have the list of IDs that you are interested in, then you can instantiate the loader with the following parameters.\n",
+    "\n",
+    "```python\n",
+    "from langchain.document_loaders.sharepoint import SharePointLoader\n",
+    "\n",
+    "loader = SharePointLoader(document_library_id=\"YOUR DOCUMENT LIBRARY ID\", object_ids=[\"ID_1\", \"ID_2\"], auth_with_token=True)\n",
+    "documents = loader.load()\n",
+    "```\n"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": ".venv",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.8.10"
+  },
+  "orig_nbformat": 4
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
--- a/docs/extras/integrations/document_loaders/pdf-amazonTextractPDFLoader.ipynb
+++ b/docs/extras/integrations/document_loaders/pdf-amazonTextractPDFLoader.ipynb
@ -0,0 +1,878 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "1f3cebbe-079a-4bfe-b1a1-07bdac882ce2",
+   "metadata": {},
+   "source": [
+    "# Amazon Textract \n",
+    "\n",
+    "Amazon Textract is a machine learning (ML) service that automatically extracts text, handwriting, and data from scanned documents. It goes beyond simple optical character recognition (OCR) to identify, understand, and extract data from forms and tables. Today, many companies manually extract data from scanned documents such as PDFs, images, tables, and forms, or through simple OCR software that requires manual configuration (which often must be updated when the form changes). To overcome these manual and expensive processes, Textract uses ML to read and process any type of document, accurately extracting text, handwriting, tables, and other data with no manual effort. You can quickly automate document processing and act on the information extracted, whether you’re automating loans processing or extracting information from invoices and receipts. Textract can extract the data in minutes instead of hours or days.\n",
+    "\n",
+    "This sample demonstrates the use of Amazon Textract in combination with LangChain as a DocumentLoader.\n",
+    "\n",
+    "Textract supports PDF, TIFF, PNG and JPEG format.\n",
+    "\n",
+    "Check https://docs.aws.amazon.com/textract/latest/dg/limits-document.html for supported document sizes, languages and characters."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "c049beaf-f904-4ce6-91ca-805da62084c2",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "\n",
+      "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m23.2\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m23.2.1\u001b[0m\n",
+      "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpython -m pip install --upgrade pip\u001b[0m\n"
+     ]
+    }
+   ],
+   "source": [
+    "!pip install langchain boto3 openai tiktoken python-dotenv -q"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "400b25c6-befa-4730-a201-39ff112c8858",
+   "metadata": {},
+   "source": [
+    "## Sample 1\n",
+    "\n",
+    "The first example uses a local file, which internally will be send to Amazon Textract sync API [DetectDocumentText](https://docs.aws.amazon.com/textract/latest/dg/API_DetectDocumentText.html). \n",
+    "\n",
+    "Local files or URL endpoints like HTTP:// are limited to one page documents for Textract.\n",
+    "Multi-page documents have to reside on S3. This sample file is a jpeg."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "id": "1becee92-e82f-42d4-9b4e-b23d77cbe88d",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "from langchain.document_loaders import AmazonTextractPDFLoader\n",
+    "loader = AmazonTextractPDFLoader(\"example_data/alejandro_rosalez_sample-small.jpeg\")\n",
+    "documents = loader.load()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d566dc56-c9a9-44ec-84fb-a81928f90d40",
+   "metadata": {},
+   "source": [
+    "Output from the file"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "id": "1272ce8c-d298-4059-ac0a-780bf5f82302",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "[Document(page_content='Patient Information First Name: ALEJANDRO Last Name: ROSALEZ Date of Birth: 10/10/1982 Sex: M Marital Status: MARRIED Email Address: Address: 123 ANY STREET City: ANYTOWN State: CA Zip Code: 12345 Phone: 646-555-0111 Emergency Contact 1: First Name: CARLOS Last Name: SALAZAR Phone: 212-555-0150 Relationship to Patient: BROTHER Emergency Contact 2: First Name: JANE Last Name: DOE Phone: 650-555-0123 Relationship FRIEND to Patient: Did you feel fever or feverish lately? Yes No Are you having shortness of breath? Yes No Do you have a cough? Yes No Did you experience loss of taste or smell? Yes No Where you in contact with any confirmed COVID-19 positive patients? Yes No Did you travel in the past 14 days to any regions affected by COVID-19? Yes No Patient Information First Name: ALEJANDRO Last Name: ROSALEZ Date of Birth: 10/10/1982 Sex: M Marital Status: MARRIED Email Address: Address: 123 ANY STREET City: ANYTOWN State: CA Zip Code: 12345 Phone: 646-555-0111 Emergency Contact 1: First Name: CARLOS Last Name: SALAZAR Phone: 212-555-0150 Relationship to Patient: BROTHER Emergency Contact 2: First Name: JANE Last Name: DOE Phone: 650-555-0123 Relationship FRIEND to Patient: Did you feel fever or feverish lately? Yes No Are you having shortness of breath? Yes No Do you have a cough? Yes No Did you experience loss of taste or smell? Yes No Where you in contact with any confirmed COVID-19 positive patients? Yes No Did you travel in the past 14 days to any regions affected by COVID-19? Yes No ', metadata={'source': 'example_data/alejandro_rosalez_sample-small.jpeg', 'page': 1})]"
+      ]
+     },
+     "execution_count": 10,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "documents"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4cf7f19c-3635-453a-9c76-4baf98b8d7f4",
+   "metadata": {},
+   "source": [
+    "## Sample 2\n",
+    "The next sample loads a file from an HTTPS endpoint. \n",
+    "It has to be single page, as Amazon Textract requires all multi-page documents to be stored on S3."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "id": "10374bfb-b325-451f-8bd0-c686710ab68c",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "from langchain.document_loaders import AmazonTextractPDFLoader\n",
+    "loader = AmazonTextractPDFLoader(\"https://amazon-textract-public-content.s3.us-east-2.amazonaws.com/langchain/alejandro_rosalez_sample_1.jpg\")\n",
+    "documents = loader.load()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "id": "16a2b6a3-7514-4c2c-a427-6847169af473",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "[Document(page_content='Patient Information First Name: ALEJANDRO Last Name: ROSALEZ Date of Birth: 10/10/1982 Sex: M Marital Status: MARRIED Email Address: Address: 123 ANY STREET City: ANYTOWN State: CA Zip Code: 12345 Phone: 646-555-0111 Emergency Contact 1: First Name: CARLOS Last Name: SALAZAR Phone: 212-555-0150 Relationship to Patient: BROTHER Emergency Contact 2: First Name: JANE Last Name: DOE Phone: 650-555-0123 Relationship FRIEND to Patient: Did you feel fever or feverish lately? Yes No Are you having shortness of breath? Yes No Do you have a cough? Yes No Did you experience loss of taste or smell? Yes No Where you in contact with any confirmed COVID-19 positive patients? Yes No Did you travel in the past 14 days to any regions affected by COVID-19? Yes No Patient Information First Name: ALEJANDRO Last Name: ROSALEZ Date of Birth: 10/10/1982 Sex: M Marital Status: MARRIED Email Address: Address: 123 ANY STREET City: ANYTOWN State: CA Zip Code: 12345 Phone: 646-555-0111 Emergency Contact 1: First Name: CARLOS Last Name: SALAZAR Phone: 212-555-0150 Relationship to Patient: BROTHER Emergency Contact 2: First Name: JANE Last Name: DOE Phone: 650-555-0123 Relationship FRIEND to Patient: Did you feel fever or feverish lately? Yes No Are you having shortness of breath? Yes No Do you have a cough? Yes No Did you experience loss of taste or smell? Yes No Where you in contact with any confirmed COVID-19 positive patients? Yes No Did you travel in the past 14 days to any regions affected by COVID-19? Yes No ', metadata={'source': 'example_data/alejandro_rosalez_sample-small.jpeg', 'page': 1})]"
+      ]
+     },
+     "execution_count": 11,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "documents"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3a9cd8ec-e663-4dc7-9db1-d2f575253141",
+   "metadata": {},
+   "source": [
+    "## Sample 3\n",
+    "\n",
+    "Processing a multi-page document requires the document to be on S3. The sample document resides in a bucket in us-east-2 and Textract needs to be called in that same region to be successful, so we set the region_name on the client and pass that in to the loader to ensure Textract is called from us-east-2. You could also to have your notebook running in us-east-2, setting the AWS_DEFAULT_REGION set to us-east-2 or when running in a different environment, pass in a boto3 Textract client with that region name like in the cell below."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "id": "8185e3e6-9599-4a47-8969-d6dcef3e6404",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "import boto3\n",
+    "textract_client = boto3.client('textract', region_name='us-east-2')\n",
+    "\n",
+    "file_path = \"s3://amazon-textract-public-content/langchain/layout-parser-paper.pdf\"\n",
+    "loader = AmazonTextractPDFLoader(file_path, client=textract_client)\n",
+    "documents = loader.load()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b8901eec-070d-4fd6-9d65-52211d332441",
+   "metadata": {},
+   "source": [
+    "Now getting the number of pages to validate the response (printing out the full response would be quite long...). We expect 16 pages."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "id": "b23c01c8-cf69-4fe2-8141-4621edb7d79c",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "16"
+      ]
+     },
+     "execution_count": 13,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "len(documents)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b3e41b4d-b159-4274-89be-80d8159134ef",
+   "metadata": {},
+   "source": [
+    "## Using the AmazonTextractPDFLoader in an LangChain chain (e. g. OpenAI)\n",
+    "\n",
+    "The AmazonTextractPDFLoader can be used in a chain the same way the other loaders are used.\n",
+    "Textract itself does have a [Query feature](https://docs.aws.amazon.com/textract/latest/dg/API_Query.html), which offers similar functionality to the QA chain in this sample, which is worth checking out as well."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 14,
+   "id": "53c47b24-cc06-4256-9e5b-a82fc80bc55d",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# You can store your OPENAI_API_KEY in a .env file as well\n",
+    "# import os \n",
+    "# from dotenv import load_dotenv\n",
+    "\n",
+    "# load_dotenv()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 15,
+   "id": "a9ae004c-246c-4c7f-8458-191cd7424a9b",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "# Or set the OpenAI key in the environment directly\n",
+    "import os \n",
+    "os.environ[\"OPENAI_API_KEY\"] = \"your-OpenAI-API-key\""
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 16,
+   "id": "d52b089c-10ca-45fb-8669-8a1c5fee10d5",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "' The authors are Zejiang Shen, Ruochen Zhang, Melissa Dell, Benjamin Charles Germain Lee, Jacob Carlson, Weining Li, Gardner, M., Grus, J., Neumann, M., Tafjord, O., Dasigi, P., Liu, N., Peters, M., Schmitz, M., Zettlemoyer, L., Lukasz Garncarek, Powalski, R., Stanislawek, T., Topolski, B., Halama, P., Gralinski, F., Graves, A., Fernández, S., Gomez, F., Schmidhuber, J., Harley, A.W., Ufkes, A., Derpanis, K.G., He, K., Gkioxari, G., Dollár, P., Girshick, R., He, K., Zhang, X., Ren, S., Sun, J., Kay, A., Lamiroy, B., Lopresti, D., Mears, J., Jakeway, E., Ferriter, M., Adams, C., Yarasavage, N., Thomas, D., Zwaard, K., Li, M., Cui, L., Huang,'"
+      ]
+     },
+     "execution_count": 16,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "from langchain.llms import OpenAI\n",
+    "from langchain.chains.question_answering import load_qa_chain\n",
+    "\n",
+    "chain = load_qa_chain(llm=OpenAI(), chain_type=\"map_reduce\")\n",
+    "query = [\"Who are the autors?\"]\n",
+    "\n",
+    "chain.run(input_documents=documents, question=query)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "1a09d18b-ab7b-468e-ae66-f92abf666b9b",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "availableInstances": [
+   {
+    "_defaultOrder": 0,
+    "_isFastLaunch": true,
+    "category": "General purpose",
+    "gpuNum": 0,
+    "hideHardwareSpecs": false,
+    "memoryGiB": 4,
+    "name": "ml.t3.medium",
+    "vcpuNum": 2
+   },
+   {
+    "_defaultOrder": 1,
+    "_isFastLaunch": false,
+    "category": "General purpose",
+    "gpuNum": 0,
+    "hideHardwareSpecs": false,
+    "memoryGiB": 8,
+    "name": "ml.t3.large",
+    "vcpuNum": 2
+   },
+   {
+    "_defaultOrder": 2,
+    "_isFastLaunch": false,
+    "category": "General purpose",
+    "gpuNum": 0,
+    "hideHardwareSpecs": false,
+    "memoryGiB": 16,
+    "name": "ml.t3.xlarge",
+    "vcpuNum": 4
+   },
+   {
+    "_defaultOrder": 3,
+    "_isFastLaunch": false,
+    "category": "General purpose",
+    "gpuNum": 0,
+    "hideHardwareSpecs": false,
+    "memoryGiB": 32,
+    "name": "ml.t3.2xlarge",
+    "vcpuNum": 8
+   },
+   {
+    "_defaultOrder": 4,
+    "_isFastLaunch": true,
+    "category": "General purpose",
+    "gpuNum": 0,
+    "hideHardwareSpecs": false,
+    "memoryGiB": 8,
+    "name": "ml.m5.large",
+    "vcpuNum": 2
+   },
+   {
+    "_defaultOrder": 5,
+    "_isFastLaunch": false,
+    "category": "General purpose",
+    "gpuNum": 0,
+    "hideHardwareSpecs": false,
+    "memoryGiB": 16,
+    "name": "ml.m5.xlarge",
+    "vcpuNum": 4
+   },
+   {
+    "_defaultOrder": 6,
+    "_isFastLaunch": false,
+    "category": "General purpose",
+    "gpuNum": 0,
+    "hideHardwareSpecs": false,
+    "memoryGiB": 32,
+    "name": "ml.m5.2xlarge",
+    "vcpuNum": 8
+   },
+   {
+    "_defaultOrder": 7,
+    "_isFastLaunch": false,
+    "category": "General purpose",
+    "gpuNum": 0,
+    "hideHardwareSpecs": false,
+    "memoryGiB": 64,
+    "name": "ml.m5.4xlarge",
+    "vcpuNum": 16
+   },
+   {
+    "_defaultOrder": 8,
+    "_isFastLaunch": false,
+    "category": "General purpose",
+    "gpuNum": 0,
+    "hideHardwareSpecs": false,
+    "memoryGiB": 128,
+    "name": "ml.m5.8xlarge",
+    "vcpuNum": 32
+   },
+   {
+    "_defaultOrder": 9,
+    "_isFastLaunch": false,
+    "category": "General purpose",
+    "gpuNum": 0,
+    "hideHardwareSpecs": false,
+    "memoryGiB": 192,
+    "name": "ml.m5.12xlarge",
+    "vcpuNum": 48
+   },
+   {
+    "_defaultOrder": 10,
+    "_isFastLaunch": false,
+    "category": "General purpose",
+    "gpuNum": 0,
+    "hideHardwareSpecs": false,
+    "memoryGiB": 256,
+    "name": "ml.m5.16xlarge",
+    "vcpuNum": 64
+   },
+   {
+    "_defaultOrder": 11,
+    "_isFastLaunch": false,
+    "category": "General purpose",
+    "gpuNum": 0,
+    "hideHardwareSpecs": false,
+    "memoryGiB": 384,
+    "name": "ml.m5.24xlarge",
+    "vcpuNum": 96
+   },
+   {
+    "_defaultOrder": 12,
+    "_isFastLaunch": false,
+    "category": "General purpose",
+    "gpuNum": 0,
+    "hideHardwareSpecs": false,
+    "memoryGiB": 8,
+    "name": "ml.m5d.large",
+    "vcpuNum": 2
+   },
+   {
+    "_defaultOrder": 13,
+    "_isFastLaunch": false,
+    "category": "General purpose",
+    "gpuNum": 0,
+    "hideHardwareSpecs": false,
+    "memoryGiB": 16,
+    "name": "ml.m5d.xlarge",
+    "vcpuNum": 4
+   },
+   {
+    "_defaultOrder": 14,
+    "_isFastLaunch": false,
+    "category": "General purpose",
+    "gpuNum": 0,
+    "hideHardwareSpecs": false,
+    "memoryGiB": 32,
+    "name": "ml.m5d.2xlarge",
+    "vcpuNum": 8
+   },
+   {
+    "_defaultOrder": 15,
+    "_isFastLaunch": false,
+    "category": "General purpose",
+    "gpuNum": 0,
+    "hideHardwareSpecs": false,
+    "memoryGiB": 64,
+    "name": "ml.m5d.4xlarge",
+    "vcpuNum": 16
+   },
+   {
+    "_defaultOrder": 16,
+    "_isFastLaunch": false,
+    "category": "General purpose",
+    "gpuNum": 0,
+    "hideHardwareSpecs": false,
+    "memoryGiB": 128,
+    "name": "ml.m5d.8xlarge",
+    "vcpuNum": 32
+   },
+   {
+    "_defaultOrder": 17,
+    "_isFastLaunch": false,
+    "category": "General purpose",
+    "gpuNum": 0,
+    "hideHardwareSpecs": false,
+    "memoryGiB": 192,
+    "name": "ml.m5d.12xlarge",
+    "vcpuNum": 48
+   },
+   {
+    "_defaultOrder": 18,
+    "_isFastLaunch": false,
+    "category": "General purpose",
+    "gpuNum": 0,
+    "hideHardwareSpecs": false,
+    "memoryGiB": 256,
+    "name": "ml.m5d.16xlarge",
+    "vcpuNum": 64
+   },
+   {
+    "_defaultOrder": 19,
+    "_isFastLaunch": false,
+    "category": "General purpose",
+    "gpuNum": 0,
+    "hideHardwareSpecs": false,
+    "memoryGiB": 384,
+    "name": "ml.m5d.24xlarge",
+    "vcpuNum": 96
+   },
+   {
+    "_defaultOrder": 20,
+    "_isFastLaunch": false,
+    "category": "General purpose",
+    "gpuNum": 0,
+    "hideHardwareSpecs": true,
+    "memoryGiB": 0,
+    "name": "ml.geospatial.interactive",
+    "supportedImageNames": [
+     "sagemaker-geospatial-v1-0"
+    ],
+    "vcpuNum": 0
+   },
+   {
+    "_defaultOrder": 21,
+    "_isFastLaunch": true,
+    "category": "Compute optimized",
+    "gpuNum": 0,
+    "hideHardwareSpecs": false,
+    "memoryGiB": 4,
+    "name": "ml.c5.large",
+    "vcpuNum": 2
+   },
+   {
+    "_defaultOrder": 22,
+    "_isFastLaunch": false,
+    "category": "Compute optimized",
+    "gpuNum": 0,
+    "hideHardwareSpecs": false,
+    "memoryGiB": 8,
+    "name": "ml.c5.xlarge",
+    "vcpuNum": 4
+   },
+   {
+    "_defaultOrder": 23,
+    "_isFastLaunch": false,
+    "category": "Compute optimized",
+    "gpuNum": 0,
+    "hideHardwareSpecs": false,
+    "memoryGiB": 16,
+    "name": "ml.c5.2xlarge",
+    "vcpuNum": 8
+   },
+   {
+    "_defaultOrder": 24,
+    "_isFastLaunch": false,
+    "category": "Compute optimized",
+    "gpuNum": 0,
+    "hideHardwareSpecs": false,
+    "memoryGiB": 32,
+    "name": "ml.c5.4xlarge",
+    "vcpuNum": 16
+   },
+   {
+    "_defaultOrder": 25,
+    "_isFastLaunch": false,
+    "category": "Compute optimized",
+    "gpuNum": 0,
+    "hideHardwareSpecs": false,
+    "memoryGiB": 72,
+    "name": "ml.c5.9xlarge",
+    "vcpuNum": 36
+   },
+   {
+    "_defaultOrder": 26,
+    "_isFastLaunch": false,
+    "category": "Compute optimized",
+    "gpuNum": 0,
+    "hideHardwareSpecs": false,
+    "memoryGiB": 96,
+    "name": "ml.c5.12xlarge",
+    "vcpuNum": 48
+   },
+   {
+    "_defaultOrder": 27,
+    "_isFastLaunch": false,
+    "category": "Compute optimized",
+    "gpuNum": 0,
+    "hideHardwareSpecs": false,
+    "memoryGiB": 144,
+    "name": "ml.c5.18xlarge",
+    "vcpuNum": 72
+   },
+   {
+    "_defaultOrder": 28,
+    "_isFastLaunch": false,
+    "category": "Compute optimized",
+    "gpuNum": 0,
+    "hideHardwareSpecs": false,
+    "memoryGiB": 192,
+    "name": "ml.c5.24xlarge",
+    "vcpuNum": 96
+   },
+   {
+    "_defaultOrder": 29,
+    "_isFastLaunch": true,
+    "category": "Accelerated computing",
+    "gpuNum": 1,
+    "hideHardwareSpecs": false,
+    "memoryGiB": 16,
+    "name": "ml.g4dn.xlarge",
+    "vcpuNum": 4
+   },
+   {
+    "_defaultOrder": 30,
+    "_isFastLaunch": false,
+    "category": "Accelerated computing",
+    "gpuNum": 1,
+    "hideHardwareSpecs": false,
+    "memoryGiB": 32,
+    "name": "ml.g4dn.2xlarge",
+    "vcpuNum": 8
+   },
+   {
+    "_defaultOrder": 31,
+    "_isFastLaunch": false,
+    "category": "Accelerated computing",
+    "gpuNum": 1,
+    "hideHardwareSpecs": false,
+    "memoryGiB": 64,
+    "name": "ml.g4dn.4xlarge",
+    "vcpuNum": 16
+   },
+   {
+    "_defaultOrder": 32,
+    "_isFastLaunch": false,
+    "category": "Accelerated computing",
+    "gpuNum": 1,
+    "hideHardwareSpecs": false,
+    "memoryGiB": 128,
+    "name": "ml.g4dn.8xlarge",
+    "vcpuNum": 32
+   },
+   {
+    "_defaultOrder": 33,
+    "_isFastLaunch": false,
+    "category": "Accelerated computing",
+    "gpuNum": 4,
+    "hideHardwareSpecs": false,
+    "memoryGiB": 192,
+    "name": "ml.g4dn.12xlarge",
+    "vcpuNum": 48
+   },
+   {
+    "_defaultOrder": 34,
+    "_isFastLaunch": false,
+    "category": "Accelerated computing",
+    "gpuNum": 1,
+    "hideHardwareSpecs": false,
+    "memoryGiB": 256,
+    "name": "ml.g4dn.16xlarge",
+    "vcpuNum": 64
+   },
+   {
+    "_defaultOrder": 35,
+    "_isFastLaunch": false,
+    "category": "Accelerated computing",
+    "gpuNum": 1,
+    "hideHardwareSpecs": false,
+    "memoryGiB": 61,
+    "name": "ml.p3.2xlarge",
+    "vcpuNum": 8
+   },
+   {
+    "_defaultOrder": 36,
+    "_isFastLaunch": false,
+    "category": "Accelerated computing",
+    "gpuNum": 4,
+    "hideHardwareSpecs": false,
+    "memoryGiB": 244,
+    "name": "ml.p3.8xlarge",
+    "vcpuNum": 32
+   },
+   {
+    "_defaultOrder": 37,
+    "_isFastLaunch": false,
+    "category": "Accelerated computing",
+    "gpuNum": 8,
+    "hideHardwareSpecs": false,
+    "memoryGiB": 488,
+    "name": "ml.p3.16xlarge",
+    "vcpuNum": 64
+   },
+   {
+    "_defaultOrder": 38,
+    "_isFastLaunch": false,
+    "category": "Accelerated computing",
+    "gpuNum": 8,
+    "hideHardwareSpecs": false,
+    "memoryGiB": 768,
+    "name": "ml.p3dn.24xlarge",
+    "vcpuNum": 96
+   },
+   {
+    "_defaultOrder": 39,
+    "_isFastLaunch": false,
+    "category": "Memory Optimized",
+    "gpuNum": 0,
+    "hideHardwareSpecs": false,
+    "memoryGiB": 16,
+    "name": "ml.r5.large",
+    "vcpuNum": 2
+   },
+   {
+    "_defaultOrder": 40,
+    "_isFastLaunch": false,
+    "category": "Memory Optimized",
+    "gpuNum": 0,
+    "hideHardwareSpecs": false,
+    "memoryGiB": 32,
+    "name": "ml.r5.xlarge",
+    "vcpuNum": 4
+   },
+   {
+    "_defaultOrder": 41,
+    "_isFastLaunch": false,
+    "category": "Memory Optimized",
+    "gpuNum": 0,
+    "hideHardwareSpecs": false,
+    "memoryGiB": 64,
+    "name": "ml.r5.2xlarge",
+    "vcpuNum": 8
+   },
+   {
+    "_defaultOrder": 42,
+    "_isFastLaunch": false,
+    "category": "Memory Optimized",
+    "gpuNum": 0,
+    "hideHardwareSpecs": false,
+    "memoryGiB": 128,
+    "name": "ml.r5.4xlarge",
+    "vcpuNum": 16
+   },
+   {
+    "_defaultOrder": 43,
+    "_isFastLaunch": false,
+    "category": "Memory Optimized",
+    "gpuNum": 0,
+    "hideHardwareSpecs": false,
+    "memoryGiB": 256,
+    "name": "ml.r5.8xlarge",
+    "vcpuNum": 32
+   },
+   {
+    "_defaultOrder": 44,
+    "_isFastLaunch": false,
+    "category": "Memory Optimized",
+    "gpuNum": 0,
+    "hideHardwareSpecs": false,
+    "memoryGiB": 384,
+    "name": "ml.r5.12xlarge",
+    "vcpuNum": 48
+   },
+   {
+    "_defaultOrder": 45,
+    "_isFastLaunch": false,
+    "category": "Memory Optimized",
+    "gpuNum": 0,
+    "hideHardwareSpecs": false,
+    "memoryGiB": 512,
+    "name": "ml.r5.16xlarge",
+    "vcpuNum": 64
+   },
+   {
+    "_defaultOrder": 46,
+    "_isFastLaunch": false,
+    "category": "Memory Optimized",
+    "gpuNum": 0,
+    "hideHardwareSpecs": false,
+    "memoryGiB": 768,
+    "name": "ml.r5.24xlarge",
+    "vcpuNum": 96
+   },
+   {
+    "_defaultOrder": 47,
+    "_isFastLaunch": false,
+    "category": "Accelerated computing",
+    "gpuNum": 1,
+    "hideHardwareSpecs": false,
+    "memoryGiB": 16,
+    "name": "ml.g5.xlarge",
+    "vcpuNum": 4
+   },
+   {
+    "_defaultOrder": 48,
+    "_isFastLaunch": false,
+    "category": "Accelerated computing",
+    "gpuNum": 1,
+    "hideHardwareSpecs": false,
+    "memoryGiB": 32,
+    "name": "ml.g5.2xlarge",
+    "vcpuNum": 8
+   },
+   {
+    "_defaultOrder": 49,
+    "_isFastLaunch": false,
+    "category": "Accelerated computing",
+    "gpuNum": 1,
+    "hideHardwareSpecs": false,
+    "memoryGiB": 64,
+    "name": "ml.g5.4xlarge",
+    "vcpuNum": 16
+   },
+   {
+    "_defaultOrder": 50,
+    "_isFastLaunch": false,
+    "category": "Accelerated computing",
+    "gpuNum": 1,
+    "hideHardwareSpecs": false,
+    "memoryGiB": 128,
+    "name": "ml.g5.8xlarge",
+    "vcpuNum": 32
+   },
+   {
+    "_defaultOrder": 51,
+    "_isFastLaunch": false,
+    "category": "Accelerated computing",
+    "gpuNum": 1,
+    "hideHardwareSpecs": false,
+    "memoryGiB": 256,
+    "name": "ml.g5.16xlarge",
+    "vcpuNum": 64
+   },
+   {
+    "_defaultOrder": 52,
+    "_isFastLaunch": false,
+    "category": "Accelerated computing",
+    "gpuNum": 4,
+    "hideHardwareSpecs": false,
+    "memoryGiB": 192,
+    "name": "ml.g5.12xlarge",
+    "vcpuNum": 48
+   },
+   {
+    "_defaultOrder": 53,
+    "_isFastLaunch": false,
+    "category": "Accelerated computing",
+    "gpuNum": 4,
+    "hideHardwareSpecs": false,
+    "memoryGiB": 384,
+    "name": "ml.g5.24xlarge",
+    "vcpuNum": 96
+   },
+   {
+    "_defaultOrder": 54,
+    "_isFastLaunch": false,
+    "category": "Accelerated computing",
+    "gpuNum": 8,
+    "hideHardwareSpecs": false,
+    "memoryGiB": 768,
+    "name": "ml.g5.48xlarge",
+    "vcpuNum": 192
+   },
+   {
+    "_defaultOrder": 55,
+    "_isFastLaunch": false,
+    "category": "Accelerated computing",
+    "gpuNum": 8,
+    "hideHardwareSpecs": false,
+    "memoryGiB": 1152,
+    "name": "ml.p4d.24xlarge",
+    "vcpuNum": 96
+   },
+   {
+    "_defaultOrder": 56,
+    "_isFastLaunch": false,
+    "category": "Accelerated computing",
+    "gpuNum": 8,
+    "hideHardwareSpecs": false,
+    "memoryGiB": 1152,
+    "name": "ml.p4de.24xlarge",
+    "vcpuNum": 96
+   }
+  ],
+  "instance_type": "ml.t3.medium",
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.11.1"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
--- a/docs/extras/integrations/document_loaders/unstructured_file.ipynb
+++ b/docs/extras/integrations/document_loaders/unstructured_file.ipynb
@ -299,7 +299,7 @@
   "id": "1cf27fc8",
   "metadata": {},
   "source": [
-    "If you need to post process the `unstructured` elements after extraction, you can pass in a list of `Element` -> `Element` functions to the `post_processors` kwarg when you instantiate the `UnstructuredFileLoader`. This applies to other Unstructured loaders as well. Below is an example. Post processors are only applied if you run the loader in `\"elements\"` mode."
+    "If you need to post process the `unstructured` elements after extraction, you can pass in a list of `str` -> `str` functions to the `post_processors` kwarg when you instantiate the `UnstructuredFileLoader`. This applies to other Unstructured loaders as well. Below is an example."
   ]
  },
  {
@ -495,7 +495,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.8.13"
+   "version": "3.8.10"
  }
 },
 "nbformat": 4,
--- a/docs/extras/integrations/llms/bittensor.ipynb
+++ b/docs/extras/integrations/llms/bittensor.ipynb
@ -10,7 +10,7 @@
    "\n",
    "This LLM showcases true potential of decentralized AI by giving you the best response(s) from the Bittensor protocol, which consist of various AI models such as OpenAI, LLaMA2 etc.\n",
    "\n",
-    "Users can view their logs, requests, and API keys on the [Validator Endpoint Frontend](https://api.neuralinterent.ai/). However, changes to the configuration are currently prohibited; otherwise, the user's queries will be blocked.\n",
+    "Users can view their logs, requests, and API keys on the [Validator Endpoint Frontend](https://api.neuralinternet.ai/). However, changes to the configuration are currently prohibited; otherwise, the user's queries will be blocked.\n",
    "\n",
    "If you encounter any difficulties or have any questions, please feel free to reach out to our developer on [GitHub](https://github.com/Kunj-2206), [Discord](https://discordapp.com/users/683542109248159777) or join our discord server for latest update and queries [Neural Internet](https://discord.gg/neuralinternet).\n"
   ]
--- a/docs/extras/integrations/llms/huggingface_hub.ipynb
+++ b/docs/extras/integrations/llms/huggingface_hub.ipynb
@ -32,7 +32,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 1,
+   "execution_count": null,
   "id": "d772b637-de00-4663-bd77-9bc96d798db2",
   "metadata": {
    "tags": []
@ -135,7 +135,7 @@
   "id": "4c16fded-70d1-42af-8bfa-6ddda9f0bc63",
   "metadata": {},
   "source": [
-    "### Flan, by Google"
+    "### `Flan`, by `Google`"
   ]
  },
  {
@ -178,7 +178,7 @@
   "id": "1a5c97af-89bc-4e59-95c1-223742a9160b",
   "metadata": {},
   "source": [
-    "### Dolly, by Databricks\n",
+    "### `Dolly`, by `Databricks`\n",
    "\n",
    "See [Databricks](https://huggingface.co/databricks) organization page for a list of available models."
   ]
@ -225,14 +225,14 @@
   "id": "03f6ae52-b5f9-4de6-832c-551cb3fa11ae",
   "metadata": {},
   "source": [
-    "### Camel, by Writer\n",
+    "### `Camel`, by `Writer`\n",
    "\n",
    "See [Writer's](https://huggingface.co/Writer) organization page for a list of available models."
   ]
  },
  {
   "cell_type": "code",
-   "execution_count": 13,
+   "execution_count": 11,
   "id": "257a091d-750b-4910-ac08-fe1c7b3fd98b",
   "metadata": {
    "tags": []
@ -261,7 +261,7 @@
   "id": "2bf838eb-1083-402f-b099-b07c452418c8",
   "metadata": {},
   "source": [
-    "### XGen, by Salesforce\n",
+    "### `XGen`, by `Salesforce`\n",
    "\n",
    "See [more information](https://github.com/salesforce/xgen)."
   ]
@ -295,7 +295,7 @@
   "id": "0aca9f9e-f333-449c-97b2-10d1dbf17e75",
   "metadata": {},
   "source": [
-    "### Falcon, by Technology Innovation Institute (TII)\n",
+    "### `Falcon`, by `Technology Innovation Institute (TII)`\n",
    "\n",
    "See [more information](https://huggingface.co/tiiuae/falcon-40b)."
   ]
@ -323,6 +323,86 @@
    "llm_chain = LLMChain(prompt=prompt, llm=llm)\n",
    "print(llm_chain.run(question))"
   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7e15849b-5561-4bb9-86ec-6412ca10196a",
+   "metadata": {},
+   "source": [
+    "### `InternLM-Chat`, by `Shanghai AI Laboratory`\n",
+    "\n",
+    "See [more information](https://huggingface.co/internlm/internlm-7b)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 19,
+   "id": "3b533461-59f8-406e-907b-000841fa60a7",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "repo_id = \"internlm/internlm-chat-7b\""
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "c71210b9-5895-41a2-889a-f430d22fa1aa",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "llm = HuggingFaceHub(\n",
+    "    repo_id=repo_id, model_kwargs={\"max_length\": 128, \"temperature\": 0.8}\n",
+    ")\n",
+    "llm_chain = LLMChain(prompt=prompt, llm=llm)\n",
+    "print(llm_chain.run(question))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4f2e5132-1713-42d7-919a-8c313744ce95",
+   "metadata": {},
+   "source": [
+    "### `Qwen`, by `Alibaba Cloud`\n",
+    "\n",
+    ">`Tongyi Qianwen-7B` (`Qwen-7B`) is a model with a scale of 7 billion parameters in the `Tongyi Qianwen` large model series developed by `Alibaba Cloud`. `Qwen-7B` is a large language model based on Transformer, which is trained on ultra-large-scale pre-training data.\n",
+    "\n",
+    "See [more information on HuggingFace](https://huggingface.co/Qwen/Qwen-7B) of on [GitHub](https://github.com/QwenLM/Qwen-7B).\n",
+    "\n",
+    "See here a [big example for LangChain integration and Qwen](https://github.com/QwenLM/Qwen-7B/blob/main/examples/langchain_tooluse.ipynb)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "id": "f598b1ca-77c7-40f1-a83f-c21ea9910c88",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "repo_id = \"Qwen/Qwen-7B\""
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "2c97f4e2-d401-44fb-9da7-b60b2e2cc663",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "llm = HuggingFaceHub(\n",
+    "    repo_id=repo_id, model_kwargs={\"max_length\": 128, \"temperature\": 0.5}\n",
+    ")\n",
+    "llm_chain = LLMChain(prompt=prompt, llm=llm)\n",
+    "print(llm_chain.run(question))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "1dd67c1e-1efc-4def-bde4-2e5265725303",
+   "metadata": {},
+   "outputs": [],
+   "source": []
  }
 ],
 "metadata": {
@ -341,7 +421,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.10.6"
+   "version": "3.10.12"
  }
 },
 "nbformat": 4,
--- a/docs/extras/integrations/llms/huggingface_pipelines.ipynb
+++ b/docs/extras/integrations/llms/huggingface_pipelines.ipynb
@ -21,19 +21,19 @@
    "tags": []
   },
   "source": [
-    "To use, you should have the ``transformers`` python [package installed](https://pypi.org/project/transformers/)."
+    "To use, you should have the ``transformers`` python [package installed](https://pypi.org/project/transformers/), as well as [pytorch](https://pytorch.org/get-started/locally/). You can also install `xformer` for a more memory-efficient attention implementation."
   ]
  },
  {
   "cell_type": "code",
-   "execution_count": 1,
+   "execution_count": null,
   "id": "d772b637-de00-4663-bd77-9bc96d798db2",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
-    "!pip install transformers > /dev/null"
+    "%pip install transformers --quiet"
   ]
  },
  {
@ -46,22 +46,14 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 2,
+   "execution_count": 6,
   "id": "165ae236-962a-4763-8052-c4836d78a5d2",
   "metadata": {
    "tags": []
   },
-   "outputs": [
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "WARNING:root:Failed to default session, using empty session: HTTPConnectionPool(host='localhost', port=8000): Max retries exceeded with url: /sessions (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x1117f9790>: Failed to establish a new connection: [Errno 61] Connection refused'))\n"
-     ]
-    }
-   ],
+   "outputs": [],
   "source": [
-    "from langchain import HuggingFacePipeline\n",
+    "from langchain.llms import HuggingFacePipeline\n",
    "\n",
    "llm = HuggingFacePipeline.from_model_id(\n",
    "    model_id=\"bigscience/bloom-1b7\",\n",
@ -75,24 +67,18 @@
   "id": "00104b27-0c15-4a97-b198-4512337ee211",
   "metadata": {},
   "source": [
-    "### Integrate the model in an LLMChain"
+    "### Create Chain\n",
+    "\n",
+    "With the model loaded into memory, you can compose it with a prompt to\n",
+    "form a chain."
   ]
  },
  {
   "cell_type": "code",
-   "execution_count": 3,
+   "execution_count": 7,
   "id": "3acf0069",
   "metadata": {},
   "outputs": [
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "/Users/wfh/code/lc/lckg/.venv/lib/python3.11/site-packages/transformers/generation/utils.py:1288: UserWarning: Using `max_length`'s default (64) to control the generation length. This behaviour is deprecated and will be removed from the config in v5 of Transformers -- we recommend using `max_new_tokens` to control the maximum length of the generation.\n",
-      "  warnings.warn(\n",
-      "WARNING:root:Failed to persist run: HTTPConnectionPool(host='localhost', port=8000): Max retries exceeded with url: /chain-runs (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x144d06910>: Failed to establish a new connection: [Errno 61] Connection refused'))\n"
-     ]
-    },
    {
     "name": "stdout",
     "output_type": "stream",
@ -102,27 +88,19 @@
    }
   ],
   "source": [
-    "from langchain import PromptTemplate, LLMChain\n",
+    "from langchain.prompts import PromptTemplate\n",
    "\n",
    "template = \"\"\"Question: {question}\n",
    "\n",
    "Answer: Let's think step by step.\"\"\"\n",
-    "prompt = PromptTemplate(template=template, input_variables=[\"question\"])\n",
+    "prompt = PromptTemplate.from_template(template)\n",
    "\n",
-    "llm_chain = LLMChain(prompt=prompt, llm=llm)\n",
+    "chain = prompt | llm\n",
    "\n",
    "question = \"What is electroencephalography?\"\n",
    "\n",
-    "print(llm_chain.run(question))"
+    "print(chain.invoke({\"question\": question}))"
   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "843a3837",
-   "metadata": {},
-   "outputs": [],
-   "source": []
  }
 ],
 "metadata": {
--- a/docs/extras/integrations/llms/promptguard.ipynb
+++ b/docs/extras/integrations/llms/promptguard.ipynb
@ -0,0 +1,214 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# PromptGuard\n",
+    "\n",
+    "[PromptGuard](https://promptguard.readthedocs.io/en/latest/) is a service that enables applications to leverage the power of language models without compromising user privacy. Designed for composability and ease of integration into existing applications and services, PromptGuard is consumable via a simple Python library as well as through LangChain. Perhaps more importantly, PromptGuard leverages the power of [confidential computing](https://en.wikipedia.org/wiki/Confidential_computing) to ensure that even the PromptGuard service itself cannot access the data it is protecting.\n",
+    " \n",
+    "\n",
+    "This notebook goes over how to use LangChain to interact with `PromptGuard`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# install the promptguard and langchain packages\n",
+    "! pip install promptguard langchain"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Accessing the PromptGuard API requires an API key, which you can get by creating an account on [the PromptGuard website](https://promptguard.opaque.co/). Once you have an account, you can find your API key on [the API Keys page](https://promptguard.opaque.co/api-keys)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import os\n",
+    "\n",
+    "# Set API keys\n",
+    "\n",
+    "os.environ['PROMPT_GUARD_API_KEY'] = \"<PROMPT_GUARD_API_KEY>\"\n",
+    "os.environ['OPENAI_API_KEY'] = \"<OPENAI_API_KEY>\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Use PromptGuardLLMWrapper\n",
+    "\n",
+    "Applying promptguard to your application could be as simple as wrapping your LLM using the PromptGuardLLMWrapper class by replace `llm=OpenAI()` with `llm=PromptGuardLLMWrapper(OpenAI())`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import langchain\n",
+    "from langchain import LLMChain, PromptTemplate\n",
+    "from langchain.callbacks.stdout import StdOutCallbackHandler\n",
+    "from langchain.llms import OpenAI\n",
+    "from langchain.memory import ConversationBufferWindowMemory\n",
+    "\n",
+    "from langchain.llms import PromptGuardLLMWrapper\n",
+    "\n",
+    "langchain.verbose = True\n",
+    "langchain.debug = True\n",
+    "\n",
+    "prompt_template = \"\"\"\n",
+    "As an AI assistant, you will answer questions according to given context.\n",
+    "\n",
+    "Sensitive personal information in the question is masked for privacy.\n",
+    "For instance, if the original text says \"Giana is good,\" it will be changed\n",
+    "to \"PERSON_998 is good.\" \n",
+    "\n",
+    "Here's how to handle these changes:\n",
+    "* Consider these masked phrases just as placeholders, but still refer to\n",
+    "them in a relevant way when answering.\n",
+    "* It's possible that different masked terms might mean the same thing.\n",
+    "Stick with the given term and don't modify it.\n",
+    "* All masked terms follow the \"TYPE_ID\" pattern.\n",
+    "* Please don't invent new masked terms. For instance, if you see \"PERSON_998,\"\n",
+    "don't come up with \"PERSON_997\" or \"PERSON_999\" unless they're already in the question.\n",
+    "\n",
+    "Conversation History: ```{history}```\n",
+    "Context : ```During our recent meeting on February 23, 2023, at 10:30 AM,\n",
+    "John Doe provided me with his personal details. His email is johndoe@example.com\n",
+    "and his contact number is 650-456-7890. He lives in New York City, USA, and\n",
+    "belongs to the American nationality with Christian beliefs and a leaning towards\n",
+    "the Democratic party. He mentioned that he recently made a transaction using his\n",
+    "credit card 4111 1111 1111 1111 and transferred bitcoins to the wallet address\n",
+    "1A1zP1eP5QGefi2DMPTfTL5SLmv7DivfNa. While discussing his European travels, he noted\n",
+    "down his IBAN as GB29 NWBK 6016 1331 9268 19. Additionally, he provided his website\n",
+    "as https://johndoeportfolio.com. John also discussed some of his US-specific details.\n",
+    "He said his bank account number is 1234567890123456 and his drivers license is Y12345678.\n",
+    "His ITIN is 987-65-4321, and he recently renewed his passport, the number for which is\n",
+    "123456789. He emphasized not to share his SSN, which is 123-45-6789. Furthermore, he\n",
+    "mentioned that he accesses his work files remotely through the IP 192.168.1.1 and has\n",
+    "a medical license number MED-123456. ```\n",
+    "Question: ```{question}```\n",
+    "\n",
+    "\"\"\"\n",
+    "\n",
+    "chain = LLMChain(\n",
+    "    prompt=PromptTemplate.from_template(prompt_template),\n",
+    "    llm=PromptGuardLLMWrapper(llm=OpenAI()),\n",
+    "    memory=ConversationBufferWindowMemory(k=2),\n",
+    "    verbose=True,\n",
+    ")\n",
+    "\n",
+    "\n",
+    "print(\n",
+    "    chain.run(\n",
+    "        {\"question\": \"\"\"Write a message to remind John to do password reset for his website to stay secure.\"\"\"},\n",
+    "        callbacks=[StdOutCallbackHandler()],\n",
+    "    )\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "From the output, you can see the following context from user input has sensitive data.\n",
+    "\n",
+    "``` \n",
+    "# Context from user input\n",
+    "\n",
+    "During our recent meeting on February 23, 2023, at 10:30 AM, John Doe provided me with his personal details. His email is johndoe@example.com and his contact number is 650-456-7890. He lives in New York City, USA, and belongs to the American nationality with Christian beliefs and a leaning towards the Democratic party. He mentioned that he recently made a transaction using his credit card 4111 1111 1111 1111 and transferred bitcoins to the wallet address 1A1zP1eP5QGefi2DMPTfTL5SLmv7DivfNa. While discussing his European travels, he noted down his IBAN as GB29 NWBK 6016 1331 9268 19. Additionally, he provided his website as https://johndoeportfolio.com. John also discussed some of his US-specific details. He said his bank account number is 1234567890123456 and his drivers license is Y12345678. His ITIN is 987-65-4321, and he recently renewed his passport, the number for which is 123456789. He emphasized not to share his SSN, which is 669-45-6789. Furthermore, he mentioned that he accesses his work files remotely through the IP 192.168.1.1 and has a medical license number MED-123456.\n",
+    "```\n",
+    "\n",
+    "PromptGuard will automatically detect the sensitive data and replace it with a placeholder. \n",
+    "\n",
+    "```\n",
+    "# Context after PromptGuard\n",
+    "\n",
+    "During our recent meeting on DATE_TIME_3, at DATE_TIME_2, PERSON_3 provided me with his personal details. His email is EMAIL_ADDRESS_1 and his contact number is PHONE_NUMBER_1. He lives in LOCATION_3, LOCATION_2, and belongs to the NRP_3 nationality with NRP_2 beliefs and a leaning towards the Democratic party. He mentioned that he recently made a transaction using his credit card CREDIT_CARD_1 and transferred bitcoins to the wallet address CRYPTO_1. While discussing his NRP_1 travels, he noted down his IBAN as IBAN_CODE_1. Additionally, he provided his website as URL_1. PERSON_2 also discussed some of his LOCATION_1-specific details. He said his bank account number is US_BANK_NUMBER_1 and his drivers license is US_DRIVER_LICENSE_2. His ITIN is US_ITIN_1, and he recently renewed his passport, the number for which is DATE_TIME_1. He emphasized not to share his SSN, which is US_SSN_1. Furthermore, he mentioned that he accesses his work files remotely through the IP IP_ADDRESS_1 and has a medical license number MED-US_DRIVER_LICENSE_1.\n",
+    "```\n",
+    "\n",
+    "Placeholder is used in the LLM response.\n",
+    "\n",
+    "```\n",
+    "# response returned by LLM\n",
+    "\n",
+    "Hey PERSON_1, just wanted to remind you to do a password reset for your website URL_1 through your email EMAIL_ADDRESS_1. It's important to stay secure online, so don't forget to do it!\n",
+    "```\n",
+    "\n",
+    "Response is desanitized by replacing the placeholder with the original sensitive data.\n",
+    "\n",
+    "```\n",
+    "# desanitized LLM response from PromptGuard\n",
+    "\n",
+    "Hey John, just wanted to remind you to do a password reset for your website https://johndoeportfolio.com through your email johndoe@example.com. It's important to stay secure online, so don't forget to do it!\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Use PromptGuard in LangChain expression\n",
+    "\n",
+    "There are functions that can be used with LangChain expression as well if a drop-in replacement doesn't offer the flexibility you need. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import langchain.utilities.promptguard as pgf\n",
+    "from langchain.schema.runnable import RunnableMap\n",
+    "from langchain.schema.output_parser import StrOutputParser\n",
+    "\n",
+    "\n",
+    "prompt=PromptTemplate.from_template(prompt_template),    \n",
+    "llm = OpenAI()\n",
+    "pg_chain = (\n",
+    "    pgf.sanitize\n",
+    "    | RunnableMap(\n",
+    "        {\n",
+    "            \"response\": (lambda x: x[\"sanitized_input\"])\n",
+    "            | prompt\n",
+    "            | llm\n",
+    "            | StrOutputParser(),\n",
+    "            \"secure_context\": lambda x: x[\"secure_context\"],\n",
+    "        }\n",
+    "    )\n",
+    "    | (lambda x: pgf.desanitize(x[\"response\"], x[\"secure_context\"]))\n",
+    ")\n",
+    "\n",
+    "pg_chain.invoke({\"question\": \"Write a text message to remind John to do password reset for his website through his email to stay secure.\", \"history\": \"\"})"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "langchain",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "name": "python",
+   "version": "3.10.10"
+  },
+  "orig_nbformat": 4
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
--- a/docs/extras/integrations/llms/textgen.ipynb
+++ b/docs/extras/integrations/llms/textgen.ipynb
@ -26,7 +26,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 1,
+   "execution_count": null,
   "metadata": {
    "tags": []
   },
@ -61,6 +61,71 @@
    "\n",
    "llm_chain.run(question)"
   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Streaming Version"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "You should install websocket-client to use this feature.\n",
+    "`pip install websocket-client`"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "model_url = \"ws://localhost:5005\""
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import langchain\n",
+    "from langchain import PromptTemplate, LLMChain\n",
+    "from langchain.llms import TextGen\n",
+    "from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler\n",
+    "\n",
+    "langchain.debug = True\n",
+    "\n",
+    "template = \"\"\"Question: {question}\n",
+    "\n",
+    "Answer: Let's think step by step.\"\"\"\n",
+    "\n",
+    "\n",
+    "prompt = PromptTemplate(template=template, input_variables=[\"question\"])\n",
+    "llm = TextGen(model_url=model_url, streaming=True, callbacks=[StreamingStdOutCallbackHandler()])\n",
+    "llm_chain = LLMChain(prompt=prompt, llm=llm)\n",
+    "question = \"What NFL team won the Super Bowl in the year Justin Bieber was born?\"\n",
+    "\n",
+    "llm_chain.run(question)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "llm = TextGen(\n",
+    "    model_url = model_url,\n",
+    "    streaming=True\n",
+    ")\n",
+    "for chunk in llm.stream(\"Ask 'Hi, how are you?' like a pirate:'\",\n",
+    "        stop=[\"'\",\"\\n\"]):\n",
+    "    print(chunk, end='', flush=True)"
+   ]
  }
 ],
 "metadata": {
@ -79,7 +144,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.10.7"
+   "version": "3.10.4"
  }
 },
 "nbformat": 4,
--- a/docs/extras/integrations/providers/clarifai.mdx
+++ b/docs/extras/integrations/providers/clarifai.mdx
@ -37,7 +37,7 @@ There is a Clarifai Embedding model in LangChain, which you can access with:
 from langchain.embeddings import ClarifaiEmbeddings
 embeddings = ClarifaiEmbeddings(pat=CLARIFAI_PAT, user_id=USER_ID, app_id=APP_ID, model_id=MODEL_ID)
 ```
-For more details, the docs on the Clarifai Embeddings wrapper provide a [detailed walthrough](/docs/integrations/text_embedding/clarifai.html).
+For more details, the docs on the Clarifai Embeddings wrapper provide a [detailed walkthrough](/docs/integrations/text_embedding/clarifai.html).

 ## Vectorstore

@ -49,4 +49,4 @@ You an also add data directly from LangChain as well, and the auto-indexing will
 from langchain.vectorstores import Clarifai
 clarifai_vector_db = Clarifai.from_texts(user_id=USER_ID, app_id=APP_ID, texts=texts, pat=CLARIFAI_PAT, number_of_docs=NUMBER_OF_DOCS, metadatas = metadatas)
 ```
-For more details, the docs on the Clarifai vector store provide a [detailed walthrough](/docs/integrations/text_embedding/clarifai.html).
+For more details, the docs on the Clarifai vector store provide a [detailed walkthrough](/docs/integrations/text_embedding/clarifai.html).
--- a/docs/extras/integrations/providers/epsilla.mdx
+++ b/docs/extras/integrations/providers/epsilla.mdx
@ -0,0 +1,23 @@
+# Epsilla
+
+This page covers how to use [Epsilla](https://github.com/epsilla-cloud/vectordb) within LangChain.
+It is broken into two parts: installation and setup, and then references to specific Epsilla wrappers.
+
+## Installation and Setup
+
+- Install the Python SDK with `pip/pip3 install pyepsilla`
+
+## Wrappers
+
+### VectorStore
+
+There exists a wrapper around Epsilla vector databases, allowing you to use it as a vectorstore,
+whether for semantic search or example selection.
+
+To import this vectorstore:
+
+```python
+from langchain.vectorstores import Epsilla
+```
+
+For a more detailed walkthrough of the Epsilla wrapper, see [this notebook](/docs/integrations/vectorstores/epsilla.html)
--- a/docs/extras/integrations/text_embedding/clarifai.ipynb
+++ b/docs/extras/integrations/text_embedding/clarifai.ipynb
@ -130,9 +130,9 @@
   "metadata": {},
   "outputs": [],
   "source": [
-    "USER_ID = \"openai\"\n",
-    "APP_ID = \"embed\"\n",
-    "MODEL_ID = \"text-embedding-ada\"\n",
+    "USER_ID = \"salesforce\"\n",
+    "APP_ID = \"blip\"\n",
+    "MODEL_ID = \"multimodal-embedder-blip-2\"\n",
    "\n",
    "# You can provide a specific model version as the model_version_id arg.\n",
    "# MODEL_VERSION_ID = \"MODEL_VERSION_ID\""
--- a/docs/extras/integrations/text_embedding/ernie.ipynb
+++ b/docs/extras/integrations/text_embedding/ernie.ipynb
@ -0,0 +1,60 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# ERNIE Embedding-V1\n",
+    "\n",
+    "[ERNIE Embedding-V1](https://cloud.baidu.com/doc/WENXINWORKSHOP/s/alj562vvu) is a text representation model based on Baidu Wenxin's large-scale model technology, \n",
+    "which converts text into a vector form represented by numerical values, and is used in text retrieval, information recommendation, knowledge mining and other scenarios."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from langchain.embeddings import ErnieEmbeddings"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "embeddings = ErnieEmbeddings()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "query_result = embeddings.embed_query(\"foo\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "doc_results = embeddings.embed_documents([\"foo\"])"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "orig_nbformat": 4
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
--- a/docs/extras/integrations/vectorstores/clarifai.ipynb
+++ b/docs/extras/integrations/vectorstores/clarifai.ipynb
@ -53,7 +53,15 @@
   "execution_count": 1,
   "id": "c1e38361-c1fe-4ac6-86e9-c90ebaf7ae87",
   "metadata": {},
-   "outputs": [],
+   "outputs": [
+    {
+     "name": "stdin",
+     "output_type": "stream",
+     "text": [
+      " ········\n"
+     ]
+    }
+   ],
   "source": [
    "# Please login and get your API key from  https://clarifai.com/settings/security\n",
    "from getpass import getpass\n",
@ -61,18 +69,9 @@
    "CLARIFAI_PAT = getpass()"
   ]
  },
-  {
-   "attachments": {},
-   "cell_type": "markdown",
-   "id": "320af802-9271-46ee-948f-d2453933d44b",
-   "metadata": {},
-   "source": [
-    "We want to use `OpenAIEmbeddings` so we have to get the OpenAI API Key."
-   ]
-  },
  {
   "cell_type": "code",
-   "execution_count": 2,
+   "execution_count": 6,
   "id": "aac9563e",
   "metadata": {
    "tags": []
@ -99,7 +98,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 3,
+   "execution_count": 2,
   "id": "4d853395",
   "metadata": {},
   "outputs": [],
@ -134,7 +133,7 @@
    "    \"I love playing soccer with my friends\",\n",
    "]\n",
    "\n",
-    "metadatas = [{\"id\": i, \"text\": text} for i, text in enumerate(texts)]"
+    "metadatas = [{\"id\": i, \"text\": text, \"source\": \"book 1\", \"category\": [\"books\", \"modern\"]} for i, text in enumerate(texts)]"
   ]
  },
  {
@ -156,21 +155,17 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 7,
+   "execution_count": null,
   "id": "e755cdce",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
-       "[Document(page_content='I really enjoy spending time with you', metadata={'text': 'I really enjoy spending time with you', 'id': 0.0}),\n",
-       " Document(page_content='I went to the movies yesterday', metadata={'text': 'I went to the movies yesterday', 'id': 3.0}),\n",
-       " Document(page_content='zab', metadata={'page': '2'}),\n",
-       " Document(page_content='zab', metadata={'page': '2'})]"
+       "[Document(page_content='I really enjoy spending time with you', metadata={'text': 'I really enjoy spending time with you', 'id': 0.0, 'source': 'book 1', 'category': ['books', 'modern']}),\n",
+       " Document(page_content='I went to the movies yesterday', metadata={'text': 'I went to the movies yesterday', 'id': 3.0, 'source': 'book 1', 'category': ['books', 'modern']})]"
      ]
     },
-     "execution_count": 7,
-     "metadata": {},
     "output_type": "execute_result"
    }
   ],
@ -179,6 +174,21 @@
    "docs"
   ]
  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "140103ec-0936-454a-9f4a-7d5beefc138f",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# There is lots powerful filtering you can do within an app by leveraging metadata filters. \n",
+    "# This one will limit the similarity query to only the texts that have key of \"source\" matching value of \"book 1\"\n",
+    "book1_similar_docs = clarifai_vector_db.similarity_search(\"I would love to see you\", filter={\"source\": \"book 1\"})\n",
+    "\n",
+    "# you can also use lists in the input's metadata and then select things that match an item in the list. This is useful for categories like below:\n",
+    "book_category_similar_docs = clarifai_vector_db.similarity_search(\"I would love to see you\", filter={\"category\": [\"books\"]})"
+   ]
+  },
  {
   "attachments": {},
   "cell_type": "markdown",
@ -249,7 +259,7 @@
    "    user_id=USER_ID,\n",
    "    app_id=APP_ID,\n",
    "    documents=docs,\n",
-    "    pat=CLARIFAI_PAT_KEY,\n",
+    "    pat=CLARIFAI_PAT,\n",
    "    number_of_docs=NUMBER_OF_DOCS,\n",
    ")"
   ]
@ -278,6 +288,55 @@
    "docs = clarifai_vector_db.similarity_search(\"Texts related to criminals and violence\")\n",
    "docs"
   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "id": "7b332ca4-416b-4ea6-99da-b6949f399d72",
+   "metadata": {},
+   "source": [
+    "## From existing App\n",
+    "Within Clarifai we have great tools for adding data to applications (essentially projects) via API or UI. Most users will already have done that before interacting with LangChain so this example will use the data in an existing app to perform searches. Check out our [API docs](https://docs.clarifai.com/api-guide/data/create-get-update-delete) and [UI docs](https://docs.clarifai.com/portal-guide/data). The Clarifai Application can then be used for semantic search to find relevant documents."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "id": "807c1141-591b-436d-abaa-f2c325e66d39",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "USER_ID = \"USERNAME_ID\"\n",
+    "APP_ID = \"APPLICATION_ID\"\n",
+    "NUMBER_OF_DOCS = 4"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "id": "762d74ef-f7df-43d6-b121-4980c4059fc0",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "clarifai_vector_db = Clarifai(\n",
+    "    user_id=USER_ID,\n",
+    "    app_id=APP_ID,\n",
+    "    documents=docs,\n",
+    "    pat=CLARIFAI_PAT,\n",
+    "    number_of_docs=NUMBER_OF_DOCS,\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "f7636b0f-68ab-4b8f-ba0f-3c27061e3631",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "docs = clarifai_vector_db.similarity_search(\"Texts related to criminals and violence\")\n",
+    "docs"
+   ]
  }
 ],
 "metadata": {
--- a/docs/extras/integrations/vectorstores/epsilla.ipynb
+++ b/docs/extras/integrations/vectorstores/epsilla.ipynb
@ -0,0 +1,160 @@
+{
+ "cells": [
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Epsilla\n",
+    "\n",
+    ">[Epsilla](https://www.epsilla.com) is an open-source vector database that leverages the advanced parallel graph traversal techniques for vector indexing. Epsilla is licensed under GPL-3.0.\n",
+    "\n",
+    "This notebook shows how to use the functionalities related to the `Epsilla` vector database.\n",
+    "\n",
+    "As a prerequisite, you need to have a running Epsilla vector database (for example, through our docker image), and install the ``pyepsilla`` package. View full docs at [docs](https://epsilla-inc.gitbook.io/epsilladb/quick-start)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!pip/pip3 install pyepsilla"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We want to use OpenAIEmbeddings so we have to get the OpenAI API Key. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import os\n",
+    "import getpass\n",
+    "\n",
+    "os.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"OpenAI API Key:\")"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "OpenAI API Key: ········"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from langchain.embeddings import OpenAIEmbeddings\n",
+    "from langchain.vectorstores import Epsilla"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from langchain.document_loaders import TextLoader\n",
+    "from langchain.text_splitter import CharacterTextSplitter\n",
+    "\n",
+    "loader = TextLoader(\"../../modules/state_of_the_union.txt\")\n",
+    "documents = loader.load()\n",
+    "\n",
+    "documents = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0).split_documents(documents)\n",
+    "\n",
+    "embeddings = OpenAIEmbeddings()"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Epsilla vectordb is running with default host \"localhost\" and port \"8888\". We have a custom db path, db name and collection name instead of the default ones."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from pyepsilla import vectordb\n",
+    "\n",
+    "client = vectordb.Client()\n",
+    "vector_store = Epsilla.from_documents(\n",
+    "    documents,\n",
+    "    embeddings,\n",
+    "    client,\n",
+    "    db_path=\"/tmp/mypath\",\n",
+    "    db_name=\"MyDB\",\n",
+    "    collection_name=\"MyCollection\"\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "query = \"What did the president say about Ketanji Brown Jackson\"\n",
+    "docs = vector_store.similarity_search(query)\n",
+    "print(docs[0].page_content)"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "In state after state, new laws have been passed, not only to suppress the vote, but to subvert entire elections.\n",
+    "\n",
+    "We cannot let this happen.\n",
+    "\n",
+    "Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections.\n",
+    "\n",
+    "Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service.\n",
+    "\n",
+    "One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court.\n",
+    "\n",
+    "And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence."
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "langchain",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.9.17"
+  },
+  "orig_nbformat": 4
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
--- a/docs/extras/modules/chains/foundational/router.ipynb
+++ b/docs/extras/modules/chains/foundational/router.ipynb
@ -195,7 +195,7 @@
      "\n",
      "\n",
      "\u001b[1m> Entering new MultiPromptChain chain...\u001b[0m\n",
-      "math: {'input': 'What is the first prime number greater than 40 such that one plus the prime number is divisible by 3'}\n",
+      "math: {'input': 'What is the first prime number greater than 40 such that one plus the prime number is divisible by 3?'}\n",
      "\u001b[1m> Finished chain.\u001b[0m\n",
      "?\n",
      "\n",
@ -206,7 +206,7 @@
   "source": [
    "print(\n",
    "    chain.run(\n",
-    "        \"What is the first prime number greater than 40 such that one plus the prime number is divisible by 3\"\n",
+    "        \"What is the first prime number greater than 40 such that one plus the prime number is divisible by 3?\"\n",
    "    )\n",
    ")"
   ]
@ -342,7 +342,7 @@
      "\n",
      "\n",
      "\u001b[1m> Entering new MultiPromptChain chain...\u001b[0m\n",
-      "math: {'input': 'What is the first prime number greater than 40 such that one plus the prime number is divisible by 3'}\n",
+      "math: {'input': 'What is the first prime number greater than 40 such that one plus the prime number is divisible by 3?'}\n",
      "\u001b[1m> Finished chain.\u001b[0m\n",
      "?\n",
      "\n",
@ -353,7 +353,7 @@
   "source": [
    "print(\n",
    "    chain.run(\n",
-    "        \"What is the first prime number greater than 40 such that one plus the prime number is divisible by 3\"\n",
+    "        \"What is the first prime number greater than 40 such that one plus the prime number is divisible by 3?\"\n",
    "    )\n",
    ")"
   ]
--- a/docs/extras/modules/data_connection/retrievers/parent_document_retriever.ipynb
+++ b/docs/extras/modules/data_connection/retrievers/parent_document_retriever.ipynb
@ -72,7 +72,7 @@
   "source": [
    "## Retrieving Full Documents\n",
    "\n",
-    "In this mode, we want to retrieve the full documents. Therefor, we only specify a child splitter."
+    "In this mode, we want to retrieve the full documents. Therefore, we only specify a child splitter."
   ]
  },
  {
@ -106,7 +106,7 @@
   "metadata": {},
   "outputs": [],
   "source": [
-    "retriever.add_documents(docs)"
+    "retriever.add_documents(docs, ids=None)"
   ]
  },
  {
@ -144,7 +144,7 @@
   "id": "f895d62b",
   "metadata": {},
   "source": [
-    "Let's now call the vectorstore search functionality - we should see that it returns small chunks (since we're storing the small chunks"
+    "Let's now call the vectorstore search functionality - we should see that it returns small chunks (since we're storing the small chunks)."
   ]
  },
  {
@ -432,7 +432,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.10.1"
+   "version": "3.10.5"
  }
 },
 "nbformat": 4,
--- a/docs/extras/modules/model_io/prompts/prompt_templates/connecting_to_a_feature_store.ipynb
+++ b/docs/extras/modules/model_io/prompts/prompt_templates/connecting_to_a_feature_store.ipynb
@ -544,7 +544,7 @@
   "metadata": {},
   "outputs": [],
   "source": [
-    "prompt_template = FeatureformPrompTemplate(input_variables=[\"user_id\"])"
+    "prompt_template = FeatureformPromptTemplate(input_variables=[\"user_id\"])"
   ]
  },
  {
--- a/docs/extras/use_cases/apis.ipynb
+++ b/docs/extras/use_cases/apis.ipynb
@ -145,7 +145,7 @@
   "source": [
    "## Functions \n",
    "\n",
-    "We can unpack what is hapening when we use the funtions to calls external APIs.\n",
+    "We can unpack what is hapening when we use the functions to calls external APIs.\n",
    "\n",
    "Let's look at the [LangSmith trace](https://smith.langchain.com/public/76a58b85-193f-4eb7-ba40-747f0d5dd56e/r):\n",
    "\n",
--- a/docs/extras/use_cases/chatbots.ipynb
+++ b/docs/extras/use_cases/chatbots.ipynb
@ -7,7 +7,7 @@
   "source": [
    "# Chatbots\n",
    "\n",
-    "[![Open In Collab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/langchain-ai/langchain/blob/master/docs/extras/use_cases/chatbots/chatbots.ipynb)\n",
+    "[![Open In Collab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/langchain-ai/langchain/blob/master/docs/extras/use_cases/chatbots.ipynb)\n",
    "\n",
    "## Use case\n",
    "\n",
--- a/docs/extras/use_cases/code_understanding.ipynb
+++ b/docs/extras/use_cases/code_understanding.ipynb
@ -130,7 +130,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Splittng\n",
+    "### Splitting\n",
    "\n",
    "Split the `Document` into chunks for embedding and vector storage.\n",
    "\n",
--- a/docs/extras/use_cases/more/graph/graph_memgraph_qa.ipynb
+++ b/docs/extras/use_cases/more/graph/graph_memgraph_qa.ipynb
@ -0,0 +1,695 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "311b3061",
+   "metadata": {},
+   "source": [
+    "# Memgraph QA chain\n",
+    "This notebook shows how to use LLMs to provide a natural language interface to a [Memgraph](https://github.com/memgraph/memgraph) database. To complete this tutorial, you will need [Docker](https://www.docker.com/get-started/) and [Python 3.x](https://www.python.org/) installed.\n",
+    "\n",
+    "To follow along with this tutorial, ensure you have a running Memgraph instance. You can download and run it in a local Docker container by executing the following script:\n",
+    "```\n",
+    "docker run \\\n",
+    "    -it \\\n",
+    "    -p 7687:7687 \\\n",
+    "    -p 7444:7444 \\\n",
+    "    -p 3000:3000 \\\n",
+    "    -e MEMGRAPH=\"--bolt-server-name-for-init=Neo4j/\" \\\n",
+    "    -v mg_lib:/var/lib/memgraph memgraph/memgraph-platform\n",
+    "```\n",
+    "\n",
+    "You will need to wait a few seconds for the database to start. If the process completes successfully, you should see something like this:\n",
+    "```\n",
+    "mgconsole X.X\n",
+    "Connected to 'memgraph://127.0.0.1:7687'\n",
+    "Type :help for shell usage\n",
+    "Quit the shell by typing Ctrl-D(eof) or :quit\n",
+    "memgraph>\n",
+    "```\n",
+    "\n",
+    "Now you can start playing with Memgraph!"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "45ee105e",
+   "metadata": {},
+   "source": [
+    "Begin by installing and importing all the necessary packages. We'll use the package manager called [pip](https://pip.pypa.io/en/stable/installation/), along with the `--user` flag, to ensure proper permissions. If you've installed Python 3.4 or a later version, pip is included by default. You can install all the required packages using the following command:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "fd6b9672",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "pip install langchain openai neo4j gqlalchemy --user"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ec969a02",
+   "metadata": {},
+   "source": [
+    "You can either run the provided code blocks in this notebook or use a separate Python file to experiment with Memgraph and LangChain."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "8206f90d",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from langchain.chat_models import ChatOpenAI\n",
+    "from langchain.chains import GraphCypherQAChain\n",
+    "from langchain.graphs import MemgraphGraph\n",
+    "from langchain import PromptTemplate\n",
+    "\n",
+    "from gqlalchemy import Memgraph\n",
+    "\n",
+    "import os"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "95ba37a4",
+   "metadata": {},
+   "source": [
+    "We're utilizing the Python library [GQLAlchemy](https://github.com/memgraph/gqlalchemy) to establish a connection between our Memgraph database and Python script. To execute queries, we can set up a Memgraph instance as follows:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "b90c9cf8",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "memgraph = Memgraph(host='127.0.0.1', port=7687)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4c379d16",
+   "metadata": {},
+   "source": [
+    "## Populating the database\n",
+    "You can effortlessly populate your new, empty database using the Cypher query language. Don't worry if you don't grasp every line just yet, you can learn Cypher from the documentation [here](https://memgraph.com/docs/cypher-manual/). Running the following script will execute a seeding query on the database, giving us data about a video game, including details like the publisher, available platforms, and genres. This data will serve as a basis for our work."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "11922bdf",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Creating and executing the seeding query\n",
+    "query = \"\"\"\n",
+    "    MERGE (g:Game {name: \"Baldur's Gate 3\"})\n",
+    "    WITH g, [\"PlayStation 5\", \"Mac OS\", \"Windows\", \"Xbox Series X/S\"] AS platforms,\n",
+    "            [\"Adventure\", \"Role-Playing Game\", \"Strategy\"] AS genres\n",
+    "    FOREACH (platform IN platforms |\n",
+    "        MERGE (p:Platform {name: platform})\n",
+    "        MERGE (g)-[:AVAILABLE_ON]->(p)\n",
+    "    )\n",
+    "    FOREACH (genre IN genres |\n",
+    "        MERGE (gn:Genre {name: genre})\n",
+    "        MERGE (g)-[:HAS_GENRE]->(gn)\n",
+    "    )\n",
+    "    MERGE (p:Publisher {name: \"Larian Studios\"})\n",
+    "    MERGE (g)-[:PUBLISHED_BY]->(p);\n",
+    "\"\"\"\n",
+    "\n",
+    "memgraph.execute(query)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "378db965",
+   "metadata": {},
+   "source": [
+    "## Refresh graph schema"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d6b37df3",
+   "metadata": {},
+   "source": [
+    "You're all set to instantiate the Memgraph-LangChain graph using the following script. This interface will allow us to query our database using LangChain, automatically creating the required graph schema for generating Cypher queries through LLM."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "f38bbe83",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "graph = MemgraphGraph(url=\"bolt://localhost:7687\", username=\"\", password=\"\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "846c32a8",
+   "metadata": {},
+   "source": [
+    "If necessary, you can manually refresh the graph schema as follows."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "b561026e",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "graph.refresh_schema()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c51b7948",
+   "metadata": {},
+   "source": [
+    "To familiarize yourself with the data and verify the updated graph schema, you can print it using the following statement."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "f2e0ec3e",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "print(graph.get_schema)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a0c2a556",
+   "metadata": {},
+   "source": [
+    "```\n",
+    "Node properties are the following:\n",
+    "Node name: 'Game', Node properties: [{'property': 'name', 'type': 'str'}]\n",
+    "Node name: 'Platform', Node properties: [{'property': 'name', 'type': 'str'}]\n",
+    "Node name: 'Genre', Node properties: [{'property': 'name', 'type': 'str'}]\n",
+    "Node name: 'Publisher', Node properties: [{'property': 'name', 'type': 'str'}]\n",
+    "\n",
+    "Relationship properties are the following:\n",
+    "\n",
+    "The relationships are the following:\n",
+    "['(:Game)-[:AVAILABLE_ON]->(:Platform)']\n",
+    "['(:Game)-[:HAS_GENRE]->(:Genre)']\n",
+    "['(:Game)-[:PUBLISHED_BY]->(:Publisher)']\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "44d3a1da",
+   "metadata": {},
+   "source": [
+    "## Querying the database"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8aedfd63",
+   "metadata": {},
+   "source": [
+    "To interact with the OpenAI API, you must configure your API key as an environment variable using the Python [os](https://docs.python.org/3/library/os.html) package. This ensures proper authorization for your requests. You can find more information on obtaining your API key [here](https://help.openai.com/en/articles/4936850-where-do-i-find-my-secret-api-key)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "b8385c72",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "os.environ[\"OPENAI_API_KEY\"] = \"your-key-here\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5a74565a",
+   "metadata": {},
+   "source": [
+    "You should create the graph chain using the following script, which will be utilized in the question-answering process based on your graph data. While it defaults to GPT-3.5-turbo, you might also consider experimenting with other models like [GPT-4](https://help.openai.com/en/articles/7102672-how-can-i-access-gpt-4) for notably improved Cypher queries and outcomes. We'll utilize the OpenAI chat, utilizing the key you previously configured. We'll set the temperature to zero, ensuring predictable and consistent answers. Additionally, we'll use our Memgraph-LangChain graph and set the verbose parameter, which defaults to False, to True to receive more detailed messages regarding query generation."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "4a3a5f2e",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "chain = GraphCypherQAChain.from_llm(\n",
+    "    ChatOpenAI(temperature=0), graph=graph, verbose=True, model_name='gpt-3.5-turbo'\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "949de4f3",
+   "metadata": {},
+   "source": [
+    "Now you can start asking questions!"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "b7aea263",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "response = chain.run(\"Which platforms is Baldur's Gate 3 available on?\")\n",
+    "print(response)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a06a8164",
+   "metadata": {},
+   "source": [
+    "```\n",
+    "> Entering new GraphCypherQAChain chain...\n",
+    "Generated Cypher:\n",
+    "MATCH (g:Game {name: 'Baldur\\'s Gate 3'})-[:AVAILABLE_ON]->(p:Platform)\n",
+    "RETURN p.name\n",
+    "Full Context:\n",
+    "[{'p.name': 'PlayStation 5'}, {'p.name': 'Mac OS'}, {'p.name': 'Windows'}, {'p.name': 'Xbox Series X/S'}]\n",
+    "\n",
+    "> Finished chain.\n",
+    "Baldur's Gate 3 is available on PlayStation 5, Mac OS, Windows, and Xbox Series X/S.\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "59d298d5",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "response = chain.run(\"Is Baldur's Gate 3 available on Windows?\")\n",
+    "print(response)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "99dd783c",
+   "metadata": {},
+   "source": [
+    "```\n",
+    "> Entering new GraphCypherQAChain chain...\n",
+    "Generated Cypher:\n",
+    "MATCH (:Game {name: 'Baldur\\'s Gate 3'})-[:AVAILABLE_ON]->(:Platform {name: 'Windows'})\n",
+    "RETURN true\n",
+    "Full Context:\n",
+    "[{'true': True}]\n",
+    "\n",
+    "> Finished chain.\n",
+    "Yes, Baldur's Gate 3 is available on Windows.\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "08620465",
+   "metadata": {},
+   "source": [
+    "## Chain modifiers"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6603e6c8",
+   "metadata": {},
+   "source": [
+    "To modify the behavior of your chain and obtain more context or additional information, you can modify the chain's parameters."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8d187a83",
+   "metadata": {},
+   "source": [
+    "#### Return direct query results\n",
+    "The `return_direct` modifier specifies whether to return the direct results of the executed Cypher query or the processed natural language response."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "0533847d",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Return the result of querying the graph directly\n",
+    "chain = GraphCypherQAChain.from_llm(\n",
+    "    ChatOpenAI(temperature=0), graph=graph, verbose=True, return_direct=True\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "afbe96fb",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "response = chain.run(\"Which studio published Baldur's Gate 3?\")\n",
+    "print(response)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "94b32b6e",
+   "metadata": {},
+   "source": [
+    "```\n",
+    "> Entering new GraphCypherQAChain chain...\n",
+    "Generated Cypher:\n",
+    "MATCH (:Game {name: 'Baldur\\'s Gate 3'})-[:PUBLISHED_BY]->(p:Publisher)\n",
+    "RETURN p.name\n",
+    "\n",
+    "> Finished chain.\n",
+    "[{'p.name': 'Larian Studios'}]\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5c97ab3a",
+   "metadata": {},
+   "source": [
+    "#### Return query intermediate steps\n",
+    "The `return_intermediate_steps` chain modifier enhances the returned response by including the intermediate steps of the query in addition to the initial query result."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "82f673c8",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Return all the intermediate steps of query execution\n",
+    "chain = GraphCypherQAChain.from_llm(\n",
+    "    ChatOpenAI(temperature=0), graph=graph, verbose=True, return_intermediate_steps=True\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "d87e0976",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "response = chain(\"Is Baldur's Gate 3 an Adventure game?\")\n",
+    "print(f\"Intermediate steps: {response['intermediate_steps']}\")\n",
+    "print(f\"Final response: {response['result']}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "df12b3da",
+   "metadata": {},
+   "source": [
+    "```\n",
+    "> Entering new GraphCypherQAChain chain...\n",
+    "Generated Cypher:\n",
+    "MATCH (g:Game {name: 'Baldur\\'s Gate 3'})-[:HAS_GENRE]->(genre:Genre {name: 'Adventure'})\n",
+    "RETURN g, genre\n",
+    "Full Context:\n",
+    "[{'g': {'name': \"Baldur's Gate 3\"}, 'genre': {'name': 'Adventure'}}]\n",
+    "\n",
+    "> Finished chain.\n",
+    "Intermediate steps: [{'query': \"MATCH (g:Game {name: 'Baldur\\\\'s Gate 3'})-[:HAS_GENRE]->(genre:Genre {name: 'Adventure'})\\nRETURN g, genre\"}, {'context': [{'g': {'name': \"Baldur's Gate 3\"}, 'genre': {'name': 'Adventure'}}]}]\n",
+    "Final response: Yes, Baldur's Gate 3 is an Adventure game.\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "41124485",
+   "metadata": {},
+   "source": [
+    "#### Limit the number of query results\n",
+    "The `top_k` modifier can be used when you want to restrict the maximum number of query results."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "7340fc87",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Limit the maximum number of results returned by query\n",
+    "chain = GraphCypherQAChain.from_llm(\n",
+    "    ChatOpenAI(temperature=0), graph=graph, verbose=True, top_k=2\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "3a17cdc6",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "response = chain.run(\"What genres are associated with Baldur's Gate 3?\")\n",
+    "print(response)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "dcff33ed",
+   "metadata": {},
+   "source": [
+    "```\n",
+    "> Entering new GraphCypherQAChain chain...\n",
+    "Generated Cypher:\n",
+    "MATCH (:Game {name: 'Baldur\\'s Gate 3'})-[:HAS_GENRE]->(g:Genre)\n",
+    "RETURN g.name\n",
+    "Full Context:\n",
+    "[{'g.name': 'Adventure'}, {'g.name': 'Role-Playing Game'}]\n",
+    "\n",
+    "> Finished chain.\n",
+    "Baldur's Gate 3 is associated with the genres Adventure and Role-Playing Game.\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2eb524a1",
+   "metadata": {},
+   "source": [
+    "# Advanced querying"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "113be997",
+   "metadata": {},
+   "source": [
+    "As the complexity of your solution grows, you might encounter different use-cases that require careful handling. Ensuring your application's scalability is essential to maintain a smooth user flow without any hitches."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e0b2db17",
+   "metadata": {},
+   "source": [
+    "Let's instantiate our chain once again and attempt to ask some questions that users might potentially ask."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "fc544d0b",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "chain = GraphCypherQAChain.from_llm(\n",
+    "    ChatOpenAI(temperature=0), graph=graph, verbose=True, model_name='gpt-3.5-turbo'\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "e2abde2d",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "response = chain.run(\"Is Baldur's Gate 3 available on PS5?\")\n",
+    "print(response)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cf22dc48",
+   "metadata": {},
+   "source": [
+    "```\n",
+    "> Entering new GraphCypherQAChain chain...\n",
+    "Generated Cypher:\n",
+    "MATCH (g:Game {name: 'Baldur\\'s Gate 3'})-[:AVAILABLE_ON]->(p:Platform {name: 'PS5'})\n",
+    "RETURN g.name, p.name\n",
+    "Full Context:\n",
+    "[]\n",
+    "\n",
+    "> Finished chain.\n",
+    "I'm sorry, but I don't have the information to answer your question.\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "293aa1c9",
+   "metadata": {},
+   "source": [
+    "The generated Cypher query looks fine, but we didn't receive any information in response. This illustrates a common challenge when working with LLMs - the misalignment between how users phrase queries and how data is stored. In this case, the difference between user perception and the actual data storage can cause mismatches. Prompt refinement, the process of honing the model's prompts to better grasp these distinctions, is an efficient solution that tackles this issue. Through prompt refinement, the model gains increased proficiency in generating precise and pertinent queries, leading to the successful retrieval of the desired data."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a87b2f1b",
+   "metadata": {},
+   "source": [
+    "### Prompt refinement"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8edb9976",
+   "metadata": {},
+   "source": [
+    "To address this, we can adjust the initial Cypher prompt of the QA chain. This involves adding guidance to the LLM on how users can refer to specific platforms, such as PS5 in our case. We achieve this using the LangChain [PromptTemplate](https://python.langchain.com/docs/modules/model_io/prompts/prompt_templates/), creating a modified initial prompt. This modified prompt is then supplied as an argument to our refined Memgraph-LangChain instance."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "312dad05",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "CYPHER_GENERATION_TEMPLATE = \"\"\"\n",
+    "Task:Generate Cypher statement to query a graph database.\n",
+    "Instructions:\n",
+    "Use only the provided relationship types and properties in the schema.\n",
+    "Do not use any other relationship types or properties that are not provided.\n",
+    "Schema:\n",
+    "{schema}\n",
+    "Note: Do not include any explanations or apologies in your responses.\n",
+    "Do not respond to any questions that might ask anything else than for you to construct a Cypher statement.\n",
+    "Do not include any text except the generated Cypher statement.\n",
+    "If the user asks about PS5, Play Station 5 or PS 5, that is the platform called PlayStation 5.\n",
+    "\n",
+    "The question is:\n",
+    "{question}\n",
+    "\"\"\"\n",
+    "\n",
+    "CYPHER_GENERATION_PROMPT = PromptTemplate(\n",
+    "    input_variables=[\"schema\", \"question\"], template=CYPHER_GENERATION_TEMPLATE\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "2c297245",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "chain = GraphCypherQAChain.from_llm(\n",
+    "    ChatOpenAI(temperature=0), \n",
+    "    cypher_prompt=CYPHER_GENERATION_PROMPT,\n",
+    "    graph=graph, \n",
+    "    verbose=True, \n",
+    "    model_name='gpt-3.5-turbo'\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "7efb11a0",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "response = chain.run(\"Is Baldur's Gate 3 available on PS5?\")\n",
+    "print(response)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "289db07f",
+   "metadata": {},
+   "source": [
+    "```\n",
+    "> Entering new GraphCypherQAChain chain...\n",
+    "Generated Cypher:\n",
+    "MATCH (g:Game {name: 'Baldur\\'s Gate 3'})-[:AVAILABLE_ON]->(p:Platform {name: 'PlayStation 5'})\n",
+    "RETURN g.name, p.name\n",
+    "Full Context:\n",
+    "[{'g.name': \"Baldur's Gate 3\", 'p.name': 'PlayStation 5'}]\n",
+    "\n",
+    "> Finished chain.\n",
+    "Yes, Baldur's Gate 3 is available on PlayStation 5.\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "84b5f6af",
+   "metadata": {},
+   "source": [
+    "Now, with the revised initial Cypher prompt that includes guidance on platform naming, we are obtaining accurate and relevant results that align more closely with user queries. "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a21108ad",
+   "metadata": {},
+   "source": [
+    "This approach allows for further improvement of your QA chain. You can effortlessly integrate extra prompt refinement data into your chain, thereby enhancing the overall user experience of your app."
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.9.13"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
--- a/docs/snippets/get_started/installation.mdx
+++ b/docs/snippets/get_started/installation.mdx
@ -40,7 +40,7 @@ pip install 'langchain[all]'

 ## From source

-If you want to install from source, you can do so by cloning the repo and running:
+If you want to install from source, you can do so by cloning the repo and be sure that the directory is `PATH/TO/REPO/langchain/libs/langchain` running:

 ```bash
 pip install -e .
--- a/docs/snippets/modules/data_connection/document_loaders/how_to/pdf.mdx
+++ b/docs/snippets/modules/data_connection/document_loaders/how_to/pdf.mdx
@ -1,4 +1,4 @@
-## Using PyPDF
+# Using PyPDF

 Load PDF using `pypdf` into array of documents, where each document contains the page content and metadata with `page` number.

@ -389,3 +389,17 @@ data[0]
 ```

 </CodeOutputBlock>
+
+## Using AmazonTextractPDFParser
+
+The AmazonTextractPDFLoader calls the [Amazon Textract Service](https://aws.amazon.com/textract/) to convert PDFs into a Document structure. The loader does pure OCR at the moment, with more features like layout support planned, depending on demand.  Single and multi-page documents are supported with up to 3000 pages and 512 MB of size.
+
+For the call to be successful an AWS account is required, similar to the [AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html) requirements.
+
+Besides the AWS configuration, it is very similar to the other PDF loaders, while also supporting JPEG, PNG and TIFF and non-native PDF formats.
+
+```python
+from langchain.document_loaders import AmazonTextractPDFLoader
+loader = AmazonTextractPDFLoader("example_data/alejandro_rosalez_sample-small.jpeg")
+documents = loader.load()
+```
--- a/docs/snippets/modules/model_io/models/llms/get_started.mdx
+++ b/docs/snippets/modules/model_io/models/llms/get_started.mdx
@ -43,7 +43,7 @@ llm("Tell me a joke")
 </CodeOutputBlock>

 ### `generate`: batch calls, richer outputs
-`generate` lets you can call the model with a list of strings, getting back a more complete response than just the text. This complete response can include things like multiple top responses and other LLM provider-specific information:
+`generate` lets you call the model with a list of strings, getting back a more complete response than just the text. This complete response can include things like multiple top responses and other LLM provider-specific information:

 ```python
 llm_result = llm.generate(["Tell me a joke", "Tell me a poem"]*15)
--- a/libs/experimental/langchain_experimental/py.typed
+++ b/libs/experimental/langchain_experimental/py.typed
--- a/libs/experimental/pyproject.toml
+++ b/libs/experimental/pyproject.toml
@ -5,7 +5,7 @@ description = "Building applications with LLMs through composability"
 authors = []
 license = "MIT"
 readme = "README.md"
-repository = "https://www.github.com/hwchase17/langchain"
+repository = "https://github.com/langchain-ai/langchain"


 [tool.poetry.dependencies]
--- a/libs/langchain/Makefile
+++ b/libs/langchain/Makefile
@ -76,9 +76,9 @@ lint format: PYTHON_FILES=.
 lint_diff format_diff: PYTHON_FILES=$(shell git diff --relative=libs/langchain --name-only --diff-filter=d master | grep -E '\.py$$|\.ipynb$$')

 lint lint_diff:
-	poetry run mypy $(PYTHON_FILES)
-	poetry run black $(PYTHON_FILES) --check
 	poetry run ruff .
+	poetry run black $(PYTHON_FILES) --check
+	poetry run mypy $(PYTHON_FILES)

 format format_diff:
 	poetry run black $(PYTHON_FILES)
--- a/libs/langchain/langchain/agents/loading.py
+++ b/libs/langchain/langchain/agents/loading.py
@ -97,8 +97,9 @@ def load_agent(
    Returns:
        An agent executor.
    """
+    valid_suffixes = {"json", "yaml"}
    if hub_result := try_load_from_hub(
-        path, _load_agent_from_file, "agents", {"json", "yaml"}
+        path, _load_agent_from_file, "agents", valid_suffixes
    ):
        return hub_result
    else:
@ -109,19 +110,20 @@ def _load_agent_from_file(
    file: Union[str, Path], **kwargs: Any
 ) -> Union[BaseSingleActionAgent, BaseMultiActionAgent]:
    """Load agent from file."""
+    valid_suffixes = {"json", "yaml"}
    # Convert file to Path object.
    if isinstance(file, str):
        file_path = Path(file)
    else:
        file_path = file
    # Load from either json or yaml.
-    if file_path.suffix == ".json":
+    if file_path.suffix[1:] == "json":
        with open(file_path) as f:
            config = json.load(f)
-    elif file_path.suffix == ".yaml":
+    elif file_path.suffix[1:] == "yaml":
        with open(file_path, "r") as f:
            config = yaml.safe_load(f)
    else:
-        raise ValueError("File type must be json or yaml")
+        raise ValueError(f"Unsupported file type, must be one of {valid_suffixes}.")
    # Load the agent from the config now.
    return load_agent_from_config(config, **kwargs)
--- a/libs/langchain/langchain/callbacks/manager.py
+++ b/libs/langchain/langchain/callbacks/manager.py
@ -190,6 +190,7 @@ def trace_as_chain_group(
    *,
    project_name: Optional[str] = None,
    example_id: Optional[Union[str, UUID]] = None,
+    run_id: Optional[UUID] = None,
    tags: Optional[List[str]] = None,
 ) -> Generator[CallbackManager, None, None]:
    """Get a callback manager for a chain group in a context manager.
@ -202,6 +203,7 @@ def trace_as_chain_group(
            Defaults to None.
        example_id (str or UUID, optional): The ID of the example.
            Defaults to None.
+        run_id (UUID, optional): The ID of the run.
        tags (List[str], optional): The inheritable tags to apply to all runs.
            Defaults to None.

@ -229,7 +231,7 @@ def trace_as_chain_group(
        inheritable_tags=tags,
    )

-    run_manager = cm.on_chain_start({"name": group_name}, {})
+    run_manager = cm.on_chain_start({"name": group_name}, {}, run_id=run_id)
    yield run_manager.get_child()
    run_manager.on_chain_end({})

@ -241,6 +243,7 @@ async def atrace_as_chain_group(
    *,
    project_name: Optional[str] = None,
    example_id: Optional[Union[str, UUID]] = None,
+    run_id: Optional[UUID] = None,
    tags: Optional[List[str]] = None,
 ) -> AsyncGenerator[AsyncCallbackManager, None]:
    """Get an async callback manager for a chain group in a context manager.
@ -253,6 +256,7 @@ async def atrace_as_chain_group(
            Defaults to None.
        example_id (str or UUID, optional): The ID of the example.
            Defaults to None.
+        run_id (UUID, optional): The ID of the run.
        tags (List[str], optional): The inheritable tags to apply to all runs.
            Defaults to None.
    Returns:
@ -276,7 +280,7 @@ async def atrace_as_chain_group(
    )
    cm = AsyncCallbackManager.configure(inheritable_callbacks=cb, inheritable_tags=tags)

-    run_manager = await cm.on_chain_start({"name": group_name}, {})
+    run_manager = await cm.on_chain_start({"name": group_name}, {}, run_id=run_id)
    try:
        yield run_manager.get_child()
    finally:
@ -711,11 +715,11 @@ class AsyncCallbackManagerForLLMRun(AsyncRunManager, LLMManagerMixin):
 class CallbackManagerForChainRun(ParentRunManager, ChainManagerMixin):
    """Callback manager for chain run."""

-    def on_chain_end(self, outputs: Dict[str, Any], **kwargs: Any) -> None:
+    def on_chain_end(self, outputs: Union[Dict[str, Any], Any], **kwargs: Any) -> None:
        """Run when chain ends running.

        Args:
-            outputs (Dict[str, Any]): The outputs of the chain.
+            outputs (Union[Dict[str, Any], Any]): The outputs of the chain.
        """
        _handle_event(
            self.handlers,
@ -793,11 +797,13 @@ class CallbackManagerForChainRun(ParentRunManager, ChainManagerMixin):
 class AsyncCallbackManagerForChainRun(AsyncParentRunManager, ChainManagerMixin):
    """Async callback manager for chain run."""

-    async def on_chain_end(self, outputs: Dict[str, Any], **kwargs: Any) -> None:
+    async def on_chain_end(
+        self, outputs: Union[Dict[str, Any], Any], **kwargs: Any
+    ) -> None:
        """Run when chain ends running.

        Args:
-            outputs (Dict[str, Any]): The outputs of the chain.
+            outputs (Union[Dict[str, Any], Any]): The outputs of the chain.
        """
        await _ahandle_event(
            self.handlers,
@ -1140,7 +1146,7 @@ class CallbackManager(BaseCallbackManager):
    def on_chain_start(
        self,
        serialized: Dict[str, Any],
-        inputs: Dict[str, Any],
+        inputs: Union[Dict[str, Any], Any],
        run_id: Optional[UUID] = None,
        **kwargs: Any,
    ) -> CallbackManagerForChainRun:
@ -1148,7 +1154,7 @@ class CallbackManager(BaseCallbackManager):

        Args:
            serialized (Dict[str, Any]): The serialized chain.
-            inputs (Dict[str, Any]): The inputs to the chain.
+            inputs (Union[Dict[str, Any], Any]): The inputs to the chain.
            run_id (UUID, optional): The ID of the run. Defaults to None.

        Returns:
@ -1429,7 +1435,7 @@ class AsyncCallbackManager(BaseCallbackManager):
    async def on_chain_start(
        self,
        serialized: Dict[str, Any],
-        inputs: Dict[str, Any],
+        inputs: Union[Dict[str, Any], Any],
        run_id: Optional[UUID] = None,
        **kwargs: Any,
    ) -> AsyncCallbackManagerForChainRun:
@ -1437,7 +1443,7 @@ class AsyncCallbackManager(BaseCallbackManager):

        Args:
            serialized (Dict[str, Any]): The serialized chain.
-            inputs (Dict[str, Any]): The inputs to the chain.
+            inputs (Union[Dict[str, Any], Any]): The inputs to the chain.
            run_id (UUID, optional): The ID of the run. Defaults to None.

        Returns:
--- a/libs/langchain/langchain/callbacks/tracers/base.py
+++ b/libs/langchain/langchain/callbacks/tracers/base.py
@ -231,6 +231,7 @@ class BaseTracer(BaseCallbackHandler, ABC):
        parent_run_id: Optional[UUID] = None,
        metadata: Optional[Dict[str, Any]] = None,
        run_type: Optional[str] = None,
+        name: Optional[str] = None,
        **kwargs: Any,
    ) -> None:
        """Start a trace for a chain run."""
@ -243,7 +244,7 @@ class BaseTracer(BaseCallbackHandler, ABC):
            id=run_id,
            parent_run_id=parent_run_id,
            serialized=serialized,
-            inputs=inputs,
+            inputs=inputs if isinstance(inputs, dict) else {"input": inputs},
            extra=kwargs,
            events=[{"name": "start", "time": start_time}],
            start_time=start_time,
@ -251,6 +252,7 @@ class BaseTracer(BaseCallbackHandler, ABC):
            child_execution_order=execution_order,
            child_runs=[],
            run_type=run_type or "chain",
+            name=name,
            tags=tags or [],
        )
        self._start_trace(chain_run)
@ -271,11 +273,13 @@ class BaseTracer(BaseCallbackHandler, ABC):
        if chain_run is None:
            raise TracerException(f"No chain Run found to be traced for {run_id}")

-        chain_run.outputs = outputs
+        chain_run.outputs = (
+            outputs if isinstance(outputs, dict) else {"output": outputs}
+        )
        chain_run.end_time = datetime.utcnow()
        chain_run.events.append({"name": "end", "time": chain_run.end_time})
        if inputs is not None:
-            chain_run.inputs = inputs
+            chain_run.inputs = inputs if isinstance(inputs, dict) else {"input": inputs}
        self._end_trace(chain_run)
        self._on_chain_end(chain_run)

@ -298,7 +302,7 @@ class BaseTracer(BaseCallbackHandler, ABC):
        chain_run.end_time = datetime.utcnow()
        chain_run.events.append({"name": "error", "time": chain_run.end_time})
        if inputs is not None:
-            chain_run.inputs = inputs
+            chain_run.inputs = inputs if isinstance(inputs, dict) else {"input": inputs}
        self._end_trace(chain_run)
        self._on_chain_error(chain_run)

--- a/libs/langchain/langchain/chains/sequential.py
+++ b/libs/langchain/langchain/chains/sequential.py
@ -190,11 +190,12 @@ class SimpleSequentialChain(Chain):
        run_manager: Optional[AsyncCallbackManagerForChainRun] = None,
    ) -> Dict[str, Any]:
        _run_manager = run_manager or AsyncCallbackManagerForChainRun.get_noop_manager()
-        callbacks = _run_manager.get_child()
        _input = inputs[self.input_key]
        color_mapping = get_color_mapping([str(i) for i in range(len(self.chains))])
        for i, chain in enumerate(self.chains):
-            _input = await chain.arun(_input, callbacks=callbacks)
+            _input = await chain.arun(
+                _input, callbacks=_run_manager.get_child(f"step_{i+1}")
+            )
            if self.strip_outputs:
                _input = _input.strip()
            await _run_manager.on_text(
--- a/libs/langchain/langchain/chat_models/openai.py
+++ b/libs/langchain/langchain/chat_models/openai.py
@ -381,10 +381,16 @@ class ChatOpenAI(BaseChatModel):
        ):
            if len(chunk["choices"]) == 0:
                continue
-            delta = chunk["choices"][0]["delta"]
-            chunk = _convert_delta_to_message_chunk(delta, default_chunk_class)
+            choice = chunk["choices"][0]
+            chunk = _convert_delta_to_message_chunk(
+                choice["delta"], default_chunk_class
+            )
+            finish_reason = choice.get("finish_reason")
+            generation_info = (
+                dict(finish_reason=finish_reason) if finish_reason is not None else None
+            )
            default_chunk_class = chunk.__class__
-            yield ChatGenerationChunk(message=chunk)
+            yield ChatGenerationChunk(message=chunk, generation_info=generation_info)
            if run_manager:
                await run_manager.on_llm_new_token(chunk.content)

--- a/libs/langchain/langchain/document_loaders/init.py
+++ b/libs/langchain/langchain/document_loaders/init.py
@ -147,6 +147,7 @@ from langchain.document_loaders.rst import UnstructuredRSTLoader
 from langchain.document_loaders.rtf import UnstructuredRTFLoader
 from langchain.document_loaders.s3_directory import S3DirectoryLoader
 from langchain.document_loaders.s3_file import S3FileLoader
+from langchain.document_loaders.sharepoint import SharePointLoader
 from langchain.document_loaders.sitemap import SitemapLoader
 from langchain.document_loaders.slack_directory import SlackDirectoryLoader
 from langchain.document_loaders.snowflake_loader import SnowflakeLoader
@ -316,6 +317,7 @@ __all__ = [
    "S3FileLoader",
    "SRTLoader",
    "SeleniumURLLoader",
+    "SharePointLoader",
    "SitemapLoader",
    "SlackDirectoryLoader",
    "SnowflakeLoader",
--- a/libs/langchain/langchain/document_loaders/base_o365.py
+++ b/libs/langchain/langchain/document_loaders/base_o365.py
@ -0,0 +1,182 @@
+"""Base class for all loaders that uses O365 Package"""
+from __future__ import annotations
+
+import logging
+import os
+import tempfile
+from abc import abstractmethod
+from enum import Enum
+from pathlib import Path
+from typing import TYPE_CHECKING, Dict, Iterable, List, Sequence, Union
+
+from langchain.document_loaders.base import BaseLoader
+from langchain.document_loaders.blob_loaders.file_system import FileSystemBlobLoader
+from langchain.document_loaders.blob_loaders.schema import Blob
+from langchain.pydantic_v1 import BaseModel, BaseSettings, Field, FilePath, SecretStr
+
+if TYPE_CHECKING:
+    from O365 import Account
+    from O365.drive import Drive, Folder
+
+logger = logging.getLogger(__name__)
+
+CHUNK_SIZE = 1024 * 1024 * 5
+
+
+class _O365Settings(BaseSettings):
+    client_id: str = Field(..., env="O365_CLIENT_ID")
+    client_secret: SecretStr = Field(..., env="O365_CLIENT_SECRET")
+
+    class Config:
+        env_prefix = ""
+        case_sentive = False
+        env_file = ".env"
+
+
+class _O365TokenStorage(BaseSettings):
+    token_path: FilePath = Path.home() / ".credentials" / "o365_token.txt"
+
+
+class _FileType(str, Enum):
+    DOC = "doc"
+    DOCX = "docx"
+    PDF = "pdf"
+
+
+def fetch_mime_types(file_types: Sequence[_FileType]) -> Dict[str, str]:
+    mime_types_mapping = {}
+    for file_type in file_types:
+        if file_type.value == "doc":
+            mime_types_mapping[file_type.value] = "application/msword"
+        elif file_type.value == "docx":
+            mime_types_mapping[
+                file_type.value
+            ] = "application/vnd.openxmlformats-officedocument.wordprocessingml.document"  # noqa: E501
+        elif file_type.value == "pdf":
+            mime_types_mapping[file_type.value] = "application/pdf"
+    return mime_types_mapping
+
+
+class O365BaseLoader(BaseLoader, BaseModel):
+    settings: _O365Settings = Field(default_factory=_O365Settings)
+    """Settings for the Office365 API client."""
+    auth_with_token: bool = False
+    """Whether to authenticate with a token or not. Defaults to False."""
+    chunk_size: Union[int, str] = CHUNK_SIZE
+    """Number of bytes to retrieve from each api call to the server. int or 'auto'."""
+
+    @property
+    @abstractmethod
+    def _file_types(self) -> Sequence[_FileType]:
+        """Return supported file types."""
+
+    @property
+    def _fetch_mime_types(self) -> Dict[str, str]:
+        """Return a dict of supported file types to corresponding mime types."""
+        return fetch_mime_types(self._file_types)
+
+    @property
+    @abstractmethod
+    def _scopes(self) -> List[str]:
+        """Return required scopes."""
+
+    def _load_from_folder(self, folder: Folder) -> Iterable[Blob]:
+        """Lazily load all files from a specified folder of the configured MIME type.
+
+        Args:
+            folder: The Folder instance from which the files are to be loaded. This
+                Folder instance should represent a directory in a file system where the
+                files are stored.
+
+        Yields:
+            An iterator that yields Blob instances, which are binary representations of
+                the files loaded from the folder.
+        """
+        file_mime_types = self._fetch_mime_types
+        items = folder.get_items()
+        with tempfile.TemporaryDirectory() as temp_dir:
+            os.makedirs(os.path.dirname(temp_dir), exist_ok=True)
+            for file in items:
+                if file.is_file:
+                    if file.mime_type in list(file_mime_types.values()):
+                        file.download(to_path=temp_dir, chunk_size=self.chunk_size)
+            loader = FileSystemBlobLoader(path=temp_dir)
+            yield from loader.yield_blobs()
+
+    def _load_from_object_ids(
+        self, drive: Drive, object_ids: List[str]
+    ) -> Iterable[Blob]:
+        """Lazily load files specified by their object_ids from a drive.
+
+        Load files into the system as binary large objects (Blobs) and return Iterable.
+
+        Args:
+            drive: The Drive instance from which the files are to be loaded. This Drive
+                instance should represent a cloud storage service or similar storage
+                system where the files are stored.
+            object_ids: A list of object_id strings. Each object_id represents a unique
+                identifier for a file in the drive.
+
+        Yields:
+            An iterator that yields Blob instances, which are binary representations of
+            the files loaded from the drive using the specified object_ids.
+        """
+        file_mime_types = self._fetch_mime_types
+        with tempfile.TemporaryDirectory() as temp_dir:
+            for object_id in object_ids:
+                file = drive.get_item(object_id)
+                if not file:
+                    logging.warning(
+                        "There isn't a file with"
+                        f"object_id {object_id} in drive {drive}."
+                    )
+                    continue
+                if file.is_file:
+                    if file.mime_type in list(file_mime_types.values()):
+                        file.download(to_path=temp_dir, chunk_size=self.chunk_size)
+            loader = FileSystemBlobLoader(path=temp_dir)
+            yield from loader.yield_blobs()
+
+    def _auth(self) -> Account:
+        """Authenticates the OneDrive API client
+
+        Returns:
+            The authenticated Account object.
+        """
+        try:
+            from O365 import Account, FileSystemTokenBackend
+        except ImportError:
+            raise ImportError(
+                "O365 package not found, please install it with `pip install o365`"
+            )
+        if self.auth_with_token:
+            token_storage = _O365TokenStorage()
+            token_path = token_storage.token_path
+            token_backend = FileSystemTokenBackend(
+                token_path=token_path.parent, token_filename=token_path.name
+            )
+            account = Account(
+                credentials=(
+                    self.settings.client_id,
+                    self.settings.client_secret.get_secret_value(),
+                ),
+                scopes=self._scopes,
+                token_backend=token_backend,
+                **{"raise_http_errors": False},
+            )
+        else:
+            token_backend = FileSystemTokenBackend(
+                token_path=Path.home() / ".credentials"
+            )
+            account = Account(
+                credentials=(
+                    self.settings.client_id,
+                    self.settings.client_secret.get_secret_value(),
+                ),
+                scopes=self._scopes,
+                token_backend=token_backend,
+                **{"raise_http_errors": False},
+            )
+            # make the auth
+            account.authenticate()
+        return account
--- a/libs/langchain/langchain/document_loaders/confluence.py
+++ b/libs/langchain/langchain/document_loaders/confluence.py
@ -3,6 +3,7 @@ from enum import Enum
 from io import BytesIO
 from typing import Any, Callable, Dict, List, Optional, Union

+import requests
 from tenacity import (
    before_sleep_log,
    retry,
@ -68,6 +69,15 @@ class ConfluenceLoader(BaseLoader):
            )
            documents = loader.load(space_key="SPACE",limit=50)

+            # Server on perm
+            loader = ConfluenceLoader(
+                url="https://confluence.yoursite.com/",
+                username="me",
+                api_key="your_password",
+                cloud=False
+            )
+            documents = loader.load(space_key="SPACE",limit=50)
+
    :param url: _description_
    :type url: str
    :param api_key: _description_, defaults to None
@ -97,6 +107,7 @@ class ConfluenceLoader(BaseLoader):
        url: str,
        api_key: Optional[str] = None,
        username: Optional[str] = None,
+        session: Optional[requests.Session] = None,
        oauth2: Optional[dict] = None,
        token: Optional[str] = None,
        cloud: Optional[bool] = True,
@ -107,16 +118,15 @@ class ConfluenceLoader(BaseLoader):
    ):
        confluence_kwargs = confluence_kwargs or {}
        errors = ConfluenceLoader.validate_init_args(
-            url, api_key, username, oauth2, token
+            url=url,
+            api_key=api_key,
+            username=username,
+            session=session,
+            oauth2=oauth2,
+            token=token,
        )
        if errors:
            raise ValueError(f"Error(s) while validating input: {errors}")
-
-        self.base_url = url
-        self.number_of_retries = number_of_retries
-        self.min_retry_seconds = min_retry_seconds
-        self.max_retry_seconds = max_retry_seconds
-
        try:
            from atlassian import Confluence  # noqa: F401
        except ImportError:
@ -125,7 +135,14 @@ class ConfluenceLoader(BaseLoader):
                "`pip install atlassian-python-api`"
            )

-        if oauth2:
+        self.base_url = url
+        self.number_of_retries = number_of_retries
+        self.min_retry_seconds = min_retry_seconds
+        self.max_retry_seconds = max_retry_seconds
+
+        if session:
+            self.confluence = Confluence(url=url, session=session, **confluence_kwargs)
+        elif oauth2:
            self.confluence = Confluence(
                url=url, oauth2=oauth2, cloud=cloud, **confluence_kwargs
            )
@ -147,6 +164,7 @@ class ConfluenceLoader(BaseLoader):
        url: Optional[str] = None,
        api_key: Optional[str] = None,
        username: Optional[str] = None,
+        session: Optional[requests.Session] = None,
        oauth2: Optional[dict] = None,
        token: Optional[str] = None,
    ) -> Union[List, None]:
@ -162,33 +180,28 @@ class ConfluenceLoader(BaseLoader):
                "the other must be as well."
            )

-        if (api_key or username) and oauth2:
+        non_null_creds = list(
+            x is not None for x in ((api_key or username), session, oauth2, token)
+        )
+        if sum(non_null_creds) > 1:
+            all_names = ("(api_key, username)", "session", "oath2", "token")
+            provided = tuple(n for x, n in zip(non_null_creds, all_names) if x)
            errors.append(
-                "Cannot provide a value for `api_key` and/or "
-                "`username` and provide a value for `oauth2`"
+                f"Cannot provide a value for more than one of: {all_names}. Received "
+                f"values for: {provided}"
            )
-
-        if oauth2 and oauth2.keys() != [
+        if oauth2 and set(oauth2.keys()) != {
            "access_token",
            "access_token_secret",
            "consumer_key",
            "key_cert",
-        ]:
+        }:
            errors.append(
                "You have either omitted require keys or added extra "
                "keys to the oauth2 dictionary. key values should be "
                "`['access_token', 'access_token_secret', 'consumer_key', 'key_cert']`"
            )
-
-        if token and (api_key or username or oauth2):
-            errors.append(
-                "Cannot provide a value for `token` and a value for `api_key`, "
-                "`username` or `oauth2`"
-            )
-
-        if errors:
-            return errors
-        return None
+        return errors or None

    def load(
        self,
@ -205,6 +218,7 @@ class ConfluenceLoader(BaseLoader):
        max_pages: Optional[int] = 1000,
        ocr_languages: Optional[str] = None,
        keep_markdown_format: bool = False,
+        keep_newlines: bool = False,
    ) -> List[Document]:
        """
        :param space_key: Space key retrieved from a confluence URL, defaults to None
@ -237,6 +251,9 @@ class ConfluenceLoader(BaseLoader):
        :param keep_markdown_format: Whether to keep the markdown format, defaults to
            False
        :type keep_markdown_format: bool
+        :param keep_newlines: Whether to keep the newlines format, defaults to
+            False
+        :type keep_newlines: bool
        :raises ValueError: _description_
        :raises ImportError: _description_
        :return: _description_
@ -265,8 +282,9 @@ class ConfluenceLoader(BaseLoader):
                include_attachments,
                include_comments,
                content_format,
-                ocr_languages,
-                keep_markdown_format,
+                ocr_languages=ocr_languages,
+                keep_markdown_format=keep_markdown_format,
+                keep_newlines=keep_newlines,
            )

        if label:
@ -404,6 +422,7 @@ class ConfluenceLoader(BaseLoader):
        content_format: ContentFormat,
        ocr_languages: Optional[str] = None,
        keep_markdown_format: Optional[bool] = False,
+        keep_newlines: bool = False,
    ) -> List[Document]:
        """Process a list of pages into a list of documents."""
        docs = []
@ -415,8 +434,9 @@ class ConfluenceLoader(BaseLoader):
                include_attachments,
                include_comments,
                content_format,
-                ocr_languages,
-                keep_markdown_format,
+                ocr_languages=ocr_languages,
+                keep_markdown_format=keep_markdown_format,
+                keep_newlines=keep_newlines,
            )
            docs.append(doc)

@ -430,6 +450,7 @@ class ConfluenceLoader(BaseLoader):
        content_format: ContentFormat,
        ocr_languages: Optional[str] = None,
        keep_markdown_format: Optional[bool] = False,
+        keep_newlines: bool = False,
    ) -> Document:
        if keep_markdown_format:
            try:
@ -439,7 +460,7 @@ class ConfluenceLoader(BaseLoader):
                    "`markdownify` package not found, please run "
                    "`pip install markdownify`"
                )
-        else:
+        if include_comments or not keep_markdown_format:
            try:
                from bs4 import BeautifulSoup  # type: ignore
            except ImportError:
@ -447,7 +468,6 @@ class ConfluenceLoader(BaseLoader):
                    "`beautifulsoup4` package not found, please run "
                    "`pip install beautifulsoup4`"
                )
-
        if include_attachments:
            attachment_texts = self.process_attachment(page["id"], ocr_languages)
        else:
@ -461,9 +481,14 @@ class ConfluenceLoader(BaseLoader):

        else:
            content = content_format.get_content(page)
-            text = BeautifulSoup(content, "lxml").get_text(" ", strip=True) + "".join(
-                attachment_texts
-            )
+            if keep_newlines:
+                text = BeautifulSoup(
+                    content.replace("</p>", "\n</p>").replace("<br />", "\n"), "lxml"
+                ).get_text(" ") + "".join(attachment_texts)
+            else:
+                text = BeautifulSoup(content, "lxml").get_text(
+                    " ", strip=True
+                ) + "".join(attachment_texts)

        if include_comments:
            comments = self.confluence.get_page_comments(
--- a/libs/langchain/langchain/document_loaders/geodataframe.py
+++ b/libs/langchain/langchain/document_loaders/geodataframe.py
@ -29,19 +29,43 @@ class GeoDataFrameLoader(BaseLoader):
                f"Expected data_frame to be a gpd.GeoDataFrame, got {type(data_frame)}"
            )

+        if page_content_column not in data_frame.columns:
+            raise ValueError(
+                f"Expected data_frame to have a column named {page_content_column}"
+            )
+
+        if not isinstance(data_frame[page_content_column].iloc[0], gpd.GeoSeries):
+            raise ValueError(
+                f"Expected data_frame[{page_content_column}] to be a GeoSeries"
+            )
+
        self.data_frame = data_frame
        self.page_content_column = page_content_column

    def lazy_load(self) -> Iterator[Document]:
        """Lazy load records from dataframe."""

+        # assumes all geometries in GeoSeries are same CRS and Geom Type
+        crs_str = self.data_frame.crs.to_string() if self.data_frame.crs else None
+        geometry_type = self.data_frame.geometry.geom_type.iloc[0]
+
        for _, row in self.data_frame.iterrows():
-            text = row[self.page_content_column]
+            geom = row[self.page_content_column]
+
+            xmin, ymin, xmax, ymax = geom.bounds
+
            metadata = row.to_dict()
+            metadata["crs"] = crs_str
+            metadata["geometry_type"] = geometry_type
+            metadata["xmin"] = xmin
+            metadata["ymin"] = ymin
+            metadata["xmax"] = xmax
+            metadata["ymax"] = ymax
+
            metadata.pop(self.page_content_column)
-            # Enforce str since shapely Point objects
-            # geometry type used in GeoPandas) are not strings
-            yield Document(page_content=str(text), metadata=metadata)
+
+            # using WKT instead of str() to help GIS system interoperability
+            yield Document(page_content=geom.wkt, metadata=metadata)

    def load(self) -> List[Document]:
        """Load full dataframe."""
--- a/libs/langchain/langchain/document_loaders/onedrive.py
+++ b/libs/langchain/langchain/document_loaders/onedrive.py
@ -2,129 +2,49 @@
 from __future__ import annotations

 import logging
-import os
-import tempfile
-from enum import Enum
-from pathlib import Path
-from typing import TYPE_CHECKING, Dict, List, Optional, Type, Union
+from typing import TYPE_CHECKING, Iterator, List, Optional, Sequence, Union

 from langchain.docstore.document import Document
-from langchain.document_loaders.base import BaseLoader
-from langchain.document_loaders.onedrive_file import OneDriveFileLoader
-from langchain.pydantic_v1 import BaseModel, BaseSettings, Field, FilePath, SecretStr
+from langchain.document_loaders.base_o365 import (
+    O365BaseLoader,
+    _FileType,
+)
+from langchain.document_loaders.parsers.registry import get_parser
+from langchain.pydantic_v1 import Field

 if TYPE_CHECKING:
-    from O365 import Account
    from O365.drive import Drive, Folder

-SCOPES = ["offline_access", "Files.Read.All"]
 logger = logging.getLogger(__name__)


-class _OneDriveSettings(BaseSettings):
-    client_id: str = Field(..., env="O365_CLIENT_ID")
-    client_secret: SecretStr = Field(..., env="O365_CLIENT_SECRET")
-
-    class Config:
-        env_prefix = ""
-        case_sentive = False
-        env_file = ".env"
-
-
-class _OneDriveTokenStorage(BaseSettings):
-    token_path: FilePath = Field(Path.home() / ".credentials" / "o365_token.txt")
-
-
-class _FileType(str, Enum):
-    DOC = "doc"
-    DOCX = "docx"
-    PDF = "pdf"
-
-
-class _SupportedFileTypes(BaseModel):
-    file_types: List[_FileType]
-
-    def fetch_mime_types(self) -> Dict[str, str]:
-        mime_types_mapping = {}
-        for file_type in self.file_types:
-            if file_type.value == "doc":
-                mime_types_mapping[file_type.value] = "application/msword"
-            elif file_type.value == "docx":
-                mime_types_mapping[
-                    file_type.value
-                ] = "application/vnd.openxmlformats-officedocument.wordprocessingml.document"  # noqa: E501
-            elif file_type.value == "pdf":
-                mime_types_mapping[file_type.value] = "application/pdf"
-        return mime_types_mapping
-
-
-class OneDriveLoader(BaseLoader, BaseModel):
+class OneDriveLoader(O365BaseLoader):
    """Load from `Microsoft OneDrive`."""

-    settings: _OneDriveSettings = Field(default_factory=_OneDriveSettings)
-    """ The settings for the OneDrive API client."""
    drive_id: str = Field(...)
    """ The ID of the OneDrive drive to load data from."""
    folder_path: Optional[str] = None
    """ The path to the folder to load data from."""
    object_ids: Optional[List[str]] = None
    """ The IDs of the objects to load data from."""
-    auth_with_token: bool = False
-    """ Whether to authenticate with a token or not. Defaults to False."""

-    def _auth(self) -> Type[Account]:
-        """
-        Authenticates the OneDrive API client using the specified
-        authentication method and returns the Account object.
+    @property
+    def _file_types(self) -> Sequence[_FileType]:
+        """Return supported file types."""
+        return _FileType.DOC, _FileType.DOCX, _FileType.PDF

-        Returns:
-            Type[Account]: The authenticated Account object.
-        """
-        try:
-            from O365 import FileSystemTokenBackend
-        except ImportError:
-            raise ImportError(
-                "O365 package not found, please install it with `pip install o365`"
-            )
-        if self.auth_with_token:
-            token_storage = _OneDriveTokenStorage()
-            token_path = token_storage.token_path
-            token_backend = FileSystemTokenBackend(
-                token_path=token_path.parent, token_filename=token_path.name
-            )
-            account = Account(
-                credentials=(
-                    self.settings.client_id,
-                    self.settings.client_secret.get_secret_value(),
-                ),
-                scopes=SCOPES,
-                token_backend=token_backend,
-                **{"raise_http_errors": False},
-            )
-        else:
-            token_backend = FileSystemTokenBackend(
-                token_path=Path.home() / ".credentials"
-            )
-            account = Account(
-                credentials=(
-                    self.settings.client_id,
-                    self.settings.client_secret.get_secret_value(),
-                ),
-                scopes=SCOPES,
-                token_backend=token_backend,
-                **{"raise_http_errors": False},
-            )
-            # make the auth
-            account.authenticate()
-        return account
+    @property
+    def _scopes(self) -> List[str]:
+        """Return required scopes."""
+        return ["offline_access", "Files.Read.All"]

-    def _get_folder_from_path(self, drive: Type[Drive]) -> Union[Folder, Drive]:
+    def _get_folder_from_path(self, drive: Drive) -> Union[Folder, Drive]:
        """
        Returns the folder or drive object located at the
        specified path relative to the given drive.

        Args:
-            drive (Type[Drive]): The root drive from which the folder path is relative.
+            drive (Drive): The root drive from which the folder path is relative.

        Returns:
            Union[Folder, Drive]: The folder or drive object
@ -151,90 +71,26 @@ class OneDriveLoader(BaseLoader, BaseModel):
                raise FileNotFoundError("Path {} not exist.".format(self.folder_path))
        return subfolder_drive

-    def _load_from_folder(self, folder: Type[Folder]) -> List[Document]:
-        """
-        Loads all supported document files from the specified folder
-        and returns a list of Document objects.
-
-        Args:
-            folder (Type[Folder]): The folder object to load the documents from.
-
-        Returns:
-            List[Document]: A list of Document objects representing
-            the loaded documents.
-
-        """
-        docs = []
-        file_types = _SupportedFileTypes(file_types=["doc", "docx", "pdf"])
-        file_mime_types = file_types.fetch_mime_types()
-        items = folder.get_items()
-        with tempfile.TemporaryDirectory() as temp_dir:
-            file_path = f"{temp_dir}"
-            os.makedirs(os.path.dirname(file_path), exist_ok=True)
-            for file in items:
-                if file.is_file:
-                    if file.mime_type in list(file_mime_types.values()):
-                        loader = OneDriveFileLoader(file=file)
-                        docs.extend(loader.load())
-        return docs
-
-    def _load_from_object_ids(self, drive: Type[Drive]) -> List[Document]:
-        """
-        Loads all supported document files from the specified OneDrive
-        drive based on their object IDs and returns a list
-        of Document objects.
-
-        Args:
-            drive (Type[Drive]): The OneDrive drive object
-            to load the documents from.
-
-        Returns:
-            List[Document]: A list of Document objects representing
-            the loaded documents.
-        """
-        docs = []
-        file_types = _SupportedFileTypes(file_types=["doc", "docx", "pdf"])
-        file_mime_types = file_types.fetch_mime_types()
-        with tempfile.TemporaryDirectory() as temp_dir:
-            file_path = f"{temp_dir}"
-            os.makedirs(os.path.dirname(file_path), exist_ok=True)
-
-            for object_id in self.object_ids if self.object_ids else [""]:
-                file = drive.get_item(object_id)
-                if not file:
-                    logger.warning(
-                        "There isn't a file with "
-                        f"object_id {object_id} in drive {drive}."
-                    )
-                    continue
-                if file.is_file:
-                    if file.mime_type in list(file_mime_types.values()):
-                        loader = OneDriveFileLoader(file=file)
-                        docs.extend(loader.load())
-        return docs
+    def lazy_load(self) -> Iterator[Document]:
+        """Load documents lazily. Use this when working at a large scale."""
+        try:
+            from O365.drive import Drive
+        except ImportError:
+            raise ImportError(
+                "O365 package not found, please install it with `pip install o365`"
+            )
+        drive = self._auth().storage().get_drive(self.drive_id)
+        if not isinstance(drive, Drive):
+            raise ValueError(f"There isn't a Drive with id {self.drive_id}.")
+        blob_parser = get_parser("default")
+        if self.folder_path:
+            folder = self._get_folder_from_path(drive)
+            for blob in self._load_from_folder(folder):
+                yield from blob_parser.lazy_parse(blob)
+        if self.object_ids:
+            for blob in self._load_from_object_ids(drive, self.object_ids):
+                yield from blob_parser.lazy_parse(blob)

    def load(self) -> List[Document]:
-        """
-        Loads all supported document files from the specified OneDrive drive
-        and return a list of Document objects.
-
-        Returns:
-            List[Document]: A list of Document objects
-            representing the loaded documents.
-
-        Raises:
-            ValueError: If the specified drive ID
-            does not correspond to a drive in the OneDrive storage.
-        """
-        account = self._auth()
-        storage = account.storage()
-        drive = storage.get_drive(self.drive_id)
-        docs: List[Document] = []
-        if not drive:
-            raise ValueError(f"There isn't a drive with id {self.drive_id}.")
-        if self.folder_path:
-            folder = self._get_folder_from_path(drive=drive)
-            docs.extend(self._load_from_folder(folder=folder))
-        elif self.object_ids:
-            docs.extend(self._load_from_object_ids(drive=drive))
-        return docs
+        """Load all documents."""
+        return list(self.lazy_load())
--- a/libs/langchain/langchain/document_loaders/parsers/msword.py
+++ b/libs/langchain/langchain/document_loaders/parsers/msword.py
@ -0,0 +1,34 @@
+from typing import Iterator
+
+from langchain.document_loaders.base import BaseBlobParser
+from langchain.document_loaders.blob_loaders import Blob
+from langchain.schema import Document
+
+
+class MsWordParser(BaseBlobParser):
+    def lazy_parse(self, blob: Blob) -> Iterator[Document]:
+        try:
+            from unstructured.partition.doc import partition_doc
+            from unstructured.partition.docx import partition_docx
+        except ImportError as e:
+            raise ImportError(
+                "Could not import unstructured, please install with `pip install "
+                "unstructured`."
+            ) from e
+
+        mime_type_parser = {
+            "application/msword": partition_doc,
+            "application/vnd.openxmlformats-officedocument.wordprocessingml.document": (
+                partition_docx
+            ),
+        }
+        if blob.mimetype not in (
+            "application/msword",
+            "application/vnd.openxmlformats-officedocument.wordprocessingml.document",
+        ):
+            raise ValueError("This blob type is not supported for this parser.")
+        with blob.as_bytes_io() as word_document:
+            elements = mime_type_parser[blob.mimetype](file=word_document)
+            text = "\n\n".join([str(el) for el in elements])
+            metadata = {"source": blob.source}
+            yield Document(page_content=text, metadata=metadata)
--- a/libs/langchain/langchain/document_loaders/parsers/registry.py
+++ b/libs/langchain/langchain/document_loaders/parsers/registry.py
@ -1,6 +1,7 @@
 """Module includes a registry of default parser configurations."""
 from langchain.document_loaders.base import BaseBlobParser
 from langchain.document_loaders.parsers.generic import MimeTypeBasedParser
+from langchain.document_loaders.parsers.msword import MsWordParser
 from langchain.document_loaders.parsers.pdf import PyMuPDFParser
 from langchain.document_loaders.parsers.txt import TextParser

@ -11,6 +12,10 @@ def _get_default_parser() -> BaseBlobParser:
        handlers={
            "application/pdf": PyMuPDFParser(),
            "text/plain": TextParser(),
+            "application/msword": MsWordParser(),
+            "application/vnd.openxmlformats-officedocument.wordprocessingml.document": (
+                MsWordParser()
+            ),
        },
        fallback_parser=None,
    )
--- a/libs/langchain/langchain/document_loaders/sharepoint.py
+++ b/libs/langchain/langchain/document_loaders/sharepoint.py
@ -0,0 +1,59 @@
+"""Loader that loads data from Sharepoint Document Library"""
+from __future__ import annotations
+
+from typing import Iterator, List, Optional, Sequence
+
+from langchain.docstore.document import Document
+from langchain.document_loaders.base_o365 import (
+    O365BaseLoader,
+    _FileType,
+)
+from langchain.document_loaders.parsers.registry import get_parser
+from langchain.pydantic_v1 import Field
+
+
+class SharePointLoader(O365BaseLoader):
+    """Load  from `SharePoint`."""
+
+    document_library_id: str = Field(...)
+    """ The ID of the SharePoint document library to load data from."""
+    folder_path: Optional[str] = None
+    """ The path to the folder to load data from."""
+    object_ids: Optional[List[str]] = None
+    """ The IDs of the objects to load data from."""
+
+    @property
+    def _file_types(self) -> Sequence[_FileType]:
+        """Return supported file types."""
+        return _FileType.DOC, _FileType.DOCX, _FileType.PDF
+
+    @property
+    def _scopes(self) -> List[str]:
+        """Return required scopes."""
+        return ["sharepoint", "basic"]
+
+    def lazy_load(self) -> Iterator[Document]:
+        """Load documents lazily. Use this when working at a large scale."""
+        try:
+            from O365.drive import Drive, Folder
+        except ImportError:
+            raise ImportError(
+                "O365 package not found, please install it with `pip install o365`"
+            )
+        drive = self._auth().storage().get_drive(self.document_library_id)
+        if not isinstance(drive, Drive):
+            raise ValueError(f"There isn't a Drive with id {self.document_library_id}.")
+        blob_parser = get_parser("default")
+        if self.folder_path:
+            target_folder = drive.get_item_by_path(self.folder_path)
+            if not isinstance(target_folder, Folder):
+                raise ValueError(f"There isn't a folder with path {self.folder_path}.")
+            for blob in self._load_from_folder(target_folder):
+                yield from blob_parser.lazy_parse(blob)
+        if self.object_ids:
+            for blob in self._load_from_object_ids(drive, self.object_ids):
+                yield from blob_parser.lazy_parse(blob)
+
+    def load(self) -> List[Document]:
+        """Load all documents."""
+        return list(self.lazy_load())
--- a/libs/langchain/langchain/document_loaders/unstructured.py
+++ b/libs/langchain/langchain/document_loaders/unstructured.py
@ -74,7 +74,7 @@ class UnstructuredBaseLoader(BaseLoader, ABC):

    def _post_process_elements(self, elements: list) -> list:
        """Applies post processing functions to extracted unstructured elements.
-        Post processing functions are Element -> Element callables are passed
+        Post processing functions are str -> str callables are passed
        in using the post_processors kwarg when the loader is instantiated."""
        for element in elements:
            for post_processor in self.post_processors:
@ -84,6 +84,7 @@ class UnstructuredBaseLoader(BaseLoader, ABC):
    def load(self) -> List[Document]:
        """Load file."""
        elements = self._get_elements()
+        self._post_process_elements(elements)
        if self.mode == "elements":
            docs: List[Document] = list()
            for element in elements:
--- a/libs/langchain/langchain/embeddings/init.py
+++ b/libs/langchain/langchain/embeddings/init.py
@ -28,6 +28,7 @@ from langchain.embeddings.deepinfra import DeepInfraEmbeddings
 from langchain.embeddings.edenai import EdenAiEmbeddings
 from langchain.embeddings.elasticsearch import ElasticsearchEmbeddings
 from langchain.embeddings.embaas import EmbaasEmbeddings
+from langchain.embeddings.ernie import ErnieEmbeddings
 from langchain.embeddings.fake import DeterministicFakeEmbedding, FakeEmbeddings
 from langchain.embeddings.google_palm import GooglePalmEmbeddings
 from langchain.embeddings.gpt4all import GPT4AllEmbeddings
@ -101,6 +102,7 @@ __all__ = [
    "LocalAIEmbeddings",
    "AwaEmbeddings",
    "HuggingFaceBgeEmbeddings",
+    "ErnieEmbeddings",
 ]


--- a/libs/langchain/langchain/embeddings/clarifai.py
+++ b/libs/langchain/langchain/embeddings/clarifai.py
@ -103,37 +103,44 @@ class ClarifaiEmbeddings(BaseModel, Embeddings):
                "Please install it with `pip install clarifai`."
            )

-        post_model_outputs_request = service_pb2.PostModelOutputsRequest(
-            user_app_id=self.userDataObject,
-            model_id=self.model_id,
-            version_id=self.model_version_id,
-            inputs=[
-                resources_pb2.Input(
-                    data=resources_pb2.Data(text=resources_pb2.Text(raw=t))
-                )
-                for t in texts
-            ],
-        )
-        post_model_outputs_response = self.stub.PostModelOutputs(
-            post_model_outputs_request
-        )
+        batch_size = 32
+        embeddings = []
+        for i in range(0, len(texts), batch_size):
+            batch = texts[i : i + batch_size]

-        if post_model_outputs_response.status.code != status_code_pb2.SUCCESS:
-            logger.error(post_model_outputs_response.status)
-            first_output_failure = (
-                post_model_outputs_response.outputs[0].status
-                if len(post_model_outputs_response.outputs[0])
-                else None
+            post_model_outputs_request = service_pb2.PostModelOutputsRequest(
+                user_app_id=self.userDataObject,
+                model_id=self.model_id,
+                version_id=self.model_version_id,
+                inputs=[
+                    resources_pb2.Input(
+                        data=resources_pb2.Data(text=resources_pb2.Text(raw=t))
+                    )
+                    for t in batch
+                ],
            )
-            raise Exception(
-                f"Post model outputs failed, status: "
-                f"{post_model_outputs_response.status}, first output failure: "
-                f"{first_output_failure}"
+            post_model_outputs_response = self.stub.PostModelOutputs(
+                post_model_outputs_request
+            )
+
+            if post_model_outputs_response.status.code != status_code_pb2.SUCCESS:
+                logger.error(post_model_outputs_response.status)
+                first_output_failure = (
+                    post_model_outputs_response.outputs[0].status
+                    if len(post_model_outputs_response.outputs)
+                    else None
+                )
+                raise Exception(
+                    f"Post model outputs failed, status: "
+                    f"{post_model_outputs_response.status}, first output failure: "
+                    f"{first_output_failure}"
+                )
+            embeddings.extend(
+                [
+                    list(o.data.embeddings[0].vector)
+                    for o in post_model_outputs_response.outputs
+                ]
            )
-        embeddings = [
-            list(o.data.embeddings[0].vector)
-            for o in post_model_outputs_response.outputs
-        ]
        return embeddings

    def embed_query(self, text: str) -> List[float]:
--- a/libs/langchain/langchain/embeddings/ernie.py
+++ b/libs/langchain/langchain/embeddings/ernie.py
@ -0,0 +1,102 @@
+import logging
+import threading
+from typing import Dict, List, Optional
+
+import requests
+
+from langchain.embeddings.base import Embeddings
+from langchain.pydantic_v1 import BaseModel, root_validator
+from langchain.utils import get_from_dict_or_env
+
+logger = logging.getLogger(__name__)
+
+
+class ErnieEmbeddings(BaseModel, Embeddings):
+    """`Ernie Embeddings V1` embedding models."""
+
+    ernie_client_id: Optional[str] = None
+    ernie_client_secret: Optional[str] = None
+    access_token: Optional[str] = None
+
+    chunk_size: int = 16
+
+    model_name = "ErnieBot-Embedding-V1"
+
+    _lock = threading.Lock()
+
+    @root_validator()
+    def validate_environment(cls, values: Dict) -> Dict:
+        values["ernie_client_id"] = get_from_dict_or_env(
+            values,
+            "ernie_client_id",
+            "ERNIE_CLIENT_ID",
+        )
+        values["ernie_client_secret"] = get_from_dict_or_env(
+            values,
+            "ernie_client_secret",
+            "ERNIE_CLIENT_SECRET",
+        )
+        return values
+
+    def _embedding(self, json: object) -> dict:
+        base_url = (
+            "https://aip.baidubce.com/rpc/2.0/ai_custom/v1/wenxinworkshop/embeddings"
+        )
+        resp = requests.post(
+            f"{base_url}/embedding-v1",
+            headers={
+                "Content-Type": "application/json",
+            },
+            params={"access_token": self.access_token},
+            json=json,
+        )
+        return resp.json()
+
+    def _refresh_access_token_with_lock(self) -> None:
+        with self._lock:
+            logger.debug("Refreshing access token")
+            base_url: str = "https://aip.baidubce.com/oauth/2.0/token"
+            resp = requests.post(
+                base_url,
+                headers={
+                    "Content-Type": "application/json",
+                    "Accept": "application/json",
+                },
+                params={
+                    "grant_type": "client_credentials",
+                    "client_id": self.ernie_client_id,
+                    "client_secret": self.ernie_client_secret,
+                },
+            )
+            self.access_token = str(resp.json().get("access_token"))
+
+    def embed_documents(self, texts: List[str]) -> List[List[float]]:
+        if not self.access_token:
+            self._refresh_access_token_with_lock()
+        text_in_chunks = [
+            texts[i : i + self.chunk_size]
+            for i in range(0, len(texts), self.chunk_size)
+        ]
+        lst = []
+        for chunk in text_in_chunks:
+            resp = self._embedding({"input": [text for text in chunk]})
+            if resp.get("error_code"):
+                if resp.get("error_code") == 111:
+                    self._refresh_access_token_with_lock()
+                    resp = self._embedding({"input": [text for text in chunk]})
+                else:
+                    raise ValueError(f"Error from Ernie: {resp}")
+            lst.extend([i["embedding"] for i in resp["data"]])
+        return lst
+
+    def embed_query(self, text: str) -> List[float]:
+        if not self.access_token:
+            self._refresh_access_token_with_lock()
+        resp = self._embedding({"input": [text]})
+        if resp.get("error_code"):
+            if resp.get("error_code") == 111:
+                self._refresh_access_token_with_lock()
+                resp = self._embedding({"input": [text]})
+            else:
+                raise ValueError(f"Error from Ernie: {resp}")
+        return resp["data"][0]["embedding"]
--- a/libs/langchain/langchain/llms/init.py
+++ b/libs/langchain/langchain/llms/init.py
@ -69,6 +69,7 @@ from langchain.llms.petals import Petals
 from langchain.llms.pipelineai import PipelineAI
 from langchain.llms.predibase import Predibase
 from langchain.llms.predictionguard import PredictionGuard
+from langchain.llms.promptguard import PromptGuard
 from langchain.llms.promptlayer_openai import PromptLayerOpenAI, PromptLayerOpenAIChat
 from langchain.llms.replicate import Replicate
 from langchain.llms.rwkv import RWKV
@ -141,6 +142,7 @@ __all__ = [
    "PredictionGuard",
    "PromptLayerOpenAI",
    "PromptLayerOpenAIChat",
+    "PromptGuard",
    "RWKV",
    "Replicate",
    "SagemakerEndpoint",
@ -205,6 +207,7 @@ type_to_cls_dict: Dict[str, Type[BaseLLM]] = {
    "petals": Petals,
    "pipelineai": PipelineAI,
    "predibase": Predibase,
+    "promptguard": PromptGuard,
    "replicate": Replicate,
    "rwkv": RWKV,
    "sagemaker_endpoint": SagemakerEndpoint,
--- a/libs/langchain/langchain/llms/clarifai.py
+++ b/libs/langchain/langchain/llms/clarifai.py
@ -5,6 +5,7 @@ from langchain.callbacks.manager import CallbackManagerForLLMRun
 from langchain.llms.base import LLM
 from langchain.llms.utils import enforce_stop_tokens
 from langchain.pydantic_v1 import Extra, root_validator
+from langchain.schema import Generation, LLMResult
 from langchain.utils import get_from_dict_or_env

 logger = logging.getLogger(__name__)
@ -163,7 +164,7 @@ class Clarifai(LLM):
            logger.error(post_model_outputs_response.status)
            first_model_failure = (
                post_model_outputs_response.outputs[0].status
-                if len(post_model_outputs_response.outputs[0])
+                if len(post_model_outputs_response.outputs)
                else None
            )
            raise Exception(
@ -178,3 +179,67 @@ class Clarifai(LLM):
        if stop is not None:
            text = enforce_stop_tokens(text, stop)
        return text
+
+    def _generate(
+        self,
+        prompts: List[str],
+        stop: Optional[List[str]] = None,
+        run_manager: Optional[CallbackManagerForLLMRun] = None,
+        **kwargs: Any,
+    ) -> LLMResult:
+        """Run the LLM on the given prompt and input."""
+
+        try:
+            from clarifai_grpc.grpc.api import (
+                resources_pb2,
+                service_pb2,
+            )
+            from clarifai_grpc.grpc.api.status import status_code_pb2
+        except ImportError:
+            raise ImportError(
+                "Could not import clarifai python package. "
+                "Please install it with `pip install clarifai`."
+            )
+
+        # TODO: add caching here.
+        generations = []
+        batch_size = 32
+        for i in range(0, len(prompts), batch_size):
+            batch = prompts[i : i + batch_size]
+            post_model_outputs_request = service_pb2.PostModelOutputsRequest(
+                user_app_id=self.userDataObject,
+                model_id=self.model_id,
+                version_id=self.model_version_id,
+                inputs=[
+                    resources_pb2.Input(
+                        data=resources_pb2.Data(text=resources_pb2.Text(raw=prompt))
+                    )
+                    for prompt in batch
+                ],
+            )
+            post_model_outputs_response = self.stub.PostModelOutputs(
+                post_model_outputs_request
+            )
+
+            if post_model_outputs_response.status.code != status_code_pb2.SUCCESS:
+                logger.error(post_model_outputs_response.status)
+                first_model_failure = (
+                    post_model_outputs_response.outputs[0].status
+                    if len(post_model_outputs_response.outputs)
+                    else None
+                )
+                raise Exception(
+                    f"Post model outputs failed, status: "
+                    f"{post_model_outputs_response.status}, first output failure: "
+                    f"{first_model_failure}"
+                )
+
+            for output in post_model_outputs_response.outputs:
+                if stop is not None:
+                    text = enforce_stop_tokens(output.data.text.raw, stop)
+                else:
+                    text = output.data.text.raw
+
+                generations.append([Generation(text=text)])
+
+        return LLMResult(generations=generations)
--- a/libs/langchain/langchain/llms/promptguard.py
+++ b/libs/langchain/langchain/llms/promptguard.py
@ -0,0 +1,116 @@
+import logging
+from typing import Any, Dict, List, Optional
+
+from langchain.callbacks.manager import CallbackManagerForLLMRun
+from langchain.llms.base import LLM
+from langchain.pydantic_v1 import Extra, root_validator
+from langchain.schema.language_model import BaseLanguageModel
+from langchain.utils import get_from_dict_or_env
+
+logger = logging.getLogger(__name__)
+
+
+class PromptGuard(LLM):
+    """An LLM wrapper that uses PromptGuard to sanitize prompts.
+
+    Wraps another LLM and sanitizes prompts before passing it to the LLM, then
+        de-sanitizes the response.
+
+    To use, you should have the ``promptguard`` python package installed,
+    and the environment variable ``PROMPTGUARD_API_KEY`` set with
+    your API key, or pass it as a named parameter to the constructor.
+
+    Example:
+        .. code-block:: python
+
+            from langchain.llms import PromptGuardLLM
+            from langchain.chat_models import ChatOpenAI
+
+            prompt_guard_llm = PromptGuardLLM(base_llm=ChatOpenAI())
+    """
+
+    base_llm: BaseLanguageModel
+    """The base LLM to use."""
+
+    class Config:
+        """Configuration for this pydantic object."""
+
+        extra = Extra.forbid
+
+    @root_validator()
+    def validate_environment(cls, values: Dict) -> Dict:
+        """Validates that the PromptGuard API key and the Python package exist."""
+        try:
+            import promptguard as pg
+        except ImportError:
+            raise ImportError(
+                "Could not import the `promptguard` Python package, "
+                "please install it with `pip install promptguard`."
+            )
+        if pg.__package__ is None:
+            raise ValueError(
+                "Could not properly import `promptguard`, "
+                "promptguard.__package__ is None."
+            )
+
+        api_key = get_from_dict_or_env(
+            values, "promptguard_api_key", "PROMPTGUARD_API_KEY", default=""
+        )
+        if not api_key:
+            raise ValueError(
+                "Could not find PROMPTGUARD_API_KEY in the environment. "
+                "Please set it to your PromptGuard API key."
+                "You can get it by creating an account on the PromptGuard website: "
+                "https://promptguard.opaque.co/ ."
+            )
+        return values
+
+    def _call(
+        self,
+        prompt: str,
+        stop: Optional[List[str]] = None,
+        run_manager: Optional[CallbackManagerForLLMRun] = None,
+        **kwargs: Any,
+    ) -> str:
+        """Call base LLM with sanitization before and de-sanitization after.
+
+        Args:
+            prompt: The prompt to pass into the model.
+
+        Returns:
+            The string generated by the model.
+
+        Example:
+            .. code-block:: python
+
+                response = prompt_guard_llm("Tell me a joke.")
+        """
+        import promptguard as pg
+
+        _run_manager = run_manager or CallbackManagerForLLMRun.get_noop_manager()
+
+        # sanitize the prompt by replacing the sensitive information with a placeholder
+        sanitize_response: pg.SanitizeResponse = pg.sanitize(prompt)
+        sanitized_prompt_value_str = sanitize_response.sanitized_text
+
+        # TODO: Add in callbacks once child runs for LLMs are supported by LangSmith.
+        # call the LLM with the sanitized prompt and get the response
+        llm_response = self.base_llm.predict(
+            sanitized_prompt_value_str,
+            stop=stop,
+        )
+
+        # desanitize the response by restoring the original sensitive information
+        desanitize_response: pg.DesanitizeResponse = pg.desanitize(
+            llm_response,
+            secure_context=sanitize_response.secure_context,
+        )
+        return desanitize_response.desanitized_text
+
+    @property
+    def _llm_type(self) -> str:
+        """Return type of LLM.
+
+        This is an override of the base class method.
+        """
+        return "promptguard"
--- a/libs/langchain/langchain/llms/textgen.py
+++ b/libs/langchain/langchain/llms/textgen.py
@ -1,11 +1,13 @@
+import json
 import logging
-from typing import Any, Dict, List, Optional
+from typing import Any, Dict, Iterator, List, Optional

 import requests

 from langchain.callbacks.manager import CallbackManagerForLLMRun
 from langchain.llms.base import LLM
 from langchain.pydantic_v1 import Field
+from langchain.schema.output import GenerationChunk

 logger = logging.getLogger(__name__)

@ -109,7 +111,7 @@ class TextGen(LLM):
    """A list of strings to stop generation when encountered."""

    streaming: bool = False
-    """Whether to stream the results, token by token (currently unimplemented)."""
+    """Whether to stream the results, token by token."""

    @property
    def _default_params(self) -> Dict[str, Any]:
@ -198,19 +200,99 @@ class TextGen(LLM):
                llm("Write a story about llamas.")
        """
        if self.streaming:
-            raise ValueError("`streaming` option currently unsupported.")
+            combined_text_output = ""
+            for chunk in self._stream(
+                prompt=prompt, stop=stop, run_manager=run_manager, **kwargs
+            ):
+                combined_text_output += chunk.text
+            print(prompt + combined_text_output)
+            result = combined_text_output

-        url = f"{self.model_url}/api/v1/generate"
-        params = self._get_parameters(stop)
-        request = params.copy()
-        request["prompt"] = prompt
-        response = requests.post(url, json=request)
-
-        if response.status_code == 200:
-            result = response.json()["results"][0]["text"]
-            print(prompt + result)
        else:
-            print(f"ERROR: response: {response}")
-            result = ""
+            url = f"{self.model_url}/api/v1/generate"
+            params = self._get_parameters(stop)
+            request = params.copy()
+            request["prompt"] = prompt
+            response = requests.post(url, json=request)
+
+            if response.status_code == 200:
+                result = response.json()["results"][0]["text"]
+                print(prompt + result)
+            else:
+                print(f"ERROR: response: {response}")
+                result = ""

        return result
+
+    def _stream(
+        self,
+        prompt: str,
+        stop: Optional[List[str]] = None,
+        run_manager: Optional[CallbackManagerForLLMRun] = None,
+        **kwargs: Any,
+    ) -> Iterator[GenerationChunk]:
+        """Yields results objects as they are generated in real time.
+
+        It also calls the callback manager's on_llm_new_token event with
+        similar parameters to the OpenAI LLM class method of the same name.
+
+        Args:
+            prompt: The prompts to pass into the model.
+            stop: Optional list of stop words to use when generating.
+
+        Returns:
+            A generator representing the stream of tokens being generated.
+
+        Yields:
+            A dictionary like objects containing a string token and metadata.
+            See text-generation-webui docs and below for more.
+
+        Example:
+            .. code-block:: python
+
+                from langchain.llms import TextGen
+                llm = TextGen(
+                    model_url = "ws://localhost:5005"
+                    streaming=True
+                )
+                for chunk in llm.stream("Ask 'Hi, how are you?' like a pirate:'",
+                        stop=["'","\n"]):
+                    print(chunk, end='', flush=True)
+
+        """
+        try:
+            import websocket
+        except ImportError:
+            raise ImportError(
+                "The `websocket-client` package is required for streaming."
+            )
+
+        params = {**self._get_parameters(stop), **kwargs}
+
+        url = f"{self.model_url}/api/v1/stream"
+
+        request = params.copy()
+        request["prompt"] = prompt
+
+        websocket_client = websocket.WebSocket()
+
+        websocket_client.connect(url)
+
+        websocket_client.send(json.dumps(request))
+
+        while True:
+            result = websocket_client.recv()
+            result = json.loads(result)
+
+            if result["event"] == "text_stream":
+                chunk = GenerationChunk(
+                    text=result["text"],
+                    generation_info=None,
+                )
+                yield chunk
+            elif result["event"] == "stream_end":
+                websocket_client.close()
+                return
+
+            if run_manager:
+                run_manager.on_llm_new_token(token=chunk.text)
--- a/libs/langchain/langchain/output_parsers/openai_functions.py
+++ b/libs/langchain/langchain/output_parsers/openai_functions.py
@ -37,17 +37,33 @@ class OutputFunctionsParser(BaseGenerationOutputParser[Any]):
 class JsonOutputFunctionsParser(OutputFunctionsParser):
    """Parse an output as the Json object."""

+    strict: bool = False
+    """Whether to allow non-JSON-compliant strings.
+    
+    See: https://docs.python.org/3/library/json.html#encoders-and-decoders
+    
+    Useful when the parsed output may include unicode characters or new lines.
+    """
+
    def parse_result(self, result: List[Generation]) -> Any:
        function_call_info = super().parse_result(result)
        if self.args_only:
            try:
-                return json.loads(function_call_info)
+                return json.loads(function_call_info, strict=self.strict)
            except (json.JSONDecodeError, TypeError) as exc:
                raise OutputParserException(
                    f"Could not parse function call data: {exc}"
                )
-        function_call_info["arguments"] = json.loads(function_call_info["arguments"])
-        return function_call_info
+        else:
+            try:
+                function_call_info["arguments"] = json.loads(
+                    function_call_info["arguments"], strict=self.strict
+                )
+            except (json.JSONDecodeError, TypeError) as exc:
+                raise OutputParserException(
+                    f"Could not parse function call data: {exc}"
+                )
+            return function_call_info


 class JsonKeyOutputFunctionsParser(JsonOutputFunctionsParser):
--- a/libs/langchain/langchain/prompts/chat.py
+++ b/libs/langchain/langchain/prompts/chat.py
@ -6,6 +6,7 @@ from pathlib import Path
 from typing import (
    Any,
    Callable,
+    Dict,
    List,
    Sequence,
    Set,
@ -298,6 +299,15 @@ class ChatPromptValue(PromptValue):
 class BaseChatPromptTemplate(BasePromptTemplate, ABC):
    """Base class for chat prompt templates."""

+    @property
+    def lc_attributes(self) -> Dict:
+        """
+        Return a list of attribute names that should be included in the
+        serialized kwargs. These attributes must be accepted by the
+        constructor.
+        """
+        return {"input_variables": self.input_variables}
+
    def format(self, **kwargs: Any) -> str:
        """Format the chat template into a string.

@ -337,7 +347,7 @@ MessageLikeRepresentation = Union[
 ]


-class ChatPromptTemplate(BaseChatPromptTemplate, ABC):
+class ChatPromptTemplate(BaseChatPromptTemplate):
    """A prompt template for chat models.

    Use to create flexible templated prompts for chat models.
@ -419,7 +429,7 @@ class ChatPromptTemplate(BaseChatPromptTemplate, ABC):
                    f"Got: {values['input_variables']}"
                )
        else:
-            values["input_variables"] = list(input_vars)
+            values["input_variables"] = sorted(input_vars)
        return values

    @classmethod
--- a/libs/langchain/langchain/schema/runnable/base.py
+++ b/libs/langchain/langchain/schema/runnable/base.py
@ -266,7 +266,7 @@ class Runnable(Generic[Input, Output], ABC):
        callback_manager = get_callback_manager_for_config(config)
        run_manager = callback_manager.on_chain_start(
            dumpd(self),
-            input if isinstance(input, dict) else {"input": input},
+            input,
            run_type=run_type,
        )
        try:
@ -284,12 +284,7 @@ class Runnable(Generic[Input, Output], ABC):
            run_manager.on_chain_error(e)
            raise
        else:
-            output_for_tracer = dumpd(output)
-            run_manager.on_chain_end(
-                output_for_tracer
-                if isinstance(output_for_tracer, dict)
-                else {"output": output_for_tracer}
-            )
+            run_manager.on_chain_end(dumpd(output))
            return output

    async def _acall_with_config(
@ -312,7 +307,7 @@ class Runnable(Generic[Input, Output], ABC):
        callback_manager = get_async_callback_manager_for_config(config)
        run_manager = await callback_manager.on_chain_start(
            dumpd(self),
-            input if isinstance(input, dict) else {"input": input},
+            input,
            run_type=run_type,
        )
        try:
@ -333,12 +328,7 @@ class Runnable(Generic[Input, Output], ABC):
            await run_manager.on_chain_error(e)
            raise
        else:
-            output_for_tracer = dumpd(output)
-            await run_manager.on_chain_end(
-                output_for_tracer
-                if isinstance(output_for_tracer, dict)
-                else {"output": output_for_tracer}
-            )
+            await run_manager.on_chain_end(dumpd(output))
            return output

    def _transform_stream_with_config(
@ -413,22 +403,10 @@ class Runnable(Generic[Input, Output], ABC):
                            final_input = None
                            final_input_supported = False
        except Exception as e:
-            run_manager.on_chain_error(
-                e,
-                inputs=final_input
-                if isinstance(final_input, dict)
-                else {"input": final_input},
-            )
+            run_manager.on_chain_error(e, inputs=final_input)
            raise
        else:
-            run_manager.on_chain_end(
-                final_output
-                if isinstance(final_output, dict)
-                else {"output": final_output},
-                inputs=final_input
-                if isinstance(final_input, dict)
-                else {"input": final_input},
-            )
+            run_manager.on_chain_end(final_output, inputs=final_input)

    async def _atransform_stream_with_config(
        self,
@ -507,22 +485,10 @@ class Runnable(Generic[Input, Output], ABC):
                            final_input = None
                            final_input_supported = False
        except Exception as e:
-            await run_manager.on_chain_error(
-                e,
-                inputs=final_input
-                if isinstance(final_input, dict)
-                else {"input": final_input},
-            )
+            await run_manager.on_chain_error(e, inputs=final_input)
            raise
        else:
-            await run_manager.on_chain_end(
-                final_output
-                if isinstance(final_output, dict)
-                else {"output": final_output},
-                inputs=final_input
-                if isinstance(final_input, dict)
-                else {"input": final_input},
-            )
+            await run_manager.on_chain_end(final_output, inputs=final_input)


 class RunnableWithFallbacks(Serializable, Runnable[Input, Output]):
@ -555,9 +521,7 @@ class RunnableWithFallbacks(Serializable, Runnable[Input, Output]):
        config = ensure_config(config)
        callback_manager = get_callback_manager_for_config(config)
        # start the root run
-        run_manager = callback_manager.on_chain_start(
-            dumpd(self), input if isinstance(input, dict) else {"input": input}
-        )
+        run_manager = callback_manager.on_chain_start(dumpd(self), input)
        first_error = None
        for runnable in self.runnables:
            try:
@ -572,9 +536,7 @@ class RunnableWithFallbacks(Serializable, Runnable[Input, Output]):
                run_manager.on_chain_error(e)
                raise e
            else:
-                run_manager.on_chain_end(
-                    output if isinstance(output, dict) else {"output": output}
-                )
+                run_manager.on_chain_end(output)
                return output
        if first_error is None:
            raise ValueError("No error stored at end of fallbacks.")
@ -591,9 +553,7 @@ class RunnableWithFallbacks(Serializable, Runnable[Input, Output]):
        config = ensure_config(config)
        callback_manager = get_async_callback_manager_for_config(config)
        # start the root run
-        run_manager = await callback_manager.on_chain_start(
-            dumpd(self), input if isinstance(input, dict) else {"input": input}
-        )
+        run_manager = await callback_manager.on_chain_start(dumpd(self), input)

        first_error = None
        for runnable in self.runnables:
@ -609,9 +569,7 @@ class RunnableWithFallbacks(Serializable, Runnable[Input, Output]):
                await run_manager.on_chain_error(e)
                raise e
            else:
-                await run_manager.on_chain_end(
-                    output if isinstance(output, dict) else {"output": output}
-                )
+                await run_manager.on_chain_end(output)
                return output
        if first_error is None:
            raise ValueError("No error stored at end of fallbacks.")
@ -671,9 +629,7 @@ class RunnableWithFallbacks(Serializable, Runnable[Input, Output]):
                raise e
            else:
                for rm, output in zip(run_managers, outputs):
-                    rm.on_chain_end(
-                        output if isinstance(output, dict) else {"output": output}
-                    )
+                    rm.on_chain_end(output)
                return outputs
        if first_error is None:
            raise ValueError("No error stored at end of fallbacks.")
@ -711,9 +667,7 @@ class RunnableWithFallbacks(Serializable, Runnable[Input, Output]):
        # start the root runs, one per input
        run_managers: List[AsyncCallbackManagerForChainRun] = await asyncio.gather(
            *(
-                cm.on_chain_start(
-                    dumpd(self), input if isinstance(input, dict) else {"input": input}
-                )
+                cm.on_chain_start(dumpd(self), input)
                for cm, input in zip(callback_managers, inputs)
            )
        )
@ -738,9 +692,7 @@ class RunnableWithFallbacks(Serializable, Runnable[Input, Output]):
            else:
                await asyncio.gather(
                    *(
-                        rm.on_chain_end(
-                            output if isinstance(output, dict) else {"output": output}
-                        )
+                        rm.on_chain_end(output)
                        for rm, output in zip(run_managers, outputs)
                    )
                )
@ -822,9 +774,7 @@ class RunnableSequence(Serializable, Runnable[Input, Output]):
        config = ensure_config(config)
        callback_manager = get_callback_manager_for_config(config)
        # start the root run
-        run_manager = callback_manager.on_chain_start(
-            dumpd(self), input if isinstance(input, dict) else {"input": input}
-        )
+        run_manager = callback_manager.on_chain_start(dumpd(self), input)

        # invoke all steps in sequence
        try:
@ -839,9 +789,7 @@ class RunnableSequence(Serializable, Runnable[Input, Output]):
            run_manager.on_chain_error(e)
            raise
        else:
-            run_manager.on_chain_end(
-                input if isinstance(input, dict) else {"output": input}
-            )
+            run_manager.on_chain_end(input)
            return cast(Output, input)

    async def ainvoke(
@ -854,9 +802,7 @@ class RunnableSequence(Serializable, Runnable[Input, Output]):
        config = ensure_config(config)
        callback_manager = get_async_callback_manager_for_config(config)
        # start the root run
-        run_manager = await callback_manager.on_chain_start(
-            dumpd(self), input if isinstance(input, dict) else {"input": input}
-        )
+        run_manager = await callback_manager.on_chain_start(dumpd(self), input)

        # invoke all steps in sequence
        try:
@ -871,9 +817,7 @@ class RunnableSequence(Serializable, Runnable[Input, Output]):
            await run_manager.on_chain_error(e)
            raise
        else:
-            await run_manager.on_chain_end(
-                input if isinstance(input, dict) else {"output": input}
-            )
+            await run_manager.on_chain_end(input)
            return cast(Output, input)

    def batch(
@ -902,9 +846,7 @@ class RunnableSequence(Serializable, Runnable[Input, Output]):
        ]
        # start the root runs, one per input
        run_managers = [
-            cm.on_chain_start(
-                dumpd(self), input if isinstance(input, dict) else {"input": input}
-            )
+            cm.on_chain_start(dumpd(self), input)
            for cm, input in zip(callback_managers, inputs)
        ]

@ -927,7 +869,7 @@ class RunnableSequence(Serializable, Runnable[Input, Output]):
            raise
        else:
            for rm, input in zip(run_managers, inputs):
-                rm.on_chain_end(input if isinstance(input, dict) else {"output": input})
+                rm.on_chain_end(input)
            return cast(List[Output], inputs)

    async def abatch(
@ -959,9 +901,7 @@ class RunnableSequence(Serializable, Runnable[Input, Output]):
        # start the root runs, one per input
        run_managers: List[AsyncCallbackManagerForChainRun] = await asyncio.gather(
            *(
-                cm.on_chain_start(
-                    dumpd(self), input if isinstance(input, dict) else {"input": input}
-                )
+                cm.on_chain_start(dumpd(self), input)
                for cm, input in zip(callback_managers, inputs)
            )
        )
@ -985,12 +925,7 @@ class RunnableSequence(Serializable, Runnable[Input, Output]):
            raise
        else:
            await asyncio.gather(
-                *(
-                    rm.on_chain_end(
-                        input if isinstance(input, dict) else {"output": input}
-                    )
-                    for rm, input in zip(run_managers, inputs)
-                )
+                *(rm.on_chain_end(input) for rm, input in zip(run_managers, inputs))
            )
            return cast(List[Output], inputs)

@ -1004,9 +939,7 @@ class RunnableSequence(Serializable, Runnable[Input, Output]):
        config = ensure_config(config)
        callback_manager = get_callback_manager_for_config(config)
        # start the root run
-        run_manager = callback_manager.on_chain_start(
-            dumpd(self), input if isinstance(input, dict) else {"input": input}
-        )
+        run_manager = callback_manager.on_chain_start(dumpd(self), input)

        steps = [self.first] + self.middle + [self.last]
        streaming_start_index = 0
@ -1060,9 +993,7 @@ class RunnableSequence(Serializable, Runnable[Input, Output]):
            run_manager.on_chain_error(e)
            raise
        else:
-            run_manager.on_chain_end(
-                final if isinstance(final, dict) else {"output": final}
-            )
+            run_manager.on_chain_end(final)

    async def astream(
        self,
@ -1074,9 +1005,7 @@ class RunnableSequence(Serializable, Runnable[Input, Output]):
        config = ensure_config(config)
        callback_manager = get_async_callback_manager_for_config(config)
        # start the root run
-        run_manager = await callback_manager.on_chain_start(
-            dumpd(self), input if isinstance(input, dict) else {"input": input}
-        )
+        run_manager = await callback_manager.on_chain_start(dumpd(self), input)

        steps = [self.first] + self.middle + [self.last]
        streaming_start_index = len(steps) - 1
@ -1130,9 +1059,7 @@ class RunnableSequence(Serializable, Runnable[Input, Output]):
            await run_manager.on_chain_error(e)
            raise
        else:
-            await run_manager.on_chain_end(
-                final if isinstance(final, dict) else {"output": final}
-            )
+            await run_manager.on_chain_end(final)


 class RunnableMapChunk(Dict[str, Any]):
@ -1199,9 +1126,7 @@ class RunnableMap(Serializable, Runnable[Input, Dict[str, Any]]):
            local_metadata=None,
        )
        # start the root run
-        run_manager = callback_manager.on_chain_start(
-            dumpd(self), input if isinstance(input, dict) else {"input": input}
-        )
+        run_manager = callback_manager.on_chain_start(dumpd(self), input)

        # gather results from all steps
        try:
@ -1236,9 +1161,7 @@ class RunnableMap(Serializable, Runnable[Input, Dict[str, Any]]):
        config = ensure_config(config)
        callback_manager = get_async_callback_manager_for_config(config)
        # start the root run
-        run_manager = await callback_manager.on_chain_start(
-            dumpd(self), input if isinstance(input, dict) else {"input": input}
-        )
+        run_manager = await callback_manager.on_chain_start(dumpd(self), input)

        # gather results from all steps
        try:
--- a/libs/langchain/langchain/smith/evaluation/runner_utils.py
+++ b/libs/langchain/langchain/smith/evaluation/runner_utils.py
@ -8,6 +8,7 @@ import inspect
 import itertools
 import logging
 import uuid
+import warnings
 from enum import Enum
 from typing import (
    Any,
@ -662,7 +663,6 @@ async def _arun_chain(
 async def _arun_llm_or_chain(
    example: Example,
    llm_or_chain_factory: MCF,
-    n_repetitions: int,
    *,
    tags: Optional[List[str]] = None,
    callbacks: Optional[List[BaseCallbackHandler]] = None,
@ -673,7 +673,6 @@ async def _arun_llm_or_chain(
    Args:
        example: The example to run.
        llm_or_chain_factory: The Chain or language model constructor to run.
-        n_repetitions: The number of times to run the model on each example.
        tags: Optional tags to add to the run.
        callbacks: Optional callbacks to use during the run.
        input_mapper: Optional function to map the input to the expected format.
@ -694,31 +693,28 @@ async def _arun_llm_or_chain(
    chain_or_llm = (
        "LLM" if isinstance(llm_or_chain_factory, BaseLanguageModel) else "Chain"
    )
-    for _ in range(n_repetitions):
-        try:
-            if isinstance(llm_or_chain_factory, BaseLanguageModel):
-                output: Any = await _arun_llm(
-                    llm_or_chain_factory,
-                    example.inputs,
-                    tags=tags,
-                    callbacks=callbacks,
-                    input_mapper=input_mapper,
-                )
-            else:
-                chain = llm_or_chain_factory()
-                output = await _arun_chain(
-                    chain,
-                    example.inputs,
-                    tags=tags,
-                    callbacks=callbacks,
-                    input_mapper=input_mapper,
-                )
-            outputs.append(output)
-        except Exception as e:
-            logger.warning(
-                f"{chain_or_llm} failed for example {example.id}. Error: {e}"
+    try:
+        if isinstance(llm_or_chain_factory, BaseLanguageModel):
+            output: Any = await _arun_llm(
+                llm_or_chain_factory,
+                example.inputs,
+                tags=tags,
+                callbacks=callbacks,
+                input_mapper=input_mapper,
            )
-            outputs.append({"Error": str(e)})
+        else:
+            chain = llm_or_chain_factory()
+            output = await _arun_chain(
+                chain,
+                example.inputs,
+                tags=tags,
+                callbacks=callbacks,
+                input_mapper=input_mapper,
+            )
+        outputs.append(output)
+    except Exception as e:
+        logger.warning(f"{chain_or_llm} failed for example {example.id}. Error: {e}")
+        outputs.append({"Error": str(e)})
    if callbacks and previous_example_ids:
        for example_id, tracer in zip(previous_example_ids, callbacks):
            if hasattr(tracer, "example_id"):
@ -822,7 +818,6 @@ async def _arun_on_examples(
    *,
    evaluation: Optional[RunEvalConfig] = None,
    concurrency_level: int = 5,
-    num_repetitions: int = 1,
    project_name: Optional[str] = None,
    verbose: bool = False,
    tags: Optional[List[str]] = None,
@ -841,9 +836,6 @@ async def _arun_on_examples(
            independent calls on each example without carrying over state.
        evaluation: Optional evaluation configuration to use when evaluating
        concurrency_level: The number of async tasks to run concurrently.
-        num_repetitions: Number of times to run the model on each example.
-            This is useful when testing success rates or generating confidence
-            intervals.
        project_name: Project name to use when tracing runs.
            Defaults to {dataset_name}-{chain class name}-{datetime}.
        verbose: Whether to print progress.
@ -873,7 +865,6 @@ async def _arun_on_examples(
        result = await _arun_llm_or_chain(
            example,
            wrapped_model,
-            num_repetitions,
            tags=tags,
            callbacks=callbacks,
            input_mapper=input_mapper,
@ -983,7 +974,6 @@ def _run_chain(
 def _run_llm_or_chain(
    example: Example,
    llm_or_chain_factory: MCF,
-    n_repetitions: int,
    *,
    tags: Optional[List[str]] = None,
    callbacks: Optional[List[BaseCallbackHandler]] = None,
@ -995,7 +985,6 @@ def _run_llm_or_chain(
    Args:
        example: The example to run.
        llm_or_chain_factory: The Chain or language model constructor to run.
-        n_repetitions: The number of times to run the model on each example.
        tags: Optional tags to add to the run.
        callbacks: Optional callbacks to use during the run.

@ -1016,32 +1005,31 @@ def _run_llm_or_chain(
    chain_or_llm = (
        "LLM" if isinstance(llm_or_chain_factory, BaseLanguageModel) else "Chain"
    )
-    for _ in range(n_repetitions):
-        try:
-            if isinstance(llm_or_chain_factory, BaseLanguageModel):
-                output: Any = _run_llm(
-                    llm_or_chain_factory,
-                    example.inputs,
-                    callbacks,
-                    tags=tags,
-                    input_mapper=input_mapper,
-                )
-            else:
-                chain = llm_or_chain_factory()
-                output = _run_chain(
-                    chain,
-                    example.inputs,
-                    callbacks,
-                    tags=tags,
-                    input_mapper=input_mapper,
-                )
-            outputs.append(output)
-        except Exception as e:
-            logger.warning(
-                f"{chain_or_llm} failed for example {example.id} with inputs:"
-                f" {example.inputs}.\nError: {e}",
+    try:
+        if isinstance(llm_or_chain_factory, BaseLanguageModel):
+            output: Any = _run_llm(
+                llm_or_chain_factory,
+                example.inputs,
+                callbacks,
+                tags=tags,
+                input_mapper=input_mapper,
            )
-            outputs.append({"Error": str(e)})
+        else:
+            chain = llm_or_chain_factory()
+            output = _run_chain(
+                chain,
+                example.inputs,
+                callbacks,
+                tags=tags,
+                input_mapper=input_mapper,
+            )
+        outputs.append(output)
+    except Exception as e:
+        logger.warning(
+            f"{chain_or_llm} failed for example {example.id} with inputs:"
+            f" {example.inputs}.\nError: {e}",
+        )
+        outputs.append({"Error": str(e)})
    if callbacks and previous_example_ids:
        for example_id, tracer in zip(previous_example_ids, callbacks):
            if hasattr(tracer, "example_id"):
@ -1055,7 +1043,6 @@ def _run_on_examples(
    llm_or_chain_factory: MODEL_OR_CHAIN_FACTORY,
    *,
    evaluation: Optional[RunEvalConfig] = None,
-    num_repetitions: int = 1,
    project_name: Optional[str] = None,
    verbose: bool = False,
    tags: Optional[List[str]] = None,
@ -1073,9 +1060,6 @@ def _run_on_examples(
            over the dataset. The Chain constructor is used to permit
            independent calls on each example without carrying over state.
        evaluation: Optional evaluation configuration to use when evaluating
-        num_repetitions: Number of times to run the model on each example.
-            This is useful when testing success rates or generating confidence
-            intervals.
        project_name: Name of the project to store the traces in.
            Defaults to {dataset_name}-{chain class name}-{datetime}.
        verbose: Whether to print progress.
@ -1110,7 +1094,6 @@ def _run_on_examples(
        result = _run_llm_or_chain(
            example,
            wrapped_model,
-            num_repetitions,
            tags=tags,
            callbacks=callbacks,
            input_mapper=input_mapper,
@ -1158,11 +1141,11 @@ async def arun_on_dataset(
    *,
    evaluation: Optional[RunEvalConfig] = None,
    concurrency_level: int = 5,
-    num_repetitions: int = 1,
    project_name: Optional[str] = None,
    verbose: bool = False,
    tags: Optional[List[str]] = None,
    input_mapper: Optional[Callable[[Dict], Any]] = None,
+    **kwargs: Any,
 ) -> Dict[str, Any]:
    """
    Asynchronously run the Chain or language model on a dataset
@ -1177,9 +1160,6 @@ async def arun_on_dataset(
            independent calls on each example without carrying over state.
        evaluation: Optional evaluation configuration to use when evaluating
        concurrency_level: The number of async tasks to run concurrently.
-        num_repetitions: Number of times to run the model on each example.
-            This is useful when testing success rates or generating confidence
-            intervals.
        project_name: Name of the project to store the traces in.
            Defaults to {dataset_name}-{chain class name}-{datetime}.
        verbose: Whether to print progress.
@ -1274,6 +1254,13 @@ async def arun_on_dataset(
            evaluation=evaluation_config,
        )
    """  # noqa: E501
+    if kwargs:
+        warnings.warn(
+            "The following arguments are deprecated and will "
+            "be removed in a future release: "
+            f"{kwargs.keys()}.",
+            DeprecationWarning,
+        )
    wrapped_model, project_name, dataset, examples = _prepare_eval_run(
        client, dataset_name, llm_or_chain_factory, project_name
    )
@ -1282,7 +1269,6 @@ async def arun_on_dataset(
        examples,
        wrapped_model,
        concurrency_level=concurrency_level,
-        num_repetitions=num_repetitions,
        project_name=project_name,
        verbose=verbose,
        tags=tags,
@ -1323,12 +1309,12 @@ def run_on_dataset(
    llm_or_chain_factory: MODEL_OR_CHAIN_FACTORY,
    *,
    evaluation: Optional[RunEvalConfig] = None,
-    num_repetitions: int = 1,
    concurrency_level: int = 5,
    project_name: Optional[str] = None,
    verbose: bool = False,
    tags: Optional[List[str]] = None,
    input_mapper: Optional[Callable[[Dict], Any]] = None,
+    **kwargs: Any,
 ) -> Dict[str, Any]:
    """
    Run the Chain or language model on a dataset and store traces
@ -1344,9 +1330,6 @@ def run_on_dataset(
        evaluation: Configuration for evaluators to run on the
            results of the chain
        concurrency_level: The number of async tasks to run concurrently.
-        num_repetitions: Number of times to run the model on each example.
-            This is useful when testing success rates or generating confidence
-            intervals.
        project_name: Name of the project to store the traces in.
            Defaults to {dataset_name}-{chain class name}-{datetime}.
        verbose: Whether to print progress.
@ -1441,6 +1424,13 @@ def run_on_dataset(
            evaluation=evaluation_config,
        )
    """  # noqa: E501
+    if kwargs:
+        warnings.warn(
+            "The following arguments are deprecated and "
+            "will be removed in a future release: "
+            f"{kwargs.keys()}.",
+            DeprecationWarning,
+        )
    wrapped_model, project_name, dataset, examples = _prepare_eval_run(
        client, dataset_name, llm_or_chain_factory, project_name
    )
@ -1449,7 +1439,6 @@ def run_on_dataset(
            client,
            examples,
            wrapped_model,
-            num_repetitions=num_repetitions,
            project_name=project_name,
            verbose=verbose,
            tags=tags,
@ -1464,7 +1453,6 @@ def run_on_dataset(
            examples,
            wrapped_model,
            concurrency_level=concurrency_level,
-            num_repetitions=num_repetitions,
            project_name=project_name,
            verbose=verbose,
            tags=tags,
--- a/libs/langchain/langchain/utilities/loading.py
+++ b/libs/langchain/langchain/utilities/loading.py
@ -35,7 +35,7 @@ def try_load_from_hub(
    if remote_path.parts[0] != valid_prefix:
        return None
    if remote_path.suffix[1:] not in valid_suffixes:
-        raise ValueError("Unsupported file type.")
+        raise ValueError(f"Unsupported file type, must be one of {valid_suffixes}.")

    # Using Path with URLs is not recommended, because on Windows
    # the backslash is used as the path separator, which can cause issues
--- a/libs/langchain/langchain/utilities/promptguard.py
+++ b/libs/langchain/langchain/utilities/promptguard.py
@ -0,0 +1,99 @@
+import json
+from typing import Dict, Union
+
+
+def sanitize(
+    input: Union[str, Dict[str, str]]
+) -> Dict[str, Union[str, Dict[str, str]]]:
+    """
+    Sanitize input string or dict of strings by replacing sensitive data with
+    placeholders.
+    It returns the sanitized input string or dict of strings and the secure
+    context as a dict following the format:
+    {
+        "sanitized_input": <sanitized input string or dict of strings>,
+        "secure_context": <secure context>
+    }
+
+    The secure context is a bytes object that is needed to de-sanitize the response
+    from the LLM.
+
+    Args:
+        input: Input string or dict of strings.
+
+    Returns:
+        Sanitized input string or dict of strings and the secure context
+        as a dict following the format:
+        {
+            "sanitized_input": <sanitized input string or dict of strings>,
+            "secure_context": <secure context>
+        }
+
+        The `secure_context` needs to be passed to the `desanitize` function.
+    """
+    try:
+        import promptguard as pg
+    except ImportError:
+        raise ImportError(
+            "Could not import the `promptguard` Python package, "
+            "please install it with `pip install promptguard`."
+        )
+
+    if isinstance(input, str):
+        # the input could be a string, so we sanitize the string
+        sanitize_response: pg.SanitizeResponse = pg.sanitize(input)
+        return {
+            "sanitized_input": sanitize_response.sanitized_text,
+            "secure_context": sanitize_response.secure_context,
+        }
+
+    if isinstance(input, dict):
+        # the input could be a dict[string, string], so we sanitize the values
+        values = list()
+
+        # get the values from the dict
+        for key in input:
+            values.append(input[key])
+        input_value_str = json.dumps(values)
+
+        # sanitize the values
+        sanitize_values_response: pg.SanitizeResponse = pg.sanitize(input_value_str)
+
+        # reconstruct the dict with the sanitized values
+        sanitized_input_values = json.loads(sanitize_values_response.sanitized_text)
+        idx = 0
+        sanitized_input = dict()
+        for key in input:
+            sanitized_input[key] = sanitized_input_values[idx]
+            idx += 1
+
+        return {
+            "sanitized_input": sanitized_input,
+            "secure_context": sanitize_values_response.secure_context,
+        }
+
+    raise ValueError(f"Unexpected input type {type(input)}")
+
+
+def desanitize(sanitized_text: str, secure_context: bytes) -> str:
+    """
+    Restore the original sensitive data from the sanitized text.
+
+    Args:
+        sanitized_text: Sanitized text.
+        secure_context: Secure context returned by the `sanitize` function.
+
+    Returns:
+        De-sanitized text.
+    """
+    try:
+        import promptguard as pg
+    except ImportError:
+        raise ImportError(
+            "Could not import the `promptguard` Python package, "
+            "please install it with `pip install promptguard`."
+        )
+    desanitize_response: pg.DesanitizeResponse = pg.desanitize(
+        sanitized_text, secure_context
+    )
+    return desanitize_response.desanitized_text
--- a/libs/langchain/langchain/vectorstores/init.py
+++ b/libs/langchain/langchain/vectorstores/init.py
@ -42,6 +42,7 @@ from langchain.vectorstores.elastic_vector_search import (
    ElasticVectorSearch,
 )
 from langchain.vectorstores.elasticsearch import ElasticsearchStore
+from langchain.vectorstores.epsilla import Epsilla
 from langchain.vectorstores.faiss import FAISS
 from langchain.vectorstores.hologres import Hologres
 from langchain.vectorstores.lancedb import LanceDB
@ -93,6 +94,7 @@ __all__ = [
    "ElasticVectorSearch",
    "ElasticKnnSearch",
    "ElasticsearchStore",
+    "Epsilla",
    "FAISS",
    "PGEmbedding",
    "Hologres",
--- a/libs/langchain/langchain/vectorstores/clarifai.py
+++ b/libs/langchain/langchain/vectorstores/clarifai.py
@ -3,6 +3,7 @@ from __future__ import annotations
 import logging
 import os
 import traceback
+from concurrent.futures import ThreadPoolExecutor
 from typing import Any, Iterable, List, Optional, Tuple

 import requests
@ -84,7 +85,9 @@ class Clarifai(VectorStore):
        self._userDataObject = self._auth.get_user_app_id_proto()
        self._number_of_docs = number_of_docs

-    def _post_text_input(self, text: str, metadata: dict) -> str:
+    def _post_texts_as_inputs(
+        self, texts: List[str], metadatas: Optional[List[dict]] = None
+    ) -> List[str]:
        """Post text to Clarifai and return the ID of the input.

        Args:
@ -104,20 +107,29 @@ class Clarifai(VectorStore):
                "Please install it with `pip install clarifai`."
            ) from e

-        input_metadata = Struct()
-        input_metadata.update(metadata)
+        if metadatas is not None:
+            assert len(list(texts)) == len(
+                metadatas
+            ), "Number of texts and metadatas should be the same."
+
+        inputs = []
+        for idx, text in enumerate(texts):
+            if metadatas is not None:
+                input_metadata = Struct()
+                input_metadata.update(metadatas[idx])
+            inputs.append(
+                resources_pb2.Input(
+                    data=resources_pb2.Data(
+                        text=resources_pb2.Text(raw=text),
+                        metadata=input_metadata,
+                    )
+                )
+            )

        post_inputs_response = self._stub.PostInputs(
            service_pb2.PostInputsRequest(
                user_app_id=self._userDataObject,
-                inputs=[
-                    resources_pb2.Input(
-                        data=resources_pb2.Data(
-                            text=resources_pb2.Text(raw=text),
-                            metadata=input_metadata,
-                        )
-                    )
-                ],
+                inputs=inputs,
            )
        )

@ -127,9 +139,11 @@ class Clarifai(VectorStore):
                "Post inputs failed, status: " + post_inputs_response.status.description
            )

-        input_id = post_inputs_response.inputs[0].id
+        input_ids = []
+        for input in post_inputs_response.inputs:
+            input_ids.append(input.id)

-        return input_id
+        return input_ids

    def add_texts(
        self,
@ -140,7 +154,7 @@ class Clarifai(VectorStore):
    ) -> List[str]:
        """Add texts to the Clarifai vectorstore. This will push the text
        to a Clarifai application.
-        Application use base workflow that create and store embedding for each text.
+        Application use a base workflow that create and store embedding for each text.
        Make sure you are using a base workflow that is compatible with text
        (such as Language Understanding).

@ -153,20 +167,26 @@ class Clarifai(VectorStore):
            List[str]: List of IDs of the added texts.
        """

-        assert len(list(texts)) > 0, "No texts provided to add to the vectorstore."
+        ltexts = list(texts)
+        length = len(ltexts)
+        assert length > 0, "No texts provided to add to the vectorstore."

        if metadatas is not None:
-            assert len(list(texts)) == len(
+            assert length == len(
                metadatas
            ), "Number of texts and metadatas should be the same."

+        batch_size = 32
        input_ids = []
-        for idx, text in enumerate(texts):
+        for idx in range(0, length, batch_size):
            try:
-                metadata = metadatas[idx] if metadatas else {}
-                input_id = self._post_text_input(text, metadata)
-                input_ids.append(input_id)
-                logger.debug(f"Input {input_id} posted successfully.")
+                batch_texts = ltexts[idx : idx + batch_size]
+                batch_metadatas = (
+                    metadatas[idx : idx + batch_size] if metadatas else None
+                )
+                result_ids = self._post_texts_as_inputs(batch_texts, batch_metadatas)
+                input_ids.extend(result_ids)
+                logger.debug(f"Input {result_ids} posted successfully.")
            except Exception as error:
                logger.warning(f"Post inputs failed: {error}")
                traceback.print_exc()
@ -196,6 +216,7 @@ class Clarifai(VectorStore):
            from clarifai_grpc.grpc.api import resources_pb2, service_pb2
            from clarifai_grpc.grpc.api.status import status_code_pb2
            from google.protobuf import json_format  # type: ignore
+            from google.protobuf.struct_pb2 import Struct  # type: ignore
        except ImportError as e:
            raise ImportError(
                "Could not import clarifai python package. "
@ -206,28 +227,35 @@ class Clarifai(VectorStore):
        if self._number_of_docs is not None:
            k = self._number_of_docs

-        post_annotations_searches_response = self._stub.PostAnnotationsSearches(
-            service_pb2.PostAnnotationsSearchesRequest(
-                user_app_id=self._userDataObject,
-                searches=[
-                    resources_pb2.Search(
-                        query=resources_pb2.Query(
-                            ranks=[
-                                resources_pb2.Rank(
-                                    annotation=resources_pb2.Annotation(
-                                        data=resources_pb2.Data(
-                                            text=resources_pb2.Text(raw=query),
-                                        )
+        req = service_pb2.PostAnnotationsSearchesRequest(
+            user_app_id=self._userDataObject,
+            searches=[
+                resources_pb2.Search(
+                    query=resources_pb2.Query(
+                        ranks=[
+                            resources_pb2.Rank(
+                                annotation=resources_pb2.Annotation(
+                                    data=resources_pb2.Data(
+                                        text=resources_pb2.Text(raw=query),
                                    )
                                )
-                            ]
-                        )
+                            )
+                        ]
                    )
-                ],
-                pagination=service_pb2.Pagination(page=1, per_page=k),
-            )
+                )
+            ],
+            pagination=service_pb2.Pagination(page=1, per_page=k),
        )

+        # Add filter by metadata if provided.
+        if filter is not None:
+            search_metadata = Struct()
+            search_metadata.update(filter)
+            f = req.searches[0].query.filters.add()
+            f.annotation.data.metadata.update(search_metadata)
+
+        post_annotations_searches_response = self._stub.PostAnnotationsSearches(req)
+
        # Check if search was successful
        if post_annotations_searches_response.status.code != status_code_pb2.SUCCESS:
            raise Exception(
@ -238,11 +266,12 @@ class Clarifai(VectorStore):
        # Retrieve hits
        hits = post_annotations_searches_response.hits

-        docs_and_scores = []
-        # Iterate over hits and retrieve metadata and text
-        for hit in hits:
+        executor = ThreadPoolExecutor(max_workers=10)
+
+        def hit_to_document(hit: resources_pb2.Hit) -> Tuple[Document, float]:
            metadata = json_format.MessageToDict(hit.input.data.metadata)
-            request = requests.get(hit.input.data.text.url)
+            h = {"Authorization": f"Key {self._auth.pat}"}
+            request = requests.get(hit.input.data.text.url, headers=h)

            # override encoding by real educated guess as provided by chardet
            request.encoding = request.apparent_encoding
@ -252,10 +281,11 @@ class Clarifai(VectorStore):
                f"\tScore {hit.score:.2f} for annotation: {hit.annotation.id}\
                off input: {hit.input.id}, text: {requested_text[:125]}"
            )
+            return (Document(page_content=requested_text, metadata=metadata), hit.score)

-            docs_and_scores.append(
-                (Document(page_content=requested_text, metadata=metadata), hit.score)
-            )
+        # Iterate over hits and retrieve metadata and text
+        futures = [executor.submit(hit_to_document, hit) for hit in hits]
+        docs_and_scores = [future.result() for future in futures]

        return docs_and_scores

--- a/libs/langchain/langchain/vectorstores/elastic_vector_search.py
+++ b/libs/langchain/langchain/vectorstores/elastic_vector_search.py
@ -3,7 +3,6 @@ from __future__ import annotations

 import uuid
 import warnings
-from abc import ABC
 from typing import (
    TYPE_CHECKING,
    Any,
@ -53,7 +52,7 @@ def _default_script_query(query_vector: List[float], filter: Optional[dict]) ->


@deprecated("0.0.265", alternative="ElasticsearchStore class.", pending=True)
-class ElasticVectorSearch(VectorStore, ABC):
+class ElasticVectorSearch(VectorStore):
    """Wrapper around Elasticsearch as a vector database.

    To connect to an Elasticsearch instance that does not require
@ -340,7 +339,7 @@ class ElasticVectorSearch(VectorStore, ABC):
            self.client.delete(index=self.index_name, id=id)


-class ElasticKnnSearch(VectorStore, ABC):
+class ElasticKnnSearch(VectorStore):
    """
    ElasticKnnSearch is a class for performing k-nearest neighbor
    (k-NN) searches on text data using Elasticsearch.
--- a/libs/langchain/langchain/vectorstores/epsilla.py
+++ b/libs/langchain/langchain/vectorstores/epsilla.py
@ -0,0 +1,375 @@
+"""Wrapper around Epsilla vector database."""
+from __future__ import annotations
+
+import logging
+import uuid
+from typing import TYPE_CHECKING, Any, Iterable, List, Optional, Type
+
+from langchain.docstore.document import Document
+from langchain.embeddings.base import Embeddings
+from langchain.vectorstores.base import VectorStore
+
+if TYPE_CHECKING:
+    from pyepsilla import vectordb
+
+logger = logging.getLogger()
+
+
+class Epsilla(VectorStore):
+    """
+    Wrapper around Epsilla vector database.
+
+    As a prerequisite, you need to install ``pyepsilla`` package
+    and have a running Epsilla vector database (for example, through our docker image)
+    See the following documentation for how to run an Epsilla vector database:
+    https://epsilla-inc.gitbook.io/epsilladb/quick-start
+
+    Args:
+        client (Any): Epsilla client to connect to.
+        embeddings (Embeddings): Function used to embed the texts.
+        db_path (Optional[str]): The path where the database will be persisted.
+                                 Defaults to "/tmp/langchain-epsilla".
+        db_name (Optional[str]): Give a name to the loaded database.
+                                 Defaults to "langchain_store".
+    Example:
+        .. code-block:: python
+
+            from langchain.vectorstores import Epsilla
+            from pyepsilla import vectordb
+
+            client = vectordb.Client()
+            embeddings = OpenAIEmbeddings()
+            db_path = "/tmp/vectorstore"
+            db_name = "langchain_store"
+            epsilla = Epsilla(client, embeddings, db_path, db_name)
+    """
+
+    _LANGCHAIN_DEFAULT_DB_NAME = "langchain_store"
+    _LANGCHAIN_DEFAULT_DB_PATH = "/tmp/langchain-epsilla"
+    _LANGCHAIN_DEFAULT_TABLE_NAME = "langchain_collection"
+
+    def __init__(
+        self,
+        client: Any,
+        embeddings: Embeddings,
+        db_path: Optional[str] = _LANGCHAIN_DEFAULT_DB_PATH,
+        db_name: Optional[str] = _LANGCHAIN_DEFAULT_DB_NAME,
+    ):
+        """Initialize with necessary components."""
+        try:
+            import pyepsilla
+        except ImportError as e:
+            raise ImportError(
+                "Could not import pyepsilla python package. "
+                "Please install pyepsilla package with `pip install pyepsilla`."
+            ) from e
+
+        if not isinstance(client, pyepsilla.vectordb.Client):
+            raise TypeError(
+                f"client should be an instance of pyepsilla.vectordb.Client, "
+                f"got {type(client)}"
+            )
+
+        self._client: vectordb.Client = client
+        self._db_name = db_name
+        self._embeddings = embeddings
+        self._collection_name = Epsilla._LANGCHAIN_DEFAULT_TABLE_NAME
+        self._client.load_db(db_name=db_name, db_path=db_path)
+        self._client.use_db(db_name=db_name)
+
+    @property
+    def embeddings(self) -> Optional[Embeddings]:
+        return self._embeddings
+
+    def use_collection(self, collection_name: str) -> None:
+        """
+        Set default collection to use.
+
+        Args:
+            collection_name (str): The name of the collection.
+        """
+        self._collection_name = collection_name
+
+    def clear_data(self, collection_name: str = "") -> None:
+        """
+        Clear data in a collection.
+
+        Args:
+            collection_name (Optional[str]): The name of the collection.
+                If not provided, the default collection will be used.
+        """
+        if not collection_name:
+            collection_name = self._collection_name
+        self._client.drop_table(collection_name)
+
+    def get(
+        self, collection_name: str = "", response_fields: Optional[List[str]] = None
+    ) -> List[dict]:
+        """Get the collection.
+
+        Args:
+            collection_name (Optional[str]): The name of the collection
+                to retrieve data from.
+                If not provided, the default collection will be used.
+            response_fields (Optional[List[str]]): List of field names in the result.
+                If not specified, all available fields will be responded.
+
+        Returns:
+            A list of the retrieved data.
+        """
+        if not collection_name:
+            collection_name = self._collection_name
+        status_code, response = self._client.get(
+            table_name=collection_name, response_fields=response_fields
+        )
+        if status_code != 200:
+            logger.error(f"Failed to get records: {response['message']}")
+            raise Exception("Error: {}.".format(response["message"]))
+        return response["result"]
+
+    def _create_collection(
+        self, table_name: str, embeddings: list, metadatas: Optional[list[dict]] = None
+    ) -> None:
+        if not embeddings:
+            raise ValueError("Embeddings list is empty.")
+
+        dim = len(embeddings[0])
+        fields: List[dict] = [
+            {"name": "id", "dataType": "INT"},
+            {"name": "text", "dataType": "STRING"},
+            {"name": "embeddings", "dataType": "VECTOR_FLOAT", "dimensions": dim},
+        ]
+        if metadatas is not None:
+            field_names = [field["name"] for field in fields]
+            for metadata in metadatas:
+                for key, value in metadata.items():
+                    if key in field_names:
+                        continue
+                    d_type: str
+                    if isinstance(value, str):
+                        d_type = "STRING"
+                    elif isinstance(value, int):
+                        d_type = "INT"
+                    elif isinstance(value, float):
+                        d_type = "FLOAT"
+                    elif isinstance(value, bool):
+                        d_type = "BOOL"
+                    else:
+                        raise ValueError(f"Unsupported data type for {key}.")
+                    fields.append({"name": key, "dataType": d_type})
+                    field_names.append(key)
+
+        status_code, response = self._client.create_table(
+            table_name, table_fields=fields
+        )
+        if status_code != 200:
+            if status_code == 409:
+                logger.info(f"Continuing with the existing table {table_name}.")
+            else:
+                logger.error(
+                    f"Failed to create collection {table_name}: {response['message']}"
+                )
+                raise Exception("Error: {}.".format(response["message"]))
+
+    def add_texts(
+        self,
+        texts: Iterable[str],
+        metadatas: Optional[List[dict]] = None,
+        collection_name: Optional[str] = "",
+        drop_old: Optional[bool] = False,
+        **kwargs: Any,
+    ) -> List[str]:
+        """
+        Embed texts and add them to the database.
+
+        Args:
+            texts (Iterable[str]): The texts to embed.
+            metadatas (Optional[List[dict]]): Metadata dicts
+                        attached to each of the texts. Defaults to None.
+            collection_name (Optional[str]): Which collection to use.
+                        Defaults to "langchain_collection".
+                        If provided, default collection name will be set as well.
+            drop_old (Optional[bool]): Whether to drop the previous collection
+                        and create a new one. Defaults to False.
+
+        Returns:
+            List of ids of the added texts.
+        """
+        if not collection_name:
+            collection_name = self._collection_name
+        else:
+            self._collection_name = collection_name
+
+        if drop_old:
+            self._client.drop_db(db_name=collection_name)
+
+        texts = list(texts)
+        try:
+            embeddings = self._embeddings.embed_documents(texts)
+        except NotImplementedError:
+            embeddings = [self._embeddings.embed_query(x) for x in texts]
+
+        if len(embeddings) == 0:
+            logger.debug("Nothing to insert, skipping.")
+            return []
+
+        self._create_collection(
+            table_name=collection_name, embeddings=embeddings, metadatas=metadatas
+        )
+
+        ids = [hash(uuid.uuid4()) for _ in texts]
+        records = []
+        for index, id in enumerate(ids):
+            record = {
+                "id": id,
+                "text": texts[index],
+                "embeddings": embeddings[index],
+            }
+            if metadatas is not None:
+                metadata = metadatas[index].items()
+                for key, value in metadata:
+                    record[key] = value
+            records.append(record)
+
+        status_code, response = self._client.insert(
+            table_name=collection_name, records=records
+        )
+        if status_code != 200:
+            logger.error(
+                f"Failed to add records to {collection_name}: {response['message']}"
+            )
+            raise Exception("Error: {}.".format(response["message"]))
+        return [str(id) for id in ids]
+
+    def similarity_search(
+        self, query: str, k: int = 4, collection_name: str = "", **kwargs: Any
+    ) -> List[Document]:
+        """
+        Return the documents that are semantically most relevant to the query.
+
+        Args:
+            query (str): String to query the vectorstore with.
+            k (Optional[int]): Number of documents to return. Defaults to 4.
+            collection_name (Optional[str]): Collection to use.
+                Defaults to "langchain_store" or the one provided before.
+        Returns:
+            List of documents that are semantically most relevant to the query
+        """
+        if not collection_name:
+            collection_name = self._collection_name
+        query_vector = self._embeddings.embed_query(query)
+        status_code, response = self._client.query(
+            table_name=collection_name,
+            query_field="embeddings",
+            query_vector=query_vector,
+            limit=k,
+        )
+        if status_code != 200:
+            logger.error(f"Search failed: {response['message']}.")
+            raise Exception("Error: {}.".format(response["message"]))
+
+        exclude_keys = ["id", "text", "embeddings"]
+        return list(
+            map(
+                lambda item: Document(
+                    page_content=item["text"],
+                    metadata={
+                        key: item[key] for key in item if key not in exclude_keys
+                    },
+                ),
+                response["result"],
+            )
+        )
+
+    @classmethod
+    def from_texts(
+        cls: Type[Epsilla],
+        texts: List[str],
+        embedding: Embeddings,
+        metadatas: Optional[List[dict]] = None,
+        client: Any = None,
+        db_path: Optional[str] = _LANGCHAIN_DEFAULT_DB_PATH,
+        db_name: Optional[str] = _LANGCHAIN_DEFAULT_DB_NAME,
+        collection_name: Optional[str] = _LANGCHAIN_DEFAULT_TABLE_NAME,
+        drop_old: Optional[bool] = False,
+        **kwargs: Any,
+    ) -> Epsilla:
+        """Create an Epsilla vectorstore from raw documents.
+
+        Args:
+            texts (List[str]): List of text data to be inserted.
+            embeddings (Embeddings): Embedding function.
+            client (pyepsilla.vectordb.Client): Epsilla client to connect to.
+            metadatas (Optional[List[dict]]): Metadata for each text.
+                    Defaults to None.
+            db_path (Optional[str]): The path where the database will be persisted.
+                    Defaults to "/tmp/langchain-epsilla".
+            db_name (Optional[str]): Give a name to the loaded database.
+                    Defaults to "langchain_store".
+            collection_name (Optional[str]): Which collection to use.
+                    Defaults to "langchain_collection".
+                    If provided, default collection name will be set as well.
+            drop_old (Optional[bool]): Whether to drop the previous collection
+                    and create a new one. Defaults to False.
+
+        Returns:
+            Epsilla: Epsilla vector store.
+        """
+        instance = Epsilla(client, embedding, db_path=db_path, db_name=db_name)
+        instance.add_texts(
+            texts,
+            metadatas=metadatas,
+            collection_name=collection_name,
+            drop_old=drop_old,
+            **kwargs,
+        )
+
+        return instance
+
+    @classmethod
+    def from_documents(
+        cls: Type[Epsilla],
+        documents: List[Document],
+        embedding: Embeddings,
+        client: Any = None,
+        db_path: Optional[str] = _LANGCHAIN_DEFAULT_DB_PATH,
+        db_name: Optional[str] = _LANGCHAIN_DEFAULT_DB_NAME,
+        collection_name: Optional[str] = _LANGCHAIN_DEFAULT_TABLE_NAME,
+        drop_old: Optional[bool] = False,
+        **kwargs: Any,
+    ) -> Epsilla:
+        """Create an Epsilla vectorstore from a list of documents.
+
+        Args:
+            texts (List[str]): List of text data to be inserted.
+            embeddings (Embeddings): Embedding function.
+            client (pyepsilla.vectordb.Client): Epsilla client to connect to.
+            metadatas (Optional[List[dict]]): Metadata for each text.
+                    Defaults to None.
+            db_path (Optional[str]): The path where the database will be persisted.
+                    Defaults to "/tmp/langchain-epsilla".
+            db_name (Optional[str]): Give a name to the loaded database.
+                    Defaults to "langchain_store".
+            collection_name (Optional[str]): Which collection to use.
+                    Defaults to "langchain_collection".
+                    If provided, default collection name will be set as well.
+            drop_old (Optional[bool]): Whether to drop the previous collection
+                    and create a new one. Defaults to False.
+
+        Returns:
+            Epsilla: Epsilla vector store.
+        """
+        texts = [doc.page_content for doc in documents]
+        metadatas = [doc.metadata for doc in documents]
+
+        return cls.from_texts(
+            texts,
+            embedding,
+            metadatas=metadatas,
+            client=client,
+            db_path=db_path,
+            db_name=db_name,
+            collection_name=collection_name,
+            drop_old=drop_old,
+            **kwargs,
+        )
--- a/libs/langchain/langchain/vectorstores/marqo.py
+++ b/libs/langchain/langchain/vectorstores/marqo.py
@ -78,7 +78,7 @@ class Marqo(VectorStore):
        self._searchable_attributes = searchable_attributes
        self.page_content_builder = page_content_builder

-        self._non_tensor_fields = ["metadata"]
+        self.tensor_fields = ["text"]

        self._document_batch_size = 1024

@ -132,7 +132,7 @@ class Marqo(VectorStore):
        for i in range(0, num_docs, self._document_batch_size):
            response = self._client.index(self._index_name).add_documents(
                documents[i : i + self._document_batch_size],
-                non_tensor_fields=self._non_tensor_fields,
+                tensor_fields=self.tensor_fields,
                **self._add_documents_settings,
            )
            if response["errors"]:
@ -330,17 +330,15 @@ class Marqo(VectorStore):
            Dict[str, Dict[List[Dict[str, Dict[str, Any]]]]]: A bulk search results
            object
        """
-        bulk_results = self._client.bulk_search(
-            [
-                {
-                    "index": self._index_name,
-                    "q": query,
-                    "searchableAttributes": self._searchable_attributes,
-                    "limit": k,
-                }
+        bulk_results = {
+            "result": [
+                self._client.index(self._index_name).search(
+                    q=query, searchable_attributes=self._searchable_attributes, limit=k
+                )
                for query in queries
            ]
-        )
+        }
+
        return bulk_results

    @classmethod
--- a/libs/langchain/langchain/vectorstores/qdrant.py
+++ b/libs/langchain/langchain/vectorstores/qdrant.py
@ -10,6 +10,7 @@ from operator import itemgetter
 from typing import (
    TYPE_CHECKING,
    Any,
+    AsyncGenerator,
    Callable,
    Dict,
    Generator,
@ -213,7 +214,7 @@ class Qdrant(VectorStore):
        from qdrant_client.conversions.conversion import RestToGrpc

        added_ids = []
-        for batch_ids, points in self._generate_rest_batches(
+        async for batch_ids, points in self._agenerate_rest_batches(
            texts, metadatas, ids, batch_size
        ):
            await self.client.async_grpc_points.Upsert(
@ -1264,7 +1265,7 @@ class Qdrant(VectorStore):
                embeddings = OpenAIEmbeddings()
                qdrant = await Qdrant.afrom_texts(texts, embeddings, "localhost")
        """
-        qdrant = cls._construct_instance(
+        qdrant = await cls._aconstruct_instance(
            texts,
            embedding,
            location,
@ -1465,6 +1466,172 @@ class Qdrant(VectorStore):
        )
        return qdrant

+    @classmethod
+    async def _aconstruct_instance(
+        cls: Type[Qdrant],
+        texts: List[str],
+        embedding: Embeddings,
+        location: Optional[str] = None,
+        url: Optional[str] = None,
+        port: Optional[int] = 6333,
+        grpc_port: int = 6334,
+        prefer_grpc: bool = False,
+        https: Optional[bool] = None,
+        api_key: Optional[str] = None,
+        prefix: Optional[str] = None,
+        timeout: Optional[float] = None,
+        host: Optional[str] = None,
+        path: Optional[str] = None,
+        collection_name: Optional[str] = None,
+        distance_func: str = "Cosine",
+        content_payload_key: str = CONTENT_KEY,
+        metadata_payload_key: str = METADATA_KEY,
+        vector_name: Optional[str] = VECTOR_NAME,
+        shard_number: Optional[int] = None,
+        replication_factor: Optional[int] = None,
+        write_consistency_factor: Optional[int] = None,
+        on_disk_payload: Optional[bool] = None,
+        hnsw_config: Optional[common_types.HnswConfigDiff] = None,
+        optimizers_config: Optional[common_types.OptimizersConfigDiff] = None,
+        wal_config: Optional[common_types.WalConfigDiff] = None,
+        quantization_config: Optional[common_types.QuantizationConfig] = None,
+        init_from: Optional[common_types.InitFrom] = None,
+        on_disk: Optional[bool] = None,
+        force_recreate: bool = False,
+        **kwargs: Any,
+    ) -> Qdrant:
+        try:
+            import qdrant_client
+        except ImportError:
+            raise ValueError(
+                "Could not import qdrant-client python package. "
+                "Please install it with `pip install qdrant-client`."
+            )
+        from grpc import RpcError
+        from qdrant_client.http import models as rest
+        from qdrant_client.http.exceptions import UnexpectedResponse
+
+        # Just do a single quick embedding to get vector size
+        partial_embeddings = await embedding.aembed_documents(texts[:1])
+        vector_size = len(partial_embeddings[0])
+        collection_name = collection_name or uuid.uuid4().hex
+        distance_func = distance_func.upper()
+        client = qdrant_client.QdrantClient(
+            location=location,
+            url=url,
+            port=port,
+            grpc_port=grpc_port,
+            prefer_grpc=prefer_grpc,
+            https=https,
+            api_key=api_key,
+            prefix=prefix,
+            timeout=timeout,
+            host=host,
+            path=path,
+            **kwargs,
+        )
+        try:
+            # Skip any validation in case of forced collection recreate.
+            if force_recreate:
+                raise ValueError
+
+            # Get the vector configuration of the existing collection and vector, if it
+            # was specified. If the old configuration does not match the current one,
+            # an exception is being thrown.
+            collection_info = client.get_collection(collection_name=collection_name)
+            current_vector_config = collection_info.config.params.vectors
+            if isinstance(current_vector_config, dict) and vector_name is not None:
+                if vector_name not in current_vector_config:
+                    raise QdrantException(
+                        f"Existing Qdrant collection {collection_name} does not "
+                        f"contain vector named {vector_name}. Did you mean one of the "
+                        f"existing vectors: {', '.join(current_vector_config.keys())}? "
+                        f"If you want to recreate the collection, set `force_recreate` "
+                        f"parameter to `True`."
+                    )
+                current_vector_config = current_vector_config.get(
+                    vector_name
+                )  # type: ignore[assignment]
+            elif isinstance(current_vector_config, dict) and vector_name is None:
+                raise QdrantException(
+                    f"Existing Qdrant collection {collection_name} uses named vectors. "
+                    f"If you want to reuse it, please set `vector_name` to any of the "
+                    f"existing named vectors: "
+                    f"{', '.join(current_vector_config.keys())}."  # noqa
+                    f"If you want to recreate the collection, set `force_recreate` "
+                    f"parameter to `True`."
+                )
+            elif (
+                not isinstance(current_vector_config, dict) and vector_name is not None
+            ):
+                raise QdrantException(
+                    f"Existing Qdrant collection {collection_name} doesn't use named "
+                    f"vectors. If you want to reuse it, please set `vector_name` to "
+                    f"`None`. If you want to recreate the collection, set "
+                    f"`force_recreate` parameter to `True`."
+                )
+
+            # Check if the vector configuration has the same dimensionality.
+            if current_vector_config.size != vector_size:  # type: ignore[union-attr]
+                raise QdrantException(
+                    f"Existing Qdrant collection is configured for vectors with "
+                    f"{current_vector_config.size} "  # type: ignore[union-attr]
+                    f"dimensions. Selected embeddings are {vector_size}-dimensional. "
+                    f"If you want to recreate the collection, set `force_recreate` "
+                    f"parameter to `True`."
+                )
+
+            current_distance_func = (
+                current_vector_config.distance.name.upper()  # type: ignore[union-attr]
+            )
+            if current_distance_func != distance_func:
+                raise QdrantException(
+                    f"Existing Qdrant collection is configured for "
+                    f"{current_vector_config.distance} "  # type: ignore[union-attr]
+                    f"similarity. Please set `distance_func` parameter to "
+                    f"`{distance_func}` if you want to reuse it. If you want to "
+                    f"recreate the collection, set `force_recreate` parameter to "
+                    f"`True`."
+                )
+        except (UnexpectedResponse, RpcError, ValueError):
+            vectors_config = rest.VectorParams(
+                size=vector_size,
+                distance=rest.Distance[distance_func],
+                on_disk=on_disk,
+            )
+
+            # If vector name was provided, we're going to use the named vectors feature
+            # with just a single vector.
+            if vector_name is not None:
+                vectors_config = {  # type: ignore[assignment]
+                    vector_name: vectors_config,
+                }
+
+            client.recreate_collection(
+                collection_name=collection_name,
+                vectors_config=vectors_config,
+                shard_number=shard_number,
+                replication_factor=replication_factor,
+                write_consistency_factor=write_consistency_factor,
+                on_disk_payload=on_disk_payload,
+                hnsw_config=hnsw_config,
+                optimizers_config=optimizers_config,
+                wal_config=wal_config,
+                quantization_config=quantization_config,
+                init_from=init_from,
+                timeout=timeout,  # type: ignore[arg-type]
+            )
+        qdrant = cls(
+            client=client,
+            collection_name=collection_name,
+            embeddings=embedding,
+            content_payload_key=content_payload_key,
+            metadata_payload_key=metadata_payload_key,
+            distance_strategy=distance_func,
+            vector_name=vector_name,
+        )
+        return qdrant
+
    def _select_relevance_score_fn(self) -> Callable[[float], float]:
        """
        The 'correct' relevance function
@ -1648,6 +1815,33 @@ class Qdrant(VectorStore):

        return embeddings

+    async def _aembed_texts(self, texts: Iterable[str]) -> List[List[float]]:
+        """Embed search texts.
+
+        Used to provide backward compatibility with `embedding_function` argument.
+
+        Args:
+            texts: Iterable of texts to embed.
+
+        Returns:
+            List of floats representing the texts embedding.
+        """
+        if self.embeddings is not None:
+            embeddings = await self.embeddings.aembed_documents(list(texts))
+            if hasattr(embeddings, "tolist"):
+                embeddings = embeddings.tolist()
+        elif self._embeddings_function is not None:
+            embeddings = []
+            for text in texts:
+                embedding = self._embeddings_function(text)
+                if hasattr(embeddings, "tolist"):
+                    embedding = embedding.tolist()
+                embeddings.append(embedding)
+        else:
+            raise ValueError("Neither of embeddings or embedding_function is set")
+
+        return embeddings
+
    def _generate_rest_batches(
        self,
        texts: Iterable[str],
@ -1689,3 +1883,45 @@ class Qdrant(VectorStore):
            ]

            yield batch_ids, points
+
+    async def _agenerate_rest_batches(
+        self,
+        texts: Iterable[str],
+        metadatas: Optional[List[dict]] = None,
+        ids: Optional[Sequence[str]] = None,
+        batch_size: int = 64,
+    ) -> AsyncGenerator[Tuple[List[str], List[rest.PointStruct]], None]:
+        from qdrant_client.http import models as rest
+
+        texts_iterator = iter(texts)
+        metadatas_iterator = iter(metadatas or [])
+        ids_iterator = iter(ids or [uuid.uuid4().hex for _ in iter(texts)])
+        while batch_texts := list(islice(texts_iterator, batch_size)):
+            # Take the corresponding metadata and id for each text in a batch
+            batch_metadatas = list(islice(metadatas_iterator, batch_size)) or None
+            batch_ids = list(islice(ids_iterator, batch_size))
+
+            # Generate the embeddings for all the texts in a batch
+            batch_embeddings = await self._aembed_texts(batch_texts)
+
+            points = [
+                rest.PointStruct(
+                    id=point_id,
+                    vector=vector
+                    if self.vector_name is None
+                    else {self.vector_name: vector},
+                    payload=payload,
+                )
+                for point_id, vector, payload in zip(
+                    batch_ids,
+                    batch_embeddings,
+                    self._build_payloads(
+                        batch_texts,
+                        batch_metadatas,
+                        self.content_payload_key,
+                        self.metadata_payload_key,
+                    ),
+                )
+            ]
+
+            yield batch_ids, points
--- a/libs/langchain/langchain/vectorstores/vectara.py
+++ b/libs/langchain/langchain/vectorstores/vectara.py
@ -202,12 +202,12 @@ class Vectara(VectorStore):
            doc_metadata: optional metadata for the document

        This function indexes all the input text strings in the Vectara corpus as a
-        single Vectara document, where each input text is considered a "part" and the
-        metadata are associated with each part.
+        single Vectara document, where each input text is considered a "section" and the
+        metadata are associated with each section.
        if 'doc_metadata' is provided, it is associated with the Vectara document.

        Returns:
-            List of ids from adding the texts into the vectorstore.
+            document ID of the document added

        """
        doc_hash = md5()
@ -307,21 +307,27 @@ class Vectara(VectorStore):
        result = response.json()

        responses = result["responseSet"][0]["response"]
-        vectara_default_metadata = ["lang", "len", "offset"]
+        documents = result["responseSet"][0]["document"]
+
+        metadatas = []
+        for x in responses:
+            md = {m["name"]: m["value"] for m in x["metadata"]}
+            doc_num = x["documentIndex"]
+            doc_md = {m["name"]: m["value"] for m in documents[doc_num]["metadata"]}
+            md.update(doc_md)
+            metadatas.append(md)
+
        docs = [
            (
                Document(
                    page_content=x["text"],
-                    metadata={
-                        m["name"]: m["value"]
-                        for m in x["metadata"]
-                        if m["name"] not in vectara_default_metadata
-                    },
+                    metadata=md,
                ),
                x["score"],
            )
-            for x in responses
+            for x, md in zip(responses, metadatas)
        ]
+
        return docs

    def similarity_search(
--- a/libs/langchain/poetry.lock
+++ b/libs/langchain/poetry.lock
@ -4407,19 +4407,21 @@ files = [

 [[package]]
 name = "marqo"
-version = "0.11.0"
+version = "1.2.4"
 description = "Tensor search for humans"
 category = "main"
 optional = true
 python-versions = ">=3"
 files = [
-    {file = "marqo-0.11.0-py3-none-any.whl", hash = "sha256:e1a5409beeb02dcec725566cfbc5fd88a84ce65ca7bce08a1120f8082badeab4"},
-    {file = "marqo-0.11.0.tar.gz", hash = "sha256:808e691cf06f5f7d67d422dc7f5f6fcc53b9acc6a4bc000abbcae8a817fd765d"},
+    {file = "marqo-1.2.4-py3-none-any.whl", hash = "sha256:aaf59ca35214febaa893e102828a50ab9e53fe57201cd43714ab7c0515166068"},
+    {file = "marqo-1.2.4.tar.gz", hash = "sha256:3fe0eb8e1ed73883fd8e6001582d18dab6e149d79e41b92a1403b2ff52d18c43"},
 ]

 [package.dependencies]
-pydantic = "*"
+packaging = "*"
+pydantic = "<2.0.0"
 requests = "*"
+typing-extensions = ">=4.5.0"
 urllib3 = "*"

 [[package]]
@ -10487,4 +10489,4 @@ text-helpers = ["chardet"]
 [metadata]
 lock-version = "2.0"
 python-versions = ">=3.8.1,<4.0"
-content-hash = "a5e3458dd0cabcefd83caec6eb33b6fb593c2c347ca1d33c1f182341e852a9c8"
+content-hash = "0247674f3f274fd2249ceb02c23a468f911a7c482796ea67252b203d1ab938ae"
--- a/libs/langchain/pyproject.toml
+++ b/libs/langchain/pyproject.toml
@ -1,11 +1,11 @@
 [tool.poetry]
 name = "langchain"
-version = "0.0.267"
+version = "0.0.270"
 description = "Building applications with LLMs through composability"
 authors = []
 license = "MIT"
 readme = "README.md"
-repository = "https://www.github.com/hwchase17/langchain"
+repository = "https://github.com/langchain-ai/langchain"

 [tool.poetry.scripts]
 langchain-server = "langchain.server:main"
@ -37,7 +37,7 @@ pinecone-text = {version = "^0.4.2", optional = true}
 pymongo = {version = "^4.3.3", optional = true}
 clickhouse-connect = {version="^0.5.14", optional=true}
 weaviate-client = {version = "^3", optional = true}
-marqo = {version = "^0.11.0", optional=true}
+marqo = {version = "^1.2.4", optional=true}
 google-api-python-client = {version = "2.70.0", optional = true}
 google-auth = {version = "^2.18.1", optional = true}
 wolframalpha = {version = "5.0.0", optional = true}
--- a/libs/langchain/tests/integration_tests/document_loaders/test_geodataframe.py
+++ b/libs/langchain/tests/integration_tests/document_loaders/test_geodataframe.py
@ -17,6 +17,7 @@ else:
 def sample_gdf() -> GeoDataFrame:
    import geopandas

+    # TODO: geopandas.datasets will be deprecated in 1.0
    path_to_data = geopandas.datasets.get_path("nybb")
    gdf = geopandas.read_file(path_to_data)
    gdf["area"] = gdf.area
--- a/libs/langchain/tests/integration_tests/document_loaders/test_unstructured.py
+++ b/libs/langchain/tests/integration_tests/document_loaders/test_unstructured.py
@ -12,18 +12,20 @@ EXAMPLE_DOCS_DIRECTORY = str(Path(__file__).parent.parent / "examples/")


 def test_unstructured_loader_with_post_processor() -> None:
-    from unstructured.cleaners.core import clean_extra_whitespace
+    def add_the_end(text: str) -> str:
+        return text + "THE END!"

    file_path = os.path.join(EXAMPLE_DOCS_DIRECTORY, "layout-parser-paper.pdf")
    loader = UnstructuredFileLoader(
        file_path=file_path,
-        pos_processors=[clean_extra_whitespace],
+        post_processors=[add_the_end],
        strategy="fast",
        mode="elements",
    )
    docs = loader.load()

    assert len(docs) > 1
+    assert docs[0].page_content.endswith("THE END!")


 def test_unstructured_api_file_loader() -> None:
--- a/libs/langchain/tests/integration_tests/embeddings/test_ernie.py
+++ b/libs/langchain/tests/integration_tests/embeddings/test_ernie.py
@ -0,0 +1,41 @@
+import pytest
+
+from langchain.embeddings.ernie import ErnieEmbeddings
+
+
+def test_embedding_documents_1() -> None:
+    documents = ["foo bar"]
+    embedding = ErnieEmbeddings()
+    output = embedding.embed_documents(documents)
+    assert len(output) == 1
+    assert len(output[0]) == 384
+
+
+def test_embedding_documents_2() -> None:
+    documents = ["foo", "bar"]
+    embedding = ErnieEmbeddings()
+    output = embedding.embed_documents(documents)
+    assert len(output) == 2
+    assert len(output[0]) == 384
+    assert len(output[1]) == 384
+
+
+def test_embedding_query() -> None:
+    query = "foo"
+    embedding = ErnieEmbeddings()
+    output = embedding.embed_query(query)
+    assert len(output) == 384
+
+
+def test_max_chunks() -> None:
+    documents = [f"text-{i}" for i in range(20)]
+    embedding = ErnieEmbeddings()
+    output = embedding.embed_documents(documents)
+    assert len(output) == 20
+
+
+def test_too_many_chunks() -> None:
+    documents = [f"text-{i}" for i in range(20)]
+    embedding = ErnieEmbeddings(chunk_size=20)
+    with pytest.raises(ValueError):
+        embedding.embed_documents(documents)
--- a/libs/langchain/tests/integration_tests/llms/test_promptguard.py
+++ b/libs/langchain/tests/integration_tests/llms/test_promptguard.py
@ -0,0 +1,84 @@
+import langchain.utilities.promptguard as pgf
+from langchain import LLMChain, PromptTemplate
+from langchain.llms import OpenAI
+from langchain.llms.promptguard import PromptGuard
+from langchain.memory import ConversationBufferWindowMemory
+from langchain.schema.output_parser import StrOutputParser
+from langchain.schema.runnable import RunnableMap
+
+prompt_template = """
+As an AI assistant, you will answer questions according to given context.
+
+Sensitive personal information in the question is masked for privacy.
+For instance, if the original text says "Giana is good," it will be changed
+to "PERSON_998 is good."
+
+Here's how to handle these changes:
+* Consider these masked phrases just as placeholders, but still refer to
+them in a relevant way when answering.
+* It's possible that different masked terms might mean the same thing.
+Stick with the given term and don't modify it.
+* All masked terms follow the "TYPE_ID" pattern.
+* Please don't invent new masked terms. For instance, if you see "PERSON_998,"
+don't come up with "PERSON_997" or "PERSON_999" unless they're already in the question.
+
+Conversation History: ```{history}```
+Context : ```During our recent meeting on February 23, 2023, at 10:30 AM,
+John Doe provided me with his personal details. His email is johndoe@example.com
+and his contact number is 650-456-7890. He lives in New York City, USA, and
+belongs to the American nationality with Christian beliefs and a leaning towards
+the Democratic party. He mentioned that he recently made a transaction using his
+credit card 4111 1111 1111 1111 and transferred bitcoins to the wallet address
+1A1zP1eP5QGefi2DMPTfTL5SLmv7DivfNa. While discussing his European travels, he
+noted down his IBAN as GB29 NWBK 6016 1331 9268 19. Additionally, he provided
+his website as https://johndoeportfolio.com. John also discussed
+some of his US-specific details. He said his bank account number is
+1234567890123456 and his drivers license is Y12345678. His ITIN is 987-65-4321,
+and he recently renewed his passport,
+the number for which is 123456789. He emphasized not to share his SSN, which is
+669-45-6789. Furthermore, he mentioned that he accesses his work files remotely
+through the IP 192.168.1.1 and has a medical license number MED-123456. ```
+Question: ```{question}```
+"""
+
+
+def test_promptguard() -> None:
+    chain = LLMChain(
+        prompt=PromptTemplate.from_template(prompt_template),
+        llm=PromptGuard(llm=OpenAI()),
+        memory=ConversationBufferWindowMemory(k=2),
+    )
+
+    output = chain.run(
+        {
+            "question": "Write a text message to remind John to do password reset \
+                for his website through his email to stay secure."
+        }
+    )
+    assert isinstance(output, str)
+
+
+def test_promptguard_functions() -> None:
+    prompt = (PromptTemplate.from_template(prompt_template),)
+    llm = OpenAI()
+    pg_chain = (
+        pgf.sanitize
+        | RunnableMap(
+            {
+                "response": (lambda x: x["sanitized_input"])  # type: ignore
+                | prompt
+                | llm
+                | StrOutputParser(),
+                "secure_context": lambda x: x["secure_context"],
+            }
+        )
+        | (lambda x: pgf.desanitize(x["response"], x["secure_context"]))
+    )
+
+    pg_chain.invoke(
+        {
+            "question": "Write a text message to remind John to do password reset\
+                 for his website through his email to stay secure.",
+            "history": "",
+        }
+    )
--- a/libs/langchain/tests/integration_tests/vectorstores/fake_embeddings.py
+++ b/libs/langchain/tests/integration_tests/vectorstores/fake_embeddings.py
@ -15,6 +15,9 @@ class FakeEmbeddings(Embeddings):
        Embeddings encode each text as its index."""
        return [[float(1.0)] * 9 + [float(i)] for i in range(len(texts))]

+    async def aembed_documents(self, texts: List[str]) -> List[List[float]]:
+        return self.embed_documents(texts)
+
    def embed_query(self, text: str) -> List[float]:
        """Return constant query embeddings.
        Embeddings are identical to embed_documents(texts)[0].
@ -22,6 +25,9 @@ class FakeEmbeddings(Embeddings):
        as it was passed to embed_documents."""
        return [float(1.0)] * 9 + [float(0.0)]

+    async def aembed_query(self, text: str) -> List[float]:
+        return self.embed_query(text)
+

 class ConsistentFakeEmbeddings(FakeEmbeddings):
    """Fake embeddings which remember all the texts seen so far to return consistent
--- a/libs/langchain/tests/integration_tests/vectorstores/test_epsilla.py
+++ b/libs/langchain/tests/integration_tests/vectorstores/test_epsilla.py
@ -0,0 +1,31 @@
+"""Test Epsilla functionality."""
+from pyepsilla import vectordb
+
+from langchain.vectorstores import Epsilla
+from tests.integration_tests.vectorstores.fake_embeddings import (
+    FakeEmbeddings,
+    fake_texts,
+)
+
+
+def _test_from_texts() -> Epsilla:
+    embeddings = FakeEmbeddings()
+    client = vectordb.Client()
+    return Epsilla.from_texts(fake_texts, embeddings, client)
+
+
+def test_epsilla() -> None:
+    instance = _test_from_texts()
+    search = instance.similarity_search(query="bar", k=1)
+    result_texts = [doc.page_content for doc in search]
+    assert "bar" in result_texts
+
+
+def test_epsilla_add_texts() -> None:
+    embeddings = FakeEmbeddings()
+    client = vectordb.Client()
+    db = Epsilla(client, embeddings)
+    db.add_texts(fake_texts)
+    search = db.similarity_search(query="foo", k=1)
+    result_texts = [doc.page_content for doc in search]
+    assert "foo" in result_texts
--- a/libs/langchain/tests/integration_tests/vectorstores/test_marqo.py
+++ b/libs/langchain/tests/integration_tests/vectorstores/test_marqo.py
@ -158,6 +158,7 @@ def test_marqo_multimodal() -> None:
                "mainline/examples/ImageSearchGuide/data/image2.jpg",
            },
        ],
+        tensor_fields=["caption", "image"],
    )

    def get_content(res: Dict[str, str]) -> str:
--- a/libs/langchain/tests/integration_tests/vectorstores/test_vectara.py
+++ b/libs/langchain/tests/integration_tests/vectorstores/test_vectara.py
@ -5,11 +5,14 @@ from langchain.docstore.document import Document
 from langchain.vectorstores.vectara import Vectara
 from tests.integration_tests.vectorstores.fake_embeddings import FakeEmbeddings

-# For this test to run properly, please setup as follows
-# 1. Create a corpus in Vectara, with a filter attribute called "test_num".
-# 2. Create an API_KEY for this corpus with permissions for query and indexing
-# 3. Setup environment variables:
+#
+# For this test to run properly, please setup as follows:
+# 1. Create a Vectara account: sign up at https://console.vectara.com/signup
+# 2. Create a corpus in your Vectara account, with a filter attribute called "test_num".
+# 3. Create an API_KEY for this corpus with permissions for query and indexing
+# 4. Setup environment variables:
 #    VECTARA_API_KEY, VECTARA_CORPUS_ID and VECTARA_CUSTOMER_ID
+#


 def get_abbr(s: str) -> str:
@ -21,37 +24,52 @@ def get_abbr(s: str) -> str:
 def test_vectara_add_documents() -> None:
    """Test end to end construction and search."""

-    # start with some initial texts
-    texts = ["grounded generation", "retrieval augmented generation", "data privacy"]
-    docsearch: Vectara = Vectara.from_texts(
-        texts,
-        embedding=FakeEmbeddings(),
-        metadatas=[
-            {"abbr": "gg", "test_num": "1"},
-            {"abbr": "rag", "test_num": "1"},
-            {"abbr": "dp", "test_num": "1"},
-        ],
+    # create a new Vectara instance
+    docsearch: Vectara = Vectara()
+
+    # start with some initial texts, added with add_texts
+    texts1 = ["grounded generation", "retrieval augmented generation", "data privacy"]
+    md = [{"abbr": get_abbr(t)} for t in texts1]
+    doc_id1 = docsearch.add_texts(
+        texts1,
+        metadatas=md,
        doc_metadata={"test_num": "1"},
    )

-    # then add some additional documents
-    new_texts = ["large language model", "information retrieval", "question answering"]
-    docsearch.add_documents(
-        [Document(page_content=t, metadata={"abbr": get_abbr(t)}) for t in new_texts],
-        doc_metadata={"test_num": "1"},
+    # then add some additional documents, now with add_documents
+    texts2 = ["large language model", "information retrieval", "question answering"]
+    doc_id2 = docsearch.add_documents(
+        [Document(page_content=t, metadata={"abbr": get_abbr(t)}) for t in texts2],
+        doc_metadata={"test_num": "2"},
    )
+    doc_ids = doc_id1 + doc_id2

-    # finally do a similarity search to see if all works okay
-    output = docsearch.similarity_search(
+    # test without filter
+    output1 = docsearch.similarity_search(
        "large language model",
        k=2,
        n_sentence_context=0,
+    )
+    assert len(output1) == 2
+    assert output1[0].page_content == "large language model"
+    assert output1[0].metadata["abbr"] == "llm"
+    assert output1[1].page_content == "information retrieval"
+    assert output1[1].metadata["abbr"] == "ir"
+
+    # test with metadata filter (doc level)
+    # since the query does not match test_num=1 directly we get "RAG" as the result
+    output2 = docsearch.similarity_search(
+        "large language model",
+        k=1,
+        n_sentence_context=0,
        filter="doc.test_num = 1",
    )
-    assert output[0].page_content == "large language model"
-    assert output[0].metadata == {"abbr": "llm"}
-    assert output[1].page_content == "information retrieval"
-    assert output[1].metadata == {"abbr": "ir"}
+    assert len(output2) == 1
+    assert output2[0].page_content == "retrieval augmented generation"
+    assert output2[0].metadata["abbr"] == "rag"
+
+    for doc_id in doc_ids:
+        docsearch._delete_doc(doc_id)


 def test_vectara_from_files() -> None:
@ -73,8 +91,9 @@ def test_vectara_from_files() -> None:
        urllib.request.urlretrieve(url, name)
        files_list.append(name)

-    docsearch: Vectara = Vectara.from_files(
-        files=files_list,
+    docsearch: Vectara = Vectara()
+    doc_ids = docsearch.add_files(
+        files_list=files_list,
        embedding=FakeEmbeddings(),
        metadatas=[{"url": url, "test_num": "2"} for url in urls],
    )
@ -101,7 +120,6 @@ def test_vectara_from_files() -> None:
        n_sentence_context=1,
        filter="doc.test_num = 2",
    )
-    print(output[0].page_content)
    assert output[0].page_content == (
        """\
 Note the use of “hybrid” in 3) above is different from that used sometimes in the literature, \
@ -114,3 +132,6 @@ This classification scheme, however, misses a key insight gained in deep learnin
 models can greatly improve the training of DNNs and other deep discriminative models via better regularization.\
 """  # noqa: E501
    )
+
+    for doc_id in doc_ids:
+        docsearch._delete_doc(doc_id)
--- a/libs/langchain/tests/unit_tests/chains/test_sequential.py
+++ b/libs/langchain/tests/unit_tests/chains/test_sequential.py
@ -3,11 +3,15 @@ from typing import Dict, List, Optional

 import pytest

-from langchain.callbacks.manager import CallbackManagerForChainRun
+from langchain.callbacks.manager import (
+    AsyncCallbackManagerForChainRun,
+    CallbackManagerForChainRun,
+)
 from langchain.chains.base import Chain
 from langchain.chains.sequential import SequentialChain, SimpleSequentialChain
 from langchain.memory import ConversationBufferMemory
 from langchain.memory.simple import SimpleMemory
+from tests.unit_tests.callbacks.fake_callback_handler import FakeCallbackHandler


 class FakeChain(Chain):
@ -37,6 +41,17 @@ class FakeChain(Chain):
            outputs[var] = f"{' '.join(variables)}foo"
        return outputs

+    async def _acall(
+        self,
+        inputs: Dict[str, str],
+        run_manager: Optional[AsyncCallbackManagerForChainRun] = None,
+    ) -> Dict[str, str]:
+        outputs = {}
+        for var in self.output_variables:
+            variables = [inputs[k] for k in self.input_variables]
+            outputs[var] = f"{' '.join(variables)}foo"
+        return outputs
+

 def test_sequential_usage_single_inputs() -> None:
    """Test sequential on single input chains."""
@ -165,6 +180,36 @@ def test_simple_sequential_functionality() -> None:
    assert output == expected_output


+@pytest.mark.asyncio
+@pytest.mark.parametrize("isAsync", [False, True])
+async def test_simple_sequential_functionality_with_callbacks(isAsync: bool) -> None:
+    """Test simple sequential functionality."""
+    handler_1 = FakeCallbackHandler()
+    handler_2 = FakeCallbackHandler()
+    handler_3 = FakeCallbackHandler()
+    chain_1 = FakeChain(
+        input_variables=["foo"], output_variables=["bar"], callbacks=[handler_1]
+    )
+    chain_2 = FakeChain(
+        input_variables=["bar"], output_variables=["baz"], callbacks=[handler_2]
+    )
+    chain_3 = FakeChain(
+        input_variables=["jack"], output_variables=["baf"], callbacks=[handler_3]
+    )
+    chain = SimpleSequentialChain(chains=[chain_1, chain_2, chain_3])
+    if isAsync:
+        output = await chain.ainvoke({"input": "123"})
+    else:
+        output = chain({"input": "123"})
+    expected_output = {"output": "123foofoofoo", "input": "123"}
+    assert output == expected_output
+    # Check that each of the callbacks were invoked once per the entire run
+    for handler in [handler_1, handler_2, handler_3]:
+        assert handler.starts == 1
+        assert handler.ends == 1
+        assert handler.errors == 0
+
+
 def test_multi_input_errors() -> None:
    """Test simple sequential errors if multiple input variables are expected."""
    chain_1 = FakeChain(input_variables=["foo"], output_variables=["bar"])
--- a/libs/langchain/tests/unit_tests/document_loaders/test_confluence.py
+++ b/libs/langchain/tests/unit_tests/document_loaders/test_confluence.py
@ -3,6 +3,7 @@ from typing import Dict
 from unittest.mock import MagicMock, patch

 import pytest
+import requests

 from langchain.docstore.document import Document
 from langchain.document_loaders.confluence import ConfluenceLoader
@ -23,7 +24,7 @@ class TestConfluenceLoader:

    def test_confluence_loader_initialization(self, mock_confluence: MagicMock) -> None:
        ConfluenceLoader(
-            url=self.CONFLUENCE_URL,
+            self.CONFLUENCE_URL,
            username=self.MOCK_USERNAME,
            api_key=self.MOCK_API_TOKEN,
        )
@ -34,6 +35,36 @@ class TestConfluenceLoader:
            cloud=True,
        )

+    def test_confluence_loader_initialization_invalid(self) -> None:
+        with pytest.raises(ValueError):
+            ConfluenceLoader(
+                self.CONFLUENCE_URL,
+                username=self.MOCK_USERNAME,
+                api_key=self.MOCK_API_TOKEN,
+                token="foo",
+            )
+
+        with pytest.raises(ValueError):
+            ConfluenceLoader(
+                self.CONFLUENCE_URL,
+                username=self.MOCK_USERNAME,
+                api_key=self.MOCK_API_TOKEN,
+                oauth2={
+                    "access_token": "bar",
+                    "access_token_secret": "bar",
+                    "consumer_key": "bar",
+                    "key_cert": "bar",
+                },
+            )
+
+        with pytest.raises(ValueError):
+            ConfluenceLoader(
+                self.CONFLUENCE_URL,
+                username=self.MOCK_USERNAME,
+                api_key=self.MOCK_API_TOKEN,
+                session=requests.Session(),
+            )
+
    def test_confluence_loader_initialization_from_env(
        self, mock_confluence: MagicMock
    ) -> None:
@ -51,7 +82,7 @@ class TestConfluenceLoader:

    def test_confluence_loader_load_data_invalid_args(self) -> None:
        confluence_loader = ConfluenceLoader(
-            url=self.CONFLUENCE_URL,
+            self.CONFLUENCE_URL,
            username=self.MOCK_USERNAME,
            api_key=self.MOCK_API_TOKEN,
        )
@ -125,7 +156,7 @@ class TestConfluenceLoader:
        self, mock_confluence: MagicMock
    ) -> ConfluenceLoader:
        confluence_loader = ConfluenceLoader(
-            url=self.CONFLUENCE_URL,
+            self.CONFLUENCE_URL,
            username=self.MOCK_USERNAME,
            api_key=self.MOCK_API_TOKEN,
        )
--- a/libs/langchain/tests/unit_tests/output_parsers/test_openai_functions.py
+++ b/libs/langchain/tests/unit_tests/output_parsers/test_openai_functions.py
@ -1,4 +1,4 @@
-import json
+from typing import Any, Dict

 import pytest

@ -9,42 +9,101 @@ from langchain.schema import BaseMessage, ChatGeneration, OutputParserException
 from langchain.schema.messages import AIMessage, HumanMessage


-@pytest.fixture
-def ai_message() -> AIMessage:
-    """Return a simple AIMessage."""
-    content = "This is a test message"
-
-    args = json.dumps(
-        {
-            "arg1": "value1",
-        }
+def test_json_output_function_parser() -> None:
+    """Test the JSON output function parser is configured with robust defaults."""
+    message = AIMessage(
+        content="This is a test message",
+        additional_kwargs={
+            "function_call": {
+                "name": "function_name",
+                "arguments": '{"arg1": "code\ncode"}',
+            }
+        },
    )
-
-    function_call = {"name": "function_name", "arguments": args}
-    additional_kwargs = {"function_call": function_call}
-    return AIMessage(content=content, additional_kwargs=additional_kwargs)
-
-
-def test_json_output_function_parser(ai_message: AIMessage) -> None:
-    """Test that the JsonOutputFunctionsParser with full output."""
-    chat_generation = ChatGeneration(message=ai_message)
+    chat_generation = ChatGeneration(message=message)

    # Full output
+    # Test that the parsers defaults are configured to parse in non-strict mode
    parser = JsonOutputFunctionsParser(args_only=False)
    result = parser.parse_result([chat_generation])
-    assert result == {"arguments": {"arg1": "value1"}, "name": "function_name"}
+    assert result == {"arguments": {"arg1": "code\ncode"}, "name": "function_name"}

    # Args only
    parser = JsonOutputFunctionsParser(args_only=True)
    result = parser.parse_result([chat_generation])
-    assert result == {"arg1": "value1"}
+    assert result == {"arg1": "code\ncode"}

    # Verify that the original message is not modified
-    assert ai_message.additional_kwargs == {
-        "function_call": {"name": "function_name", "arguments": '{"arg1": "value1"}'}
+    assert message.additional_kwargs == {
+        "function_call": {
+            "name": "function_name",
+            "arguments": '{"arg1": "code\ncode"}',
+        }
    }


+@pytest.mark.parametrize(
+    "config",
+    [
+        {
+            "args_only": False,
+            "strict": False,
+            "args": '{"arg1": "value1"}',
+            "result": {"arguments": {"arg1": "value1"}, "name": "function_name"},
+            "exception": None,
+        },
+        {
+            "args_only": True,
+            "strict": False,
+            "args": '{"arg1": "value1"}',
+            "result": {"arg1": "value1"},
+            "exception": None,
+        },
+        {
+            "args_only": True,
+            "strict": False,
+            "args": '{"code": "print(2+\n2)"}',
+            "result": {"code": "print(2+\n2)"},
+            "exception": None,
+        },
+        {
+            "args_only": True,
+            "strict": False,
+            "args": '{"code": "你好)"}',
+            "result": {"code": "你好)"},
+            "exception": None,
+        },
+        {
+            "args_only": True,
+            "strict": True,
+            "args": '{"code": "print(2+\n2)"}',
+            "exception": OutputParserException,
+        },
+    ],
+)
+def test_json_output_function_parser_strictness(config: Dict[str, Any]) -> None:
+    """Test parsing with JSON strictness on and off."""
+    args = config["args"]
+
+    message = AIMessage(
+        content="This is a test message",
+        additional_kwargs={
+            "function_call": {"name": "function_name", "arguments": args}
+        },
+    )
+    chat_generation = ChatGeneration(message=message)
+
+    # Full output
+    parser = JsonOutputFunctionsParser(
+        strict=config["strict"], args_only=config["args_only"]
+    )
+    if config["exception"] is not None:
+        with pytest.raises(config["exception"]):
+            parser.parse_result([chat_generation])
+    else:
+        assert parser.parse_result([chat_generation]) == config["result"]
+
+
@pytest.mark.parametrize(
    "bad_message",
    [
--- a/libs/langchain/tests/unit_tests/schema/runnable/snapshots/test_runnable.ambr
+++ b/libs/langchain/tests/unit_tests/schema/runnable/snapshots/test_runnable.ambr
--- a/libs/langchain/tests/unit_tests/schema/runnable/test_runnable.py
+++ b/libs/langchain/tests/unit_tests/schema/runnable/test_runnable.py
@ -21,7 +21,12 @@ from langchain.prompts.chat import (
    SystemMessagePromptTemplate,
 )
 from langchain.schema.document import Document
-from langchain.schema.messages import AIMessage, HumanMessage, SystemMessage
+from langchain.schema.messages import (
+    AIMessage,
+    AIMessageChunk,
+    HumanMessage,
+    SystemMessage,
+)
 from langchain.schema.output_parser import StrOutputParser
 from langchain.schema.retriever import BaseRetriever
 from langchain.schema.runnable import (
@ -809,7 +814,7 @@ def test_map_stream() -> None:
    assert streamed_chunks[0] in [
        {"passthrough": prompt.invoke({"question": "What is your name?"})},
        {"llm": "i"},
-        {"chat": "i"},
+        {"chat": AIMessageChunk(content="i")},
    ]
    assert len(streamed_chunks) == len(chat_res) + len(llm_res) + 1
    assert all(len(c.keys()) == 1 for c in streamed_chunks)
@ -856,7 +861,11 @@ def test_map_stream_iterator_input() -> None:
        else:
            final_value += chunk

-    assert streamed_chunks[0] in [{"passthrough": "i"}, {"llm": "i"}, {"chat": "i"}]
+    assert streamed_chunks[0] in [
+        {"passthrough": "i"},
+        {"llm": "i"},
+        {"chat": AIMessageChunk(content="i")},
+    ]
    assert len(streamed_chunks) == len(chat_res) + len(llm_res) + len(llm_res)
    assert all(len(c.keys()) == 1 for c in streamed_chunks)
    assert final_value is not None
@ -900,7 +909,7 @@ async def test_map_astream() -> None:
    assert streamed_chunks[0] in [
        {"passthrough": prompt.invoke({"question": "What is your name?"})},
        {"llm": "i"},
-        {"chat": "i"},
+        {"chat": AIMessageChunk(content="i")},
    ]
    assert len(streamed_chunks) == len(chat_res) + len(llm_res) + 1
    assert all(len(c.keys()) == 1 for c in streamed_chunks)
@ -948,7 +957,11 @@ async def test_map_astream_iterator_input() -> None:
        else:
            final_value += chunk

-    assert streamed_chunks[0] in [{"passthrough": "i"}, {"llm": "i"}, {"chat": "i"}]
+    assert streamed_chunks[0] in [
+        {"passthrough": "i"},
+        {"llm": "i"},
+        {"chat": AIMessageChunk(content="i")},
+    ]
    assert len(streamed_chunks) == len(chat_res) + len(llm_res) + len(llm_res)
    assert all(len(c.keys()) == 1 for c in streamed_chunks)
    assert final_value is not None
--- a/libs/langchain/tests/unit_tests/smith/evaluation/test_runner_utils.py
+++ b/libs/langchain/tests/unit_tests/smith/evaluation/test_runner_utils.py
@ -181,15 +181,12 @@ def test_run_llm_or_chain_with_input_mapper() -> None:
        assert "the wrong input" in inputs
        return {"the right input": inputs["the wrong input"]}

-    result = _run_llm_or_chain(
-        example, lambda: mock_chain, n_repetitions=1, input_mapper=input_mapper
-    )
+    result = _run_llm_or_chain(example, lambda: mock_chain, input_mapper=input_mapper)
    assert len(result) == 1
    assert result[0] == {"output": "2", "the right input": "1"}
    bad_result = _run_llm_or_chain(
        example,
        lambda: mock_chain,
-        n_repetitions=1,
    )
    assert len(bad_result) == 1
    assert "Error" in bad_result[0]
@ -200,9 +197,7 @@ def test_run_llm_or_chain_with_input_mapper() -> None:
        return "the right input"

    mock_llm = FakeLLM(queries={"the right input": "somenumber"})
-    result = _run_llm_or_chain(
-        example, mock_llm, n_repetitions=1, input_mapper=llm_input_mapper
-    )
+    result = _run_llm_or_chain(example, mock_llm, input_mapper=llm_input_mapper)
    assert len(result) == 1
    llm_result = result[0]
    assert isinstance(llm_result, str)
@ -302,14 +297,11 @@ async def test_arun_on_dataset(monkeypatch: pytest.MonkeyPatch) -> None:
    async def mock_arun_chain(
        example: Example,
        llm_or_chain: Union[BaseLanguageModel, Chain],
-        n_repetitions: int,
        tags: Optional[List[str]] = None,
        callbacks: Optional[Any] = None,
        **kwargs: Any,
    ) -> List[Dict[str, Any]]:
-        return [
-            {"result": f"Result for example {example.id}"} for _ in range(n_repetitions)
-        ]
+        return [{"result": f"Result for example {example.id}"}]

    def mock_create_project(*args: Any, **kwargs: Any) -> Any:
        proj = mock.MagicMock()
@ -327,20 +319,17 @@ async def test_arun_on_dataset(monkeypatch: pytest.MonkeyPatch) -> None:
        client = Client(api_url="http://localhost:1984", api_key="123")
        chain = mock.MagicMock()
        chain.input_keys = ["foothing"]
-        num_repetitions = 3
        results = await arun_on_dataset(
            dataset_name="test",
            llm_or_chain_factory=lambda: chain,
            concurrency_level=2,
            project_name="test_project",
-            num_repetitions=num_repetitions,
            client=client,
        )

        expected = {
            uuid_: [
-                {"result": f"Result for example {uuid.UUID(uuid_)}"}
-                for _ in range(num_repetitions)
+                {"result": f"Result for example {uuid.UUID(uuid_)}"} for _ in range(1)
            ]
            for uuid_ in uuids
        }
--- a/libs/langchain/tests/unit_tests/test_bash.py
+++ b/libs/langchain/tests/unit_tests/test_bash.py
@ -56,7 +56,9 @@ def test_incorrect_command_return_err_output() -> None:
    """Test optional returning of shell output on incorrect command."""
    session = BashProcess(return_err_output=True)
    output = session.run(["invalid_command"])
-    assert re.match(r"^/bin/sh:.*invalid_command.*not found.*$", output)
+    assert re.match(
+        r"^/bin/sh:.*invalid_command.*(?:not found|Permission denied).*$", output
+    )


@pytest.mark.skipif(
--- a/libs/langchain/tests/unit_tests/utilities/test_loading.py
+++ b/libs/langchain/tests/unit_tests/utilities/test_loading.py
@ -48,7 +48,9 @@ def test_invalid_suffix() -> None:
    loader = Mock()
    valid_suffixes = {"json"}

-    with pytest.raises(ValueError, match="Unsupported file type."):
+    with pytest.raises(
+        ValueError, match=f"Unsupported file type, must be one of {valid_suffixes}."
+    ):
        try_load_from_hub(path, loader, "chains", valid_suffixes)

    loader.assert_not_called()