wip

2026-02-15 09:39:11 +00:00 · 2024-02-14 12:27:14 -08:00
272 changed files with 5404 additions and 70892 deletions
--- a/.github/CONTRIBUTING.md
+++ b/.github/CONTRIBUTING.md
@@ -3,4 +3,43 @@
 Hi there! Thank you for even being interested in contributing to LangChain.
 As an open-source project in a rapidly developing field, we are extremely open to contributions, whether they involve new features, improved infrastructure, better documentation, or bug fixes.

-To learn how to contribute to LangChain, please follow the [contribution guide here](https://python.langchain.com/docs/contributing/).
+To learn about how to contribute, please follow the [guides here](https://python.langchain.com/docs/contributing/)
+
+## 🗺️ Guidelines
+
+### 👩‍💻 Ways to contribute
+
+There are many ways to contribute to LangChain. Here are some common ways people contribute:
+
+- [**Documentation**](https://python.langchain.com/docs/contributing/documentation): Help improve our docs, including this one!
+- [**Code**](https://python.langchain.com/docs/contributing/code): Help us write code, fix bugs, or improve our infrastructure.
+- [**Integrations**](https://python.langchain.com/docs/contributing/integrations): Help us integrate with your favorite vendors and tools.
+
+### 🚩GitHub Issues
+
+Our [issues](https://github.com/langchain-ai/langchain/issues) page is kept up to date with bugs, improvements, and feature requests.
+
+There is a taxonomy of labels to help with sorting and discovery of issues of interest. Please use these to help organize issues.
+
+If you start working on an issue, please assign it to yourself.
+
+If you are adding an issue, please try to keep it focused on a single, modular bug/improvement/feature.
+If two issues are related, or blocking, please link them rather than combining them.
+
+We will try to keep these issues as up-to-date as possible, though
+with the rapid rate of development in this field some may get out of date.
+If you notice this happening, please let us know.
+
+### 🙋Getting Help
+
+Our goal is to have the simplest developer setup possible. Should you experience any difficulty getting setup, please
+contact a maintainer! Not only do we want to help get you unblocked, but we also want to make sure that the process is
+smooth for future contributors.
+
+In a similar vein, we do enforce certain linting, formatting, and documentation standards in the codebase.
+If you are finding these difficult (or even just annoying) to work with, feel free to contact a maintainer for help -
+we do not want these to get in the way of getting good code into the codebase.
+
+### Contributor Documentation
+
+To learn about how to contribute, please follow the [guides here](https://python.langchain.com/docs/contributing/)
--- a/.github/PULL_REQUEST_TEMPLATE.md
+++ b/.github/PULL_REQUEST_TEMPLATE.md
@@ -1,24 +1,19 @@
 Thank you for contributing to LangChain!

- [ ] **PR title**: "package: description"
-  - Where "package" is whichever of langchain, community, core, experimental, etc. is being modified. Use "docs: ..." for purely docs changes, "templates: ..." for template changes, "infra: ..." for CI changes.
+Checklist:
+
+- [ ] PR title: Please title your PR "package: description", where "package" is whichever of langchain, community, core, experimental, etc. is being modified. Use "docs: ..." for purely docs changes, "templates: ..." for template changes, "infra: ..." for CI changes.
  - Example: "community: add foobar LLM"
-
-
- [ ] **PR message**: ***Delete this entire checklist*** and replace with
+- [ ] PR message: **Delete this entire template message** and replace it with the following bulleted list
    - **Description:** a description of the change
    - **Issue:** the issue # it fixes, if applicable
    - **Dependencies:** any dependencies required for this change
    - **Twitter handle:** if your PR gets announced, and you'd like a mention, we'll gladly shout you out!
-
-
- [ ] **Add tests and docs**: If you're adding a new integration, please include
+- [ ] Pass lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified to check that you're passing lint and testing. See contribution guidelines for more information on how to write/run tests, lint, etc: https://python.langchain.com/docs/contributing/
+- [ ] Add tests and docs: If you're adding a new integration, please include
  1. a test for the integration, preferably unit tests that do not rely on network access,
  2. an example notebook showing its use. It lives in `docs/docs/integrations` directory.

-
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/
-
 Additional guidelines:
 - Make sure optional dependencies are imported within a function.
 - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests.
--- a/.github/actions/people/Dockerfile
+++ b/.github/actions/people/Dockerfile
@@ -1,7 +0,0 @@
-FROM python:3.9
-
-RUN pip install httpx PyGithub "pydantic==2.0.2" pydantic-settings "pyyaml>=5.3.1,<6.0.0"
-
-COPY ./app /app
-
-CMD ["python", "/app/main.py"]
--- a/.github/actions/people/action.yml
+++ b/.github/actions/people/action.yml
@@ -1,11 +0,0 @@
-# Adapted from https://github.com/tiangolo/fastapi/blob/master/.github/actions/people/action.yml
-name: "Generate LangChain People"
-description: "Generate the data for the LangChain People page"
-author: "Jacob Lee <jacob@langchain.dev>"
-inputs:
-  token:
-    description: 'User token, to read the GitHub API. Can be passed in using {{ secrets.LANGCHAIN_PEOPLE_GITHUB_TOKEN }}'
-    required: true
-runs:
-  using: 'docker'
-  image: 'Dockerfile'
--- a/.github/actions/people/app/main.py
+++ b/.github/actions/people/app/main.py
@@ -1,641 +0,0 @@
-# Adapted from https://github.com/tiangolo/fastapi/blob/master/.github/actions/people/app/main.py
-
-import logging
-import subprocess
-import sys
-from collections import Counter
-from datetime import datetime, timedelta, timezone
-from pathlib import Path
-from typing import Any, Container, Dict, List, Set, Union
-
-import httpx
-import yaml
-from github import Github
-from pydantic import BaseModel, SecretStr
-from pydantic_settings import BaseSettings
-
-github_graphql_url = "https://api.github.com/graphql"
-questions_category_id = "DIC_kwDOIPDwls4CS6Ve"
-
-# discussions_query = """
-# query Q($after: String, $category_id: ID) {
-#   repository(name: "langchain", owner: "langchain-ai") {
-#     discussions(first: 100, after: $after, categoryId: $category_id) {
-#       edges {
-#         cursor
-#         node {
-#           number
-#           author {
-#             login
-#             avatarUrl
-#             url
-#           }
-#           title
-#           createdAt
-#           comments(first: 100) {
-#             nodes {
-#               createdAt
-#               author {
-#                 login
-#                 avatarUrl
-#                 url
-#               }
-#               isAnswer
-#               replies(first: 10) {
-#                 nodes {
-#                   createdAt
-#                   author {
-#                     login
-#                     avatarUrl
-#                     url
-#                   }
-#                 }
-#               }
-#             }
-#           }
-#         }
-#       }
-#     }
-#   }
-# }
-# """
-
-# issues_query = """
-# query Q($after: String) {
-#   repository(name: "langchain", owner: "langchain-ai") {
-#     issues(first: 100, after: $after) {
-#       edges {
-#         cursor
-#         node {
-#           number
-#           author {
-#             login
-#             avatarUrl
-#             url
-#           }
-#           title
-#           createdAt
-#           state
-#           comments(first: 100) {
-#             nodes {
-#               createdAt
-#               author {
-#                 login
-#                 avatarUrl
-#                 url
-#               }
-#             }
-#           }
-#         }
-#       }
-#     }
-#   }
-# }
-# """
-
-prs_query = """
-query Q($after: String) {
-  repository(name: "langchain", owner: "langchain-ai") {
-    pullRequests(first: 100, after: $after, states: MERGED) {
-      edges {
-        cursor
-        node {
-          changedFiles
-          additions
-          deletions
-          number
-          labels(first: 100) {
-            nodes {
-              name
-            }
-          }
-          author {
-            login
-            avatarUrl
-            url
-            ... on User {
-              twitterUsername
-            }
-          }
-          title
-          createdAt
-          state
-          reviews(first:100) {
-            nodes {
-              author {
-                login
-                avatarUrl
-                url
-                ... on User {
-                  twitterUsername
-                }
-              }
-              state
-            }
-          }
-        }
-      }
-    }
-  }
-}
-"""
-
-
-class Author(BaseModel):
-    login: str
-    avatarUrl: str
-    url: str
-    twitterUsername: Union[str, None] = None
-
-
-# Issues and Discussions
-
-
-class CommentsNode(BaseModel):
-    createdAt: datetime
-    author: Union[Author, None] = None
-
-
-class Replies(BaseModel):
-    nodes: List[CommentsNode]
-
-
-class DiscussionsCommentsNode(CommentsNode):
-    replies: Replies
-
-
-class Comments(BaseModel):
-    nodes: List[CommentsNode]
-
-
-class DiscussionsComments(BaseModel):
-    nodes: List[DiscussionsCommentsNode]
-
-
-class IssuesNode(BaseModel):
-    number: int
-    author: Union[Author, None] = None
-    title: str
-    createdAt: datetime
-    state: str
-    comments: Comments
-
-
-class DiscussionsNode(BaseModel):
-    number: int
-    author: Union[Author, None] = None
-    title: str
-    createdAt: datetime
-    comments: DiscussionsComments
-
-
-class IssuesEdge(BaseModel):
-    cursor: str
-    node: IssuesNode
-
-
-class DiscussionsEdge(BaseModel):
-    cursor: str
-    node: DiscussionsNode
-
-
-class Issues(BaseModel):
-    edges: List[IssuesEdge]
-
-
-class Discussions(BaseModel):
-    edges: List[DiscussionsEdge]
-
-
-class IssuesRepository(BaseModel):
-    issues: Issues
-
-
-class DiscussionsRepository(BaseModel):
-    discussions: Discussions
-
-
-class IssuesResponseData(BaseModel):
-    repository: IssuesRepository
-
-
-class DiscussionsResponseData(BaseModel):
-    repository: DiscussionsRepository
-
-
-class IssuesResponse(BaseModel):
-    data: IssuesResponseData
-
-
-class DiscussionsResponse(BaseModel):
-    data: DiscussionsResponseData
-
-
-# PRs
-
-
-class LabelNode(BaseModel):
-    name: str
-
-
-class Labels(BaseModel):
-    nodes: List[LabelNode]
-
-
-class ReviewNode(BaseModel):
-    author: Union[Author, None] = None
-    state: str
-
-
-class Reviews(BaseModel):
-    nodes: List[ReviewNode]
-
-
-class PullRequestNode(BaseModel):
-    number: int
-    labels: Labels
-    author: Union[Author, None] = None
-    changedFiles: int
-    additions: int
-    deletions: int
-    title: str
-    createdAt: datetime
-    state: str
-    reviews: Reviews
-    # comments: Comments
-
-
-class PullRequestEdge(BaseModel):
-    cursor: str
-    node: PullRequestNode
-
-
-class PullRequests(BaseModel):
-    edges: List[PullRequestEdge]
-
-
-class PRsRepository(BaseModel):
-    pullRequests: PullRequests
-
-
-class PRsResponseData(BaseModel):
-    repository: PRsRepository
-
-
-class PRsResponse(BaseModel):
-    data: PRsResponseData
-
-
-class Settings(BaseSettings):
-    input_token: SecretStr
-    github_repository: str
-    httpx_timeout: int = 30
-
-
-def get_graphql_response(
-    *,
-    settings: Settings,
-    query: str,
-    after: Union[str, None] = None,
-    category_id: Union[str, None] = None,
-) -> Dict[str, Any]:
-    headers = {"Authorization": f"token {settings.input_token.get_secret_value()}"}
-    # category_id is only used by one query, but GraphQL allows unused variables, so
-    # keep it here for simplicity
-    variables = {"after": after, "category_id": category_id}
-    response = httpx.post(
-        github_graphql_url,
-        headers=headers,
-        timeout=settings.httpx_timeout,
-        json={"query": query, "variables": variables, "operationName": "Q"},
-    )
-    if response.status_code != 200:
-        logging.error(
-            f"Response was not 200, after: {after}, category_id: {category_id}"
-        )
-        logging.error(response.text)
-        raise RuntimeError(response.text)
-    data = response.json()
-    if "errors" in data:
-        logging.error(f"Errors in response, after: {after}, category_id: {category_id}")
-        logging.error(data["errors"])
-        logging.error(response.text)
-        raise RuntimeError(response.text)
-    return data
-
-
-# def get_graphql_issue_edges(*, settings: Settings, after: Union[str, None] = None):
-#     data = get_graphql_response(settings=settings, query=issues_query, after=after)
-#     graphql_response = IssuesResponse.model_validate(data)
-#     return graphql_response.data.repository.issues.edges
-
-
-# def get_graphql_question_discussion_edges(
-#     *,
-#     settings: Settings,
-#     after: Union[str, None] = None,
-# ):
-#     data = get_graphql_response(
-#         settings=settings,
-#         query=discussions_query,
-#         after=after,
-#         category_id=questions_category_id,
-#     )
-#     graphql_response = DiscussionsResponse.model_validate(data)
-#     return graphql_response.data.repository.discussions.edges
-
-
-def get_graphql_pr_edges(*, settings: Settings, after: Union[str, None] = None):
-    if after is None:
-        print("Querying PRs...")
-    else:
-        print(f"Querying PRs with cursor {after}...")
-    data = get_graphql_response(
-        settings=settings,
-        query=prs_query,
-        after=after
-    )
-    graphql_response = PRsResponse.model_validate(data)
-    return graphql_response.data.repository.pullRequests.edges
-
-
-# def get_issues_experts(settings: Settings):
-#     issue_nodes: List[IssuesNode] = []
-#     issue_edges = get_graphql_issue_edges(settings=settings)
-
-#     while issue_edges:
-#         for edge in issue_edges:
-#             issue_nodes.append(edge.node)
-#         last_edge = issue_edges[-1]
-#         issue_edges = get_graphql_issue_edges(settings=settings, after=last_edge.cursor)
-
-#     commentors = Counter()
-#     last_month_commentors = Counter()
-#     authors: Dict[str, Author] = {}
-
-#     now = datetime.now(tz=timezone.utc)
-#     one_month_ago = now - timedelta(days=30)
-
-#     for issue in issue_nodes:
-#         issue_author_name = None
-#         if issue.author:
-#             authors[issue.author.login] = issue.author
-#             issue_author_name = issue.author.login
-#         issue_commentors = set()
-#         for comment in issue.comments.nodes:
-#             if comment.author:
-#                 authors[comment.author.login] = comment.author
-#                 if comment.author.login != issue_author_name:
-#                     issue_commentors.add(comment.author.login)
-#         for author_name in issue_commentors:
-#             commentors[author_name] += 1
-#             if issue.createdAt > one_month_ago:
-#                 last_month_commentors[author_name] += 1
-
-#     return commentors, last_month_commentors, authors
-
-
-# def get_discussions_experts(settings: Settings):
-#     discussion_nodes: List[DiscussionsNode] = []
-#     discussion_edges = get_graphql_question_discussion_edges(settings=settings)
-
-#     while discussion_edges:
-#         for discussion_edge in discussion_edges:
-#             discussion_nodes.append(discussion_edge.node)
-#         last_edge = discussion_edges[-1]
-#         discussion_edges = get_graphql_question_discussion_edges(
-#             settings=settings, after=last_edge.cursor
-#         )
-
-#     commentors = Counter()
-#     last_month_commentors = Counter()
-#     authors: Dict[str, Author] = {}
-
-#     now = datetime.now(tz=timezone.utc)
-#     one_month_ago = now - timedelta(days=30)
-
-#     for discussion in discussion_nodes:
-#         discussion_author_name = None
-#         if discussion.author:
-#             authors[discussion.author.login] = discussion.author
-#             discussion_author_name = discussion.author.login
-#         discussion_commentors = set()
-#         for comment in discussion.comments.nodes:
-#             if comment.author:
-#                 authors[comment.author.login] = comment.author
-#                 if comment.author.login != discussion_author_name:
-#                     discussion_commentors.add(comment.author.login)
-#             for reply in comment.replies.nodes:
-#                 if reply.author:
-#                     authors[reply.author.login] = reply.author
-#                     if reply.author.login != discussion_author_name:
-#                         discussion_commentors.add(reply.author.login)
-#         for author_name in discussion_commentors:
-#             commentors[author_name] += 1
-#             if discussion.createdAt > one_month_ago:
-#                 last_month_commentors[author_name] += 1
-#     return commentors, last_month_commentors, authors
-
-
-# def get_experts(settings: Settings):
-#     (
-#         discussions_commentors,
-#         discussions_last_month_commentors,
-#         discussions_authors,
-#     ) = get_discussions_experts(settings=settings)
-#     commentors = discussions_commentors
-#     last_month_commentors = discussions_last_month_commentors
-#     authors = {**discussions_authors}
-#     return commentors, last_month_commentors, authors
-
-
-def _logistic(x, k):
-    return x / (x + k)
-
-
-def get_contributors(settings: Settings):
-    pr_nodes: List[PullRequestNode] = []
-    pr_edges = get_graphql_pr_edges(settings=settings)
-
-    while pr_edges:
-        for edge in pr_edges:
-            pr_nodes.append(edge.node)
-        last_edge = pr_edges[-1]
-        pr_edges = get_graphql_pr_edges(settings=settings, after=last_edge.cursor)
-
-    contributors = Counter()
-    contributor_scores = Counter()
-    recent_contributor_scores = Counter()
-    reviewers = Counter()
-    authors: Dict[str, Author] = {}
-
-    for pr in pr_nodes:
-        pr_reviewers: Set[str] = set()
-        for review in pr.reviews.nodes:
-            if review.author:
-                authors[review.author.login] = review.author
-                pr_reviewers.add(review.author.login)
-        for reviewer in pr_reviewers:
-            reviewers[reviewer] += 1
-        if pr.author:
-            authors[pr.author.login] = pr.author
-            contributors[pr.author.login] += 1
-            files_changed = pr.changedFiles
-            lines_changed = pr.additions + pr.deletions
-            score = _logistic(files_changed, 20) + _logistic(lines_changed, 100)
-            contributor_scores[pr.author.login] += score
-            three_months_ago = (datetime.now(timezone.utc) - timedelta(days=3*30))
-            if pr.createdAt > three_months_ago:
-                recent_contributor_scores[pr.author.login] += score
-    return contributors, contributor_scores, recent_contributor_scores, reviewers, authors
-
-
-def get_top_users(
-    *,
-    counter: Counter,
-    min_count: int,
-    authors: Dict[str, Author],
-    skip_users: Container[str],
-):
-    users = []
-    for commentor, count in counter.most_common():
-        if commentor in skip_users:
-            continue
-        if count >= min_count:
-            author = authors[commentor]
-            users.append(
-                {
-                    "login": commentor,
-                    "count": count,
-                    "avatarUrl": author.avatarUrl,
-                    "twitterUsername": author.twitterUsername,
-                    "url": author.url,
-                }
-            )
-    return users
-
-
-if __name__ == "__main__":
-    logging.basicConfig(level=logging.INFO)
-    settings = Settings()
-    logging.info(f"Using config: {settings.model_dump_json()}")
-    g = Github(settings.input_token.get_secret_value())
-    repo = g.get_repo(settings.github_repository)
-    # question_commentors, question_last_month_commentors, question_authors = get_experts(
-    #     settings=settings
-    # )
-    contributors, contributor_scores, recent_contributor_scores, reviewers, pr_authors = get_contributors(
-        settings=settings
-    )
-    # authors = {**question_authors, **pr_authors}
-    authors = {**pr_authors}
-    maintainers_logins = {
-        "hwchase17",
-        "agola11",
-        "baskaryan",
-        "hinthornw",
-        "nfcampos",
-        "efriis",
-        "eyurtsev",
-        "rlancemartin"
-    }
-    hidden_logins = {
-        "dev2049",
-        "vowelparrot",
-        "obi1kenobi",
-        "langchain-infra",
-        "jacoblee93",
-        "dqbd",
-        "bracesproul",
-        "akira",
-    }
-    bot_names = {"dosubot", "github-actions", "CodiumAI-Agent"}
-    maintainers = []
-    for login in maintainers_logins:
-        user = authors[login]
-        maintainers.append(
-            {
-                "login": login,
-                "count": contributors[login], #+ question_commentors[login],
-                "avatarUrl": user.avatarUrl,
-                "twitterUsername": user.twitterUsername,
-                "url": user.url,
-            }
-        )
-
-    # min_count_expert = 10
-    # min_count_last_month = 3
-    min_score_contributor = 1
-    min_count_reviewer = 5
-    skip_users = maintainers_logins | bot_names | hidden_logins
-    # experts = get_top_users(
-    #     counter=question_commentors,
-    #     min_count=min_count_expert,
-    #     authors=authors,
-    #     skip_users=skip_users,
-    # )
-    # last_month_active = get_top_users(
-    #     counter=question_last_month_commentors,
-    #     min_count=min_count_last_month,
-    #     authors=authors,
-    #     skip_users=skip_users,
-    # )
-    top_recent_contributors = get_top_users(
-        counter=recent_contributor_scores,
-        min_count=min_score_contributor,
-        authors=authors,
-        skip_users=skip_users,
-    )
-    top_contributors = get_top_users(
-        counter=contributor_scores,
-        min_count=min_score_contributor,
-        authors=authors,
-        skip_users=skip_users,
-    )
-    top_reviewers = get_top_users(
-        counter=reviewers,
-        min_count=min_count_reviewer,
-        authors=authors,
-        skip_users=skip_users,
-    )
-
-    people = {
-        "maintainers": maintainers,
-        # "experts": experts,
-        # "last_month_active": last_month_active,
-        "top_recent_contributors": top_recent_contributors,
-        "top_contributors": top_contributors,
-        "top_reviewers": top_reviewers,
-    }
-    people_path = Path("./docs/data/people.yml")
-    people_old_content = people_path.read_text(encoding="utf-8")
-    new_people_content = yaml.dump(
-        people, sort_keys=False, width=200, allow_unicode=True
-    )
-    if (
-        people_old_content == new_people_content
-    ):
-        logging.info("The LangChain People data hasn't changed, finishing.")
-        sys.exit(0)
-    people_path.write_text(new_people_content, encoding="utf-8")
-    logging.info("Setting up GitHub Actions git user")
-    subprocess.run(["git", "config", "user.name", "github-actions"], check=True)
-    subprocess.run(
-        ["git", "config", "user.email", "github-actions@github.com"], check=True
-    )
-    branch_name = "langchain/langchain-people"
-    logging.info(f"Creating a new branch {branch_name}")
-    subprocess.run(["git", "checkout", "-B", branch_name], check=True)
-    logging.info("Adding updated file")
-    subprocess.run(
-        ["git", "add", str(people_path)], check=True
-    )
-    logging.info("Committing updated file")
-    message = "👥 Update LangChain people data"
-    result = subprocess.run(["git", "commit", "-m", message], check=True)
-    logging.info("Pushing branch")
-    subprocess.run(["git", "push", "origin", branch_name, "-f"], check=True)
-    logging.info("Creating PR")
-    pr = repo.create_pull(title=message, body=message, base="master", head=branch_name)
-    logging.info(f"Created PR: {pr.number}")
-    logging.info("Finished")
--- a/.github/workflows/_integration_test.yml
+++ b/.github/workflows/_integration_test.yml
@@ -52,7 +52,6 @@ jobs:
      - name: Run integration tests
        shell: bash
        env:
-          AI21_API_KEY: ${{ secrets.AI21_API_KEY }}
          GOOGLE_API_KEY: ${{ secrets.GOOGLE_API_KEY }}
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
          MISTRAL_API_KEY: ${{ secrets.MISTRAL_API_KEY }}
@@ -67,9 +66,6 @@ jobs:
          WATSONX_PROJECT_ID: ${{ secrets.WATSONX_PROJECT_ID }}
          PINECONE_API_KEY: ${{ secrets.PINECONE_API_KEY }}
          PINECONE_ENVIRONMENT: ${{ secrets.PINECONE_ENVIRONMENT }}
-          ASTRA_DB_API_ENDPOINT: ${{ secrets.ASTRA_DB_API_ENDPOINT }}
-          ASTRA_DB_APPLICATION_TOKEN: ${{ secrets.ASTRA_DB_APPLICATION_TOKEN }}
-          ASTRA_DB_KEYSPACE: ${{ secrets.ASTRA_DB_KEYSPACE }}
        run: |
          make integration_tests

--- a/.github/workflows/_release.yml
+++ b/.github/workflows/_release.yml
@@ -166,7 +166,6 @@ jobs:
      - name: Run integration tests
        if: ${{ startsWith(inputs.working-directory, 'libs/partners/') }}
        env:
-          AI21_API_KEY: ${{ secrets.AI21_API_KEY }}
          GOOGLE_API_KEY: ${{ secrets.GOOGLE_API_KEY }}
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
          MISTRAL_API_KEY: ${{ secrets.MISTRAL_API_KEY }}
@@ -187,9 +186,6 @@ jobs:
          WATSONX_PROJECT_ID: ${{ secrets.WATSONX_PROJECT_ID }}
          PINECONE_API_KEY: ${{ secrets.PINECONE_API_KEY }}
          PINECONE_ENVIRONMENT: ${{ secrets.PINECONE_ENVIRONMENT }}
-          ASTRA_DB_API_ENDPOINT: ${{ secrets.ASTRA_DB_API_ENDPOINT }}
-          ASTRA_DB_APPLICATION_TOKEN: ${{ secrets.ASTRA_DB_APPLICATION_TOKEN }}
-          ASTRA_DB_KEYSPACE: ${{ secrets.ASTRA_DB_KEYSPACE }}
        run: make integration_tests
        working-directory: ${{ inputs.working-directory }}

--- a/.github/workflows/people.yml
+++ b/.github/workflows/people.yml
@@ -1,36 +0,0 @@
-name: LangChain People
-
-on:
-  schedule:
-    - cron: "0 14 1 * *"
-  push:
-    branches: [jacob/people]
-  workflow_dispatch:
-    inputs:
-      debug_enabled:
-        description: 'Run the build with tmate debugging enabled (https://github.com/marketplace/actions/debugging-with-tmate)'
-        required: false
-        default: 'false'
-
-jobs:
-  langchain-people:
-    if: github.repository_owner == 'langchain-ai'
-    runs-on: ubuntu-latest
-    steps:
-      - name: Dump GitHub context
-        env:
-          GITHUB_CONTEXT: ${{ toJson(github) }}
-        run: echo "$GITHUB_CONTEXT"
-      - uses: actions/checkout@v4
-      # Ref: https://github.com/actions/runner/issues/2033
-      - name: Fix git safe.directory in container
-        run: mkdir -p /home/runner/work/_temp/_github_home && printf "[safe]\n\tdirectory = /github/workspace" > /home/runner/work/_temp/_github_home/.gitconfig
-      # Allow debugging with tmate
-      - name: Setup tmate session
-        uses: mxschmitt/action-tmate@v3
-        if: ${{ github.event_name == 'workflow_dispatch' && github.event.inputs.debug_enabled == 'true' }}
-        with:
-          limit-access-to-actor: true
-      - uses: ./.github/actions/people
-        with:
-          token: ${{ secrets.LANGCHAIN_PEOPLE_GITHUB_TOKEN }}
--- a/.gitignore
+++ b/.gitignore
@@ -177,6 +177,4 @@ docs/docs/build
 docs/docs/node_modules
 docs/docs/yarn.lock
 _dist
-docs/docs/templates
-
-prof
+docs/docs/templates
--- a/7
+++ b/7
@@ -15,12 +15,7 @@ docs_build:
 	docs/.local_build.sh

 docs_clean:
-	@if [ -d _dist ]; then \
-			rm -r _dist; \
-			echo "Directory _dist has been cleaned."; \
-	else \
-			echo "Nothing to clean."; \
-	fi
+	rm -r _dist

 docs_linkcheck:
 	poetry run linkchecker _dist/docs/ --ignore-url node_modules
--- a/README.md
+++ b/README.md
@@ -18,7 +18,7 @@ Looking for the JS/TS library? Check out [LangChain.js](https://github.com/langc

 To help you ship LangChain apps to production faster, check out [LangSmith](https://smith.langchain.com). 
 [LangSmith](https://smith.langchain.com) is a unified developer platform for building, testing, and monitoring LLM applications. 
-Fill out [this form](https://www.langchain.com/contact-sales) to speak with our sales team.
+Fill out [this form](https://airtable.com/appwQzlErAS2qiP0L/shrGtGaVBVAz7NcV2) to get off the waitlist or speak with our sales team.

 ## Quick Install

--- a/cookbook/amazon_personalize_how_to.ipynb
+++ b/cookbook/amazon_personalize_how_to.ipynb
@@ -1,284 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Amazon Personalize\n",
-    "\n",
-    "[Amazon Personalize](https://docs.aws.amazon.com/personalize/latest/dg/what-is-personalize.html) is a fully managed machine learning service that uses your data to generate item recommendations for your users. It can also generate user segments based on the users' affinity for certain items or item metadata.\n",
-    "\n",
-    "This notebook goes through how to use Amazon Personalize Chain. You need a Amazon Personalize campaign_arn or a recommender_arn before you get started with the below notebook.\n",
-    "\n",
-    "Following is a [tutorial](https://github.com/aws-samples/retail-demo-store/blob/master/workshop/1-Personalization/Lab-1-Introduction-and-data-preparation.ipynb) to setup a campaign_arn/recommender_arn on Amazon Personalize. Once the campaign_arn/recommender_arn is setup, you can use it in the langchain ecosystem. \n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## 1. Install Dependencies"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "scrolled": true
-   },
-   "outputs": [],
-   "source": [
-    "!pip install boto3"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## 2. Sample Use-cases"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### 2.1 [Use-case-1] Setup Amazon Personalize Client and retrieve recommendations"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "from langchain_experimental.recommenders import AmazonPersonalize\n",
-    "\n",
-    "recommender_arn = \"<insert_arn>\"\n",
-    "\n",
-    "client = AmazonPersonalize(\n",
-    "    credentials_profile_name=\"default\",\n",
-    "    region_name=\"us-west-2\",\n",
-    "    recommender_arn=recommender_arn,\n",
-    ")\n",
-    "client.get_recommendations(user_id=\"1\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "collapsed": false,
-    "jupyter": {
-     "outputs_hidden": false
-    }
-   },
-   "source": [
-    "### 2.2 [Use-case-2] Invoke Personalize Chain for summarizing results"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "collapsed": false,
-    "jupyter": {
-     "outputs_hidden": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "from langchain.llms.bedrock import Bedrock\n",
-    "from langchain_experimental.recommenders import AmazonPersonalizeChain\n",
-    "\n",
-    "bedrock_llm = Bedrock(model_id=\"anthropic.claude-v2\", region_name=\"us-west-2\")\n",
-    "\n",
-    "# Create personalize chain\n",
-    "# Use return_direct=True if you do not want summary\n",
-    "chain = AmazonPersonalizeChain.from_llm(\n",
-    "    llm=bedrock_llm, client=client, return_direct=False\n",
-    ")\n",
-    "response = chain({\"user_id\": \"1\"})\n",
-    "print(response)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### 2.3 [Use-Case-3] Invoke Amazon Personalize Chain using your own prompt"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "from langchain.prompts.prompt import PromptTemplate\n",
-    "\n",
-    "RANDOM_PROMPT_QUERY = \"\"\"\n",
-    "You are a skilled publicist. Write a high-converting marketing email advertising several movies available in a video-on-demand streaming platform next week, \n",
-    "    given the movie and user information below. Your email will leverage the power of storytelling and persuasive language. \n",
-    "    The movies to recommend and their information is contained in the <movie> tag. \n",
-    "    All movies in the <movie> tag must be recommended. Give a summary of the movies and why the human should watch them. \n",
-    "    Put the email between <email> tags.\n",
-    "\n",
-    "    <movie>\n",
-    "    {result} \n",
-    "    </movie>\n",
-    "\n",
-    "    Assistant:\n",
-    "    \"\"\"\n",
-    "\n",
-    "RANDOM_PROMPT = PromptTemplate(input_variables=[\"result\"], template=RANDOM_PROMPT_QUERY)\n",
-    "\n",
-    "chain = AmazonPersonalizeChain.from_llm(\n",
-    "    llm=bedrock_llm, client=client, return_direct=False, prompt_template=RANDOM_PROMPT\n",
-    ")\n",
-    "chain.run({\"user_id\": \"1\", \"item_id\": \"234\"})"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### 2.4 [Use-case-4] Invoke Amazon Personalize in a Sequential Chain "
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "from langchain.chains import LLMChain, SequentialChain\n",
-    "\n",
-    "RANDOM_PROMPT_QUERY_2 = \"\"\"\n",
-    "You are a skilled publicist. Write a high-converting marketing email advertising several movies available in a video-on-demand streaming platform next week, \n",
-    "    given the movie and user information below. Your email will leverage the power of storytelling and persuasive language. \n",
-    "    You want the email to impress the user, so make it appealing to them.\n",
-    "    The movies to recommend and their information is contained in the <movie> tag. \n",
-    "    All movies in the <movie> tag must be recommended. Give a summary of the movies and why the human should watch them. \n",
-    "    Put the email between <email> tags.\n",
-    "\n",
-    "    <movie>\n",
-    "    {result}\n",
-    "    </movie>\n",
-    "\n",
-    "    Assistant:\n",
-    "    \"\"\"\n",
-    "\n",
-    "RANDOM_PROMPT_2 = PromptTemplate(\n",
-    "    input_variables=[\"result\"], template=RANDOM_PROMPT_QUERY_2\n",
-    ")\n",
-    "personalize_chain_instance = AmazonPersonalizeChain.from_llm(\n",
-    "    llm=bedrock_llm, client=client, return_direct=True\n",
-    ")\n",
-    "random_chain_instance = LLMChain(llm=bedrock_llm, prompt=RANDOM_PROMPT_2)\n",
-    "overall_chain = SequentialChain(\n",
-    "    chains=[personalize_chain_instance, random_chain_instance],\n",
-    "    input_variables=[\"user_id\"],\n",
-    "    verbose=True,\n",
-    ")\n",
-    "overall_chain.run({\"user_id\": \"1\", \"item_id\": \"234\"})"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "collapsed": false,
-    "jupyter": {
-     "outputs_hidden": false
-    }
-   },
-   "source": [
-    "### 2.5 [Use-case-5] Invoke Amazon Personalize and retrieve metadata "
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "collapsed": false,
-    "jupyter": {
-     "outputs_hidden": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "recommender_arn = \"<insert_arn>\"\n",
-    "metadata_column_names = [\n",
-    "    \"<insert metadataColumnName-1>\",\n",
-    "    \"<insert metadataColumnName-2>\",\n",
-    "]\n",
-    "metadataMap = {\"ITEMS\": metadata_column_names}\n",
-    "\n",
-    "client = AmazonPersonalize(\n",
-    "    credentials_profile_name=\"default\",\n",
-    "    region_name=\"us-west-2\",\n",
-    "    recommender_arn=recommender_arn,\n",
-    ")\n",
-    "client.get_recommendations(user_id=\"1\", metadataColumns=metadataMap)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "collapsed": false,
-    "jupyter": {
-     "outputs_hidden": false
-    }
-   },
-   "source": [
-    "### 2.6 [Use-Case 6] Invoke Personalize Chain with returned metadata for summarizing results"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "collapsed": false,
-    "jupyter": {
-     "outputs_hidden": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "bedrock_llm = Bedrock(model_id=\"anthropic.claude-v2\", region_name=\"us-west-2\")\n",
-    "\n",
-    "# Create personalize chain\n",
-    "# Use return_direct=True if you do not want summary\n",
-    "chain = AmazonPersonalizeChain.from_llm(\n",
-    "    llm=bedrock_llm, client=client, return_direct=False\n",
-    ")\n",
-    "response = chain({\"user_id\": \"1\", \"metadata_columns\": metadataMap})\n",
-    "print(response)"
-   ]
-  }
- ],
- "metadata": {
-  "kernelspec": {
-   "display_name": "Python 3 (ipykernel)",
-   "language": "python",
-   "name": "python3"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.11.7"
-  },
-  "vscode": {
-   "interpreter": {
-    "hash": "15e58ce194949b77a891bd4339ce3d86a9bd138e905926019517993f97db9e6c"
-   }
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 4
-}
--- a/cookbook/rag_with_quantized_embeddings.ipynb
+++ b/cookbook/rag_with_quantized_embeddings.ipynb
@@ -1,591 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "id": "6195da33-34c3-4ca2-943a-050b6dcbacbc",
-   "metadata": {},
-   "source": [
-    "# Embedding Documents using Optimized and Quantized Embedders\n",
-    "\n",
-    "In this tutorial, we will demo how to build a RAG pipeline, with the embedding for all documents done using Quantized Embedders.\n",
-    "\n",
-    "We will use a pipeline that will:\n",
-    "\n",
-    "* Create a document collection.\n",
-    "* Embed all documents using Quantized Embedders.\n",
-    "* Fetch relevant documents for our question.\n",
-    "* Run an LLM answer the question.\n",
-    "\n",
-    "For more information about optimized models, we refer to [optimum-intel](https://github.com/huggingface/optimum-intel.git) and [IPEX](https://github.com/intel/intel-extension-for-pytorch).\n",
-    "\n",
-    "This tutorial is based on the [Langchain RAG tutorial here](https://towardsai.net/p/machine-learning/dense-x-retrieval-technique-in-langchain-and-llamaindex)."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 17,
-   "id": "26db2da5-3733-4a90-909e-6c11508ea140",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "import uuid\n",
-    "from pathlib import Path\n",
-    "\n",
-    "import langchain\n",
-    "import torch\n",
-    "from bs4 import BeautifulSoup as Soup\n",
-    "from langchain.retrievers.multi_vector import MultiVectorRetriever\n",
-    "from langchain.storage import InMemoryByteStore, LocalFileStore\n",
-    "\n",
-    "# For our example, we'll load docs from the web\n",
-    "from langchain.text_splitter import RecursiveCharacterTextSplitter  # noqa\n",
-    "from langchain_community.document_loaders.recursive_url_loader import (\n",
-    "    RecursiveUrlLoader,\n",
-    ")\n",
-    "\n",
-    "# noqa\n",
-    "from langchain_community.vectorstores import Chroma\n",
-    "\n",
-    "DOCSTORE_DIR = \".\"\n",
-    "DOCSTORE_ID_KEY = \"doc_id\""
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "f5ccda4e-7af5-4355-b9c4-25547edf33f9",
-   "metadata": {},
-   "source": [
-    "Lets first load up this paper, and split into text chunks of size 1000."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 2,
-   "id": "5f4d8888-53a6-49f5-a198-da5c92419ca4",
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Loaded 1 documents\n",
-      "Split into 73 documents\n"
-     ]
-    }
-   ],
-   "source": [
-    "# Could add more parsing here, as it's very raw.\n",
-    "loader = RecursiveUrlLoader(\n",
-    "    \"https://ar5iv.labs.arxiv.org/html/1706.03762\",\n",
-    "    max_depth=2,\n",
-    "    extractor=lambda x: Soup(x, \"html.parser\").text,\n",
-    ")\n",
-    "data = loader.load()\n",
-    "print(f\"Loaded {len(data)} documents\")\n",
-    "\n",
-    "# Split\n",
-    "text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
-    "all_splits = text_splitter.split_documents(data)\n",
-    "print(f\"Split into {len(all_splits)} documents\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "73e90632-2ac2-49eb-80da-ffe9ac4a278d",
-   "metadata": {},
-   "source": [
-    "In order to embed our documents, we can use the ```QuantizedBiEncoderEmbeddings```, for efficient and fast embedding. "
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 9,
-   "id": "9a68a6f6-332d-481e-bbea-ad763155ea36",
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "application/vnd.jupyter.widget-view+json": {
-       "model_id": "89af89b48c55409b9999b8e0387fab5b",
-       "version_major": 2,
-       "version_minor": 0
-      },
-      "text/plain": [
-       "config.json:   0%|          | 0.00/747 [00:00<?, ?B/s]"
-      ]
-     },
-     "metadata": {},
-     "output_type": "display_data"
-    },
-    {
-     "data": {
-      "application/vnd.jupyter.widget-view+json": {
-       "model_id": "01ad1b6278194b53bf6a5a286a311864",
-       "version_major": 2,
-       "version_minor": 0
-      },
-      "text/plain": [
-       "pytorch_model.bin:   0%|          | 0.00/45.9M [00:00<?, ?B/s]"
-      ]
-     },
-     "metadata": {},
-     "output_type": "display_data"
-    },
-    {
-     "data": {
-      "application/vnd.jupyter.widget-view+json": {
-       "model_id": "cb3bd1b88f7743c3b0322da3f021325c",
-       "version_major": 2,
-       "version_minor": 0
-      },
-      "text/plain": [
-       "inc_config.json:   0%|          | 0.00/287 [00:00<?, ?B/s]"
-      ]
-     },
-     "metadata": {},
-     "output_type": "display_data"
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "loading configuration file inc_config.json from cache at \n",
-      "INCConfig {\n",
-      "  \"distillation\": {},\n",
-      "  \"neural_compressor_version\": \"2.4.1\",\n",
-      "  \"optimum_version\": \"1.16.2\",\n",
-      "  \"pruning\": {},\n",
-      "  \"quantization\": {\n",
-      "    \"dataset_num_samples\": 50,\n",
-      "    \"is_static\": true\n",
-      "  },\n",
-      "  \"save_onnx_model\": false,\n",
-      "  \"torch_version\": \"2.2.0\",\n",
-      "  \"transformers_version\": \"4.37.2\"\n",
-      "}\n",
-      "\n",
-      "Using `INCModel` to load a TorchScript model will be deprecated in v1.15.0, to load your model please use `IPEXModel` instead.\n"
-     ]
-    },
-    {
-     "data": {
-      "application/vnd.jupyter.widget-view+json": {
-       "model_id": "7439315ebcb746f5be11fe30bc7693f6",
-       "version_major": 2,
-       "version_minor": 0
-      },
-      "text/plain": [
-       "tokenizer_config.json:   0%|          | 0.00/1.24k [00:00<?, ?B/s]"
-      ]
-     },
-     "metadata": {},
-     "output_type": "display_data"
-    },
-    {
-     "data": {
-      "application/vnd.jupyter.widget-view+json": {
-       "model_id": "05265a3912254ce1ad43cc8086bcb0ca",
-       "version_major": 2,
-       "version_minor": 0
-      },
-      "text/plain": [
-       "vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]"
-      ]
-     },
-     "metadata": {},
-     "output_type": "display_data"
-    },
-    {
-     "data": {
-      "application/vnd.jupyter.widget-view+json": {
-       "model_id": "a48f4245c60744f28f37cd3a7a24d198",
-       "version_major": 2,
-       "version_minor": 0
-      },
-      "text/plain": [
-       "tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]"
-      ]
-     },
-     "metadata": {},
-     "output_type": "display_data"
-    },
-    {
-     "data": {
-      "application/vnd.jupyter.widget-view+json": {
-       "model_id": "584a63cace934033b4ab22d3a178582a",
-       "version_major": 2,
-       "version_minor": 0
-      },
-      "text/plain": [
-       "special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]"
-      ]
-     },
-     "metadata": {},
-     "output_type": "display_data"
-    }
-   ],
-   "source": [
-    "from langchain_community.embeddings import QuantizedBiEncoderEmbeddings\n",
-    "from langchain_core.embeddings import Embeddings\n",
-    "\n",
-    "model_name = \"Intel/bge-small-en-v1.5-rag-int8-static\"\n",
-    "encode_kwargs = {\"normalize_embeddings\": True}  # set True to compute cosine similarity\n",
-    "\n",
-    "model_inc = QuantizedBiEncoderEmbeddings(\n",
-    "    model_name=model_name,\n",
-    "    encode_kwargs=encode_kwargs,\n",
-    "    query_instruction=\"Represent this sentence for searching relevant passages: \",\n",
-    ")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "360b2837-8024-47e0-a4ba-592505a9a5c8",
-   "metadata": {},
-   "source": [
-    "With our embedder in place, lets define our retriever:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 16,
-   "id": "18bc0a73-1a13-4b2f-96ac-05a5313343b7",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "def get_multi_vector_retriever(\n",
-    "    docstore_id_key: str, collection_name: str, embedding_function: Embeddings\n",
-    "):\n",
-    "    \"\"\"Create the composed retriever object.\"\"\"\n",
-    "    vectorstore = Chroma(\n",
-    "        collection_name=collection_name,\n",
-    "        embedding_function=embedding_function,\n",
-    "    )\n",
-    "    store = InMemoryByteStore()\n",
-    "\n",
-    "    return MultiVectorRetriever(\n",
-    "        vectorstore=vectorstore,\n",
-    "        byte_store=store,\n",
-    "        id_key=docstore_id_key,\n",
-    "    )\n",
-    "\n",
-    "\n",
-    "retriever = get_multi_vector_retriever(DOCSTORE_ID_KEY, \"multi_vec_store\", model_inc)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "8484078e-1bf0-4080-a354-ef23823fd6dc",
-   "metadata": {},
-   "source": [
-    "Next, we divide each chunk into sub-docs:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 18,
-   "id": "e12f48d4-6562-416b-8f28-342912e5756e",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "child_text_splitter = RecursiveCharacterTextSplitter(chunk_size=400)\n",
-    "id_key = \"doc_id\"\n",
-    "doc_ids = [str(uuid.uuid4()) for _ in all_splits]"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 19,
-   "id": "a268ef5f-91c2-4d8e-87f0-53db376e6a29",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "sub_docs = []\n",
-    "for i, doc in enumerate(all_splits):\n",
-    "    _id = doc_ids[i]\n",
-    "    _sub_docs = child_text_splitter.split_documents([doc])\n",
-    "    for _doc in _sub_docs:\n",
-    "        _doc.metadata[id_key] = _id\n",
-    "    sub_docs.extend(_sub_docs)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "d84ea8f4-a5de-4d76-b44d-85e56583f489",
-   "metadata": {},
-   "source": [
-    "Lets write our documents into our new store. This will use our embedder on each document."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 20,
-   "id": "1af831ce-0eae-44bc-aca7-4d691063640b",
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "Batches: 100%|██████████| 8/8 [00:00<00:00,  9.05it/s]\n"
-     ]
-    }
-   ],
-   "source": [
-    "retriever.vectorstore.add_documents(sub_docs)\n",
-    "retriever.docstore.mset(list(zip(doc_ids, all_splits)))"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "580bc212-8ecd-4d28-8656-b96fcd0d7eb6",
-   "metadata": {},
-   "source": [
-    "Great! Our retriever is good to go. Lets load up an LLM, that will reason over the retrieved documents:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 21,
-   "id": "008c992f",
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": []
-    },
-    {
-     "data": {
-      "application/vnd.jupyter.widget-view+json": {
-       "model_id": "cbe70583ad964ae19582b72dab396784",
-       "version_major": 2,
-       "version_minor": 0
-      },
-      "text/plain": [
-       "Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]"
-      ]
-     },
-     "metadata": {},
-     "output_type": "display_data"
-    }
-   ],
-   "source": [
-    "import torch\n",
-    "from langchain.llms.huggingface_pipeline import HuggingFacePipeline\n",
-    "from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline\n",
-    "\n",
-    "model_id = \"Intel/neural-chat-7b-v3-3\"\n",
-    "tokenizer = AutoTokenizer.from_pretrained(model_id)\n",
-    "model = AutoModelForCausalLM.from_pretrained(\n",
-    "    model_id, device_map=\"auto\", torch_dtype=torch.bfloat16\n",
-    ")\n",
-    "\n",
-    "pipe = pipeline(\"text-generation\", model=model, tokenizer=tokenizer, max_new_tokens=100)\n",
-    "\n",
-    "hf = HuggingFacePipeline(pipeline=pipe)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "6dd21fb2-0442-477d-aae2-9e7ee1d1d778",
-   "metadata": {},
-   "source": [
-    "Next, we will load up a prompt for answering questions using retrieved documents:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 22,
-   "id": "5e582509-caaf-4920-932c-4ce16162c789",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "from langchain import hub\n",
-    "\n",
-    "prompt = hub.pull(\"rlm/rag-prompt\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "5cdfcba5-7ec7-4d0a-820e-4e200643a882",
-   "metadata": {},
-   "source": [
-    "We can now build our pipeline:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 23,
-   "id": "b74d8dfb-72bb-46da-9df9-0dc47a3ac791",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "from langchain.schema.runnable import RunnablePassthrough\n",
-    "\n",
-    "rag_chain = {\"context\": retriever, \"question\": RunnablePassthrough()} | prompt | hf"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "3bc53602-86d6-420f-91b1-fc2effa7e986",
-   "metadata": {},
-   "source": [
-    "Excellent! lets ask it a question.\n",
-    "We will also use a verbose and debug, to check which documents were used by the model to produce the answer."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 31,
-   "id": "f0a92c07-53da-4e1f-b880-ee83a36ee17d",
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "\u001b[32;1m\u001b[1;3m[chain/start]\u001b[0m \u001b[1m[1:chain:RunnableSequence] Entering Chain run with input:\n",
-      "\u001b[0m{\n",
-      "  \"input\": \"What is the first transduction model relying entirely on self-attention?\"\n",
-      "}\n",
-      "\u001b[32;1m\u001b[1;3m[chain/start]\u001b[0m \u001b[1m[1:chain:RunnableSequence > 2:chain:RunnableParallel<context,question>] Entering Chain run with input:\n",
-      "\u001b[0m{\n",
-      "  \"input\": \"What is the first transduction model relying entirely on self-attention?\"\n",
-      "}\n",
-      "\u001b[32;1m\u001b[1;3m[chain/start]\u001b[0m \u001b[1m[1:chain:RunnableSequence > 2:chain:RunnableParallel<context,question> > 4:chain:RunnablePassthrough] Entering Chain run with input:\n",
-      "\u001b[0m{\n",
-      "  \"input\": \"What is the first transduction model relying entirely on self-attention?\"\n",
-      "}\n",
-      "\u001b[36;1m\u001b[1;3m[chain/end]\u001b[0m \u001b[1m[1:chain:RunnableSequence > 2:chain:RunnableParallel<context,question> > 4:chain:RunnablePassthrough] [1ms] Exiting Chain run with output:\n",
-      "\u001b[0m{\n",
-      "  \"output\": \"What is the first transduction model relying entirely on self-attention?\"\n",
-      "}\n",
-      "\u001b[36;1m\u001b[1;3m[chain/end]\u001b[0m \u001b[1m[1:chain:RunnableSequence > 2:chain:RunnableParallel<context,question>] [66ms] Exiting Chain run with output:\n",
-      "\u001b[0m[outputs]\n",
-      "\u001b[32;1m\u001b[1;3m[chain/start]\u001b[0m \u001b[1m[1:chain:RunnableSequence > 5:prompt:ChatPromptTemplate] Entering Prompt run with input:\n",
-      "\u001b[0m[inputs]\n",
-      "\u001b[36;1m\u001b[1;3m[chain/end]\u001b[0m \u001b[1m[1:chain:RunnableSequence > 5:prompt:ChatPromptTemplate] [1ms] Exiting Prompt run with output:\n",
-      "\u001b[0m{\n",
-      "  \"lc\": 1,\n",
-      "  \"type\": \"constructor\",\n",
-      "  \"id\": [\n",
-      "    \"langchain\",\n",
-      "    \"prompts\",\n",
-      "    \"chat\",\n",
-      "    \"ChatPromptValue\"\n",
-      "  ],\n",
-      "  \"kwargs\": {\n",
-      "    \"messages\": [\n",
-      "      {\n",
-      "        \"lc\": 1,\n",
-      "        \"type\": \"constructor\",\n",
-      "        \"id\": [\n",
-      "          \"langchain\",\n",
-      "          \"schema\",\n",
-      "          \"messages\",\n",
-      "          \"HumanMessage\"\n",
-      "        ],\n",
-      "        \"kwargs\": {\n",
-      "          \"content\": \"You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.\\nQuestion: What is the first transduction model relying entirely on self-attention? \\nContext: [Document(page_content='To the best of our knowledge, however, the Transformer is the first transduction model relying entirely on self-attention to compute representations of its input and output without using sequence-aligned RNNs or convolution.\\\\nIn the following sections, we will describe the Transformer, motivate self-attention and discuss its advantages over models such as (neural_gpu, ; NalBytenet2017, ) and (JonasFaceNet2017, ).\\\\n\\\\n\\\\n\\\\n\\\\n3 Model Architecture\\\\n\\\\nFigure 1: The Transformer - model architecture.', metadata={'source': 'https://ar5iv.labs.arxiv.org/html/1706.03762', 'title': '[1706.03762] Attention Is All You Need', 'language': 'en'}), Document(page_content='In this work, we presented the Transformer, the first sequence transduction model based entirely on attention, replacing the recurrent layers most commonly used in encoder-decoder architectures with multi-headed self-attention.\\\\n\\\\n\\\\nFor translation tasks, the Transformer can be trained significantly faster than architectures based on recurrent or convolutional layers. On both WMT 2014 English-to-German and WMT 2014 English-to-French translation tasks, we achieve a new state of the art. In the former task our best model outperforms even all previously reported ensembles. \\\\n\\\\n\\\\nWe are excited about the future of attention-based models and plan to apply them to other tasks. We plan to extend the Transformer to problems involving input and output modalities other than text and to investigate local, restricted attention mechanisms to efficiently handle large inputs and outputs such as images, audio and video.\\\\nMaking generation less sequential is another research goals of ours.', metadata={'source': 'https://ar5iv.labs.arxiv.org/html/1706.03762', 'title': '[1706.03762] Attention Is All You Need', 'language': 'en'}), Document(page_content='Attention mechanisms have become an integral part of compelling sequence modeling and transduction models in various tasks, allowing modeling of dependencies without regard to their distance in the input or output sequences (bahdanau2014neural, ; structuredAttentionNetworks, ). In all but a few cases (decomposableAttnModel, ), however, such attention mechanisms are used in conjunction with a recurrent network.\\\\n\\\\n\\\\nIn this work we propose the Transformer, a model architecture eschewing recurrence and instead relying entirely on an attention mechanism to draw global dependencies between input and output. The Transformer allows for significantly more parallelization and can reach a new state of the art in translation quality after being trained for as little as twelve hours on eight P100 GPUs.\\\\n\\\\n\\\\n\\\\n\\\\n\\\\n2 Background', metadata={'source': 'https://ar5iv.labs.arxiv.org/html/1706.03762', 'title': '[1706.03762] Attention Is All You Need', 'language': 'en'}), Document(page_content='The dominant sequence transduction models are based on complex recurrent or convolutional neural networks that include an encoder and a decoder. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles, by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the', metadata={'source': 'https://ar5iv.labs.arxiv.org/html/1706.03762', 'title': '[1706.03762] Attention Is All You Need', 'language': 'en'})] \\nAnswer:\",\n",
-      "          \"additional_kwargs\": {}\n",
-      "        }\n",
-      "      }\n",
-      "    ]\n",
-      "  }\n",
-      "}\n",
-      "\u001b[32;1m\u001b[1;3m[llm/start]\u001b[0m \u001b[1m[1:chain:RunnableSequence > 6:llm:HuggingFacePipeline] Entering LLM run with input:\n",
-      "\u001b[0m{\n",
-      "  \"prompts\": [\n",
-      "    \"Human: You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.\\nQuestion: What is the first transduction model relying entirely on self-attention? \\nContext: [Document(page_content='To the best of our knowledge, however, the Transformer is the first transduction model relying entirely on self-attention to compute representations of its input and output without using sequence-aligned RNNs or convolution.\\\\nIn the following sections, we will describe the Transformer, motivate self-attention and discuss its advantages over models such as (neural_gpu, ; NalBytenet2017, ) and (JonasFaceNet2017, ).\\\\n\\\\n\\\\n\\\\n\\\\n3 Model Architecture\\\\n\\\\nFigure 1: The Transformer - model architecture.', metadata={'source': 'https://ar5iv.labs.arxiv.org/html/1706.03762', 'title': '[1706.03762] Attention Is All You Need', 'language': 'en'}), Document(page_content='In this work, we presented the Transformer, the first sequence transduction model based entirely on attention, replacing the recurrent layers most commonly used in encoder-decoder architectures with multi-headed self-attention.\\\\n\\\\n\\\\nFor translation tasks, the Transformer can be trained significantly faster than architectures based on recurrent or convolutional layers. On both WMT 2014 English-to-German and WMT 2014 English-to-French translation tasks, we achieve a new state of the art. In the former task our best model outperforms even all previously reported ensembles. \\\\n\\\\n\\\\nWe are excited about the future of attention-based models and plan to apply them to other tasks. We plan to extend the Transformer to problems involving input and output modalities other than text and to investigate local, restricted attention mechanisms to efficiently handle large inputs and outputs such as images, audio and video.\\\\nMaking generation less sequential is another research goals of ours.', metadata={'source': 'https://ar5iv.labs.arxiv.org/html/1706.03762', 'title': '[1706.03762] Attention Is All You Need', 'language': 'en'}), Document(page_content='Attention mechanisms have become an integral part of compelling sequence modeling and transduction models in various tasks, allowing modeling of dependencies without regard to their distance in the input or output sequences (bahdanau2014neural, ; structuredAttentionNetworks, ). In all but a few cases (decomposableAttnModel, ), however, such attention mechanisms are used in conjunction with a recurrent network.\\\\n\\\\n\\\\nIn this work we propose the Transformer, a model architecture eschewing recurrence and instead relying entirely on an attention mechanism to draw global dependencies between input and output. The Transformer allows for significantly more parallelization and can reach a new state of the art in translation quality after being trained for as little as twelve hours on eight P100 GPUs.\\\\n\\\\n\\\\n\\\\n\\\\n\\\\n2 Background', metadata={'source': 'https://ar5iv.labs.arxiv.org/html/1706.03762', 'title': '[1706.03762] Attention Is All You Need', 'language': 'en'}), Document(page_content='The dominant sequence transduction models are based on complex recurrent or convolutional neural networks that include an encoder and a decoder. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles, by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the', metadata={'source': 'https://ar5iv.labs.arxiv.org/html/1706.03762', 'title': '[1706.03762] Attention Is All You Need', 'language': 'en'})] \\nAnswer:\"\n",
-      "  ]\n",
-      "}\n",
-      "\u001b[36;1m\u001b[1;3m[llm/end]\u001b[0m \u001b[1m[1:chain:RunnableSequence > 6:llm:HuggingFacePipeline] [4.34s] Exiting LLM run with output:\n",
-      "\u001b[0m{\n",
-      "  \"generations\": [\n",
-      "    [\n",
-      "      {\n",
-      "        \"text\": \" The first transduction model relying entirely on self-attention is the Transformer.\",\n",
-      "        \"generation_info\": null,\n",
-      "        \"type\": \"Generation\"\n",
-      "      }\n",
-      "    ]\n",
-      "  ],\n",
-      "  \"llm_output\": null,\n",
-      "  \"run\": null\n",
-      "}\n",
-      "\u001b[36;1m\u001b[1;3m[chain/end]\u001b[0m \u001b[1m[1:chain:RunnableSequence] [4.41s] Exiting Chain run with output:\n",
-      "\u001b[0m{\n",
-      "  \"output\": \" The first transduction model relying entirely on self-attention is the Transformer.\"\n",
-      "}\n"
-     ]
-    }
-   ],
-   "source": [
-    "langchain.verbose = True\n",
-    "langchain.debug = True\n",
-    "\n",
-    "llm_res = rag_chain.invoke(\n",
-    "    \"What is the first transduction model relying entirely on self-attention?\",\n",
-    ")"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 32,
-   "id": "023404a1-401a-46e1-8ab5-cafbc8593b04",
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "' The first transduction model relying entirely on self-attention is the Transformer.'"
-      ]
-     },
-     "execution_count": 32,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "llm_res"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "0eaefd01-254a-445d-a95f-37889c126e0e",
-   "metadata": {},
-   "source": [
-    "Based on the retrieved documents, the answer is indeed correct :)"
-   ]
-  }
- ],
- "metadata": {
-  "kernelspec": {
-   "display_name": "Python 3 (ipykernel)",
-   "language": "python",
-   "name": "python3"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.9.18"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 5
-}
--- a/docker/Makefile
+++ b/docker/Makefile
@@ -1,12 +0,0 @@
-# Makefile
-
-build_graphdb:
-	docker build --tag graphdb ./graphdb
-
-start_graphdb:
-	docker-compose up -d graphdb
-
-down:
-	docker-compose down -v --remove-orphans
-
-.PHONY: build_graphdb start_graphdb down
--- a/docker/docker-compose.yml
+++ b/docker/docker-compose.yml
@@ -15,7 +15,3 @@ services:
      - "6020:6379"
    volumes:
      - ./redis-volume:/data
-  graphdb:
-    image: graphdb
-    ports:
-      - "6021:7200"
--- a/docker/graphdb/Dockerfile
+++ b/docker/graphdb/Dockerfile
@@ -1,5 +0,0 @@
-FROM ontotext/graphdb:10.5.1
-RUN mkdir -p /opt/graphdb/dist/data/repositories/langchain
-COPY config.ttl /opt/graphdb/dist/data/repositories/langchain/
-COPY graphdb_create.sh /run.sh
-ENTRYPOINT bash /run.sh
--- a/docker/graphdb/config.ttl
+++ b/docker/graphdb/config.ttl
@@ -1,46 +0,0 @@
-@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>.
-@prefix rep: <http://www.openrdf.org/config/repository#>.
-@prefix sr: <http://www.openrdf.org/config/repository/sail#>.
-@prefix sail: <http://www.openrdf.org/config/sail#>.
-@prefix graphdb: <http://www.ontotext.com/config/graphdb#>.
-
-[] a rep:Repository ;
-    rep:repositoryID "langchain" ;
-    rdfs:label "" ;
-    rep:repositoryImpl [
-        rep:repositoryType "graphdb:SailRepository" ;
-        sr:sailImpl [
-            sail:sailType "graphdb:Sail" ;
-
-            graphdb:read-only "false" ;
-
-            # Inference and Validation
-            graphdb:ruleset "empty" ;
-            graphdb:disable-sameAs "true" ;
-            graphdb:check-for-inconsistencies "false" ;
-
-            # Indexing
-            graphdb:entity-id-size "32" ;
-            graphdb:enable-context-index "false" ;
-            graphdb:enablePredicateList "true" ;
-            graphdb:enable-fts-index "false" ;
-            graphdb:fts-indexes ("default" "iri") ;
-            graphdb:fts-string-literals-index "default" ;
-            graphdb:fts-iris-index "none" ;
-
-            # Queries and Updates
-            graphdb:query-timeout "0" ;
-            graphdb:throw-QueryEvaluationException-on-timeout "false" ;
-            graphdb:query-limit-results "0" ;
-
-            # Settable in the file but otherwise hidden in the UI and in the RDF4J console
-            graphdb:base-URL "http://example.org/owlim#" ;
-            graphdb:defaultNS "" ;
-            graphdb:imports "" ;
-            graphdb:repository-type "file-repository" ;
-            graphdb:storage-folder "storage" ;
-            graphdb:entity-index-size "10000000" ;
-            graphdb:in-memory-literal-properties "true" ;
-            graphdb:enable-literal-index "true" ;
-        ]
-    ].
--- a/docker/graphdb/graphdb_create.sh
+++ b/docker/graphdb/graphdb_create.sh
@@ -1,28 +0,0 @@
-#! /bin/bash
-REPOSITORY_ID="langchain"
-GRAPHDB_URI="http://localhost:7200/"
-
-echo -e "\nUsing GraphDB: ${GRAPHDB_URI}"
-
-function startGraphDB {
- echo -e "\nStarting GraphDB..."
- exec /opt/graphdb/dist/bin/graphdb
-}
-
-function waitGraphDBStart {
-  echo -e "\nWaiting GraphDB to start..."
-  for _ in $(seq 1 5); do
-    CHECK_RES=$(curl --silent --write-out '%{http_code}' --output /dev/null ${GRAPHDB_URI}/rest/repositories)
-    if [ "${CHECK_RES}" = '200' ]; then
-        echo -e "\nUp and running"
-        break
-    fi
-    sleep 30s
-    echo "CHECK_RES: ${CHECK_RES}"
-  done
-}
-
-
-startGraphDB &
-waitGraphDBStart
-wait
--- a/docs/api_reference/conf.py
+++ b/docs/api_reference/conf.py
@@ -114,8 +114,8 @@ autodoc_pydantic_field_signature_prefix = "param"
 autodoc_member_order = "groupwise"
 autoclass_content = "both"
 autodoc_typehints_format = "short"
-autodoc_typehints = "both"

+# autodoc_typehints = "description"
 # Add any paths that contain templates here, relative to this directory.
 templates_path = ["templates"]

--- a/docs/api_reference/create_api_rst.py
+++ b/docs/api_reference/create_api_rst.py
@@ -14,6 +14,7 @@ from pydantic import BaseModel
 ROOT_DIR = Path(__file__).parents[2].absolute()
 HERE = Path(__file__).parent

+
 ClassKind = Literal["TypedDict", "Regular", "Pydantic", "enum"]


@@ -217,8 +218,8 @@ def _construct_doc(

    for module in namespaces:
        _members = members_by_namespace[module]
-        classes = [el for el in _members["classes_"] if el["is_public"]]
-        functions = [el for el in _members["functions"] if el["is_public"]]
+        classes = _members["classes_"]
+        functions = _members["functions"]
        if not (classes or functions):
            continue
        section = f":mod:`{package_namespace}.{module}`"
@@ -244,6 +245,9 @@ Classes
 """

            for class_ in sorted(classes, key=lambda c: c["qualified_name"]):
+                if not class_["is_public"]:
+                    continue
+
                if class_["kind"] == "TypedDict":
                    template = "typeddict.rst"
                elif class_["kind"] == "enum":
@@ -261,7 +265,7 @@ Classes
 """

        if functions:
-            _functions = [f["qualified_name"] for f in functions]
+            _functions = [f["qualified_name"] for f in functions if f["is_public"]]
            fstring = "\n    ".join(sorted(_functions))
            full_doc += f"""\
 Functions
@@ -319,52 +323,30 @@ def _package_dir(package_name: str = "langchain") -> Path:


 def _get_package_version(package_dir: Path) -> str:
-    """Return the version of the package."""
-    try:
-        with open(package_dir.parent / "pyproject.toml", "r") as f:
-            pyproject = toml.load(f)
-    except FileNotFoundError as e:
-        print(
-            f"pyproject.toml not found in {package_dir.parent}.\n"
-            "You are either attempting to build a directory which is not a package or "
-            "the package is missing a pyproject.toml file which should be added."
-            "Aborting the build."
-        )
-        exit(1)
+    with open(package_dir.parent / "pyproject.toml", "r") as f:
+        pyproject = toml.load(f)
    return pyproject["tool"]["poetry"]["version"]


-def _out_file_path(package_name: str) -> Path:
+def _out_file_path(package_name: str = "langchain") -> Path:
    """Return the path to the file containing the documentation."""
    return HERE / f"{package_name.replace('-', '_')}_api_reference.rst"


-def _doc_first_line(package_name: str) -> str:
+def _doc_first_line(package_name: str = "langchain") -> str:
    """Return the path to the file containing the documentation."""
    return f".. {package_name.replace('-', '_')}_api_reference:\n\n"


 def main() -> None:
    """Generate the api_reference.rst file for each package."""
-    print("Starting to build API reference files.")
    for dir in os.listdir(ROOT_DIR / "libs"):
-        # Skip any hidden directories
-        # Some of these could be present by mistake in the code base
-        # e.g., .pytest_cache from running tests from the wrong location.
-        if not dir.startswith("."):
-            print("Skipping dir:", dir)
-            continue
-
        if dir in ("cli", "partners"):
            continue
        else:
-            print("Building package:", dir)
            _build_rst_file(package_name=dir)
-    partner_packages = os.listdir(ROOT_DIR / "libs" / "partners")
-    print("Building partner packages:", partner_packages)
-    for dir in partner_packages:
+    for dir in os.listdir(ROOT_DIR / "libs" / "partners"):
        _build_rst_file(package_name=dir)
-    print("API reference files built.")


 if __name__ == "__main__":
--- a/docs/api_reference/themes/scikit-learn-modern/search.html
+++ b/docs/api_reference/themes/scikit-learn-modern/search.html
@@ -5,7 +5,7 @@
  <script type="text/javascript" src="{{ pathto('_static/doctools.js', 1) }}"></script>
  <script type="text/javascript" src="{{ pathto('_static/language_data.js', 1) }}"></script>
  <script type="text/javascript" src="{{ pathto('_static/searchtools.js', 1) }}"></script>
-  <script type="text/javascript" src="{{ pathto('_static/sphinx_highlight.js', 1) }}"></script>
+  <!-- <script type="text/javascript" src="{{ pathto('_static/sphinx_highlight.js', 1) }}"></script> -->
  <script type="text/javascript">
    $(document).ready(function() {
      if (!Search.out) {
--- a/docs/data/people.yml
+++ b/docs/data/people.yml
--- a/docs/docs/contributing/documentation.mdx
+++ b/docs/docs/contributing/documentation.mdx
@@ -3,68 +3,24 @@ sidebar_position: 3
 ---
 # Contribute Documentation

-LangChain documentation consists of two components:
+The docs directory contains Documentation and API Reference.

-1. Main Documentation: Hosted at [python.langchain.com](https://python.langchain.com/),
-this comprehensive resource serves as the primary user-facing documentation.
-It covers a wide array of topics, including tutorials, use cases, integrations,
-and more, offering extensive guidance on building with LangChain.
-The content for this documentation lives in the `/docs` directory of the monorepo.
-2. In-code Documentation: This is documentation of the codebase itself, which is also
-used to generate the externally facing [API Reference](https://api.python.langchain.com/en/latest/langchain_api_reference.html).
-The content for the API reference is autogenerated by scanning the docstrings in the codebase. For this reason we ask that
-developers document their code well.
+Documentation is built using [Quarto](https://quarto.org) and [Docusaurus 2](https://docusaurus.io/).

-The main documentation is built using [Quarto](https://quarto.org) and [Docusaurus 2](https://docusaurus.io/).
+API Reference are largely autogenerated by [sphinx](https://www.sphinx-doc.org/en/master/) from the code and are hosted by [Read the Docs](https://readthedocs.org/).
+For that reason, we ask that you add good documentation to all classes and methods.

-The `API Reference` is largely autogenerated by [sphinx](https://www.sphinx-doc.org/en/master/)
-from the code and is hosted by [Read the Docs](https://readthedocs.org/).
+Similar to linting, we recognize documentation can be annoying. If you do not want to do it, please contact a project maintainer, and they can help you with it. We do not want this to be a blocker for good code getting contributed.

-We appreciate all contributions to the documentation, whether it be fixing a typo,
-adding a new tutorial or example and whether it be in the main documentation or the API Reference.
-
-Similar to linting, we recognize documentation can be annoying. If you do not want
-to do it, please contact a project maintainer, and they can help you with it. We do not want this to be a blocker for good code getting contributed.
-
-## 📜 Main Documentation
-
-The content for the main documentation is located in the `/docs` directory of the monorepo.
-
-The documentation is written using a combination of ipython notebooks (`.ipynb` files)
-and markdown (`.mdx` files). The notebooks are converted to markdown
-using [Quarto](https://quarto.org) and then built using [Docusaurus 2](https://docusaurus.io/).
-
-Feel free to make contributions to the main documentation! 🥰
-
-After modifying the documentation:
-
-1. Run the linting and formatting commands (see below) to ensure that the documentation is well-formatted and free of errors.
-2. Optionally build the documentation locally to verify that the changes look good.
-3. Make a pull request with the changes.
-4. You can preview and verify that the changes are what you wanted by clicking the `View deployment` or `Visit Preview` buttons on the pull request `Conversation` page. This will take you to a preview of the documentation changes.
-
-## ⚒️ Linting and Building Documentation Locally
-
-After writing up the documentation, you may want to lint and build the documentation
-locally to ensure that it looks good and is free of errors.
-
-If you're unable to build it locally that's okay as well, as you will be able to
-see a preview of the documentation on the pull request page.
+## Build Documentation Locally

 ### Install dependencies

- [Quarto](https://quarto.org) - package that converts Jupyter notebooks (`.ipynb` files) into mdx files for serving in Docusaurus. [Download link](https://quarto.org/docs/download/).
-
-From the **monorepo root**, run the following command to install the dependencies:
-
-```bash
-poetry install --with lint,docs --no-root
-````
+- [Quarto](https://quarto.org) - package that converts Jupyter notebooks (`.ipynb` files) into mdx files for serving in Docusaurus.
+- `poetry install` from the monorepo root

 ### Building

-The code that builds the documentation is located in the `/docs` directory of the monorepo.
-
 In the following commands, the prefix `api_` indicates that those are operations for the API Reference.

 Before building the documentation, it is always a good idea to clean the build directory:
@@ -90,9 +46,10 @@ make api_docs_linkcheck

 ### Linting and Formatting

-The Main Documentation is linted from the **monorepo root**. To lint the main documentation, run the following from there:
+The docs are linted from the monorepo root. To lint the docs, run the following from there:

 ```bash
+poetry install --with lint,typing
 make lint
 ```

@@ -100,73 +57,9 @@ If you have formatting-related errors, you can fix them automatically with:

 ```bash
 make format
-```
+``` 

-## ⌨️ In-code Documentation
-
-The in-code documentation is largely autogenerated by [sphinx](https://www.sphinx-doc.org/en/master/) from the code and is hosted by [Read the Docs](https://readthedocs.org/).
-
-For the API reference to be useful, the codebase must be well-documented. This means that all functions, classes, and methods should have a docstring that explains what they do, what the arguments are, and what the return value is. This is a good practice in general, but it is especially important for LangChain because the API reference is the primary resource for developers to understand how to use the codebase.
-
-We generally follow the [Google Python Style Guide](https://google.github.io/styleguide/pyguide.html#38-comments-and-docstrings) for docstrings.
-
-Here is an example of a well-documented function:
-
-```python
-
-def my_function(arg1: int, arg2: str) -> float:
-    """This is a short description of the function. (It should be a single sentence.)
-
-    This is a longer description of the function. It should explain what
-    the function does, what the arguments are, and what the return value is.
-    It should wrap at 88 characters.
-
-    Examples:
-        This is a section for examples of how to use the function.
-
-        .. code-block:: python
-
-            my_function(1, "hello")
-
-    Args:
-        arg1: This is a description of arg1. We do not need to specify the type since
-            it is already specified in the function signature.
-        arg2: This is a description of arg2.
-
-    Returns:
-        This is a description of the return value.
-    """
-    return 3.14
-```
-
-### Linting and Formatting
-
-The in-code documentation is linted from the directories belonging to the packages
-being documented.
-
-For example, if you're working on the `langchain-community` package, you would change
-the working directory to the `langchain-community` directory:
-
-```bash
-cd [root]/libs/langchain-community
-```
-
-Set up a virtual environment for the package if you haven't done so already.
-
-Install the dependencies for the package.
-
-```bash
-poetry install --with lint
-```
-
-Then you can run the following commands to lint and format the in-code documentation:
-
-```bash
-make format
-make lint
-```
-
-## Verify Documentation Changes
+## Verify Documentation changes

 After pushing documentation changes to the repository, you can preview and verify that the changes are
 what you wanted by clicking the `View deployment` or `Visit Preview` buttons on the pull request `Conversation` page.
--- a/docs/docs/contributing/index.mdx
+++ b/docs/docs/contributing/index.mdx
@@ -15,9 +15,8 @@ There are many ways to contribute to LangChain. Here are some common ways people
 - [**Documentation**](./documentation.mdx): Help improve our docs, including this one!
 - [**Code**](./code.mdx): Help us write code, fix bugs, or improve our infrastructure.
 - [**Integrations**](integrations.mdx): Help us integrate with your favorite vendors and tools.
- [**Discussions**](https://github.com/langchain-ai/langchain/discussions): Help answer usage questions and discuss issues with users.

-### 🚩 GitHub Issues
+### 🚩GitHub Issues

 Our [issues](https://github.com/langchain-ai/langchain/issues) page is kept up to date with bugs, improvements, and feature requests.

@@ -32,13 +31,7 @@ We will try to keep these issues as up-to-date as possible, though
 with the rapid rate of development in this field some may get out of date.
 If you notice this happening, please let us know.

-### 💭 GitHub Discussions
-
-We have a [discussions](https://github.com/langchain-ai/langchain/discussions) page where users can ask usage questions, discuss design decisions, and propose new features.
-
-If you are able to help answer questions, please do so! This will allow the maintainers to spend more time focused on development and bug fixing.
-
-### 🙋 Getting Help
+### 🙋Getting Help

 Our goal is to have the simplest developer setup possible. Should you experience any difficulty getting setup, please
 contact a maintainer! Not only do we want to help get you unblocked, but we also want to make sure that the process is
--- a/docs/docs/contributing/repo_structure.mdx
+++ b/docs/docs/contributing/repo_structure.mdx
@@ -1,54 +0,0 @@
---
-sidebar_position: 0.5
---
-# Repository Structure
-
-If you plan on contributing to LangChain code or documentation, it can be useful
-to understand the high level structure of the repository.
-
-LangChain is organized as a [monorep](https://en.wikipedia.org/wiki/Monorepo) that contains multiple packages.
-
-Here's the structure visualized as a tree:
-
-```text
-.
-├── cookbook # Tutorials and examples
-├── docs # Contains content for the documentation here: https://python.langchain.com/
-├── libs
-│   ├── langchain # Main package
-│   │   ├── tests/unit_tests # Unit tests (present in each package not shown for brevity)
-│   │   ├── tests/integration_tests # Integration tests (present in each package not shown for brevity)
-│   ├── langchain-community # Third-party integrations
-│   ├── langchain-core # Base interfaces for key abstractions
-│   ├── langchain-experimental # Experimental components and chains
-│   ├── partners
-│       ├── langchain-partner-1
-│       ├── langchain-partner-2
-│       ├── ...
-│
-├── templates # A collection of easily deployable reference architectures for a wide variety of tasks.
-```
-
-The root directory also contains the following files:
-
-* `pyproject.toml`: Dependencies for building docs and linting docs, cookbook.
-* `Makefile`: A file that contains shortcuts for building, linting and docs and cookbook.
-
-There are other files in the root directory level, but their presence should be self-explanatory. Feel free to browse around!
-
-## Documentation
-
-The `/docs` directory contains the content for the documentation that is shown
-at https://python.langchain.com/ and the associated API Reference https://api.python.langchain.com/en/latest/langchain_api_reference.html.
-
-See the [documentation](./documentation) guidelines to learn how to contribute to the documentation.
-
-## Code
-
-The `/libs` directory contains the code for the LangChain packages.
-
-To learn more about how to contribute code see the following guidelines:
-
- [Code](./code.mdx) Learn how to develop in the LangChain codebase.
- [Integrations](./integrations.mdx) to learn how to contribute to third-party integrations to langchain-community or to start a new partner package.
- [Testing](./testing.mdx) guidelines to learn how to write tests for the packages.
--- a/docs/docs/expression_language/how_to/message_history.ipynb
+++ b/docs/docs/expression_language/how_to/message_history.ipynb
@@ -7,7 +7,7 @@
   "source": [
    "# Add message history (memory)\n",
    "\n",
-    "The `RunnableWithMessageHistory` lets us add message history to certain types of chains. It wraps another Runnable and manages the chat message history for it.\n",
+    "The `RunnableWithMessageHistory` let us add message history to certain types of chains.\n",
    "\n",
    "Specifically, it can be used for any Runnable that takes as input one of\n",
    "\n",
@@ -21,379 +21,7 @@
    "* a sequence of `BaseMessage`\n",
    "* a dict with a key that contains a sequence of `BaseMessage`\n",
    "\n",
-    "Let's take a look at some examples to see how it works. First we construct a runnable (which here accepts a dict as input and returns a message as output):"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 1,
-   "id": "2ed413b4-33a1-48ee-89b0-2d4917ec101a",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder\n",
-    "from langchain_openai.chat_models import ChatOpenAI\n",
-    "\n",
-    "model = ChatOpenAI()\n",
-    "prompt = ChatPromptTemplate.from_messages(\n",
-    "    [\n",
-    "        (\n",
-    "            \"system\",\n",
-    "            \"You're an assistant who's good at {ability}. Respond in 20 words or fewer\",\n",
-    "        ),\n",
-    "        MessagesPlaceholder(variable_name=\"history\"),\n",
-    "        (\"human\", \"{input}\"),\n",
-    "    ]\n",
-    ")\n",
-    "runnable = prompt | model"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "9fd175e1-c7b8-4929-a57e-3331865fe7aa",
-   "metadata": {},
-   "source": [
-    "To manage the message history, we will need:\n",
-    "1. This runnable;\n",
-    "2. A callable that returns an instance of `BaseChatMessageHistory`.\n",
-    "\n",
-    "Check out the [memory integrations](https://integrations.langchain.com/memory) page for implementations of chat message histories using Redis and other providers. Here we demonstrate using an in-memory `ChatMessageHistory` as well as more persistent storage using `RedisChatMessageHistory`."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "3d83adad-9672-496d-9f25-5747e7b8c8bb",
-   "metadata": {},
-   "source": [
-    "## In-memory\n",
-    "\n",
-    "Below we show a simple example in which the chat history lives in memory, in this case via a global Python dict.\n",
-    "\n",
-    "We construct a callable `get_session_history` that references this dict to return an instance of `ChatMessageHistory`. The arguments to the callable can be specified by passing a configuration to the `RunnableWithMessageHistory` at runtime. By default, the configuration parameter is expected to be a single string `session_id`. This can be adjusted via the `history_factory_config` kwarg.\n",
-    "\n",
-    "Using the single-parameter default:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 2,
-   "id": "54348d02-d8ee-440c-bbf9-41bc0fbbc46c",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "from langchain_community.chat_message_histories import ChatMessageHistory\n",
-    "from langchain_core.chat_history import BaseChatMessageHistory\n",
-    "from langchain_core.runnables.history import RunnableWithMessageHistory\n",
-    "\n",
-    "store = {}\n",
-    "\n",
-    "\n",
-    "def get_session_history(session_id: str) -> BaseChatMessageHistory:\n",
-    "    if session_id not in store:\n",
-    "        store[session_id] = ChatMessageHistory()\n",
-    "    return store[session_id]\n",
-    "\n",
-    "\n",
-    "with_message_history = RunnableWithMessageHistory(\n",
-    "    runnable,\n",
-    "    get_session_history,\n",
-    "    input_messages_key=\"input\",\n",
-    "    history_messages_key=\"history\",\n",
-    ")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "01acb505-3fd3-4ab4-9f04-5ea07e81542e",
-   "metadata": {},
-   "source": [
-    "Note that we've specified `input_messages_key` (the key to be treated as the latest input message) and `history_messages_key` (the key to add historical messages to).\n",
-    "\n",
-    "When invoking this new runnable, we specify the corresponding chat history via a configuration parameter:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 3,
-   "id": "01384412-f08e-4634-9edb-3f46f475b582",
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "AIMessage(content='Cosine is a trigonometric function that calculates the ratio of the adjacent side to the hypotenuse of a right triangle.')"
-      ]
-     },
-     "execution_count": 3,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "with_message_history.invoke(\n",
-    "    {\"ability\": \"math\", \"input\": \"What does cosine mean?\"},\n",
-    "    config={\"configurable\": {\"session_id\": \"abc123\"}},\n",
-    ")"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 4,
-   "id": "954688a2-9a3f-47ee-a9e8-fa0c83e69477",
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "AIMessage(content='Cosine is a mathematical function used to calculate the length of a side in a right triangle.')"
-      ]
-     },
-     "execution_count": 4,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "# Remembers\n",
-    "with_message_history.invoke(\n",
-    "    {\"ability\": \"math\", \"input\": \"What?\"},\n",
-    "    config={\"configurable\": {\"session_id\": \"abc123\"}},\n",
-    ")"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 5,
-   "id": "39350d7c-2641-4744-bc2a-fd6a57c4ea90",
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "AIMessage(content='I can help with math problems. What do you need assistance with?')"
-      ]
-     },
-     "execution_count": 5,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "# New session_id --> does not remember.\n",
-    "with_message_history.invoke(\n",
-    "    {\"ability\": \"math\", \"input\": \"What?\"},\n",
-    "    config={\"configurable\": {\"session_id\": \"def234\"}},\n",
-    ")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "d29497be-3366-408d-bbb9-d4a8bf4ef37c",
-   "metadata": {},
-   "source": [
-    "The configuration parameters by which we track message histories can be customized by passing in a list of ``ConfigurableFieldSpec`` objects to the ``history_factory_config`` parameter. Below, we use two parameters: a `user_id` and `conversation_id`."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 6,
-   "id": "1c89daee-deff-4fdf-86a3-178f7d8ef536",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "from langchain_core.runnables import ConfigurableFieldSpec\n",
-    "\n",
-    "store = {}\n",
-    "\n",
-    "\n",
-    "def get_session_history(user_id: str, conversation_id: str) -> BaseChatMessageHistory:\n",
-    "    if (user_id, conversation_id) not in store:\n",
-    "        store[(user_id, conversation_id)] = ChatMessageHistory()\n",
-    "    return store[(user_id, conversation_id)]\n",
-    "\n",
-    "\n",
-    "with_message_history = RunnableWithMessageHistory(\n",
-    "    runnable,\n",
-    "    get_session_history,\n",
-    "    input_messages_key=\"input\",\n",
-    "    history_messages_key=\"history\",\n",
-    "    history_factory_config=[\n",
-    "        ConfigurableFieldSpec(\n",
-    "            id=\"user_id\",\n",
-    "            annotation=str,\n",
-    "            name=\"User ID\",\n",
-    "            description=\"Unique identifier for the user.\",\n",
-    "            default=\"\",\n",
-    "            is_shared=True,\n",
-    "        ),\n",
-    "        ConfigurableFieldSpec(\n",
-    "            id=\"conversation_id\",\n",
-    "            annotation=str,\n",
-    "            name=\"Conversation ID\",\n",
-    "            description=\"Unique identifier for the conversation.\",\n",
-    "            default=\"\",\n",
-    "            is_shared=True,\n",
-    "        ),\n",
-    "    ],\n",
-    ")"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "65c5622e-09b8-4f2f-8c8a-2dab0fd040fa",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "with_message_history.invoke(\n",
-    "    {\"ability\": \"math\", \"input\": \"Hello\"},\n",
-    "    config={\"configurable\": {\"user_id\": \"123\", \"conversation_id\": \"1\"}},\n",
-    ")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "18f1a459-3f88-4ee6-8542-76a907070dd6",
-   "metadata": {},
-   "source": [
-    "### Examples with runnables of different signatures\n",
-    "\n",
-    "The above runnable takes a dict as input and returns a BaseMessage. Below we show some alternatives."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "48eae1bf-b59d-4a61-8e62-b6dbf667e866",
-   "metadata": {},
-   "source": [
-    "#### Messages input, dict output"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 7,
-   "id": "17733d4f-3a32-4055-9d44-5d58b9446a26",
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "{'output_message': AIMessage(content=\"Simone de Beauvoir believed in the existence of free will. She argued that individuals have the ability to make choices and determine their own actions, even in the face of social and cultural constraints. She rejected the idea that individuals are purely products of their environment or predetermined by biology or destiny. Instead, she emphasized the importance of personal responsibility and the need for individuals to actively engage in creating their own lives and defining their own existence. De Beauvoir believed that freedom and agency come from recognizing one's own freedom and actively exercising it in the pursuit of personal and collective liberation.\")}"
-      ]
-     },
-     "execution_count": 7,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "from langchain_core.messages import HumanMessage\n",
-    "from langchain_core.runnables import RunnableParallel\n",
-    "\n",
-    "chain = RunnableParallel({\"output_message\": ChatOpenAI()})\n",
-    "\n",
-    "\n",
-    "def get_session_history(session_id: str) -> BaseChatMessageHistory:\n",
-    "    if session_id not in store:\n",
-    "        store[session_id] = ChatMessageHistory()\n",
-    "    return store[session_id]\n",
-    "\n",
-    "\n",
-    "with_message_history = RunnableWithMessageHistory(\n",
-    "    chain,\n",
-    "    get_session_history,\n",
-    "    output_messages_key=\"output_message\",\n",
-    ")\n",
-    "\n",
-    "with_message_history.invoke(\n",
-    "    [HumanMessage(content=\"What did Simone de Beauvoir believe about free will\")],\n",
-    "    config={\"configurable\": {\"session_id\": \"baz\"}},\n",
-    ")"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 8,
-   "id": "efb57ef5-91f9-426b-84b9-b77f071a9dd7",
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "{'output_message': AIMessage(content='Simone de Beauvoir\\'s views on free will were closely aligned with those of her contemporary and partner Jean-Paul Sartre. Both de Beauvoir and Sartre were existentialist philosophers who emphasized the importance of individual freedom and the rejection of determinism. They believed that human beings have the capacity to transcend their circumstances and create their own meaning and values.\\n\\nSartre, in his famous work \"Being and Nothingness,\" argued that human beings are condemned to be free, meaning that we are burdened with the responsibility of making choices and defining ourselves in a world that lacks inherent meaning. Like de Beauvoir, Sartre believed that individuals have the ability to exercise their freedom and make choices in the face of external and internal constraints.\\n\\nWhile there may be some nuanced differences in their philosophical writings, overall, de Beauvoir and Sartre shared a similar belief in the existence of free will and the importance of individual agency in shaping one\\'s own life.')}"
-      ]
-     },
-     "execution_count": 8,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "with_message_history.invoke(\n",
-    "    [HumanMessage(content=\"How did this compare to Sartre\")],\n",
-    "    config={\"configurable\": {\"session_id\": \"baz\"}},\n",
-    ")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "a39eac5f-a9d8-4729-be06-5e7faf0c424d",
-   "metadata": {},
-   "source": [
-    "#### Messages input, messages output"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "e45bcd95-e31f-4a9a-967a-78f96e8da881",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "RunnableWithMessageHistory(\n",
-    "    ChatOpenAI(),\n",
-    "    get_session_history,\n",
-    ")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "04daa921-a2d1-40f9-8cd1-ae4e9a4163a7",
-   "metadata": {},
-   "source": [
-    "#### Dict with single key for all messages input, messages output"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "27157f15-9fb0-4167-9870-f4d7f234b3cb",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "from operator import itemgetter\n",
-    "\n",
-    "RunnableWithMessageHistory(\n",
-    "    itemgetter(\"input_messages\") | ChatOpenAI(),\n",
-    "    get_session_history,\n",
-    "    input_messages_key=\"input_messages\",\n",
-    ")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "418ca7af-9ed9-478c-8bca-cba0de2ca61e",
-   "metadata": {},
-   "source": [
-    "## Persistent storage"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "76799a13-d99a-4c4f-91f2-db699e40b8df",
-   "metadata": {},
-   "source": [
-    "In many cases it is preferable to persist conversation histories. `RunnableWithMessageHistory` is agnostic as to how the `get_session_history` callable retrieves its chat message histories. See [here](https://github.com/langchain-ai/langserve/blob/main/examples/chat_with_persistence_and_user/server.py) for an example using a local filesystem. Below we demonstrate how one could use Redis. Check out the [memory integrations](https://integrations.langchain.com/memory) page for implementations of chat message histories using other providers."
+    "Let's take a look at some examples to see how it works."
   ]
  },
  {
@@ -401,9 +29,9 @@
   "id": "6bca45e5-35d9-4603-9ca9-6ac0ce0e35cd",
   "metadata": {},
   "source": [
-    "### Setup\n",
+    "## Setup\n",
    "\n",
-    "We'll need to install Redis if it's not installed already:"
+    "We'll use Redis to store our chat message histories and Anthropic's claude-2 model so we'll need to install the following dependencies:"
   ]
  },
  {
@@ -413,7 +41,28 @@
   "metadata": {},
   "outputs": [],
   "source": [
-    "%pip install --upgrade --quiet redis"
+    "%pip install --upgrade --quiet  langchain redis anthropic"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "93776323-d6b8-4912-bb6a-867c5e655f46",
+   "metadata": {},
+   "source": [
+    "Set your [Anthropic API  key](https://console.anthropic.com/):"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "c7f56f69-d2f1-4a21-990c-b5551eb012fa",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import getpass\n",
+    "import os\n",
+    "\n",
+    "os.environ[\"ANTHROPIC_API_KEY\"] = getpass.getpass()"
   ]
  },
  {
@@ -429,7 +78,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 9,
+   "execution_count": 1,
   "id": "cd6a250e-17fe-4368-a39d-1fe6b2cbde68",
   "metadata": {},
   "outputs": [],
@@ -463,30 +112,75 @@
  },
  {
   "cell_type": "markdown",
-   "id": "f9d81796-ce61-484c-89e2-6c567d5e54ef",
+   "id": "1a5a632e-ba9e-4488-b586-640ad5494f62",
   "metadata": {},
   "source": [
-    "Updating the message history implementation just requires us to define a new callable, this time returning an instance of `RedisChatMessageHistory`:"
+    "## Example: Dict input, message output\n",
+    "\n",
+    "Let's create a simple chain that takes a dict as input and returns a BaseMessage.\n",
+    "\n",
+    "In this case the `\"question\"` key in the input represents our input message, and the `\"history\"` key is where our historical messages will be injected."
   ]
  },
  {
   "cell_type": "code",
-   "execution_count": 10,
+   "execution_count": 2,
+   "id": "2a150d6f-8878-4950-8634-a608c5faad56",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from typing import Optional\n",
+    "\n",
+    "from langchain_community.chat_message_histories import RedisChatMessageHistory\n",
+    "from langchain_community.chat_models import ChatAnthropic\n",
+    "from langchain_core.chat_history import BaseChatMessageHistory\n",
+    "from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder\n",
+    "from langchain_core.runnables.history import RunnableWithMessageHistory"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "3185edba-4eb6-4b32-80c6-577c0d19af97",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "prompt = ChatPromptTemplate.from_messages(\n",
+    "    [\n",
+    "        (\"system\", \"You're an assistant who's good at {ability}\"),\n",
+    "        MessagesPlaceholder(variable_name=\"history\"),\n",
+    "        (\"human\", \"{question}\"),\n",
+    "    ]\n",
+    ")\n",
+    "\n",
+    "chain = prompt | ChatAnthropic(model=\"claude-2\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f9d81796-ce61-484c-89e2-6c567d5e54ef",
+   "metadata": {},
+   "source": [
+    "### Adding message history\n",
+    "\n",
+    "To add message history to our original chain we wrap it in the `RunnableWithMessageHistory` class.\n",
+    "\n",
+    "Crucially, we also need to  define a method that takes a session_id string and based on it returns a `BaseChatMessageHistory`. Given the same input, this method should return an equivalent output.\n",
+    "\n",
+    "In this case we'll also want to specify `input_messages_key` (the key to be treated as the latest input message) and `history_messages_key` (the key to add historical messages to)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
   "id": "ca7c64d8-e138-4ef8-9734-f82076c47d80",
   "metadata": {},
   "outputs": [],
   "source": [
-    "from langchain_community.chat_message_histories import RedisChatMessageHistory\n",
-    "\n",
-    "\n",
-    "def get_message_history(session_id: str) -> RedisChatMessageHistory:\n",
-    "    return RedisChatMessageHistory(session_id, url=REDIS_URL)\n",
-    "\n",
-    "\n",
-    "with_message_history = RunnableWithMessageHistory(\n",
-    "    runnable,\n",
-    "    get_message_history,\n",
-    "    input_messages_key=\"input\",\n",
+    "chain_with_history = RunnableWithMessageHistory(\n",
+    "    chain,\n",
+    "    lambda session_id: RedisChatMessageHistory(session_id, url=REDIS_URL),\n",
+    "    input_messages_key=\"question\",\n",
    "    history_messages_key=\"history\",\n",
    ")"
   ]
@@ -496,53 +190,60 @@
   "id": "37eefdec-9901-4650-b64c-d3c097ed5f4d",
   "metadata": {},
   "source": [
-    "We can invoke as before:"
+    "## Invoking with config\n",
+    "\n",
+    "Whenever we call our chain with message history, we need to include a config that contains the `session_id`\n",
+    "```python\n",
+    "config={\"configurable\": {\"session_id\": \"<SESSION_ID>\"}}\n",
+    "```\n",
+    "\n",
+    "Given the same configuration, our chain should be pulling from the same chat message history."
   ]
  },
  {
   "cell_type": "code",
-   "execution_count": 11,
+   "execution_count": 7,
   "id": "a85bcc22-ca4c-4ad5-9440-f94be7318f3e",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
-       "AIMessage(content='Cosine is a trigonometric function that represents the ratio of the adjacent side to the hypotenuse in a right triangle.')"
+       "AIMessage(content=' Cosine is one of the basic trigonometric functions in mathematics. It is defined as the ratio of the adjacent side to the hypotenuse in a right triangle.\\n\\nSome key properties and facts about cosine:\\n\\n- It is denoted by cos(θ), where θ is the angle in a right triangle. \\n\\n- The cosine of an acute angle is always positive. For angles greater than 90 degrees, cosine can be negative.\\n\\n- Cosine is one of the three main trig functions along with sine and tangent.\\n\\n- The cosine of 0 degrees is 1. As the angle increases towards 90 degrees, the cosine value decreases towards 0.\\n\\n- The range of values for cosine is -1 to 1.\\n\\n- The cosine function maps angles in a circle to the x-coordinate on the unit circle.\\n\\n- Cosine is used to find adjacent side lengths in right triangles, and has many other applications in mathematics, physics, engineering and more.\\n\\n- Key cosine identities include: cos(A+B) = cosAcosB − sinAsinB and cos(2A) = cos^2(A) − sin^2(A)\\n\\nSo in summary, cosine is a fundamental trig')"
      ]
     },
-     "execution_count": 11,
+     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
-    "with_message_history.invoke(\n",
-    "    {\"ability\": \"math\", \"input\": \"What does cosine mean?\"},\n",
+    "chain_with_history.invoke(\n",
+    "    {\"ability\": \"math\", \"question\": \"What does cosine mean?\"},\n",
    "    config={\"configurable\": {\"session_id\": \"foobar\"}},\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
-   "execution_count": 12,
+   "execution_count": 8,
   "id": "ab29abd3-751f-41ce-a1b0-53f6b565e79d",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
-       "AIMessage(content='The inverse of cosine is the arccosine function, denoted as acos or cos^-1, which gives the angle corresponding to a given cosine value.')"
+       "AIMessage(content=' The inverse of the cosine function is called the arccosine or inverse cosine, often denoted as cos-1(x) or arccos(x).\\n\\nThe key properties and facts about arccosine:\\n\\n- It is defined as the angle θ between 0 and π radians whose cosine is x. So arccos(x) = θ such that cos(θ) = x.\\n\\n- The range of arccosine is 0 to π radians (0 to 180 degrees).\\n\\n- The domain of arccosine is -1 to 1. \\n\\n- arccos(cos(θ)) = θ for values of θ from 0 to π radians.\\n\\n- arccos(x) is the angle in a right triangle whose adjacent side is x and hypotenuse is 1.\\n\\n- arccos(0) = 90 degrees. As x increases from 0 to 1, arccos(x) decreases from 90 to 0 degrees.\\n\\n- arccos(1) = 0 degrees. arccos(-1) = 180 degrees.\\n\\n- The graph of y = arccos(x) is part of the unit circle, restricted to x')"
      ]
     },
-     "execution_count": 12,
+     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
-    "with_message_history.invoke(\n",
-    "    {\"ability\": \"math\", \"input\": \"What's its inverse\"},\n",
+    "chain_with_history.invoke(\n",
+    "    {\"ability\": \"math\", \"question\": \"What's its inverse\"},\n",
    "    config={\"configurable\": {\"session_id\": \"foobar\"}},\n",
    ")"
   ]
@@ -554,7 +255,7 @@
   "source": [
    ":::tip\n",
    "\n",
-    "[Langsmith trace](https://smith.langchain.com/public/bd73e122-6ec1-48b2-82df-e6483dc9cb63/r)\n",
+    "[Langsmith trace](https://smith.langchain.com/public/863a003b-7ca8-4b24-be9e-d63ec13c106e/r)\n",
    "\n",
    ":::"
   ]
@@ -566,13 +267,124 @@
   "source": [
    "Looking at the Langsmith trace for the second call, we can see that when constructing the prompt, a \"history\" variable has been injected which is a list of two messages (our first input and first output)."
   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "028cf151-6cd5-4533-b3cf-c8d735554647",
+   "metadata": {},
+   "source": [
+    "## Example: messages input, dict output"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 14,
+   "id": "0bb446b5-6251-45fe-a92a-4c6171473c53",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "{'output_message': AIMessage(content=' Here is a summary of Simone de Beauvoir\\'s views on free will:\\n\\n- De Beauvoir was an existentialist philosopher and believed strongly in the concept of free will. She rejected the idea that human nature or instincts determine behavior.\\n\\n- Instead, de Beauvoir argued that human beings define their own essence or nature through their actions and choices. As she famously wrote, \"One is not born, but rather becomes, a woman.\"\\n\\n- De Beauvoir believed that while individuals are situated in certain cultural contexts and social conditions, they still have agency and the ability to transcend these situations. Freedom comes from choosing one\\'s attitude toward these constraints.\\n\\n- She emphasized the radical freedom and responsibility of the individual. We are \"condemned to be free\" because we cannot escape making choices and taking responsibility for our choices. \\n\\n- De Beauvoir felt that many people evade their freedom and responsibility by adopting rigid mindsets, ideologies, or conforming uncritically to social roles.\\n\\n- She advocated for the recognition of ambiguity in the human condition and warned against the quest for absolute rules that deny freedom and responsibility. Authentic living involves embracing ambiguity.\\n\\nIn summary, de Beauvoir promoted an existential ethics')}"
+      ]
+     },
+     "execution_count": 14,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "from langchain_core.messages import HumanMessage\n",
+    "from langchain_core.runnables import RunnableParallel\n",
+    "\n",
+    "chain = RunnableParallel({\"output_message\": ChatAnthropic(model=\"claude-2\")})\n",
+    "chain_with_history = RunnableWithMessageHistory(\n",
+    "    chain,\n",
+    "    lambda session_id: RedisChatMessageHistory(session_id, url=REDIS_URL),\n",
+    "    output_messages_key=\"output_message\",\n",
+    ")\n",
+    "\n",
+    "chain_with_history.invoke(\n",
+    "    [HumanMessage(content=\"What did Simone de Beauvoir believe about free will\")],\n",
+    "    config={\"configurable\": {\"session_id\": \"baz\"}},\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 16,
+   "id": "601ce3ff-aea8-424d-8e54-fd614256af4f",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "{'output_message': AIMessage(content=\" There are many similarities between Simone de Beauvoir's views on free will and those of Jean-Paul Sartre, though some key differences emerge as well:\\n\\nSimilarities with Sartre:\\n\\n- Both were existentialist thinkers who rejected determinism and emphasized human freedom and responsibility.\\n\\n- They agreed that existence precedes essence - there is no predefined human nature that determines who we are.\\n\\n- Individuals must define themselves through their choices and actions. This leads to anxiety but also freedom.\\n\\n- The human condition is characterized by ambiguity and uncertainty, rather than fixed meanings/values.\\n\\n- Both felt that most people evade their freedom through self-deception, conformity, or adopting collective identities/values uncritically.\\n\\nDifferences from Sartre: \\n\\n- Sartre placed more emphasis on the burden and anguish of radical freedom. De Beauvoir focused more on its positive potential.\\n\\n- De Beauvoir critiqued Sartre's premise that human relations are necessarily conflictual. She saw more potential for mutual recognition.\\n\\n- Sartre saw the Other's gaze as a threat to freedom. De Beauvoir put more stress on how the Other's gaze can confirm\")}"
+      ]
+     },
+     "execution_count": 16,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "chain_with_history.invoke(\n",
+    "    [HumanMessage(content=\"How did this compare to Sartre\")],\n",
+    "    config={\"configurable\": {\"session_id\": \"baz\"}},\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b898d1b1-11e6-4d30-a8dd-cc5e45533611",
+   "metadata": {},
+   "source": [
+    ":::tip\n",
+    "\n",
+    "[LangSmith trace](https://smith.langchain.com/public/f6c3e1d1-a49d-4955-a9fa-c6519df74fa7/r)\n",
+    "\n",
+    ":::"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1724292c-01c6-44bb-83e8-9cdb6bf01483",
+   "metadata": {},
+   "source": [
+    "## More examples\n",
+    "\n",
+    "We could also do any of the below:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "fd89240b-5a25-48f8-9568-5c1127f9ffad",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from operator import itemgetter\n",
+    "\n",
+    "# messages in, messages out\n",
+    "RunnableWithMessageHistory(\n",
+    "    ChatAnthropic(model=\"claude-2\"),\n",
+    "    lambda session_id: RedisChatMessageHistory(session_id, url=REDIS_URL),\n",
+    ")\n",
+    "\n",
+    "# dict with single key for all messages in, messages out\n",
+    "RunnableWithMessageHistory(\n",
+    "    itemgetter(\"input_messages\") | ChatAnthropic(model=\"claude-2\"),\n",
+    "    lambda session_id: RedisChatMessageHistory(session_id, url=REDIS_URL),\n",
+    "    input_messages_key=\"input_messages\",\n",
+    ")"
+   ]
  }
 ],
 "metadata": {
  "kernelspec": {
-   "display_name": "Python 3 (ipykernel)",
+   "display_name": "poetry-venv",
   "language": "python",
-   "name": "python3"
+   "name": "poetry-venv"
  },
  "language_info": {
   "codemirror_mode": {
@@ -584,7 +396,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.10.13"
+   "version": "3.9.1"
  }
 },
 "nbformat": 4,
--- a/docs/docs/get_started/quickstart.mdx
+++ b/docs/docs/get_started/quickstart.mdx
@@ -193,7 +193,7 @@ After that, we can import and use WebBaseLoader.

 ```python
 from langchain_community.document_loaders import WebBaseLoader
-loader = WebBaseLoader("https://docs.smith.langchain.com")
+loader = WebBaseLoader("https://docs.smith.langchain.com/overview")

 docs = loader.load()
 ```
@@ -374,7 +374,7 @@ The final thing we will create is an agent - where the LLM decides what steps to
 **NOTE: for this example we will only show how to create an agent using OpenAI models, as local models are not reliable enough yet.**

 One of the first things to do when building an agent is to decide what tools it should have access to.
-For this example, we will give the agent access to two tools:
+For this example, we will give the agent access two tools:

 1. The retriever we just created. This will let it easily answer questions about LangSmith
 2. A search tool. This will let it easily answer questions that require up to date information.
--- a/docs/docs/integrations/chat/ai21.ipynb
+++ b/docs/docs/integrations/chat/ai21.ipynb
@@ -1,141 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "raw",
-   "id": "4cebeec0",
-   "metadata": {},
-   "source": [
-    "---\n",
-    "sidebar_label: AI21 Labs\n",
-    "---"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "e49f1e0d",
-   "metadata": {},
-   "source": [
-    "# ChatAI21\n",
-    "\n",
-    "This notebook covers how to get started with AI21 chat models.\n",
-    "\n",
-    "## Installation"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "4c3bef91",
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2024-02-15T06:50:44.929635Z",
-     "start_time": "2024-02-15T06:50:41.209704Z"
-    }
-   },
-   "outputs": [],
-   "source": [
-    "!pip install -qU langchain-ai21"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "2b4f3e15",
-   "metadata": {},
-   "source": [
-    "## Environment Setup\n",
-    "\n",
-    "We'll need to get a [AI21 API key](https://docs.ai21.com/) and set the `AI21_API_KEY` environment variable:\n"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "62e0dbc3",
-   "metadata": {
-    "tags": []
-   },
-   "outputs": [],
-   "source": [
-    "import os\n",
-    "from getpass import getpass\n",
-    "\n",
-    "os.environ[\"AI21_API_KEY\"] = getpass()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "4828829d3da430ce",
-   "metadata": {
-    "collapsed": false
-   },
-   "source": [
-    "## Usage"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 1,
-   "id": "39353473fce5dd2e",
-   "metadata": {
-    "collapsed": false
-   },
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "AIMessage(content='Bonjour, comment vas-tu?')"
-      ]
-     },
-     "execution_count": 1,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "from langchain_ai21 import ChatAI21\n",
-    "from langchain_core.prompts import ChatPromptTemplate\n",
-    "\n",
-    "chat = ChatAI21(model=\"j2-ultra\")\n",
-    "\n",
-    "prompt = ChatPromptTemplate.from_messages(\n",
-    "    [\n",
-    "        (\"system\", \"You are a helpful assistant that translates English to French.\"),\n",
-    "        (\"human\", \"Translate this sentence from English to French. {english_text}.\"),\n",
-    "    ]\n",
-    ")\n",
-    "\n",
-    "chain = prompt | chat\n",
-    "chain.invoke({\"english_text\": \"Hello, how are you?\"})"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "c159a79f",
-   "metadata": {},
-   "outputs": [],
-   "source": []
-  }
- ],
- "metadata": {
-  "kernelspec": {
-   "display_name": "Python 3 (ipykernel)",
-   "language": "python",
-   "name": "python3"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.11.4"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 5
-}
--- a/docs/docs/integrations/chat_loaders/langsmith_dataset.ipynb
+++ b/docs/docs/integrations/chat_loaders/langsmith_dataset.ipynb
@@ -55,7 +55,7 @@
   "source": [
    "## 1. Select a dataset\n",
    "\n",
-    "This notebook fine-tunes a model directly on selecting which runs to fine-tune on. You will often curate these from traced runs. You can learn more about LangSmith datasets in the docs [docs](https://docs.smith.langchain.com/evaluation/concepts#datasets).\n",
+    "This notebook fine-tunes a model directly on selecting which runs to fine-tune on. You will often curate these from traced runs. You can learn more about LangSmith datasets in the docs [docs](https://docs.smith.langchain.com/evaluation/datasets).\n",
    "\n",
    "For the sake of this tutorial, we will upload an existing dataset here that you can use."
   ]
--- a/docs/docs/integrations/document_loaders/cassandra.ipynb
+++ b/docs/docs/integrations/document_loaders/cassandra.ipynb
@@ -72,72 +72,57 @@
  },
  {
   "cell_type": "markdown",
-   "metadata": {
-    "collapsed": false,
-    "jupyter": {
-     "outputs_hidden": false
-    }
-   },
   "source": [
    "### Init from a cassandra driver Session\n",
    "\n",
    "You need to create a `cassandra.cluster.Session` object, as described in the [Cassandra driver documentation](https://docs.datastax.com/en/developer/python-driver/latest/api/cassandra/cluster/#module-cassandra.cluster). The details vary (e.g. with network settings and authentication), but this might be something like:"
-   ]
+   ],
+   "metadata": {
+    "collapsed": false
+   }
  },
  {
   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "collapsed": false,
-    "jupyter": {
-     "outputs_hidden": false
-    }
-   },
   "outputs": [],
   "source": [
    "from cassandra.cluster import Cluster\n",
    "\n",
    "cluster = Cluster()\n",
    "session = cluster.connect()"
-   ]
+   ],
+   "metadata": {
+    "collapsed": false
+   },
+   "execution_count": null
  },
  {
   "cell_type": "markdown",
-   "metadata": {
-    "collapsed": false,
-    "jupyter": {
-     "outputs_hidden": false
-    }
-   },
   "source": [
    "You need to provide the name of an existing keyspace of the Cassandra instance:"
-   ]
+   ],
+   "metadata": {
+    "collapsed": false
+   }
  },
  {
   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "collapsed": false,
-    "jupyter": {
-     "outputs_hidden": false
-    }
-   },
   "outputs": [],
   "source": [
    "CASSANDRA_KEYSPACE = input(\"CASSANDRA_KEYSPACE = \")"
-   ]
+   ],
+   "metadata": {
+    "collapsed": false
+   },
+   "execution_count": null
  },
  {
   "cell_type": "markdown",
-   "metadata": {
-    "collapsed": false,
-    "jupyter": {
-     "outputs_hidden": false
-    }
-   },
   "source": [
    "Creating the document loader:"
-   ]
+   ],
+   "metadata": {
+    "collapsed": false
+   }
  },
  {
   "cell_type": "code",
@@ -159,21 +144,18 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 17,
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2024-01-19T15:47:26.399472Z",
-     "start_time": "2024-01-19T15:47:26.389145Z"
-    },
-    "collapsed": false,
-    "jupyter": {
-     "outputs_hidden": false
-    }
-   },
   "outputs": [],
   "source": [
    "docs = loader.load()"
-   ]
+   ],
+   "metadata": {
+    "collapsed": false,
+    "ExecuteTime": {
+     "end_time": "2024-01-19T15:47:26.399472Z",
+     "start_time": "2024-01-19T15:47:26.389145Z"
+    }
+   },
+   "execution_count": 17
  },
  {
   "cell_type": "code",
@@ -187,9 +169,7 @@
   "outputs": [
    {
     "data": {
-      "text/plain": [
-       "Document(page_content='Row(_id=\\'659bdffa16cbc4586b11a423\\', title=\\'Dangerous Men\\', reviewtext=\\'\"Dangerous Men,\"  the picture\\\\\\'s production notes inform, took 26 years to reach the big screen. After having seen it, I wonder: What was the rush?\\')', metadata={'table': 'movie_reviews', 'keyspace': 'default_keyspace'})"
-      ]
+      "text/plain": "Document(page_content='Row(_id=\\'659bdffa16cbc4586b11a423\\', title=\\'Dangerous Men\\', reviewtext=\\'\"Dangerous Men,\"  the picture\\\\\\'s production notes inform, took 26 years to reach the big screen. After having seen it, I wonder: What was the rush?\\')', metadata={'table': 'movie_reviews', 'keyspace': 'default_keyspace'})"
     },
     "execution_count": 19,
     "metadata": {},
@@ -202,27 +182,17 @@
  },
  {
   "cell_type": "markdown",
-   "metadata": {
-    "collapsed": false,
-    "jupyter": {
-     "outputs_hidden": false
-    }
-   },
   "source": [
    "### Init from cassio\n",
    "\n",
    "It's also possible to use cassio to configure the session and keyspace."
-   ]
+   ],
+   "metadata": {
+    "collapsed": false
+   }
  },
  {
   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "collapsed": false,
-    "jupyter": {
-     "outputs_hidden": false
-    }
-   },
   "outputs": [],
   "source": [
    "import cassio\n",
@@ -234,16 +204,11 @@
    ")\n",
    "\n",
    "docs = loader.load()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "#### Attribution statement\n",
-    "\n",
-    "> Apache Cassandra, Cassandra and Apache are either registered trademarks or trademarks of the [Apache Software Foundation](http://www.apache.org/) in the United States and/or other countries."
-   ]
+   ],
+   "metadata": {
+    "collapsed": false
+   },
+   "execution_count": null
  }
 ],
 "metadata": {
@@ -268,7 +233,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.9.17"
+   "version": "3.9.18"
  }
 },
 "nbformat": 4,
--- a/docs/docs/integrations/llms/ai21.ipynb
+++ b/docs/docs/integrations/llms/ai21.ipynb
@@ -1,114 +1,137 @@
 {
 "cells": [
-  {
-   "cell_type": "raw",
-   "id": "602a52a4",
-   "metadata": {},
-   "source": [
-    "---\n",
-    "sidebar_label: AI21 Labs\n",
-    "---"
-   ]
-  },
  {
   "cell_type": "markdown",
   "id": "9597802c",
   "metadata": {},
   "source": [
-    "# AI21LLM\n",
+    "# AI21\n",
    "\n",
-    "This example goes over how to use LangChain to interact with `AI21` models.\n",
+    "[AI21 Studio](https://docs.ai21.com/) provides API access to `Jurassic-2` large language models.\n",
    "\n",
-    "## Installation"
+    "This example goes over how to use LangChain to interact with [AI21 models](https://docs.ai21.com/docs/jurassic-2-models)."
   ]
  },
  {
   "cell_type": "code",
-   "execution_count": null,
-   "id": "59c710c4",
-   "metadata": {},
+   "execution_count": 1,
+   "id": "02be122d-04e8-4ec6-84d1-f1d8961d6828",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "\u001b[33mWARNING: There was an error checking the latest version of pip.\u001b[0m\u001b[33m\n",
+      "\u001b[0mNote: you may need to restart the kernel to use updated packages.\n"
+     ]
+    }
+   ],
+   "source": [
+    "# install the package:\n",
+    "%pip install --upgrade --quiet  ai21"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "4229227e-6ca2-41ad-a3c3-5f29e3559091",
+   "metadata": {
+    "tags": []
+   },
   "outputs": [],
   "source": [
-    "!pip install -qU langchain-ai21"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "560a2f9254963fd7",
-   "metadata": {
-    "collapsed": false
-   },
-   "source": [
-    "## Environment Setup\n",
+    "# get AI21_API_KEY. Use https://studio.ai21.com/account/account\n",
    "\n",
-    "We'll need to get a [AI21 API key](https://docs.ai21.com/) and set the `AI21_API_KEY` environment variable:"
+    "from getpass import getpass\n",
+    "\n",
+    "AI21_API_KEY = getpass()"
   ]
  },
  {
   "cell_type": "code",
-   "execution_count": 4,
+   "execution_count": 7,
+   "id": "6fb585dd",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "from langchain_community.llms import AI21\n",
+    "from langchain_core.prompts import PromptTemplate"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
   "id": "035dea0f",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
-    "import os\n",
-    "from getpass import getpass\n",
+    "template = \"\"\"Question: {question}\n",
    "\n",
-    "os.environ[\"AI21_API_KEY\"] = getpass()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "1891df96eb076e1a",
-   "metadata": {
-    "collapsed": false
-   },
-   "source": [
-    "## Usage"
+    "Answer: Let's think step by step.\"\"\"\n",
+    "\n",
+    "prompt = PromptTemplate.from_template(template)"
   ]
  },
  {
   "cell_type": "code",
-   "execution_count": 6,
-   "id": "98f70927a87e4745",
+   "execution_count": 9,
+   "id": "3f3458d9",
   "metadata": {
-    "collapsed": false
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "llm = AI21(ai21_api_key=AI21_API_KEY)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "id": "a641dbd9",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "llm_chain = prompt | llm"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "id": "9f0b1960",
+   "metadata": {
+    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
-       "'\\nLangChain is a decentralized blockchain network that leverages AI and machine learning to provide language translation services.'"
+       "'\\nThe Super Bowl in the year Justin Beiber was born was in the year 1991.\\nThe Super Bowl in 1991 was won by the Washington Redskins.\\nFinal answer: Washington Redskins'"
      ]
     },
-     "execution_count": 6,
+     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
-    "from langchain_ai21 import AI21LLM\n",
-    "from langchain_core.prompts import PromptTemplate\n",
+    "question = \"What NFL team won the Super Bowl in the year Justin Beiber was born?\"\n",
    "\n",
-    "template = \"\"\"Question: {question}\n",
-    "\n",
-    "Answer: Let's think step by step.\"\"\"\n",
-    "\n",
-    "prompt = PromptTemplate.from_template(template)\n",
-    "\n",
-    "model = AI21LLM(model=\"j2-ultra\")\n",
-    "\n",
-    "chain = prompt | model\n",
-    "\n",
-    "chain.invoke({\"question\": \"What is LangChain?\"})"
+    "llm_chain.invoke({\"question\": question})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
-   "id": "a52f765c",
+   "id": "22bce013",
   "metadata": {},
   "outputs": [],
   "source": []
@@ -116,7 +139,7 @@
 ],
 "metadata": {
  "kernelspec": {
-   "display_name": "Python 3.11.1 64-bit",
+   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
@@ -130,12 +153,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.11.4"
-  },
-  "vscode": {
-   "interpreter": {
-    "hash": "e971737741ff4ec9aff7dc6155a1060a59a8a6d52c757dbbe66bf8ee389494b1"
-   }
+   "version": "3.10.13"
  }
 },
 "nbformat": 4,
--- a/docs/docs/integrations/llms/huggingface_endpoint.ipynb
+++ b/docs/docs/integrations/llms/huggingface_endpoint.ipynb
@@ -1,238 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Huggingface Endpoints\n",
-    "\n",
-    ">The [Hugging Face Hub](https://huggingface.co/docs/hub/index) is a platform with over 120k models, 20k datasets, and 50k demo apps (Spaces), all open source and publicly available, in an online platform where people can easily collaborate and build ML together.\n",
-    "\n",
-    "The `Hugging Face Hub` also offers various endpoints to build ML applications.\n",
-    "This example showcases how to connect to the different Endpoints types.\n",
-    "\n",
-    "In particular, text generation inference is powered by [Text Generation Inference](https://github.com/huggingface/text-generation-inference): a custom-built Rust, Python and gRPC server for blazing-faset text generation inference."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "from langchain_community.llms import HuggingFaceEndpoint"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Installation and Setup"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "To use, you should have the ``huggingface_hub`` python [package installed](https://huggingface.co/docs/huggingface_hub/installation)."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "%pip install --upgrade --quiet huggingface_hub"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# get a token: https://huggingface.co/docs/api-inference/quicktour#get-your-api-token\n",
-    "\n",
-    "from getpass import getpass\n",
-    "\n",
-    "HUGGINGFACEHUB_API_TOKEN = getpass()"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "import os\n",
-    "\n",
-    "os.environ[\"HUGGINGFACEHUB_API_TOKEN\"] = HUGGINGFACEHUB_API_TOKEN"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Prepare Examples"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "from langchain_community.llms import HuggingFaceEndpoint"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "from langchain.chains import LLMChain\n",
-    "from langchain.prompts import PromptTemplate"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "question = \"Who won the FIFA World Cup in the year 1994? \"\n",
-    "\n",
-    "template = \"\"\"Question: {question}\n",
-    "\n",
-    "Answer: Let's think step by step.\"\"\"\n",
-    "\n",
-    "prompt = PromptTemplate.from_template(template)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Examples\n",
-    "\n",
-    "Here is an example of how you can access `HuggingFaceEndpoint` integration of the free [Serverless Endpoints](https://huggingface.co/inference-endpoints/serverless) API."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "tags": []
-   },
-   "outputs": [],
-   "source": [
-    "repo_id = \"mistralai/Mistral-7B-Instruct-v0.2\"\n",
-    "\n",
-    "llm = HuggingFaceEndpoint(\n",
-    "    repo_id=repo_id, max_length=128, temperature=0.5, token=HUGGINGFACEHUB_API_TOKEN\n",
-    ")\n",
-    "llm_chain = LLMChain(prompt=prompt, llm=llm)\n",
-    "print(llm_chain.run(question))"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Dedicated Endpoint\n",
-    "\n",
-    "\n",
-    "The free serverless API lets you implement solutions and iterate in no time, but it may be rate limited for heavy use cases, since the loads are shared with other requests.\n",
-    "\n",
-    "For enterprise workloads, the best is to use [Inference Endpoints - Dedicated](https://huggingface.co/inference-endpoints/dedicated).\n",
-    "This gives access to a fully managed infrastructure that offer more flexibility and speed. These resoucres come with continuous support and uptime guarantees, as well as options like AutoScaling\n",
-    "\n"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Set the url to your Inference Endpoint below\n",
-    "your_endpoint_url = \"https://fayjubiy2xqn36z0.us-east-1.aws.endpoints.huggingface.cloud\""
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "llm = HuggingFaceEndpoint(\n",
-    "    endpoint_url=f\"{your_endpoint_url}\",\n",
-    "    max_new_tokens=512,\n",
-    "    top_k=10,\n",
-    "    top_p=0.95,\n",
-    "    typical_p=0.95,\n",
-    "    temperature=0.01,\n",
-    "    repetition_penalty=1.03,\n",
-    ")\n",
-    "llm(\"What did foo say about bar?\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### Streaming"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler\n",
-    "from langchain_community.llms import HuggingFaceEndpoint\n",
-    "\n",
-    "llm = HuggingFaceEndpoint(\n",
-    "    endpoint_url=f\"{your_endpoint_url}\",\n",
-    "    max_new_tokens=512,\n",
-    "    top_k=10,\n",
-    "    top_p=0.95,\n",
-    "    typical_p=0.95,\n",
-    "    temperature=0.01,\n",
-    "    repetition_penalty=1.03,\n",
-    "    streaming=True,\n",
-    ")\n",
-    "llm(\"What did foo say about bar?\", callbacks=[StreamingStdOutCallbackHandler()])"
-   ]
-  }
- ],
- "metadata": {
-  "kernelspec": {
-   "display_name": "agents",
-   "language": "python",
-   "name": "agents"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.10.9"
-  },
-  "vscode": {
-   "interpreter": {
-    "hash": "31f2aee4e71d21fbe5cf8b01ff0e069b9275f58929596ceb00d14d90e3e16cd6"
-   }
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 4
-}
--- a/docs/docs/integrations/llms/huggingface_hub.ipynb
+++ b/docs/docs/integrations/llms/huggingface_hub.ipynb
@@ -0,0 +1,466 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "959300d4",
+   "metadata": {},
+   "source": [
+    "# Hugging Face Hub\n",
+    "\n",
+    ">The [Hugging Face Hub](https://huggingface.co/docs/hub/index) is a platform with over 120k models, 20k datasets, and 50k demo apps (Spaces), all open source and publicly available, in an online platform where people can easily collaborate and build ML together.\n",
+    "\n",
+    "This example showcases how to connect to the `Hugging Face Hub` and use different models."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1ddafc6d-7d7c-48fa-838f-0e7f50895ce3",
+   "metadata": {},
+   "source": [
+    "## Installation and Setup"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4c1b8450-5eaf-4d34-8341-2d785448a1ff",
+   "metadata": {
+    "tags": []
+   },
+   "source": [
+    "To use, you should have the ``huggingface_hub`` python [package installed](https://huggingface.co/docs/huggingface_hub/installation)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "d772b637-de00-4663-bd77-9bc96d798db2",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "%pip install --upgrade --quiet  huggingface_hub"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "d597a792-354c-4ca5-b483-5965eec5d63d",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdin",
+     "output_type": "stream",
+     "text": [
+      " ········\n"
+     ]
+    }
+   ],
+   "source": [
+    "# get a token: https://huggingface.co/docs/api-inference/quicktour#get-your-api-token\n",
+    "\n",
+    "from getpass import getpass\n",
+    "\n",
+    "HUGGINGFACEHUB_API_TOKEN = getpass()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "b8c5b88c-e4b8-4d0d-9a35-6e8f106452c2",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import os\n",
+    "\n",
+    "os.environ[\"HUGGINGFACEHUB_API_TOKEN\"] = HUGGINGFACEHUB_API_TOKEN"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "84dd44c1-c428-41f3-a911-520281386c94",
+   "metadata": {},
+   "source": [
+    "## Prepare Examples"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "3fe7d1d1-241d-426a-acff-e208f1088871",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from langchain_community.llms import HuggingFaceHub"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "6620f39b-3d32-4840-8931-ff7d2c3e47e8",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from langchain.chains import LLMChain\n",
+    "from langchain.prompts import PromptTemplate"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "id": "44adc1a0-9c0a-4f1e-af5a-fe04222e78d7",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "question = \"Who won the FIFA World Cup in the year 1994? \"\n",
+    "\n",
+    "template = \"\"\"Question: {question}\n",
+    "\n",
+    "Answer: Let's think step by step.\"\"\"\n",
+    "\n",
+    "prompt = PromptTemplate.from_template(template)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ddaa06cf-95ec-48ce-b0ab-d892a7909693",
+   "metadata": {},
+   "source": [
+    "## Examples\n",
+    "\n",
+    "Below are some examples of models you can access through the `Hugging Face Hub` integration."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4c16fded-70d1-42af-8bfa-6ddda9f0bc63",
+   "metadata": {},
+   "source": [
+    "### `Flan`, by `Google`"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "id": "39c7eeac-01c4-486b-9480-e828a9e73e78",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "repo_id = \"google/flan-t5-xxl\"  # See https://huggingface.co/models?pipeline_tag=text-generation&sort=downloads for some other options"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "id": "3acf0069",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "The FIFA World Cup was held in the year 1994. West Germany won the FIFA World Cup in 1994\n"
+     ]
+    }
+   ],
+   "source": [
+    "llm = HuggingFaceHub(\n",
+    "    repo_id=repo_id, model_kwargs={\"temperature\": 0.5, \"max_length\": 64}\n",
+    ")\n",
+    "llm_chain = LLMChain(prompt=prompt, llm=llm)\n",
+    "\n",
+    "print(llm_chain.run(question))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1a5c97af-89bc-4e59-95c1-223742a9160b",
+   "metadata": {},
+   "source": [
+    "### `Dolly`, by `Databricks`\n",
+    "\n",
+    "See [Databricks](https://huggingface.co/databricks) organization page for a list of available models."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "id": "521fcd2b-8e38-4920-b407-5c7d330411c9",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "repo_id = \"databricks/dolly-v2-3b\""
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "id": "9907ec3a-fe0c-4543-81c4-d42f9453f16c",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      " First of all, the world cup was won by the Germany. Then the Argentina won the world cup in 2022. So, the Argentina won the world cup in 1994.\n",
+      "\n",
+      "\n",
+      "Question: Who\n"
+     ]
+    }
+   ],
+   "source": [
+    "llm = HuggingFaceHub(\n",
+    "    repo_id=repo_id, model_kwargs={\"temperature\": 0.5, \"max_length\": 64}\n",
+    ")\n",
+    "llm_chain = LLMChain(prompt=prompt, llm=llm)\n",
+    "print(llm_chain.run(question))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "03f6ae52-b5f9-4de6-832c-551cb3fa11ae",
+   "metadata": {},
+   "source": [
+    "### `Camel`, by `Writer`\n",
+    "\n",
+    "See [Writer's](https://huggingface.co/Writer) organization page for a list of available models."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "id": "257a091d-750b-4910-ac08-fe1c7b3fd98b",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "repo_id = \"Writer/camel-5b-hf\"  # See https://huggingface.co/Writer for other options"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "b06f6838-a11a-4d6a-88e3-91fa1747a2b3",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "llm = HuggingFaceHub(\n",
+    "    repo_id=repo_id, model_kwargs={\"temperature\": 0.5, \"max_length\": 64}\n",
+    ")\n",
+    "llm_chain = LLMChain(prompt=prompt, llm=llm)\n",
+    "print(llm_chain.run(question))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2bf838eb-1083-402f-b099-b07c452418c8",
+   "metadata": {},
+   "source": [
+    "### `XGen`, by `Salesforce`\n",
+    "\n",
+    "See [more information](https://github.com/salesforce/xgen)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "id": "18c78880-65d7-41d0-9722-18090efb60e9",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "repo_id = \"Salesforce/xgen-7b-8k-base\""
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "1b1150b4-ec30-4674-849e-6a41b085aa2b",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "llm = HuggingFaceHub(\n",
+    "    repo_id=repo_id, model_kwargs={\"temperature\": 0.5, \"max_length\": 64}\n",
+    ")\n",
+    "llm_chain = LLMChain(prompt=prompt, llm=llm)\n",
+    "print(llm_chain.run(question))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0aca9f9e-f333-449c-97b2-10d1dbf17e75",
+   "metadata": {},
+   "source": [
+    "### `Falcon`, by `Technology Innovation Institute (TII)`\n",
+    "\n",
+    "See [more information](https://huggingface.co/tiiuae/falcon-40b)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "id": "496b35ac-5ee2-4b68-a6ce-232608f56c03",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "repo_id = \"tiiuae/falcon-40b\""
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ff2541ad-e394-4179-93c2-7ae9c4ca2a25",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "llm = HuggingFaceHub(\n",
+    "    repo_id=repo_id, model_kwargs={\"temperature\": 0.5, \"max_length\": 64}\n",
+    ")\n",
+    "llm_chain = LLMChain(prompt=prompt, llm=llm)\n",
+    "print(llm_chain.run(question))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7e15849b-5561-4bb9-86ec-6412ca10196a",
+   "metadata": {},
+   "source": [
+    "### `InternLM-Chat`, by `Shanghai AI Laboratory`\n",
+    "\n",
+    "See [more information](https://huggingface.co/internlm/internlm-7b)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 19,
+   "id": "3b533461-59f8-406e-907b-000841fa60a7",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "repo_id = \"internlm/internlm-chat-7b\""
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "c71210b9-5895-41a2-889a-f430d22fa1aa",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "llm = HuggingFaceHub(\n",
+    "    repo_id=repo_id, model_kwargs={\"max_length\": 128, \"temperature\": 0.8}\n",
+    ")\n",
+    "llm_chain = LLMChain(prompt=prompt, llm=llm)\n",
+    "print(llm_chain.run(question))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4f2e5132-1713-42d7-919a-8c313744ce95",
+   "metadata": {},
+   "source": [
+    "### `Qwen`, by `Alibaba Cloud`\n",
+    "\n",
+    ">`Tongyi Qianwen-7B` (`Qwen-7B`) is a model with a scale of 7 billion parameters in the `Tongyi Qianwen` large model series developed by `Alibaba Cloud`. `Qwen-7B` is a large language model based on Transformer, which is trained on ultra-large-scale pre-training data.\n",
+    "\n",
+    "See [more information on HuggingFace](https://huggingface.co/Qwen/Qwen-7B) of on [GitHub](https://github.com/QwenLM/Qwen-7B).\n",
+    "\n",
+    "See here a [big example for LangChain integration and Qwen](https://github.com/QwenLM/Qwen-7B/blob/main/examples/langchain_tooluse.ipynb)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "id": "f598b1ca-77c7-40f1-a83f-c21ea9910c88",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "repo_id = \"Qwen/Qwen-7B\""
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "2c97f4e2-d401-44fb-9da7-b60b2e2cc663",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "llm = HuggingFaceHub(\n",
+    "    repo_id=repo_id, model_kwargs={\"max_length\": 128, \"temperature\": 0.5}\n",
+    ")\n",
+    "llm_chain = LLMChain(prompt=prompt, llm=llm)\n",
+    "print(llm_chain.run(question))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e3871376-ed0e-49a8-8d9b-7e60dbbd2b35",
+   "metadata": {},
+   "source": [
+    "### `Yi` series models, by `01.ai`\n",
+    "\n",
+    ">The `Yi` series models are large language models trained from scratch by developers at [01.ai](https://01.ai/). The first public release contains two bilingual(English/Chinese) base models with the parameter sizes of 6B(`Yi-6B`) and 34B(`Yi-34B`). Both of them are trained with 4K sequence length and can be extended to 32K during inference time. The `Yi-6B-200K` and `Yi-34B-200K` are base model with 200K context length.\n",
+    "\n",
+    "Here we test the [Yi-34B](https://huggingface.co/01-ai/Yi-34B) model."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "id": "1c9d3125-3f50-48b8-93b6-b50847207afa",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "repo_id = \"01-ai/Yi-34B\""
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "8b661069-8229-4850-9f13-c4ca28c0c96b",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "llm = HuggingFaceHub(\n",
+    "    repo_id=repo_id, model_kwargs={\"max_length\": 128, \"temperature\": 0.5}\n",
+    ")\n",
+    "llm_chain = LLMChain(prompt=prompt, llm=llm)\n",
+    "print(llm_chain.run(question))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "dd6f3edc-9f97-47a6-ab2c-116756babbe6",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.12"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
--- a/docs/docs/integrations/llms/huggingface_textgen_inference.ipynb
+++ b/docs/docs/integrations/llms/huggingface_textgen_inference.ipynb
@@ -0,0 +1,108 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Huggingface TextGen Inference\n",
+    "\n",
+    "[Text Generation Inference](https://github.com/huggingface/text-generation-inference) is a Rust, Python and gRPC server for text generation inference. Used in production at [HuggingFace](https://huggingface.co/) to power LLMs api-inference widgets.\n",
+    "\n",
+    "This notebooks goes over how to use a self hosted LLM using `Text Generation Inference`."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "To use, you should have the `text_generation` python package installed."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "# !pip3 install text_generation"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from langchain_community.llms import HuggingFaceTextGenInference\n",
+    "\n",
+    "llm = HuggingFaceTextGenInference(\n",
+    "    inference_server_url=\"http://localhost:8010/\",\n",
+    "    max_new_tokens=512,\n",
+    "    top_k=10,\n",
+    "    top_p=0.95,\n",
+    "    typical_p=0.95,\n",
+    "    temperature=0.01,\n",
+    "    repetition_penalty=1.03,\n",
+    ")\n",
+    "llm(\"What did foo say about bar?\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Streaming"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler\n",
+    "from langchain_community.llms import HuggingFaceTextGenInference\n",
+    "\n",
+    "llm = HuggingFaceTextGenInference(\n",
+    "    inference_server_url=\"http://localhost:8010/\",\n",
+    "    max_new_tokens=512,\n",
+    "    top_k=10,\n",
+    "    top_p=0.95,\n",
+    "    typical_p=0.95,\n",
+    "    temperature=0.01,\n",
+    "    repetition_penalty=1.03,\n",
+    "    streaming=True,\n",
+    ")\n",
+    "llm(\"What did foo say about bar?\", callbacks=[StreamingStdOutCallbackHandler()])"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.11.3"
+  },
+  "vscode": {
+   "interpreter": {
+    "hash": "31f2aee4e71d21fbe5cf8b01ff0e069b9275f58929596ceb00d14d90e3e16cd6"
+   }
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
--- a/docs/docs/integrations/llms/llm_caching.ipynb
+++ b/docs/docs/integrations/llms/llm_caching.ipynb
@@ -1131,16 +1131,6 @@
    "print(llm(\"How come we always see one face of the moon?\"))"
   ]
  },
-  {
-   "cell_type": "markdown",
-   "id": "55dc84b3-37cb-4f19-b175-40e18e06f83f",
-   "metadata": {},
-   "source": [
-    "#### Attribution statement\n",
-    "\n",
-    ">Apache Cassandra, Cassandra and Apache are either registered trademarks or trademarks of the [Apache Software Foundation](http://www.apache.org/) in the United States and/or other countries."
-   ]
-  },
  {
   "cell_type": "markdown",
   "id": "8712f8fc-bb89-4164-beb9-c672778bbd91",
@@ -1598,7 +1588,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.9.17"
+   "version": "3.10.1"
  }
 },
 "nbformat": 4,
--- a/docs/docs/integrations/llms/sparkllm.ipynb
+++ b/docs/docs/integrations/llms/sparkllm.ipynb
@@ -1,141 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# SparkLLM\n",
-    "[SparkLLM](https://xinghuo.xfyun.cn/spark) is a large-scale cognitive model independently developed by iFLYTEK.\n",
-    "It has cross-domain knowledge and language understanding ability by learning a large amount of texts, codes and images.\n",
-    "It can understand and perform tasks based on natural dialogue."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Prerequisite\n",
-    "- Get SparkLLM's app_id, api_key and api_secret from [iFlyTek SparkLLM API Console](https://console.xfyun.cn/services/bm3) (for more info, see [iFlyTek SparkLLM Intro](https://xinghuo.xfyun.cn/sparkapi) ), then set environment variables `IFLYTEK_SPARK_APP_ID`, `IFLYTEK_SPARK_API_KEY` and `IFLYTEK_SPARK_API_SECRET` or pass parameters when creating `ChatSparkLLM` as the demo above."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Use SparkLLM"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 1,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "import os\n",
-    "\n",
-    "os.environ[\"IFLYTEK_SPARK_APP_ID\"] = \"app_id\"\n",
-    "os.environ[\"IFLYTEK_SPARK_API_KEY\"] = \"api_key\"\n",
-    "os.environ[\"IFLYTEK_SPARK_API_SECRET\"] = \"api_secret\""
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 2,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "/Users/liugddx/code/langchain/libs/core/langchain_core/_api/deprecation.py:117: LangChainDeprecationWarning: The function `__call__` was deprecated in LangChain 0.1.7 and will be removed in 0.2.0. Use invoke instead.\n",
-      "  warn_deprecated(\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "My name is iFLYTEK Spark. How can I assist you today?\n"
-     ]
-    }
-   ],
-   "source": [
-    "from langchain_community.llms import SparkLLM\n",
-    "\n",
-    "# Load the model\n",
-    "llm = SparkLLM()\n",
-    "\n",
-    "res = llm(\"What's your name?\")\n",
-    "print(res)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 9,
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2024-02-18T13:04:29.305856Z",
-     "start_time": "2024-02-18T13:04:28.085715Z"
-    }
-   },
-   "outputs": [
-    {
-     "data": {
-      "text/plain": "LLMResult(generations=[[Generation(text='Hello! How can I assist you today?')]], llm_output=None, run=[RunInfo(run_id=UUID('d8cdcd41-a698-4cbf-a28d-e74f9cd2037b'))])"
-     },
-     "execution_count": 9,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "res = llm.generate(prompts=[\"hello!\"])\n",
-    "res"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 8,
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2024-02-18T13:05:44.640035Z",
-     "start_time": "2024-02-18T13:05:43.244126Z"
-    }
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Hello! How can I assist you today?\n"
-     ]
-    }
-   ],
-   "source": [
-    "for res in llm.stream(\"foo:\"):\n",
-    "    print(res)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "collapsed": false
-   },
-   "outputs": [],
-   "source": []
-  }
- ],
- "metadata": {
-  "kernelspec": {
-   "display_name": "Python 3 (ipykernel)",
-   "language": "python",
-   "name": "python3"
-  },
-  "language_info": {
-   "name": "python"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 2
-}
--- a/docs/docs/integrations/memory/astradb_chat_message_history.ipynb
+++ b/docs/docs/integrations/memory/astradb_chat_message_history.ipynb
@@ -32,7 +32,7 @@
   "metadata": {},
   "outputs": [],
   "source": [
-    "%pip install --upgrade --quiet  \"astrapy>=0.7.1\""
+    "%pip install --upgrade --quiet  \"astrapy>=0.6.2\""
   ]
  },
  {
--- a/docs/docs/integrations/memory/cassandra_chat_message_history.ipynb
+++ b/docs/docs/integrations/memory/cassandra_chat_message_history.ipynb
@@ -145,24 +145,6 @@
   "source": [
    "message_history.messages"
   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "59902d0f-e9ba-4e3d-a7e0-ce202b9d3c43",
-   "metadata": {},
-   "source": [
-    "#### Attribution statement\n",
-    "\n",
-    "> Apache Cassandra, Cassandra and Apache are either registered trademarks or trademarks of the [Apache Software Foundation](http://www.apache.org/) in the United States and/or other countries."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "7efaa51c-e9ee-4dce-80a4-eb9280a0dbe5",
-   "metadata": {},
-   "outputs": [],
-   "source": []
  }
 ],
 "metadata": {
@@ -181,7 +163,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.9.17"
+   "version": "3.10.12"
  }
 },
 "nbformat": 4,
--- a/docs/docs/integrations/providers/apache_doris.mdx
+++ b/docs/docs/integrations/providers/apache_doris.mdx
@@ -1,21 +0,0 @@
-# Apache Doris
-
->[Apache Doris](https://doris.apache.org/) is a modern data warehouse for real-time analytics.
-It delivers lightning-fast analytics on real-time data at scale.
-
->Usually `Apache Doris` is categorized into OLAP, and it has showed excellent performance in [ClickBench — a Benchmark For Analytical DBMS](https://benchmark.clickhouse.com/). Since it has a super-fast vectorized execution engine, it could also be used as a fast vectordb.
-
-## Installation and Setup
-
-
-```bash
-pip install pymysql
-```
-
-## Vector Store
-
-See a [usage example](/docs/integrations/vectorstores/apache_doris).
-
-```python
-from langchain_community.vectorstores import ApacheDoris
-```
--- a/docs/docs/integrations/providers/astradb.mdx
+++ b/docs/docs/integrations/providers/astradb.mdx
@@ -1,21 +1,25 @@
 # Astra DB

-> DataStax [Astra DB](https://docs.datastax.com/en/astra/home/astra.html) is a serverless vector-capable database built on Apache Cassandra® and made conveniently available
-> through an easy-to-use JSON API.
+This page lists the integrations available with [Astra DB](https://docs.datastax.com/en/astra/home/astra.html) and [Apache Cassandra®](https://cassandra.apache.org/).

 ### Setup

 Install the following Python package:

 ```bash
-pip install "astrapy>=0.7.1"
+pip install "astrapy>=0.5.3"
 ```

-## Vector Store
+## Astra DB
+
+> DataStax [Astra DB](https://docs.datastax.com/en/astra/home/astra.html) is a serverless vector-capable database built on Cassandra and made conveniently available
+> through an easy-to-use JSON API.
+
+### Vector Store

 ```python
-from langchain_astradb import AstraDBVectorStore
-vector_store = AstraDBVectorStore(
+from langchain_community.vectorstores import AstraDB
+vector_store = AstraDB(
    embedding=my_embedding,
    collection_name="my_store",
    api_endpoint="...",
@@ -25,22 +29,11 @@ vector_store = AstraDBVectorStore(

 Learn more in the [example notebook](/docs/integrations/vectorstores/astradb).

-## Chat message history
-
-```python
-from langchain_community.chat_message_histories import AstraDBChatMessageHistory
-message_history = AstraDBChatMessageHistory(
-    session_id="test-session",
-    api_endpoint="...",
-    token="...",
-)
-```
-
-## LLM Cache
+### LLM Cache

 ```python
 from langchain.globals import set_llm_cache
-from langchain_community.cache import AstraDBCache
+from langchain.cache import AstraDBCache
 set_llm_cache(AstraDBCache(
    api_endpoint="...",
    token="...",
@@ -50,11 +43,11 @@ set_llm_cache(AstraDBCache(
 Learn more in the [example notebook](/docs/integrations/llms/llm_caching#astra-db-caches) (scroll to the Astra DB section).


-## Semantic LLM Cache
+### Semantic LLM Cache

 ```python
 from langchain.globals import set_llm_cache
-from langchain_community.cache import AstraDBSemanticCache
+from langchain.cache import AstraDBSemanticCache
 set_llm_cache(AstraDBSemanticCache(
    embedding=my_embedding,
    api_endpoint="...",
@@ -64,9 +57,20 @@ set_llm_cache(AstraDBSemanticCache(

 Learn more in the [example notebook](/docs/integrations/llms/llm_caching#astra-db-caches) (scroll to the appropriate section).

+### Chat message history
+
+```python
+from langchain.memory import AstraDBChatMessageHistory
+message_history = AstraDBChatMessageHistory(
+    session_id="test-session",
+    api_endpoint="...",
+    token="...",
+)
+```
+
 Learn more in the [example notebook](/docs/integrations/memory/astradb_chat_message_history).

-## Document loader
+### Document loader

 ```python
 from langchain_community.document_loaders import AstraDBLoader
@@ -79,13 +83,13 @@ loader = AstraDBLoader(

 Learn more in the [example notebook](/docs/integrations/document_loaders/astradb).

-## Self-querying retriever
+### Self-querying retriever

 ```python
-from langchain_astradb import AstraDBVectorStore
+from langchain_community.vectorstores import AstraDB
 from langchain.retrievers.self_query.base import SelfQueryRetriever

-vector_store = AstraDBVectorStore(
+vector_store = AstraDB(
    embedding=my_embedding,
    collection_name="my_store",
    api_endpoint="...",
@@ -102,10 +106,10 @@ retriever = SelfQueryRetriever.from_llm(

 Learn more in the [example notebook](/docs/integrations/retrievers/self_query/astradb).

-## Store
+### Store

 ```python
-from langchain_astradb import AstraDBStore
+from langchain_community.storage import AstraDBStore
 store = AstraDBStore(
    collection_name="my_kv_store",
    api_endpoint="...",
@@ -115,10 +119,10 @@ store = AstraDBStore(

 Learn more in the [example notebook](/docs/integrations/stores/astradb#astradbstore).

-## Byte Store
+### Byte Store

 ```python
-from langchain_astradb import AstraDBByteStore
+from langchain_community.storage import AstraDBByteStore
 store = AstraDBByteStore(
    collection_name="my_kv_store",
    api_endpoint="...",
@@ -127,3 +131,57 @@ store = AstraDBByteStore(
 ```

 Learn more in the [example notebook](/docs/integrations/stores/astradb#astradbbytestore).
+
+## Apache Cassandra and Astra DB through CQL
+
+> [Cassandra](https://cassandra.apache.org/) is a NoSQL, row-oriented, highly scalable and highly available database.
+> Starting with version 5.0, the database ships with [vector search capabilities](https://cassandra.apache.org/doc/trunk/cassandra/vector-search/overview.html).
+> DataStax [Astra DB through CQL](https://docs.datastax.com/en/astra-serverless/docs/vector-search/quickstart.html) is a managed serverless database built on Cassandra, offering the same interface and strengths.
+
+These databases use the CQL protocol (Cassandra Query Language).
+Hence, a different set of connectors, outlined below, shall be used.
+
+### Vector Store
+
+```python
+from langchain_community.vectorstores import Cassandra
+vector_store = Cassandra(
+    embedding=my_embedding,
+    table_name="my_store",
+)
+```
+
+Learn more in the [example notebook](/docs/integrations/vectorstores/astradb#apache-cassandra-and-astra-db-through-cql) (scroll down to the CQL-specific section).
+
+
+### Memory
+
+```python
+from langchain.memory import CassandraChatMessageHistory
+message_history = CassandraChatMessageHistory(session_id="my-session")
+```
+
+Learn more in the [example notebook](/docs/integrations/memory/cassandra_chat_message_history).
+
+
+### LLM Cache
+
+```python
+from langchain.cache import CassandraCache
+langchain.llm_cache = CassandraCache()
+```
+
+Learn more in the [example notebook](/docs/integrations/llms/llm_caching#cassandra-caches) (scroll to the Cassandra section).
+
+
+### Semantic LLM Cache
+
+```python
+from langchain.cache import CassandraSemanticCache
+cassSemanticCache = CassandraSemanticCache(
+    embedding=my_embedding,
+    table_name="my_store",
+)
+```
+
+Learn more in the [example notebook](/docs/integrations/llms/llm_caching#cassandra-caches) (scroll to the appropriate section).
--- a/docs/docs/integrations/providers/cassandra.mdx
+++ b/docs/docs/integrations/providers/cassandra.mdx
@@ -1,76 +0,0 @@
-# Apache Cassandra
-
-> [Apache Cassandra®](https://cassandra.apache.org/) is a NoSQL, row-oriented, highly scalable and highly available database.
-> Starting with version 5.0, the database ships with [vector search capabilities](https://cassandra.apache.org/doc/trunk/cassandra/vector-search/overview.html).
-
-The integrations outlined in this page can be used with Cassandra as well as other CQL-compatible databases, i.e. those using the Cassandra Query Language protocol.
-
-
-### Setup
-
-Install the following Python package:
-
-```bash
-pip install "cassio>=0.1.4"
-```
-
-
-## Vector Store
-
-```python
-from langchain_community.vectorstores import Cassandra
-vector_store = Cassandra(
-    embedding=my_embedding,
-    table_name="my_store",
-)
-```
-
-Learn more in the [example notebook](/docs/integrations/vectorstores/cassandra).
-
-## Chat message history
-
-```python
-from langchain_community.chat_message_histories import CassandraChatMessageHistory
-message_history = CassandraChatMessageHistory(session_id="my-session")
-```
-
-Learn more in the [example notebook](/docs/integrations/memory/cassandra_chat_message_history).
-
-
-## LLM Cache
-
-```python
-from langchain.globals import set_llm_cache
-from langchain_community.cache import CassandraCache
-set_llm_cache(CassandraCache())
-```
-
-Learn more in the [example notebook](/docs/integrations/llms/llm_caching#cassandra-caches) (scroll to the Cassandra section).
-
-
-## Semantic LLM Cache
-
-```python
-from langchain.globals import set_llm_cache
-from langchain_community.cache import CassandraSemanticCache
-set_llm_cache(CassandraSemanticCache(
-    embedding=my_embedding,
-    table_name="my_store",
-))
-```
-
-Learn more in the [example notebook](/docs/integrations/llms/llm_caching#cassandra-caches) (scroll to the appropriate section).
-
-## Document loader
-
-```python
-from langchain_community.document_loaders import CassandraLoader
-loader = CassandraLoader(table="my_table")
-docs = loader.load()
-```
-
-Learn more in the [example notebook](/docs/integrations/document_loaders/cassandra).
-
-#### Attribution statement
-
-> Apache Cassandra, Cassandra and Apache are either registered trademarks or trademarks of the [Apache Software Foundation](http://www.apache.org/) in the United States and/or other countries.
--- a/docs/docs/integrations/providers/optimum_intel.mdx
+++ b/docs/docs/integrations/providers/optimum_intel.mdx
@@ -1,26 +0,0 @@
-# Optimum-intel
-
-All functionality related to the [optimum-intel](https://github.com/huggingface/optimum-intel.git) and [IPEX](https://github.com/intel/intel-extension-for-pytorch).
-
-## Installation
-
-Install using optimum-intel and ipex using:
-
-```bash
-pip install optimum[neural-compressor]
-pip install intel_extension_for_pytorch
-```
-
-Please follow the installation instructions as specified below:
-
-* Install optimum-intel as shown [here](https://github.com/huggingface/optimum-intel).
-* Install IPEX as shown [here](https://intel.github.io/intel-extension-for-pytorch/index.html#installation?platform=cpu&version=v2.2.0%2Bcpu).
-
-## Embedding Models
-
-See a [usage example](/docs/integrations/text_embedding/optimum_intel).
-We also offer a full tutorial notebook "rag_with_quantized_embeddings.ipynb" for using the embedder in a RAG pipeline in the cookbook dir.
-
-```python
-from langchain_community.embeddings import QuantizedBiEncoderEmbeddings
-```
--- a/docs/docs/integrations/providers/sparkllm.mdx
+++ b/docs/docs/integrations/providers/sparkllm.mdx
@@ -1,14 +0,0 @@
-# SparkLLM
-
->[SparkLLM](https://xinghuo.xfyun.cn/spark) is a large-scale cognitive model independently developed by iFLYTEK.
-It has cross-domain knowledge and language understanding ability by learning a large amount of texts, codes and images.
-It can understand and perform tasks based on natural dialogue.
-
-## SparkLLM LLM Model
-An example is available at [example](/docs/integrations/llm/sparkllm).
-
-## SparkLLM Chat Model
-An example is available at [example](/docs/integrations/chat/sparkllm).
-
-## SparkLLM Text Embedding Model
-An example is available at [example](/docs/integrations/text_embedding/sparkllm)
--- a/docs/docs/integrations/text_embedding/ai21.ipynb
+++ b/docs/docs/integrations/text_embedding/ai21.ipynb
@@ -1,138 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "raw",
-   "id": "c2923bd1",
-   "metadata": {},
-   "source": [
-    "---\n",
-    "sidebar_label: AI21 Labs\n",
-    "---"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "cc3c6ef6bbd57ce9",
-   "metadata": {
-    "collapsed": false
-   },
-   "source": [
-    "# AI21Embeddings\n",
-    "\n",
-    "This notebook covers how to get started with AI21 embedding models.\n",
-    "\n",
-    "## Installation"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "4c3bef91",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "!pip install -qU langchain-ai21"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "2b4f3e15",
-   "metadata": {},
-   "source": [
-    "## Environment Setup\n",
-    "\n",
-    "We'll need to get a [AI21 API key](https://docs.ai21.com/) and set the `AI21_API_KEY` environment variable:\n"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "62e0dbc3",
-   "metadata": {
-    "tags": []
-   },
-   "outputs": [],
-   "source": [
-    "import os\n",
-    "from getpass import getpass\n",
-    "\n",
-    "os.environ[\"AI21_API_KEY\"] = getpass()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "74ef9d8b40a1319e",
-   "metadata": {
-    "collapsed": false
-   },
-   "source": [
-    "## Usage"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 5,
-   "id": "12fcfb4b",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "from langchain_ai21 import AI21Embeddings\n",
-    "\n",
-    "embeddings = AI21Embeddings()"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "1f2e6104",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "embeddings.embed_query(\"My query to look up\")"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "a3465d7e63bfb3d1",
-   "metadata": {
-    "collapsed": false
-   },
-   "outputs": [],
-   "source": [
-    "embeddings.embed_documents(\n",
-    "    [\"This is a content of the document\", \"This is another document\"]\n",
-    ")"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "9d60af6d",
-   "metadata": {},
-   "outputs": [],
-   "source": []
-  }
- ],
- "metadata": {
-  "kernelspec": {
-   "display_name": "Python 3 (ipykernel)",
-   "language": "python",
-   "name": "python3"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.11.4"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 5
-}
--- a/docs/docs/integrations/text_embedding/nemo.ipynb
+++ b/docs/docs/integrations/text_embedding/nemo.ipynb
@@ -1,121 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "id": "abede47c-6a58-40c3-b7ef-10966a4fc085",
-   "metadata": {},
-   "source": [
-    "# NVIDIA NeMo embeddings"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "38f3d4ce-b36a-48c6-88b0-5970c26bb146",
-   "metadata": {},
-   "source": [
-    "Connect to NVIDIA's embedding service using the `NeMoEmbeddings` class.\n",
-    "\n",
-    "The NeMo Retriever Embedding Microservice (NREM) brings the power of state-of-the-art text embedding to your applications, providing unmatched natural language processing and understanding capabilities. Whether you're developing semantic search, Retrieval Augmented Generation (RAG) pipelines—or any application that needs to use text embeddings—NREM has you covered. Built on the NVIDIA software platform incorporating CUDA, TensorRT, and Triton, NREM brings state of the art GPU accelerated Text Embedding model serving.\n",
-    "\n",
-    "NREM uses NVIDIA's TensorRT built on top of the Triton Inference Server for optimized inference of text embedding models."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "f5ab6ea1-d074-4f36-ae45-50312a6a82b9",
-   "metadata": {},
-   "source": [
-    "## Imports"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 9,
-   "id": "32deab16-530d-455c-b40c-914db048cb05",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "from langchain_community.embeddings import NeMoEmbeddings"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "de40023c-3391-474d-96cf-fbfb2311e9d7",
-   "metadata": {},
-   "source": [
-    "## Setup"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 10,
-   "id": "37177018-47f4-48be-8575-83ce5c9a5447",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "batch_size = 16\n",
-    "model = \"NV-Embed-QA-003\"\n",
-    "api_endpoint_url = \"http://localhost:8080/v1/embeddings\""
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 13,
-   "id": "08161ed2-8ba3-4226-a387-15c348f8c343",
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Checking if endpoint is live: http://localhost:8080/v1/embeddings\n"
-     ]
-    }
-   ],
-   "source": [
-    "embedding_model = NeMoEmbeddings(\n",
-    "    batch_size=batch_size, model=model, api_endpoint_url=api_endpoint_url\n",
-    ")"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "c69070c3-fe2d-4ff7-be4a-73304e2c4f3e",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "embedding_model.embed_query(\"This is a test.\")"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "5d1d8852-5298-40b5-89c4-5a91ccfc95e5",
-   "metadata": {},
-   "outputs": [],
-   "source": []
-  }
- ],
- "metadata": {
-  "kernelspec": {
-   "display_name": "Python 3 (ipykernel)",
-   "language": "python",
-   "name": "python3"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.9.1"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 5
-}
--- a/docs/docs/integrations/text_embedding/optimum_intel.ipynb
+++ b/docs/docs/integrations/text_embedding/optimum_intel.ipynb
@@ -1,201 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "id": "ae6f9d9d-fe44-489c-9661-dac69683dcd2",
-   "metadata": {},
-   "source": [
-    "# Embedding Documents using Optimized and Quantized Embedders\n",
-    "\n",
-    "Embedding all documents using Quantized Embedders.\n",
-    "\n",
-    "The embedders are based on optimized models, created by using [optimum-intel](https://github.com/huggingface/optimum-intel.git) and [IPEX](https://github.com/intel/intel-extension-for-pytorch).\n",
-    "\n",
-    "Example text is based on [SBERT](https://www.sbert.net/docs/pretrained_cross-encoders.html)."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 2,
-   "id": "b9d1a3bb-83b1-4029-ad8d-411db1fba034",
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "loading configuration file inc_config.json from cache at \n",
-      "INCConfig {\n",
-      "  \"distillation\": {},\n",
-      "  \"neural_compressor_version\": \"2.4.1\",\n",
-      "  \"optimum_version\": \"1.16.2\",\n",
-      "  \"pruning\": {},\n",
-      "  \"quantization\": {\n",
-      "    \"dataset_num_samples\": 50,\n",
-      "    \"is_static\": true\n",
-      "  },\n",
-      "  \"save_onnx_model\": false,\n",
-      "  \"torch_version\": \"2.2.0\",\n",
-      "  \"transformers_version\": \"4.37.2\"\n",
-      "}\n",
-      "\n",
-      "Using `INCModel` to load a TorchScript model will be deprecated in v1.15.0, to load your model please use `IPEXModel` instead.\n"
-     ]
-    }
-   ],
-   "source": [
-    "from langchain_community.embeddings import QuantizedBiEncoderEmbeddings\n",
-    "\n",
-    "model_name = \"Intel/bge-small-en-v1.5-rag-int8-static\"\n",
-    "encode_kwargs = {\"normalize_embeddings\": True}  # set True to compute cosine similarity\n",
-    "\n",
-    "model = QuantizedBiEncoderEmbeddings(\n",
-    "    model_name=model_name,\n",
-    "    encode_kwargs=encode_kwargs,\n",
-    "    query_instruction=\"Represent this sentence for searching relevant passages: \",\n",
-    ")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "34318164-7a6f-47b6-8690-3b1d71e1fcfc",
-   "metadata": {},
-   "source": [
-    "Lets ask a question, and compare to 2 documents. The first contains the answer to the question, and the second one does not. \n",
-    "\n",
-    "We can check better suits our query."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 5,
-   "id": "55ff07ca-fb44-4dcf-b2d3-dde021a53983",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "question = \"How many people live in Berlin?\""
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 6,
-   "id": "aebef832-5534-440c-a4a8-4bf56ccd8ad4",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "documents = [\n",
-    "    \"Berlin had a population of 3,520,031 registered inhabitants in an area of 891.82 square kilometers.\",\n",
-    "    \"Berlin is well known for its museums.\",\n",
-    "]"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 7,
-   "id": "4eec7eda-0d9b-4488-a0e8-3eedd28ab0b1",
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "Batches: 100%|██████████| 1/1 [00:00<00:00,  4.18it/s]\n"
-     ]
-    }
-   ],
-   "source": [
-    "doc_vecs = model.embed_documents(documents)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 8,
-   "id": "8e6dac72-5a0b-4421-9454-aa0a49b20c66",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "query_vec = model.embed_query(question)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 10,
-   "id": "ec26eb7a-a259-4bb9-b9d8-9ff345a8c798",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "import torch"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 11,
-   "id": "9ca1ee83-2a6a-4f65-bc2f-3942a0c068c6",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "doc_vecs_torch = torch.tensor(doc_vecs)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 12,
-   "id": "4f6a1986-339e-443a-a2f6-ae3f3ad4266c",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "query_vec_torch = torch.tensor(query_vec)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 15,
-   "id": "2b49446e-1336-46b3-b9ef-af56b4870876",
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "tensor([0.7980, 0.6529])"
-      ]
-     },
-     "execution_count": 15,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "query_vec_torch @ doc_vecs_torch.T"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "6cc1ac2a-9641-408e-a373-736d121fc3c7",
-   "metadata": {},
-   "source": [
-    "We can see that indeed the first one ranks higher."
-   ]
-  }
- ],
- "metadata": {
-  "kernelspec": {
-   "display_name": "Python 3 (ipykernel)",
-   "language": "python",
-   "name": "python3"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.9.18"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 5
-}
--- a/docs/docs/integrations/text_embedding/sparkllm.ipynb
+++ b/docs/docs/integrations/text_embedding/sparkllm.ipynb
@@ -1,90 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# SparkLLM Text Embeddings"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Official Website: https://www.xfyun.cn/doc/spark/Embedding_new_api.html\n",
-    "\n",
-    "An API key is required to use this embedding model. You can get one by registering at https://platform.SparkLLM-ai.com/docs/text-Embedding."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "SparkLLMTextEmbeddings support 2K token window and preduces vectors with 2560 dimensions."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "from langchain_community.embeddings import SparkLLMTextEmbeddings\n",
-    "\n",
-    "embeddings = SparkLLMTextEmbeddings(\n",
-    "    spark_app_id=\"sk-*\", spark_api_key=\"\", spark_api_secret=\"\"\n",
-    ")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Alternatively, you can set API key this way:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "import os\n",
-    "\n",
-    "os.environ[\"SPARK_APP_ID\"] = \"YOUR_APP_ID\"\n",
-    "os.environ[\"SPARK_API_KEY\"] = \"YOUR_API_KEY\"\n",
-    "os.environ[\"SPARK_API_SECRET\"] = \"YOUR_API_SECRET\""
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "text_1 = \"iFLYTEK is a well-known intelligent speech and artificial intelligence publicly listed company in the Asia-Pacific Region. Since its establishment, the company is devoted to cornerstone technological research in speech and languages, natural language understanding, machine learning, machine reasoning, adaptive learning, and has maintained the world-leading position in those domains. The company actively promotes the development of A.I. products and their sector-based applications, with visions of enabling machines to listen and speak, understand and think, creating a better world with artificial intelligence.\"\n",
-    "text_2 = \"iFLYTEK Open Platform was launched in 2010 by iFLYTEK as China’s first Artificial Intelligence open platform for Mobile Internet and intelligent hardware developers.\"\n",
-    "\n",
-    "query_result = embeddings.embed_query(text_2)\n",
-    "query_result"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "doc_result = embeddings.embed_documents([text_1, text_2])\n",
-    "doc_result"
-   ]
-  }
- ],
- "metadata": {
-  "language_info": {
-   "name": "python"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 2
-}
--- a/docs/docs/integrations/toolkits/cogniswitch.ipynb
+++ b/docs/docs/integrations/toolkits/cogniswitch.ipynb
@@ -1,326 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "id": "19062701",
-   "metadata": {},
-   "source": [
-    "## Cogniswitch Tools\n",
-    "\n",
-    "**Use CogniSwitch to build production ready applications that can consume, organize and retrieve knowledge flawlessly. Using the framework of your choice, in this case Langchain CogniSwitch helps alleviate the stress of decision making when it comes to, choosing the right storage and retrieval formats. It also eradicates reliability issues and hallucinations when it comes to responses that are generated. Get started by interacting with your knowledge in just two simple steps.**\n",
-   "\n",
-    "visit [https://www.cogniswitch.ai/developer to register](https://www.cogniswitch.ai/developer?utm_source=langchain&utm_medium=langchainbuild&utm_id=dev).\n\n",
-    "**Registration:** \n\n",
-    "- Signup with your email and verify your registration \n\n",
-    "- You will get a mail with a platform token and oauth token for using the services.\n\n\n",
-    "\n",
-    "**step 1: Instantiate the toolkit and get the tools:**\n\n",
-    "- Instantiate the cogniswitch toolkit with the cogniswitch token, openAI API key and OAuth token and get the tools. \n",
-    "\n",
-    "**step 2: Instantiate the agent with the tools and llm:**\n",
-    "- Instantiate the agent with the list of cogniswitch tools and the llm, into the agent executor.\n",
-    "\n",
-    "**step 3: CogniSwitch Store Tool:** \n",
-    "\n",
-    "***CogniSwitch knowledge source file tool***\n",
-    "- Use the agent to upload a file by giving the file path.(formats that are currently supported are .pdf, .docx, .doc, .txt, .html)  \n",
-    "- The content from the file will be processed by the cogniswitch and stored in your knowledge store. \n",
-    "\n",
-    "***CogniSwitch knowledge source url tool***\n",
-    "- Use the agent to upload a URL.  \n",
-    "- The content from the url will be processed by the cogniswitch and stored in your knowledge store. \n",
-    "\n",
-    "**step 4: CogniSwitch Status Tool:**\n",
-    "- Use the agent to know the status of the document uploaded with a document name.\n",
-    "- You can also check the status of document processing in cogniswitch console. \n",
-    "\n",
-    "**step 5: CogniSwitch Answer Tool:**\n",
-    "- Use the agent to ask your question.\n",
-    "- You will get the answer from your knowledge as the response. \n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "1435b193",
-   "metadata": {},
-   "source": [
-    "### Import necessary libraries"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 1,
-   "id": "8d86323b",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "import warnings\n",
-    "\n",
-    "warnings.filterwarnings(\"ignore\")\n",
-    "\n",
-    "import os\n",
-    "\n",
-    "from langchain.agents.agent_toolkits import create_conversational_retrieval_agent\n",
-    "from langchain.chat_models import ChatOpenAI\n",
-    "from langchain_community.agent_toolkits import CogniswitchToolkit"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "6e6acf0e",
-   "metadata": {},
-   "source": [
-    "### Cogniswitch platform token, OAuth token and OpenAI API key"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 2,
-   "id": "3d2dfc9f",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "cs_token = \"Your CogniSwitch token\"\n",
-    "OAI_token = \"Your OpenAI API token\"\n",
-    "oauth_token = \"Your CogniSwitch authentication token\"\n",
-    "\n",
-    "os.environ[\"OPENAI_API_KEY\"] = OAI_token"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "320e02fc",
-   "metadata": {},
-   "source": [
-    "### Instantiate the cogniswitch toolkit with the credentials"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 3,
-   "id": "89f58167",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "cogniswitch_toolkit = CogniswitchToolkit(\n",
-    "    cs_token=cs_token, OAI_token=OAI_token, apiKey=oauth_token\n",
-    ")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "16901682",
-   "metadata": {},
-   "source": [
-    "### Get the list of cogniswitch tools"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 4,
-   "id": "288d07f6",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "tool_lst = cogniswitch_toolkit.get_tools()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "4aae43a3",
-   "metadata": {},
-   "source": [
-    "### Instantiate the llm"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 6,
-   "id": "4d67e5bb",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "llm = ChatOpenAI(\n",
-    "    temperature=0,\n",
-    "    openai_api_key=OAI_token,\n",
-    "    max_tokens=1500,\n",
-    "    model_name=\"gpt-3.5-turbo-0613\",\n",
-    ")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "04179282",
-   "metadata": {},
-   "source": [
-    "### Create a agent executor"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 7,
-   "id": "2153e758",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "agent_executor = create_conversational_retrieval_agent(llm, tool_lst, verbose=False)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "42c9890e",
-   "metadata": {},
-   "source": [
-    "### Invoke the agent to upload a URL"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 9,
-   "id": "794b4fba",
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "The URL https://cogniswitch.ai/developer has been uploaded successfully. The status of the document is currently being processed. You will receive an email notification once the processing is complete.\n"
-     ]
-    }
-   ],
-   "source": [
-    "response = agent_executor.invoke(\"upload this url https://cogniswitch.ai/developer\")\n",
-    "\n",
-    "print(response[\"output\"])"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "544fe8f9",
-   "metadata": {},
-   "source": [
-    "### Invoke the agent to upload a File"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 10,
-   "id": "fd0addfc",
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "The file example_file.txt has been uploaded successfully. The status of the document is currently being processed. You will receive an email notification once the processing is complete.\n"
-     ]
-    }
-   ],
-   "source": [
-    "response = agent_executor.invoke(\"upload this file example_file.txt\")\n",
-    "\n",
-    "print(response[\"output\"])"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "02827e1b",
-   "metadata": {},
-   "source": [
-    "### Invoke the agent to get the status of a document"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 11,
-   "id": "f424e6c5",
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "The status of the document example_file.txt is as follows:\n",
-      "\n",
-      "- Created On: 2024-01-22T19:07:42.000+00:00\n",
-      "- Modified On: 2024-01-22T19:07:42.000+00:00\n",
-      "- Document Entry ID: 153\n",
-      "- Status: 0 (Processing)\n",
-      "- Original File Name: example_file.txt\n",
-      "- Saved File Name: 1705950460069example_file29393011.txt\n",
-      "\n",
-      "The document is currently being processed.\n"
-     ]
-    }
-   ],
-   "source": [
-    "response = agent_executor.invoke(\"Tell me the status of this document example_file.txt\")\n",
-    "\n",
-    "print(response[\"output\"])"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "0ba9aca9",
-   "metadata": {},
-   "source": [
-    "### Invoke the agent with query and get the answer"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 12,
-   "id": "e73e963f",
-   "metadata": {
-    "scrolled": true
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "CogniSwitch can help develop GenAI applications in several ways:\n",
-      "\n",
-      "1. Knowledge Extraction: CogniSwitch can extract knowledge from various sources such as documents, websites, and databases. It can analyze and store data from these sources, making it easier to access and utilize the information for GenAI applications.\n",
-      "\n",
-      "2. Natural Language Processing: CogniSwitch has advanced natural language processing capabilities. It can understand and interpret human language, allowing GenAI applications to interact with users in a more conversational and intuitive manner.\n",
-      "\n",
-      "3. Sentiment Analysis: CogniSwitch can analyze the sentiment of text data, such as customer reviews or social media posts. This can be useful in developing GenAI applications that can understand and respond to the emotions and opinions of users.\n",
-      "\n",
-      "4. Knowledge Base Integration: CogniSwitch can integrate with existing knowledge bases or create new ones. This allows GenAI applications to access a vast amount of information and provide accurate and relevant responses to user queries.\n",
-      "\n",
-      "5. Document Analysis: CogniSwitch can analyze documents and extract key information such as entities, relationships, and concepts. This can be valuable in developing GenAI applications that can understand and process large amounts of textual data.\n",
-      "\n",
-      "Overall, CogniSwitch provides a range of AI-powered capabilities that can enhance the development of GenAI applications by enabling knowledge extraction, natural language processing, sentiment analysis, knowledge base integration, and document analysis.\n"
-     ]
-    }
-   ],
-   "source": [
-    "response = agent_executor.invoke(\"How can cogniswitch help develop GenAI applications?\")\n",
-    "\n",
-    "print(response[\"output\"])"
-   ]
-  }
- ],
- "metadata": {
-  "kernelspec": {
-   "display_name": "langchain_repo",
-   "language": "python",
-   "name": "langchain_repo"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.9.17"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 5
-}
--- a/docs/docs/integrations/toolkits/sql_database.ipynb
+++ b/docs/docs/integrations/toolkits/sql_database.ipynb
@@ -58,7 +58,7 @@
   "metadata": {},
   "outputs": [],
   "source": [
-    "from langchain_community.utilities.sql_database import SQLDatabase\n",
+    "from langchain.sql_database import SQLDatabase\n",
    "\n",
    "db = SQLDatabase.from_uri(\"sqlite:///Chinook.db\")"
   ]
--- a/docs/docs/integrations/tools/polygon.ipynb
+++ b/docs/docs/integrations/tools/polygon.ipynb
--- a/docs/docs/integrations/vectorstores/apache_doris.ipynb
+++ b/docs/docs/integrations/vectorstores/apache_doris.ipynb
@@ -1,322 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "id": "84180ad0-66cd-43e5-b0b8-2067a29e16ba",
-   "metadata": {
-    "collapsed": false
-   },
-   "source": [
-    "# Apache Doris\n",
-    "\n",
-    ">[Apache Doris](https://doris.apache.org/) is a modern data warehouse for real-time analytics.\n",
-    "It delivers lightning-fast analytics on real-time data at scale.\n",
-    "\n",
-    ">Usually `Apache Doris` is categorized into OLAP, and it has showed excellent performance in [ClickBench — a Benchmark For Analytical DBMS](https://benchmark.clickhouse.com/). Since it has a super-fast vectorized execution engine, it could also be used as a fast vectordb.\n",
-    "\n",
-    "Here we'll show how to use the Apache Doris Vector Store."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "1685854f",
-   "metadata": {},
-   "source": [
-    "## Setup"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "311d44bb-4aca-4f3b-8f97-5e1f29238e40",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "%pip install --upgrade --quiet  pymysql"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "2c891bba",
-   "metadata": {},
-   "source": [
-    "Set `update_vectordb = False` at the beginning. If there is no docs updated, then we don't need to rebuild the embeddings of docs"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "f4e6ca20-79dd-482a-8f68-af9d7dd59c7c",
-   "metadata": {
-    "collapsed": false
-   },
-   "outputs": [],
-   "source": [
-    "!pip install  sqlalchemy\n",
-    "!pip install langchain"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 1,
-   "id": "96f7c7a2-4811-4fdf-87f5-c60772f51fe1",
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2024-02-14T12:54:01.392500Z",
-     "start_time": "2024-02-14T12:53:58.866615Z"
-    },
-    "collapsed": false
-   },
-   "outputs": [],
-   "source": [
-    "from langchain.chains import RetrievalQA\n",
-    "from langchain.text_splitter import TokenTextSplitter\n",
-    "from langchain_community.document_loaders import (\n",
-    "    DirectoryLoader,\n",
-    "    UnstructuredMarkdownLoader,\n",
-    ")\n",
-    "from langchain_community.vectorstores.apache_doris import (\n",
-    "    ApacheDoris,\n",
-    "    ApacheDorisSettings,\n",
-    ")\n",
-    "from langchain_openai import OpenAI, OpenAIEmbeddings\n",
-    "\n",
-    "update_vectordb = False"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "ee821c00",
-   "metadata": {},
-   "source": [
-    "## Load docs and split them into tokens"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "34ba0cfd",
-   "metadata": {},
-   "source": [
-    "Load all markdown files under the `docs` directory\n",
-    "\n",
-    "for Apache Doris documents, you can clone repo from https://github.com/apache/doris, and there is `docs` directory in it."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 2,
-   "id": "799edf20-bcf4-4a65-bff7-b907f6bdba20",
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2024-02-14T12:55:24.128917Z",
-     "start_time": "2024-02-14T12:55:19.463831Z"
-    },
-    "collapsed": false
-   },
-   "outputs": [],
-   "source": [
-    "loader = DirectoryLoader(\n",
-    "    \"./docs\", glob=\"**/*.md\", loader_cls=UnstructuredMarkdownLoader\n",
-    ")\n",
-    "documents = loader.load()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "b415fe2a",
-   "metadata": {},
-   "source": [
-    "Split docs into tokens, and set `update_vectordb = True` because there are new docs/tokens."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 3,
-   "id": "0dc5ba83-62ef-4f61-a443-e872f251e7da",
-   "metadata": {
-    "collapsed": false
-   },
-   "outputs": [],
-   "source": [
-    "# load text splitter and split docs into snippets of text\n",
-    "text_splitter = TokenTextSplitter(chunk_size=400, chunk_overlap=50)\n",
-    "split_docs = text_splitter.split_documents(documents)\n",
-    "\n",
-    "# tell vectordb to update text embeddings\n",
-    "update_vectordb = True"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "46966e25-9449-4a36-87d1-c0b25dce2994",
-   "metadata": {
-    "collapsed": false
-   },
-   "source": [
-    "split_docs[-20]"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "99422e95-b407-43eb-aa68-9a62363fc82f",
-   "metadata": {
-    "collapsed": false
-   },
-   "source": [
-    "print(\"# docs  = %d, # splits = %d\" % (len(documents), len(split_docs)))"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "e780d77f-3f96-4690-a10f-f87566f7ccc6",
-   "metadata": {
-    "collapsed": false
-   },
-   "source": [
-    "## Create vectordb instance"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "15702d9c",
-   "metadata": {},
-   "source": [
-    "### Use Apache Doris as vectordb"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 7,
-   "id": "ced7dbe1",
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2024-02-14T12:55:39.508287Z",
-     "start_time": "2024-02-14T12:55:39.500370Z"
-    }
-   },
-   "outputs": [],
-   "source": [
-    "def gen_apache_doris(update_vectordb, embeddings, settings):\n",
-    "    if update_vectordb:\n",
-    "        docsearch = ApacheDoris.from_documents(split_docs, embeddings, config=settings)\n",
-    "    else:\n",
-    "        docsearch = ApacheDoris(embeddings, settings)\n",
-    "    return docsearch"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "15d86fda",
-   "metadata": {},
-   "source": [
-    "## Convert tokens into embeddings and put them into vectordb"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "ff1322ea",
-   "metadata": {},
-   "source": [
-    "Here we use Apache Doris as vectordb, you can configure Apache Doris instance via `ApacheDorisSettings`.\n",
-    "\n",
-    "Configuring Apache Doris instance is pretty much like configuring mysql instance. You need to specify:\n",
-    "1. host/port\n",
-    "2. username(default: 'root')\n",
-    "3. password(default: '')\n",
-    "4. database(default: 'default')\n",
-    "5. table(default: 'langchain')"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 8,
-   "id": "b34f8c31-c173-4902-8168-2e838ddfb9e9",
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2024-02-14T12:56:02.671291Z",
-     "start_time": "2024-02-14T12:55:48.350294Z"
-    },
-    "collapsed": false
-   },
-   "outputs": [],
-   "source": [
-    "import os\n",
-    "from getpass import getpass\n",
-    "\n",
-    "os.environ[\"OPENAI_API_KEY\"] = getpass()"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "c53ab3f2-9e34-4424-8b07-6292bde67e14",
-   "metadata": {
-    "collapsed": false
-   },
-   "outputs": [],
-   "source": [
-    "update_vectordb = True\n",
-    "\n",
-    "embeddings = OpenAIEmbeddings()\n",
-    "\n",
-    "# configure Apache Doris settings(host/port/user/pw/db)\n",
-    "settings = ApacheDorisSettings()\n",
-    "settings.port = 9030\n",
-    "settings.host = \"172.30.34.130\"\n",
-    "settings.username = \"root\"\n",
-    "settings.password = \"\"\n",
-    "settings.database = \"langchain\"\n",
-    "docsearch = gen_apache_doris(update_vectordb, embeddings, settings)\n",
-    "\n",
-    "print(docsearch)\n",
-    "\n",
-    "update_vectordb = False"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "bde66626",
-   "metadata": {},
-   "source": [
-    "## Build QA and ask question to it"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "84921814",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "llm = OpenAI()\n",
-    "qa = RetrievalQA.from_chain_type(\n",
-    "    llm=llm, chain_type=\"stuff\", retriever=docsearch.as_retriever()\n",
-    ")\n",
-    "query = \"what is apache doris\"\n",
-    "resp = qa.run(query)\n",
-    "print(resp)"
-   ]
-  }
- ],
- "metadata": {
-  "kernelspec": {
-   "display_name": "Python 3 (ipykernel)",
-   "language": "python",
-   "name": "python3"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.10.6"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 5
-}
--- a/docs/docs/integrations/vectorstores/astradb.ipynb
+++ b/docs/docs/integrations/vectorstores/astradb.ipynb
@@ -1,28 +1,14 @@
 {
 "cells": [
-  {
-   "cell_type": "markdown",
-   "id": "66d0270a-b74f-4110-901e-7960b00297af",
-   "metadata": {},
-   "source": [
-    "# Astra DB\n",
-    "\n",
-    "This page provides a quickstart for using [Astra DB](https://docs.datastax.com/en/astra/home/astra.html) as a Vector Store."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "ab8cd64f-3bb2-4f16-a0a9-12d7b1789bf6",
-   "metadata": {},
-   "source": [
-    "> DataStax [Astra DB](https://docs.datastax.com/en/astra/home/astra.html) is a serverless vector-capable database built on Apache Cassandra® and made conveniently available through an easy-to-use JSON API."
-   ]
-  },
  {
   "cell_type": "markdown",
   "id": "d2d6ca14-fb7e-4172-9aa0-a3119a064b96",
   "metadata": {},
   "source": [
+    "# Astra DB\n",
+    "\n",
+    "This page provides a quickstart for using [Astra DB](https://docs.datastax.com/en/astra/home/astra.html) and [Apache Cassandra®](https://cassandra.apache.org/) as a Vector Store.\n",
+    "\n",
    "_Note: in addition to access to the database, an OpenAI API Key is required to run the full example._"
   ]
  },
@@ -31,7 +17,7 @@
   "id": "bb9be7ce-8c70-4d46-9f11-71c42a36e928",
   "metadata": {},
   "source": [
-    "## Setup and general dependencies"
+    "### Setup and general dependencies"
   ]
  },
  {
@@ -39,7 +25,7 @@
   "id": "dbe7c156-0413-47e3-9237-4769c4248869",
   "metadata": {},
   "source": [
-    "Use of the integration requires the corresponding Python package:"
+    "Use of the integration requires the following Python package."
   ]
  },
  {
@@ -49,7 +35,7 @@
   "metadata": {},
   "outputs": [],
   "source": [
-    "pip install --upgrade langchain-astradb"
+    "%pip install --upgrade --quiet  \"astrapy>=0.5.3\""
   ]
  },
  {
@@ -57,25 +43,8 @@
   "id": "2453d83a-bc8f-41e1-a692-befe4dd90156",
   "metadata": {},
   "source": [
-    "_**Note.** the following are all packages required to run the full demo on this page. Depending on your LangChain setup, some of them may need to be installed:_"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "56c1f86e-5921-4976-ac8f-1d62e5a512b0",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "pip install langchain langchain-openai datasets pypdf"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "c2910035-e61f-48d9-a110-d68c401b62aa",
-   "metadata": {},
-   "source": [
-    "### Import dependencies"
+    "_Note: depending on your LangChain setup, you may need to install/upgrade other dependencies needed for this demo_\n",
+    "_(specifically, recent versions of `datasets`, `openai`, `pypdf` and `tiktoken` are required)._"
   ]
  },
  {
@@ -120,12 +89,28 @@
    "embe = OpenAIEmbeddings()"
   ]
  },
+  {
+   "cell_type": "markdown",
+   "id": "dd8caa76-bc41-429e-a93b-989ba13aff01",
+   "metadata": {},
+   "source": [
+    "_Keep reading to connect with Astra DB. For usage with Apache Cassandra and Astra DB through CQL, scroll to the section below._"
+   ]
+  },
  {
   "cell_type": "markdown",
   "id": "22866f09-e10d-4f05-a24b-b9420129462e",
   "metadata": {},
   "source": [
-    "## Import the Vector Store"
+    "## Astra DB"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5fba47cc-3533-42fc-84b7-9dc14cd68b2b",
+   "metadata": {},
+   "source": [
+    "DataStax [Astra DB](https://docs.datastax.com/en/astra/home/astra.html) is a serverless vector-capable database built on Cassandra and made conveniently available through an easy-to-use JSON API."
   ]
  },
  {
@@ -135,7 +120,7 @@
   "metadata": {},
   "outputs": [],
   "source": [
-    "from langchain_astradb import AstraDBVectorStore"
+    "from langchain_community.vectorstores import AstraDB"
   ]
  },
  {
@@ -143,13 +128,10 @@
   "id": "68f61b01-3e09-47c1-9d67-5d6915c86626",
   "metadata": {},
   "source": [
-    "## Connection parameters\n",
-    "\n",
-    "These are found on your Astra DB dashboard:\n",
+    "### Astra DB connection parameters\n",
    "\n",
    "- the API Endpoint looks like `https://01234567-89ab-cdef-0123-456789abcdef-us-east1.apps.astra.datastax.com`\n",
-    "- the Token looks like `AstraCS:6gBhNmsk135....`\n",
-    "- you may optionally provide a _Namespace_ such as `my_namespace`"
+    "- the Token looks like `AstraCS:6gBhNmsk135....`"
   ]
  },
  {
@@ -160,21 +142,7 @@
   "outputs": [],
   "source": [
    "ASTRA_DB_API_ENDPOINT = input(\"ASTRA_DB_API_ENDPOINT = \")\n",
-    "ASTRA_DB_APPLICATION_TOKEN = getpass(\"ASTRA_DB_APPLICATION_TOKEN = \")\n",
-    "\n",
-    "desired_namespace = input(\"(optional) Namespace = \")\n",
-    "if desired_namespace:\n",
-    "    ASTRA_DB_KEYSPACE = desired_namespace\n",
-    "else:\n",
-    "    ASTRA_DB_KEYSPACE = None"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "196268bd-a950-41c3-bede-f5b55f6a0804",
-   "metadata": {},
-   "source": [
-    "Now you can create the vector store:"
+    "ASTRA_DB_APPLICATION_TOKEN = getpass(\"ASTRA_DB_APPLICATION_TOKEN = \")"
   ]
  },
  {
@@ -184,12 +152,11 @@
   "metadata": {},
   "outputs": [],
   "source": [
-    "vstore = AstraDBVectorStore(\n",
+    "vstore = AstraDB(\n",
    "    embedding=embe,\n",
    "    collection_name=\"astra_vector_demo\",\n",
    "    api_endpoint=ASTRA_DB_API_ENDPOINT,\n",
    "    token=ASTRA_DB_APPLICATION_TOKEN,\n",
-    "    namespace=ASTRA_DB_KEYSPACE,\n",
    ")"
   ]
  },
@@ -198,7 +165,7 @@
   "id": "9a348678-b2f6-46ca-9a0d-2eb4cc6b66b1",
   "metadata": {},
   "source": [
-    "## Load a dataset"
+    "### Load a dataset"
   ]
  },
  {
@@ -276,7 +243,7 @@
   "id": "c031760a-1fc5-4855-adf2-02ed52fe2181",
   "metadata": {},
   "source": [
-    "## Run searches"
+    "### Run simple searches"
   ]
  },
  {
@@ -351,22 +318,12 @@
    "    print(f\"* {res.page_content} [{res.metadata}]\")"
   ]
  },
-  {
-   "cell_type": "markdown",
-   "id": "60fda5df-14e4-4fb0-bd17-65a393fab8a9",
-   "metadata": {},
-   "source": [
-    "### Async\n",
-    "\n",
-    "Note that the Astra DB vector store supports all fully async methods (`asimilarity_search`, `afrom_texts`, `adelete` and so on) natively, i.e. without thread wrapping involved."
-   ]
-  },
  {
   "cell_type": "markdown",
   "id": "1cc86edd-692b-4495-906c-ccfd13b03c23",
   "metadata": {},
   "source": [
-    "## Deleting stored documents"
+    "### Deleting stored documents"
   ]
  },
  {
@@ -396,7 +353,7 @@
   "id": "847181ba-77d1-4a17-b7f9-9e2c3d8efd13",
   "metadata": {},
   "source": [
-    "## A minimal RAG chain"
+    "### A minimal RAG chain"
   ]
  },
  {
@@ -495,7 +452,7 @@
   "id": "177610c7-50d0-4b7b-8634-b03338054c8e",
   "metadata": {},
   "source": [
-    "## Cleanup"
+    "### Cleanup"
   ]
  },
  {
@@ -517,6 +474,290 @@
   "source": [
    "vstore.delete_collection()"
   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "94ebaab1-7cbf-4144-a147-7b0e32c43069",
+   "metadata": {},
+   "source": [
+    "## Apache Cassandra and Astra DB through CQL"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bc3931b4-211d-4f84-bcc0-51c127e3027c",
+   "metadata": {},
+   "source": [
+    "[Cassandra](https://cassandra.apache.org/) is a NoSQL, row-oriented, highly scalable and highly available database.Starting with version 5.0, the database ships with [vector search capabilities](https://cassandra.apache.org/doc/trunk/cassandra/vector-search/overview.html).\n",
+    "\n",
+    "DataStax [Astra DB through CQL](https://docs.datastax.com/en/astra-serverless/docs/vector-search/quickstart.html) is a managed serverless database built on Cassandra, offering the same interface and strengths."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a0055fbf-448d-4e46-9c40-28d43df25ca3",
+   "metadata": {},
+   "source": [
+    "#### What sets this case apart from \"Astra DB\" above?\n",
+    "\n",
+    "Thanks to LangChain having a standardized `VectorStore` interface, most of the \"Astra DB\" section above applies to this case as well. However, this time the database uses the CQL protocol, which means you'll use a _different_ class this time and instantiate it in another way.\n",
+    "\n",
+    "The cells below show how you should get your `vstore` object in this case and how you can clean up the database resources at the end: for the rest, i.e. the actual usage of the vector store, you will be able to run the very code that was shown above.\n",
+    "\n",
+    "In other words, running this demo in full with Cassandra or Astra DB through CQL means:\n",
+    "\n",
+    "- **initialization as shown below**\n",
+    "- \"Load a dataset\", _see above section_\n",
+    "- \"Run simple searches\", _see above section_\n",
+    "- \"MMR search\", _see above section_\n",
+    "- \"Deleting stored documents\", _see above section_\n",
+    "- \"A minimal RAG chain\", _see above section_\n",
+    "- **cleanup as shown below**"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "23d12be2-745f-4e72-a82c-334a887bc7cd",
+   "metadata": {},
+   "source": [
+    "### Initialization"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e3212542-79be-423e-8e1f-b8d725e3cda8",
+   "metadata": {},
+   "source": [
+    "The class to use is the following:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "941af73e-a090-4fba-b23c-595757d470eb",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from langchain_community.vectorstores import Cassandra"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "414d1e72-f7c9-4b6d-bf6f-16075712c7e3",
+   "metadata": {},
+   "source": [
+    "Now, depending on whether you connect to a Cassandra cluster or to Astra DB through CQL, you will provide different parameters when creating the vector store object."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "48ecca56-71a4-4a91-b198-29384c44ce27",
+   "metadata": {},
+   "source": [
+    "#### Initialization (Cassandra cluster)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "55ebe958-5654-43e0-9aed-d607ffd3fa48",
+   "metadata": {},
+   "source": [
+    "In this case, you first need to create a `cassandra.cluster.Session` object, as described in the [Cassandra driver documentation](https://docs.datastax.com/en/developer/python-driver/latest/api/cassandra/cluster/#module-cassandra.cluster). The details vary (e.g. with network settings and authentication), but this might be something like:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "4642dafb-a065-4063-b58c-3d276f5ad07e",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from cassandra.cluster import Cluster\n",
+    "\n",
+    "cluster = Cluster([\"127.0.0.1\"])\n",
+    "session = cluster.connect()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "624c93bf-fb46-4350-bcfa-09ca09dc068f",
+   "metadata": {},
+   "source": [
+    "You can now set the session, along with your desired keyspace name, as a global CassIO parameter:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "92a4ab28-1c4f-4dad-9671-d47e0b1dde7b",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import cassio\n",
+    "\n",
+    "CASSANDRA_KEYSPACE = input(\"CASSANDRA_KEYSPACE = \")\n",
+    "\n",
+    "cassio.init(session=session, keyspace=CASSANDRA_KEYSPACE)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3b87a824-36f1-45b4-b54c-efec2a2de216",
+   "metadata": {},
+   "source": [
+    "Now you can create the vector store:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "853a2a88-a565-4e24-8789-d78c213954a6",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "vstore = Cassandra(\n",
+    "    embedding=embe,\n",
+    "    table_name=\"cassandra_vector_demo\",\n",
+    "    # session=None, keyspace=None  # Uncomment on older versions of LangChain\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "768ddf7a-0c3e-4134-ad38-25ac53c3da7a",
+   "metadata": {},
+   "source": [
+    "#### Initialization (Astra DB through CQL)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4ed4269a-b7e7-4503-9e66-5a11335c7681",
+   "metadata": {},
+   "source": [
+    "In this case you initialize CassIO with the following connection parameters:\n",
+    "\n",
+    "- the Database ID, e.g. `01234567-89ab-cdef-0123-456789abcdef`\n",
+    "- the Token, e.g. `AstraCS:6gBhNmsk135....` (it must be a \"Database Administrator\" token)\n",
+    "- Optionally a Keyspace name (if omitted, the default one for the database will be used)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "5fa6bd74-d4b2-45c5-9757-96dddc6242fb",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "ASTRA_DB_ID = input(\"ASTRA_DB_ID = \")\n",
+    "ASTRA_DB_APPLICATION_TOKEN = getpass(\"ASTRA_DB_APPLICATION_TOKEN = \")\n",
+    "\n",
+    "desired_keyspace = input(\"ASTRA_DB_KEYSPACE (optional, can be left empty) = \")\n",
+    "if desired_keyspace:\n",
+    "    ASTRA_DB_KEYSPACE = desired_keyspace\n",
+    "else:\n",
+    "    ASTRA_DB_KEYSPACE = None"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "add6e585-17ff-452e-8ef6-7e485ead0b06",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import cassio\n",
+    "\n",
+    "cassio.init(\n",
+    "    database_id=ASTRA_DB_ID,\n",
+    "    token=ASTRA_DB_APPLICATION_TOKEN,\n",
+    "    keyspace=ASTRA_DB_KEYSPACE,\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b305823c-bc98-4f3d-aabb-d7eb663ea421",
+   "metadata": {},
+   "source": [
+    "Now you can create the vector store:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "f45f3038-9d59-41cc-8b43-774c6aa80295",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "vstore = Cassandra(\n",
+    "    embedding=embe,\n",
+    "    table_name=\"cassandra_vector_demo\",\n",
+    "    # session=None, keyspace=None  # Uncomment on older versions of LangChain\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "39284918-cf8a-49bb-a2d3-aef285bb2ffa",
+   "metadata": {},
+   "source": [
+    "### Usage of the vector store"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3cc1aead-d6ec-48a3-affe-1d0cffa955a9",
+   "metadata": {},
+   "source": [
+    "_See the sections \"Load a dataset\" through \"A minimal RAG chain\" above._\n",
+    "\n",
+    "Speaking of the latter, you can check out a full RAG template for Astra DB through CQL [here](https://github.com/langchain-ai/langchain/tree/master/templates/cassandra-entomology-rag)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "096397d8-6622-4685-9f9d-7e238beca467",
+   "metadata": {},
+   "source": [
+    "### Cleanup"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cc1e74f9-5500-41aa-836f-235b1ed5f20c",
+   "metadata": {},
+   "source": [
+    "the following essentially retrieves the `Session` object from CassIO and runs a CQL `DROP TABLE` statement with it:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "b5b82c33-0e77-4a37-852c-8d50edbdd991",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "cassio.config.resolve_session().execute(\n",
+    "    f\"DROP TABLE {cassio.config.resolve_keyspace()}.cassandra_vector_demo;\"\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c10ece4d-ae06-42ab-baf4-4d0ac2051743",
+   "metadata": {},
+   "source": [
+    "### Learn more"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "51ea8b69-7e15-458f-85aa-9fa199f95f9c",
+   "metadata": {},
+   "source": [
+    "For more information, extended quickstarts and additional usage examples, please visit the [CassIO documentation](https://cassio.org/frameworks/langchain/about/) for more on using the LangChain `Cassandra` vector store."
+   ]
  }
 ],
 "metadata": {
@@ -535,7 +776,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.9.18"
+   "version": "3.10.12"
  }
 },
 "nbformat": 4,
--- a/docs/docs/integrations/vectorstores/cassandra.ipynb
+++ b/docs/docs/integrations/vectorstores/cassandra.ipynb
@@ -1,651 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "id": "d2d6ca14-fb7e-4172-9aa0-a3119a064b96",
-   "metadata": {},
-   "source": [
-    "# Apache Cassandra\n",
-    "\n",
-    "This page provides a quickstart for using [Apache Cassandra®](https://cassandra.apache.org/) as a Vector Store."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "6a1a562e-3d1a-4693-b55d-08bf90943a9a",
-   "metadata": {},
-   "source": [
-    "> [Cassandra](https://cassandra.apache.org/) is a NoSQL, row-oriented, highly scalable and highly available database.Starting with version 5.0, the database ships with [vector search capabilities](https://cassandra.apache.org/doc/trunk/cassandra/vector-search/overview.html)."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "9cf37d7f-c18e-4e63-adea-138e5e981475",
-   "metadata": {},
-   "source": [
-    "_Note: in addition to access to the database, an OpenAI API Key is required to run the full example._"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "bb9be7ce-8c70-4d46-9f11-71c42a36e928",
-   "metadata": {},
-   "source": [
-    "### Setup and general dependencies"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "dbe7c156-0413-47e3-9237-4769c4248869",
-   "metadata": {},
-   "source": [
-    "Use of the integration requires the following Python package."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "8d00fcf4-9798-4289-9214-d9734690adfc",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "%pip install --upgrade --quiet \"cassio>=0.1.4\""
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "2453d83a-bc8f-41e1-a692-befe4dd90156",
-   "metadata": {},
-   "source": [
-    "_Note: depending on your LangChain setup, you may need to install/upgrade other dependencies needed for this demo_\n",
-    "_(specifically, recent versions of `datasets`, `openai`, `pypdf` and `tiktoken` are required, along with `langchain-community`)._"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "b06619af-fea2-4863-8149-7f239a8c9c82",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "import os\n",
-    "from getpass import getpass\n",
-    "\n",
-    "from datasets import (\n",
-    "    load_dataset,\n",
-    ")\n",
-    "from langchain.schema import Document\n",
-    "from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
-    "from langchain_community.document_loaders import PyPDFLoader\n",
-    "from langchain_core.output_parsers import StrOutputParser\n",
-    "from langchain_core.prompts import ChatPromptTemplate\n",
-    "from langchain_core.runnables import RunnablePassthrough\n",
-    "from langchain_openai import ChatOpenAI, OpenAIEmbeddings"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "1983f1da-0ae7-4a9b-bf4c-4ade328f7a3a",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "os.environ[\"OPENAI_API_KEY\"] = getpass(\"OPENAI_API_KEY = \")"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "c656df06-e938-4bc5-b570-440b8b7a0189",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "embe = OpenAIEmbeddings()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "22866f09-e10d-4f05-a24b-b9420129462e",
-   "metadata": {},
-   "source": [
-    "## Import the Vector Store"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "0b32730d-176e-414c-9d91-fd3644c54211",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "from langchain_community.vectorstores import Cassandra"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "68f61b01-3e09-47c1-9d67-5d6915c86626",
-   "metadata": {},
-   "source": [
-    "## Connection parameters\n",
-    "\n",
-    "The Vector Store integration shown in this page can be used with Cassandra as well as other derived databases, such as Astra DB, which use the CQL (Cassandra Query Language) protocol.\n",
-    "\n",
-    "> DataStax [Astra DB](https://docs.datastax.com/en/astra-serverless/docs/vector-search/quickstart.html) is a managed serverless database built on Cassandra, offering the same interface and strengths.\n",
-    "\n",
-    "Depending on whether you connect to a Cassandra cluster or to Astra DB through CQL, you will provide different parameters when creating the vector store object."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "36bbb3d9-4d07-4f63-b23d-c52be03f8938",
-   "metadata": {},
-   "source": [
-    "### Connecting to a Cassandra cluster\n",
-    "\n",
-    "You first need to create a `cassandra.cluster.Session` object, as described in the [Cassandra driver documentation](https://docs.datastax.com/en/developer/python-driver/latest/api/cassandra/cluster/#module-cassandra.cluster). The details vary (e.g. with network settings and authentication), but this might be something like:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "d95bb1d4-d8a6-4e66-89bc-776f9c6f962b",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "from cassandra.cluster import Cluster\n",
-    "\n",
-    "cluster = Cluster([\"127.0.0.1\"])\n",
-    "session = cluster.connect()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "8279aa78-96d6-43ad-aa21-79fd798d895d",
-   "metadata": {},
-   "source": [
-    "You can now set the session, along with your desired keyspace name, as a global CassIO parameter:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "29ececc4-e50b-4428-967f-4b6bbde12a14",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "import cassio\n",
-    "\n",
-    "CASSANDRA_KEYSPACE = input(\"CASSANDRA_KEYSPACE = \")\n",
-    "\n",
-    "cassio.init(session=session, keyspace=CASSANDRA_KEYSPACE)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "0bd035a2-f0af-418f-94e5-0fbb4d51ac3c",
-   "metadata": {},
-   "source": [
-    "Now you can create the vector store:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "eeb62cde-89fc-44d7-ba76-91e19cbc5898",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "vstore = Cassandra(\n",
-    "    embedding=embe,\n",
-    "    table_name=\"cassandra_vector_demo\",\n",
-    "    # session=None, keyspace=None  # Uncomment on older versions of LangChain\n",
-    ")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "ce240555-e5fc-431d-ac0f-bcf2f6e6a5fb",
-   "metadata": {},
-   "source": [
-    "_Note: you can also pass your session and keyspace directly as parameters when creating the vector store. Using the global `cassio.init` setting, however, comes handy if your applications uses Cassandra in several ways (for instance, for vector store, chat memory and LLM response caching), as it allows to centralize credential and DB connection management in one place._"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "b598e5fa-eb62-4939-9734-091628e84db4",
-   "metadata": {},
-   "source": [
-    "### Connecting to Astra DB through CQL"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "2feec7c3-7092-4252-9a3f-05eda4babe74",
-   "metadata": {},
-   "source": [
-    "In this case you initialize CassIO with the following connection parameters:\n",
-    "\n",
-    "- the Database ID, e.g. `01234567-89ab-cdef-0123-456789abcdef`\n",
-    "- the Token, e.g. `AstraCS:6gBhNmsk135....` (it must be a \"Database Administrator\" token)\n",
-    "- Optionally a Keyspace name (if omitted, the default one for the database will be used)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "2f96147d-6d76-4101-bbb0-4a7f215c3d2d",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "ASTRA_DB_ID = input(\"ASTRA_DB_ID = \")\n",
-    "ASTRA_DB_APPLICATION_TOKEN = getpass(\"ASTRA_DB_APPLICATION_TOKEN = \")\n",
-    "\n",
-    "desired_keyspace = input(\"ASTRA_DB_KEYSPACE (optional, can be left empty) = \")\n",
-    "if desired_keyspace:\n",
-    "    ASTRA_DB_KEYSPACE = desired_keyspace\n",
-    "else:\n",
-    "    ASTRA_DB_KEYSPACE = None"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "d653df1d-9dad-4980-ba52-76a47b4c5c1a",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "import cassio\n",
-    "\n",
-    "cassio.init(\n",
-    "    database_id=ASTRA_DB_ID,\n",
-    "    token=ASTRA_DB_APPLICATION_TOKEN,\n",
-    "    keyspace=ASTRA_DB_KEYSPACE,\n",
-    ")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "e606b58b-d390-4fed-a2fc-65036c44860f",
-   "metadata": {},
-   "source": [
-    "Now you can create the vector store:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "9cb552d1-e888-4550-a350-6df06b1f5aae",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "vstore = Cassandra(\n",
-    "    embedding=embe,\n",
-    "    table_name=\"cassandra_vector_demo\",\n",
-    "    # session=None, keyspace=None  # Uncomment on older versions of LangChain\n",
-    ")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "9a348678-b2f6-46ca-9a0d-2eb4cc6b66b1",
-   "metadata": {},
-   "source": [
-    "## Load a dataset"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "552e56b0-301a-4b06-99c7-57ba6faa966f",
-   "metadata": {},
-   "source": [
-    "Convert each entry in the source dataset into a `Document`, then write them into the vector store:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "3a1f532f-ad63-4256-9730-a183841bd8e9",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "philo_dataset = load_dataset(\"datastax/philosopher-quotes\")[\"train\"]\n",
-    "\n",
-    "docs = []\n",
-    "for entry in philo_dataset:\n",
-    "    metadata = {\"author\": entry[\"author\"]}\n",
-    "    doc = Document(page_content=entry[\"quote\"], metadata=metadata)\n",
-    "    docs.append(doc)\n",
-    "\n",
-    "inserted_ids = vstore.add_documents(docs)\n",
-    "print(f\"\\nInserted {len(inserted_ids)} documents.\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "79d4f436-ef04-4288-8f79-97c9abb983ed",
-   "metadata": {},
-   "source": [
-    "In the above, `metadata` dictionaries are created from the source data and are part of the `Document`."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "084d8802-ab39-4262-9a87-42eafb746f92",
-   "metadata": {},
-   "source": [
-    "Add some more entries, this time with `add_texts`:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "b6b157f5-eb31-4907-a78e-2e2b06893936",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "texts = [\"I think, therefore I am.\", \"To the things themselves!\"]\n",
-    "metadatas = [{\"author\": \"descartes\"}, {\"author\": \"husserl\"}]\n",
-    "ids = [\"desc_01\", \"huss_xy\"]\n",
-    "\n",
-    "inserted_ids_2 = vstore.add_texts(texts=texts, metadatas=metadatas, ids=ids)\n",
-    "print(f\"\\nInserted {len(inserted_ids_2)} documents.\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "63840eb3-8b29-4017-bc2f-301bf5001f28",
-   "metadata": {},
-   "source": [
-    "_Note: you may want to speed up the execution of `add_texts` and `add_documents` by increasing the concurrency level for_\n",
-    "_these bulk operations - check out the methods' `batch_size` parameter_\n",
-    "_for more details. Depending on the network and the client machine specifications, your best-performing choice of parameters may vary._"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "c031760a-1fc5-4855-adf2-02ed52fe2181",
-   "metadata": {},
-   "source": [
-    "## Run searches"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "02a77d8e-1aae-4054-8805-01c77947c49f",
-   "metadata": {},
-   "source": [
-    "This section demonstrates metadata filtering and getting the similarity scores back:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "1761806a-1afd-4491-867c-25a80d92b9fe",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "results = vstore.similarity_search(\"Our life is what we make of it\", k=3)\n",
-    "for res in results:\n",
-    "    print(f\"* {res.page_content} [{res.metadata}]\")"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "eebc4f7c-f61a-438e-b3c8-17e6888d8a0b",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "results_filtered = vstore.similarity_search(\n",
-    "    \"Our life is what we make of it\",\n",
-    "    k=3,\n",
-    "    filter={\"author\": \"plato\"},\n",
-    ")\n",
-    "for res in results_filtered:\n",
-    "    print(f\"* {res.page_content} [{res.metadata}]\")"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "11bbfe64-c0cd-40c6-866a-a5786538450e",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "results = vstore.similarity_search_with_score(\"Our life is what we make of it\", k=3)\n",
-    "for res, score in results:\n",
-    "    print(f\"* [SIM={score:3f}] {res.page_content} [{res.metadata}]\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "b14ea558-bfbe-41ce-807e-d70670060ada",
-   "metadata": {},
-   "source": [
-    "### MMR (Maximal-marginal-relevance) search"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "76381ce8-780a-4e3b-97b1-056d6782d7d5",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "results = vstore.max_marginal_relevance_search(\n",
-    "    \"Our life is what we make of it\",\n",
-    "    k=3,\n",
-    "    filter={\"author\": \"aristotle\"},\n",
-    ")\n",
-    "for res in results:\n",
-    "    print(f\"* {res.page_content} [{res.metadata}]\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "1cc86edd-692b-4495-906c-ccfd13b03c23",
-   "metadata": {},
-   "source": [
-    "## Deleting stored documents"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "38a70ec4-b522-4d32-9ead-c642864fca37",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "delete_1 = vstore.delete(inserted_ids[:3])\n",
-    "print(f\"all_succeed={delete_1}\")  # True, all documents deleted"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "d4cf49ed-9d29-4ed9-bdab-51a308c41b8e",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "delete_2 = vstore.delete(inserted_ids[2:5])\n",
-    "print(f\"some_succeeds={delete_2}\")  # True, though some IDs were gone already"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "847181ba-77d1-4a17-b7f9-9e2c3d8efd13",
-   "metadata": {},
-   "source": [
-    "## A minimal RAG chain"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "cd64b844-846f-43c5-a7dd-c26b9ed417d0",
-   "metadata": {},
-   "source": [
-    "The next cells will implement a simple RAG pipeline:\n",
-    "- download a sample PDF file and load it onto the store;\n",
-    "- create a RAG chain with LCEL (LangChain Expression Language), with the vector store at its heart;\n",
-    "- run the question-answering chain."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "5cbc4dba-0d5e-4038-8fc5-de6cadd1c2a9",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "!curl -L \\\n",
-    "    \"https://github.com/awesome-astra/datasets/blob/main/demo-resources/what-is-philosophy/what-is-philosophy.pdf?raw=true\" \\\n",
-    "    -o \"what-is-philosophy.pdf\""
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "459385be-5e9c-47ff-ba53-2b7ae6166b09",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "pdf_loader = PyPDFLoader(\"what-is-philosophy.pdf\")\n",
-    "splitter = RecursiveCharacterTextSplitter(chunk_size=512, chunk_overlap=64)\n",
-    "docs_from_pdf = pdf_loader.load_and_split(text_splitter=splitter)\n",
-    "\n",
-    "print(f\"Documents from PDF: {len(docs_from_pdf)}.\")\n",
-    "inserted_ids_from_pdf = vstore.add_documents(docs_from_pdf)\n",
-    "print(f\"Inserted {len(inserted_ids_from_pdf)} documents.\")"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "5010a66c-4298-4e32-82b5-2da0d36a5c70",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "retriever = vstore.as_retriever(search_kwargs={\"k\": 3})\n",
-    "\n",
-    "philo_template = \"\"\"\n",
-    "You are a philosopher that draws inspiration from great thinkers of the past\n",
-    "to craft well-thought answers to user questions. Use the provided context as the basis\n",
-    "for your answers and do not make up new reasoning paths - just mix-and-match what you are given.\n",
-    "Your answers must be concise and to the point, and refrain from answering about other topics than philosophy.\n",
-    "\n",
-    "CONTEXT:\n",
-    "{context}\n",
-    "\n",
-    "QUESTION: {question}\n",
-    "\n",
-    "YOUR ANSWER:\"\"\"\n",
-    "\n",
-    "philo_prompt = ChatPromptTemplate.from_template(philo_template)\n",
-    "\n",
-    "llm = ChatOpenAI()\n",
-    "\n",
-    "chain = (\n",
-    "    {\"context\": retriever, \"question\": RunnablePassthrough()}\n",
-    "    | philo_prompt\n",
-    "    | llm\n",
-    "    | StrOutputParser()\n",
-    ")"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "fcbc1296-6c7c-478b-b55b-533ba4e54ddb",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "chain.invoke(\"How does Russel elaborate on Peirce's idea of the security blanket?\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "869ab448-a029-4692-aefc-26b85513314d",
-   "metadata": {},
-   "source": [
-    "For more, check out a complete RAG template using Astra DB through CQL [here](https://github.com/langchain-ai/langchain/tree/master/templates/cassandra-entomology-rag)."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "177610c7-50d0-4b7b-8634-b03338054c8e",
-   "metadata": {},
-   "source": [
-    "## Cleanup"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "0da4d19f-9878-4d3d-82c9-09cafca20322",
-   "metadata": {},
-   "source": [
-    "the following essentially retrieves the `Session` object from CassIO and runs a CQL `DROP TABLE` statement with it:\n",
-    "\n",
-    "_(You will lose the data you stored in it.)_"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "fd405a13-6f71-46fa-87e6-167238e9c25e",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "cassio.config.resolve_session().execute(\n",
-    "    f\"DROP TABLE {cassio.config.resolve_keyspace()}.cassandra_vector_demo;\"\n",
-    ")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "c10ece4d-ae06-42ab-baf4-4d0ac2051743",
-   "metadata": {},
-   "source": [
-    "### Learn more"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "51ea8b69-7e15-458f-85aa-9fa199f95f9c",
-   "metadata": {},
-   "source": [
-    "For more information, extended quickstarts and additional usage examples, please visit the [CassIO documentation](https://cassio.org/frameworks/langchain/about/) for more on using the LangChain `Cassandra` vector store."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "3b8ee30c-2c84-42f3-9cff-e80dbc590490",
-   "metadata": {},
-   "source": [
-    "#### Attribution statement\n",
-    "\n",
-    "> Apache Cassandra, Cassandra and Apache are either registered trademarks or trademarks of the [Apache Software Foundation](http://www.apache.org/) in the United States and/or other countries.\n"
-   ]
-  }
- ],
- "metadata": {
-  "kernelspec": {
-   "display_name": "Python 3 (ipykernel)",
-   "language": "python",
-   "name": "python3"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.9.17"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 5
-}
--- a/docs/docs/integrations/vectorstores/lancedb.ipynb
+++ b/docs/docs/integrations/vectorstores/lancedb.ipynb
--- a/docs/docs/integrations/vectorstores/singlestoredb.ipynb
+++ b/docs/docs/integrations/vectorstores/singlestoredb.ipynb
@@ -115,56 +115,12 @@
   ]
  },
  {
-   "cell_type": "markdown",
+   "cell_type": "code",
+   "execution_count": null,
   "id": "86efff90",
   "metadata": {},
-   "source": [
-    "## Multi-modal Example: Leveraging CLIP and OpenClip Embeddings\n",
-    "\n",
-    "In the realm of multi-modal data analysis, the integration of diverse information types like images and text has become increasingly crucial. One powerful tool facilitating such integration is [CLIP](https://openai.com/research/clip), a cutting-edge model capable of embedding both images and text into a shared semantic space. By doing so, CLIP enables the retrieval of relevant content across different modalities through similarity search.\n",
-    "\n",
-    "To illustrate, let's consider an application scenario where we aim to effectively analyze multi-modal data. In this example, we harness the capabilities of [OpenClip multimodal embeddings](https://python.langchain.com/docs/integrations/text_embedding/open_clip), which leverage CLIP's framework. With OpenClip, we can seamlessly embed textual descriptions alongside corresponding images, enabling comprehensive analysis and retrieval tasks. Whether it's identifying visually similar images based on textual queries or finding relevant text passages associated with specific visual content, OpenClip empowers users to explore and extract insights from multi-modal data with remarkable efficiency and accuracy."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "9c0bce88",
-   "metadata": {},
   "outputs": [],
-   "source": [
-    "%pip install -U langchain openai singlestoredb langchain-experimental # (newest versions required for multi-modal)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "21a8c25c",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "import os\n",
-    "\n",
-    "from langchain_community.vectorstores import SingleStoreDB\n",
-    "from langchain_experimental.open_clip import OpenCLIPEmbeddings\n",
-    "\n",
-    "os.environ[\"SINGLESTOREDB_URL\"] = \"root:pass@localhost:3306/db\"\n",
-    "\n",
-    "TEST_IMAGES_DIR = \"../../modules/images\"\n",
-    "\n",
-    "docsearch = SingleStoreDB(OpenCLIPEmbeddings())\n",
-    "\n",
-    "image_uris = sorted(\n",
-    "    [\n",
-    "        os.path.join(TEST_IMAGES_DIR, image_name)\n",
-    "        for image_name in os.listdir(TEST_IMAGES_DIR)\n",
-    "        if image_name.endswith(\".jpg\")\n",
-    "    ]\n",
-    ")\n",
-    "\n",
-    "# Add images\n",
-    "docsearch.add_images(uris=image_uris)"
-   ]
+   "source": []
  }
 ],
 "metadata": {
--- a/docs/docs/integrations/vectorstores/tigris.ipynb
+++ b/docs/docs/integrations/vectorstores/tigris.ipynb
@@ -6,7 +6,7 @@
   "source": [
    "# Tigris\n",
    "\n",
-    "> [Tigris](https://tigrisdata.com) is an open-source Serverless NoSQL Database and Search Platform designed to simplify building high-performance vector search applications.\n",
+    "> [Tigris](htttps://tigrisdata.com) is an open-source Serverless NoSQL Database and Search Platform designed to simplify building high-performance vector search applications.\n",
    "> `Tigris` eliminates the infrastructure complexity of managing, operating, and synchronizing multiple tools, allowing you to focus on building great applications instead."
   ]
  },
--- a/docs/docs/modules/agents/how_to/agent_iter.ipynb
+++ b/docs/docs/modules/agents/how_to/agent_iter.ipynb
@@ -7,7 +7,7 @@
   "source": [
    "# Running Agent as an Iterator\n",
    "\n",
-    "It can be useful to run the agent as an iterator, to add human-in-the-loop checks as needed.\n",
+    "It can be useful to run the agent as an interator, to add human-in-the-loop checks as needed.\n",
    "\n",
    "To demonstrate the `AgentExecutorIterator` functionality, we will set up a problem where an Agent must:\n",
    "\n",
--- a/docs/docs/modules/data_connection/vectorstores/index.mdx
+++ b/docs/docs/modules/data_connection/vectorstores/index.mdx
@@ -131,7 +131,7 @@ table = db.create_table(
 raw_documents = TextLoader('../../../state_of_the_union.txt').load()
 text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
 documents = text_splitter.split_documents(raw_documents)
-db = LanceDB.from_documents(documents, OpenAIEmbeddings())
+db = LanceDB.from_documents(documents, OpenAIEmbeddings(), connection=table)
 ```

  </TabItem>
--- a/docs/docs/modules/model_io/chat/custom_chat_model.ipynb
+++ b/docs/docs/modules/model_io/chat/custom_chat_model.ipynb
@@ -1,644 +0,0 @@
-{
- "cells": [
-  {
-   "attachments": {},
-   "cell_type": "markdown",
-   "id": "e3da9a3f-f583-4ba6-994e-0e8c1158f5eb",
-   "metadata": {},
-   "source": [
-    "# Custom Chat Model\n",
-    "\n",
-    "In this guide, we'll learn how to create a custom chat model using LangChain abstractions.\n",
-    "\n",
-    "Wrapping your LLM with the standard `ChatModel` interface allow you to use your LLM in existing LangChain programs with minimal code modifications!\n",
-    "\n",
-    "As an bonus, your LLM will automatically become a LangChain `Runnable` and will benefit from some optimizations out of the box (e.g., batch via a threadpool), async support, the `astream_events` API, etc.\n",
-    "\n",
-    "## Inputs and outputs\n",
-    "\n",
-    "First, we need to talk about messages which are the inputs and outputs of chat models.\n",
-    "\n",
-    "### Messages\n",
-    "\n",
-    "Chat models take messages as inputs and return a message as output. \n",
-    "\n",
-    "LangChain has a few built-in message types:\n",
-    "\n",
-    "- `SystemMessage`: Used for priming AI behavior, usually passed in as the first of a sequence of input messages.\n",
-    "- `HumanMessage`: Represents a message from a person interacting with the chat model.\n",
-    "- `AIMessage`: Represents a message from the chat model. This can be either text or a request to invoke a tool.\n",
-    "- `FunctionMessage` / `ToolMessage`: Message for passing the results of tool invocation back to the model.\n",
-    "\n",
-    "::: {.callout-note}\n",
-    "`ToolMessage` and `FunctionMessage` closely follow OpenAIs `function` and `tool` arguments.\n",
-    "\n",
-    "This is a rapidly developing field and as more models add function calling capabilities, expect that there will be additions to this schema.\n",
-    ":::"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 1,
-   "id": "c5046e6a-8b09-4a99-b6e6-7a605aac5738",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "from langchain_core.messages import (\n",
-    "    AIMessage,\n",
-    "    BaseMessage,\n",
-    "    FunctionMessage,\n",
-    "    HumanMessage,\n",
-    "    SystemMessage,\n",
-    "    ToolMessage,\n",
-    ")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "53033447-8260-4f53-bd6f-b2f744e04e75",
-   "metadata": {},
-   "source": [
-    "### Streaming Variant\n",
-    "\n",
-    "All the chat messages have a streaming variant that contains `Chunk` in the name."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 2,
-   "id": "d4656e9d-bfa1-4703-8f79-762fe6421294",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "from langchain_core.messages import (\n",
-    "    AIMessageChunk,\n",
-    "    FunctionMessageChunk,\n",
-    "    HumanMessageChunk,\n",
-    "    SystemMessageChunk,\n",
-    "    ToolMessageChunk,\n",
-    ")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "81ebf3f4-c760-4898-b921-fdb469453d4a",
-   "metadata": {},
-   "source": [
-    "These chunks are used when streaming output from chat models, and they all define an additive property!"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 3,
-   "id": "9c15c299-6f8a-49cf-a072-09924fd44396",
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "AIMessageChunk(content='Hello World!')"
-      ]
-     },
-     "execution_count": 3,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "AIMessageChunk(content=\"Hello\") + AIMessageChunk(content=\" World!\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "8e952d64-6d38-4a2b-b996-8812c204a12c",
-   "metadata": {},
-   "source": [
-    "## Simple Chat Model\n",
-    "\n",
-    "Inherting from `SimpleChatModel` is great for prototyping!\n",
-    "\n",
-    "It won't allow you to implement all features that you might want out of a chat model, but it's quick to implement, and if you need more you can transition to `BaseChatModel` shown below.\n",
-    "\n",
-    "Let's implement a chat model that echoes back the last `n` characters of the prompt!\n",
-    "\n",
-    "You need to implement the following:\n",
-    "\n",
-    "* The method `_call` - Use to generate a chat result from a prompt.\n",
-    "\n",
-    "In addition, you have the option to specify the following:\n",
-    "\n",
-    "* The property `_identifying_params` - Represent model parameterization for logging purposes.\n",
-    "\n",
-    "Optional:\n",
-    "\n",
-    "* `_stream` - Use to implement streaming.\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "bbfebea1",
-   "metadata": {},
-   "source": [
-    "## Base Chat Model\n",
-    "\n",
-    "Let's implement a chat model that echoes back the first `n` characetrs of the last message in the prompt!\n",
-    "\n",
-    "To do so, we will inherit from `BaseChatModel` and we'll need to implement the following methods/properties:\n",
-    "\n",
-    "In addition, you have the option to specify the following:\n",
-    "\n",
-    "To do so inherit from `BaseChatModel` which is a lower level class and implement the methods:\n",
-    "\n",
-    "* `_generate` - Use to generate a chat result from a prompt\n",
-    "* The property `_llm_type` - Used to uniquely identify the type of the model. Used for logging.\n",
-    "\n",
-    "Optional:\n",
-    "\n",
-    "* `_stream` - Use to implement streaming.\n",
-    "* `_agenerate` - Use to implement a native async method.\n",
-    "* `_astream` - Use to implement async version of `_stream`.\n",
-    "* The property `_identifying_params` - Represent model parameterization for logging purposes.\n",
-    "\n",
-    "\n",
-    ":::{.callout-caution}\n",
-    "\n",
-    "Currently, to get async streaming to work (via `astream`), you must provide an implementation of `_astream`.\n",
-    "\n",
-    "By default if `_astream` is not provided, then async streaming falls back on `_agenerate` which does not support\n",
-    "token by token streaming.\n",
-    ":::"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "8e7047bd-c235-46f6-85e1-d6d7e0868eb1",
-   "metadata": {},
-   "source": [
-    "### Implementation"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 4,
-   "id": "25ba32e5-5a6d-49f4-bb68-911827b84d61",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "from typing import Any, AsyncIterator, Dict, Iterator, List, Optional\n",
-    "\n",
-    "from langchain_core.callbacks import (\n",
-    "    AsyncCallbackManagerForLLMRun,\n",
-    "    CallbackManagerForLLMRun,\n",
-    ")\n",
-    "from langchain_core.language_models import BaseChatModel, SimpleChatModel\n",
-    "from langchain_core.messages import AIMessageChunk, BaseMessage, HumanMessage\n",
-    "from langchain_core.outputs import ChatGeneration, ChatGenerationChunk, ChatResult\n",
-    "from langchain_core.runnables import run_in_executor\n",
-    "\n",
-    "\n",
-    "class CustomChatModelAdvanced(BaseChatModel):\n",
-    "    \"\"\"A custom chat model that echoes the first `n` characters of the input.\n",
-    "\n",
-    "    When contributing an implementation to LangChain, carefully document\n",
-    "    the model including the initialization parameters, include\n",
-    "    an example of how to initialize the model and include any relevant\n",
-    "    links to the underlying models documentation or API.\n",
-    "\n",
-    "    Example:\n",
-    "\n",
-    "        .. code-block:: python\n",
-    "\n",
-    "            model = CustomChatModel(n=2)\n",
-    "            result = model.invoke([HumanMessage(content=\"hello\")])\n",
-    "            result = model.batch([[HumanMessage(content=\"hello\")],\n",
-    "                                 [HumanMessage(content=\"world\")]])\n",
-    "    \"\"\"\n",
-    "\n",
-    "    n: int\n",
-    "    \"\"\"The number of characters from the last message of the prompt to be echoed.\"\"\"\n",
-    "\n",
-    "    def _generate(\n",
-    "        self,\n",
-    "        messages: List[BaseMessage],\n",
-    "        stop: Optional[List[str]] = None,\n",
-    "        run_manager: Optional[CallbackManagerForLLMRun] = None,\n",
-    "        **kwargs: Any,\n",
-    "    ) -> ChatResult:\n",
-    "        \"\"\"Override the _generate method to implement the chat model logic.\n",
-    "\n",
-    "        This can be a call to an API, a call to a local model, or any other\n",
-    "        implementation that generates a response to the input prompt.\n",
-    "\n",
-    "        Args:\n",
-    "            messages: the prompt composed of a list of messages.\n",
-    "            stop: a list of strings on which the model should stop generating.\n",
-    "                  If generation stops due to a stop token, the stop token itself\n",
-    "                  SHOULD BE INCLUDED as part of the output. This is not enforced\n",
-    "                  across models right now, but it's a good practice to follow since\n",
-    "                  it makes it much easier to parse the output of the model\n",
-    "                  downstream and understand why generation stopped.\n",
-    "            run_manager: A run manager with callbacks for the LLM.\n",
-    "        \"\"\"\n",
-    "        last_message = messages[-1]\n",
-    "        tokens = last_message.content[: self.n]\n",
-    "        message = AIMessage(content=tokens)\n",
-    "        generation = ChatGeneration(message=message)\n",
-    "        return ChatResult(generations=[generation])\n",
-    "\n",
-    "    def _stream(\n",
-    "        self,\n",
-    "        messages: List[BaseMessage],\n",
-    "        stop: Optional[List[str]] = None,\n",
-    "        run_manager: Optional[CallbackManagerForLLMRun] = None,\n",
-    "        **kwargs: Any,\n",
-    "    ) -> Iterator[ChatGenerationChunk]:\n",
-    "        \"\"\"Stream the output of the model.\n",
-    "\n",
-    "        This method should be implemented if the model can generate output\n",
-    "        in a streaming fashion. If the model does not support streaming,\n",
-    "        do not implement it. In that case streaming requests will be automatically\n",
-    "        handled by the _generate method.\n",
-    "\n",
-    "        Args:\n",
-    "            messages: the prompt composed of a list of messages.\n",
-    "            stop: a list of strings on which the model should stop generating.\n",
-    "                  If generation stops due to a stop token, the stop token itself\n",
-    "                  SHOULD BE INCLUDED as part of the output. This is not enforced\n",
-    "                  across models right now, but it's a good practice to follow since\n",
-    "                  it makes it much easier to parse the output of the model\n",
-    "                  downstream and understand why generation stopped.\n",
-    "            run_manager: A run manager with callbacks for the LLM.\n",
-    "        \"\"\"\n",
-    "        last_message = messages[-1]\n",
-    "        tokens = last_message.content[: self.n]\n",
-    "\n",
-    "        for token in tokens:\n",
-    "            chunk = ChatGenerationChunk(message=AIMessageChunk(content=token))\n",
-    "\n",
-    "            if run_manager:\n",
-    "                run_manager.on_llm_new_token(token, chunk=chunk)\n",
-    "\n",
-    "            yield chunk\n",
-    "\n",
-    "    async def _astream(\n",
-    "        self,\n",
-    "        messages: List[BaseMessage],\n",
-    "        stop: Optional[List[str]] = None,\n",
-    "        run_manager: Optional[AsyncCallbackManagerForLLMRun] = None,\n",
-    "        **kwargs: Any,\n",
-    "    ) -> AsyncIterator[ChatGenerationChunk]:\n",
-    "        \"\"\"An async variant of astream.\n",
-    "\n",
-    "        If not provided, the default behavior is to delegate to the _generate method.\n",
-    "\n",
-    "        The implementation below instead will delegate to `_stream` and will\n",
-    "        kick it off in a separate thread.\n",
-    "\n",
-    "        If you're able to natively support async, then by all means do so!\n",
-    "        \"\"\"\n",
-    "        result = await run_in_executor(\n",
-    "            None,\n",
-    "            self._stream,\n",
-    "            messages,\n",
-    "            stop=stop,\n",
-    "            run_manager=run_manager.get_sync() if run_manager else None,\n",
-    "            **kwargs,\n",
-    "        )\n",
-    "        for chunk in result:\n",
-    "            yield chunk\n",
-    "\n",
-    "    @property\n",
-    "    def _llm_type(self) -> str:\n",
-    "        \"\"\"Get the type of language model used by this chat model.\"\"\"\n",
-    "        return \"echoing-chat-model-advanced\"\n",
-    "\n",
-    "    @property\n",
-    "    def _identifying_params(self) -> Dict[str, Any]:\n",
-    "        \"\"\"Return a dictionary of identifying parameters.\"\"\"\n",
-    "        return {\"n\": self.n}"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "b3c3d030-8d8b-4891-962d-a2d39b331883",
-   "metadata": {},
-   "source": [
-    ":::{.callout-tip}\n",
-    "The `_astream` implementation uses `run_in_executor` to launch the sync `_stream` in a separate thread.\n",
-    "\n",
-    "You can use this trick if you want to reuse the `_stream` implementation, but if you're able to implement code\n",
-    "that's natively async that's a better solution since that code will run with less overhead.\n",
-    ":::"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "1e9af284-f2d3-44e2-ac6a-09b73d89ada3",
-   "metadata": {},
-   "source": [
-    "### Let's test it 🧪\n",
-    "\n",
-    "The chat model will implement the standard `Runnable` interface of LangChain which many of the LangChain abstractions support!"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 5,
-   "id": "34bf2d48-556a-48be-aee7-496fb02332f3",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "model = CustomChatModelAdvanced(n=3)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 6,
-   "id": "27689f30-dcd2-466b-ba9d-f60b7d434110",
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "AIMessage(content='Meo')"
-      ]
-     },
-     "execution_count": 6,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "model.invoke(\n",
-    "    [\n",
-    "        HumanMessage(content=\"hello!\"),\n",
-    "        AIMessage(content=\"Hi there human!\"),\n",
-    "        HumanMessage(content=\"Meow!\"),\n",
-    "    ]\n",
-    ")"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 7,
-   "id": "406436df-31bf-466b-9c3d-39db9d6b6407",
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "AIMessage(content='hel')"
-      ]
-     },
-     "execution_count": 7,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "model.invoke(\"hello\")"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 8,
-   "id": "a72ffa46-6004-41ef-bbe4-56fa17a029e2",
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "[AIMessage(content='hel'), AIMessage(content='goo')]"
-      ]
-     },
-     "execution_count": 8,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "model.batch([\"hello\", \"goodbye\"])"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 9,
-   "id": "3633be2c-2ea0-42f9-a72f-3b5240690b55",
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "c|a|t|"
-     ]
-    }
-   ],
-   "source": [
-    "for chunk in model.stream(\"cat\"):\n",
-    "    print(chunk.content, end=\"|\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "3f8a7c42-aec4-4116-adf3-93133d409827",
-   "metadata": {},
-   "source": [
-    "Please see the implementation of `_astream` in the model! If you do not implement it, then no output will stream.!"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 10,
-   "id": "b7d73995-eeab-48c6-a7d8-32c98ba29fc2",
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "c|a|t|"
-     ]
-    }
-   ],
-   "source": [
-    "async for chunk in model.astream(\"cat\"):\n",
-    "    print(chunk.content, end=\"|\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "f80dc55b-d159-4527-9191-407a7c6d6042",
-   "metadata": {},
-   "source": [
-    "Let's try to use the astream events API which will also help double check that all the callbacks were implemented!"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 11,
-   "id": "17840eba-8ff4-4e73-8e4f-85f16eb1c9d0",
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "{'event': 'on_chat_model_start', 'run_id': 'e03c0b21-521f-4cb4-a837-02fed65cf1cf', 'name': 'CustomChatModelAdvanced', 'tags': [], 'metadata': {}, 'data': {'input': 'cat'}}\n",
-      "{'event': 'on_chat_model_stream', 'run_id': 'e03c0b21-521f-4cb4-a837-02fed65cf1cf', 'tags': [], 'metadata': {}, 'name': 'CustomChatModelAdvanced', 'data': {'chunk': AIMessageChunk(content='c')}}\n",
-      "{'event': 'on_chat_model_stream', 'run_id': 'e03c0b21-521f-4cb4-a837-02fed65cf1cf', 'tags': [], 'metadata': {}, 'name': 'CustomChatModelAdvanced', 'data': {'chunk': AIMessageChunk(content='a')}}\n",
-      "{'event': 'on_chat_model_stream', 'run_id': 'e03c0b21-521f-4cb4-a837-02fed65cf1cf', 'tags': [], 'metadata': {}, 'name': 'CustomChatModelAdvanced', 'data': {'chunk': AIMessageChunk(content='t')}}\n",
-      "{'event': 'on_chat_model_end', 'name': 'CustomChatModelAdvanced', 'run_id': 'e03c0b21-521f-4cb4-a837-02fed65cf1cf', 'tags': [], 'metadata': {}, 'data': {'output': AIMessageChunk(content='cat')}}\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "/home/eugene/src/langchain/libs/core/langchain_core/_api/beta_decorator.py:86: LangChainBetaWarning: This API is in beta and may change in the future.\n",
-      "  warn_beta(\n"
-     ]
-    }
-   ],
-   "source": [
-    "async for event in model.astream_events(\"cat\", version=\"v1\"):\n",
-    "    print(event)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "42f9553f-7d8c-4277-aeb4-d80d77839d90",
-   "metadata": {},
-   "source": [
-    "## Identifying Params\n",
-    "\n",
-    "LangChain has a callback system which allows implementing loggers to monitor the behavior of LLM applications.\n",
-    "\n",
-    "Remember the `_identifying_params` property from earlier? \n",
-    "\n",
-    "It's passed to the callback system and is accessible for user specified loggers.\n",
-    "\n",
-    "Below we'll implement a handler with just a single `on_chat_model_start` event to see where `_identifying_params` appears."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 12,
-   "id": "cc7e6b5f-711b-48aa-9ebe-92a13e230c37",
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "---\n",
-      "On chat model start.\n",
-      "{'invocation_params': {'n': 3, '_type': 'echoing-chat-model-advanced', 'stop': ['woof']}, 'options': {'stop': ['woof']}, 'name': None, 'batch_size': 1}\n"
-     ]
-    },
-    {
-     "data": {
-      "text/plain": [
-       "AIMessage(content='meo')"
-      ]
-     },
-     "execution_count": 12,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "from typing import Union\n",
-    "from uuid import UUID\n",
-    "\n",
-    "from langchain_core.callbacks import AsyncCallbackHandler\n",
-    "from langchain_core.outputs import (\n",
-    "    ChatGenerationChunk,\n",
-    "    ChatResult,\n",
-    "    GenerationChunk,\n",
-    "    LLMResult,\n",
-    ")\n",
-    "\n",
-    "\n",
-    "class SampleCallbackHandler(AsyncCallbackHandler):\n",
-    "    \"\"\"Async callback handler that handles callbacks from LangChain.\"\"\"\n",
-    "\n",
-    "    async def on_chat_model_start(\n",
-    "        self,\n",
-    "        serialized: Dict[str, Any],\n",
-    "        messages: List[List[BaseMessage]],\n",
-    "        *,\n",
-    "        run_id: UUID,\n",
-    "        parent_run_id: Optional[UUID] = None,\n",
-    "        tags: Optional[List[str]] = None,\n",
-    "        metadata: Optional[Dict[str, Any]] = None,\n",
-    "        **kwargs: Any,\n",
-    "    ) -> Any:\n",
-    "        \"\"\"Run when a chat model starts running.\"\"\"\n",
-    "        print(\"---\")\n",
-    "        print(\"On chat model start.\")\n",
-    "        print(kwargs)\n",
-    "\n",
-    "\n",
-    "model.invoke(\"meow\", stop=[\"woof\"], config={\"callbacks\": [SampleCallbackHandler()]})"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "44ee559b-b1da-4851-8c97-420ab394aff9",
-   "metadata": {},
-   "source": [
-    "## Contributing\n",
-    "\n",
-    "We appreciate all chat model integration contributions. \n",
-    "\n",
-    "Here's a checklist to help make sure your contribution gets added to LangChain:\n",
-    "\n",
-    "Documentation:\n",
-    "\n",
-    "* The model contains doc-strings for all initialization arguments, as these will be surfaced in the [APIReference](https://api.python.langchain.com/en/stable/langchain_api_reference.html).\n",
-    "* The class doc-string for the model contains a link to the model API if the model is powered by a service.\n",
-    "\n",
-    "Tests:\n",
-    "\n",
-    "* [ ] Add unit or integration tests to the overridden methods. Verify that `invoke`, `ainvoke`, `batch`, `stream` work if you've over-ridden the corresponding code.\n",
-    "\n",
-    "Streaming (if you're implementing it):\n",
-    "\n",
-    "* [ ] Provided an async implementation via `_astream`\n",
-    "* [ ] Make sure to invoke the `on_llm_new_token` callback\n",
-    "* [ ] `on_llm_new_token` is invoked BEFORE yielding the chunk\n",
-    "\n",
-    "Stop Token Behavior:\n",
-    "\n",
-    "* [ ] Stop token should be respected\n",
-    "* [ ] Stop token should be INCLUDED as part of the response\n",
-    "\n",
-    "Secret API Keys:\n",
-    "\n",
-    "* [ ] If your model connects to an API it will likely accept API keys as part of its initialization. Use Pydantic's `SecretStr` type for secrets, so they don't get accidentally printed out when folks print the model."
-   ]
-  }
- ],
- "metadata": {
-  "kernelspec": {
-   "display_name": "Python 3 (ipykernel)",
-   "language": "python",
-   "name": "python3"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.11.2"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 5
-}
--- a/docs/docs/modules/model_io/chat/index.mdx
+++ b/docs/docs/modules/model_io/chat/index.mdx
@@ -4,13 +4,11 @@ sidebar_position: 3

 # Chat Models

-Chat Models are a core component of LangChain.
+ChatModels are a core component of LangChain.
+LangChain does not serve its own ChatModels, but rather provides a standard interface for interacting with many different models. To be specific, this interface is one that takes as input a list of messages and returns a message.

-A chat model is a language model that uses chat messages as inputs and returns chat messages as outputs (as opposed to using plain text).

-LangChain has integrations with many model providers (OpenAI, Cohere, Hugging Face, etc.) and exposes a standard interface to interact with all of these models.
-
-LangChain allows you to use models in sync, async, batching and streaming modes and provides other features (e.g., caching) and more.
+There are lots of model providers (OpenAI, Cohere, Hugging Face, etc) - the `ChatModel` class is designed to provide a standard interface for all of them.

 ## [Quick Start](./quick_start)

@@ -29,4 +27,3 @@ This includes:
 - [How to use ChatModels that support function calling](./function_calling)
 - [How to stream responses from a ChatModel](./streaming)
 - [How to track token usage in a ChatModel call](./token_usage_tracking)
- [How to creat a custom ChatModel](./custom_chat_model)
--- a/docs/docs/people.mdx
+++ b/docs/docs/people.mdx
@@ -1,46 +0,0 @@
---
-hide_table_of_contents: true
---
-
-import People from "@theme/People";
-
-# People
-
-There are some incredible humans from all over the world who have been instrumental in helping the LangChain community flourish 🌐!
-
-This page highlights a few of those folks who have dedicated their time to the open-source repo in the form of direct contributions and reviews.
-
-## Top reviewers
-
-As LangChain has grown, the amount of surface area that maintainers cover has grown as well.
-
-Thank you to the following folks who have gone above and beyond in reviewing incoming PRs 🙏!
-
-<People type="top_reviewers"></People>
-
-## Top recent contributors
-
-The list below contains contributors who have had the most PRs merged in the last three months, weighted (imperfectly) by impact.
-
-Thank you all so much for your time and efforts in making LangChain better ❤️!
-
-<People type="top_recent_contributors" count="20"></People>
-
-## Core maintainers
-
-Hello there 👋!
-
-We're LangChain's core maintainers. If you've spent time in the community, you've probably crossed paths
-with at least one of us already. 
-
-<People type="maintainers"></People>
-
-## Top all-time contributors
-
-And finally, this is an all-time list of all-stars who have made significant contributions to the framework 🌟:
-
-<People type="top_contributors"></People>
-
-We're so thankful for your support!
-
-And one more thank you to [@tiangolo](https://github.com/tiangolo) for inspiration via FastAPI's [excellent people page](https://fastapi.tiangolo.com/fastapi-people).
--- a/docs/docusaurus.config.js
+++ b/docs/docusaurus.config.js
@@ -58,10 +58,6 @@ const config = {
                fullySpecified: false,
              },
            },
-            {
-              test: /\.ya?ml$/,
-              use: 'yaml-loader'
-            },
            {
              test: /\.ipynb$/,
              loader: "raw-loader",
@@ -181,10 +177,6 @@ const config = {
            label: "More",
            position: "left",
            items: [
-              {
-                to: "/docs/people/",
-                label: "People",
-              },
              {
                to: "/docs/packages",
                label: "Versioning",
@@ -196,7 +188,7 @@ const config = {
              },
              {
                to: "/docs/contributing",
-                label: "Contributing",
+                label: "Developer's guide",
              },
              {
                type: "docSidebar",
--- a/docs/package.json
+++ b/docs/package.json
@@ -45,8 +45,7 @@
    "eslint-plugin-react-hooks": "^4.6.0",
    "prettier": "^2.7.1",
    "typedoc": "^0.24.4",
-    "typedoc-plugin-markdown": "next",
-    "yaml-loader": "^0.8.0"
+    "typedoc-plugin-markdown": "next"
  },
  "browserslist": {
    "production": [
--- a/docs/src/theme/People.js
+++ b/docs/src/theme/People.js
@@ -1,28 +0,0 @@
-import React from "react";
-import PeopleData from "../../data/people.yml"
-
-function renderPerson({ login, avatarUrl, url }) {
-    return (
-        <div key={`person:${login}`} style={{ display: "flex", flexDirection: "column", alignItems: "center", padding: "18px" }}>
-            <a href={url} target="_blank">
-                <img src={avatarUrl} style={{ borderRadius: "50%", width: "128px", height: "128px" }} />
-            </a>
-            <a href={url} target="_blank" style={{ fontSize: "18px", fontWeight: "700" }}>@{login}</a>
-        </div>
-    );
-}
-
-export default function People({ type, count }) {
-    let people = PeopleData[type] ?? [];
-    if (count !== undefined) {
-      people = people.slice(0, parseInt(count, 10));
-    }
-    const html = people.map((person) => {
-        return renderPerson(person);
-    });
-    return (
-        <div style={{ display: "flex", flexWrap: "wrap", padding: "10px", justifyContent: "space-around" }}>
-            {html}
-        </div>
-    );
-}
--- a/docs/vercel.json
+++ b/docs/vercel.json
--- a/libs/community/langchain_community/agent_toolkits/init.py
+++ b/libs/community/langchain_community/agent_toolkits/init.py
@@ -18,7 +18,6 @@ from langchain_community.agent_toolkits.amadeus.toolkit import AmadeusToolkit
 from langchain_community.agent_toolkits.azure_cognitive_services import (
    AzureCognitiveServicesToolkit,
 )
-from langchain_community.agent_toolkits.cogniswitch.toolkit import CogniswitchToolkit
 from langchain_community.agent_toolkits.connery import ConneryToolkit
 from langchain_community.agent_toolkits.file_management.toolkit import (
    FileManagementToolkit,
@@ -52,7 +51,6 @@ __all__ = [
    "AINetworkToolkit",
    "AmadeusToolkit",
    "AzureCognitiveServicesToolkit",
-    "CogniswitchToolkit",
    "ConneryToolkit",
    "FileManagementToolkit",
    "GmailToolkit",
--- a/libs/community/langchain_community/agent_toolkits/cogniswitch/init.py
+++ b/libs/community/langchain_community/agent_toolkits/cogniswitch/init.py
@@ -1 +0,0 @@
-"""CogniSwitch Toolkit"""
--- a/libs/community/langchain_community/agent_toolkits/cogniswitch/toolkit.py
+++ b/libs/community/langchain_community/agent_toolkits/cogniswitch/toolkit.py
@@ -1,40 +0,0 @@
-from typing import List
-
-from langchain_community.agent_toolkits.base import BaseToolkit
-from langchain_community.tools import BaseTool
-from langchain_community.tools.cogniswitch.tool import (
-    CogniswitchKnowledgeRequest,
-    CogniswitchKnowledgeSourceFile,
-    CogniswitchKnowledgeSourceURL,
-    CogniswitchKnowledgeStatus,
-)
-
-
-class CogniswitchToolkit(BaseToolkit):
-    """
-    Toolkit for CogniSwitch.
-
-    Use the toolkit to get all the tools present in the cogniswitch and
-    use them to interact with your knowledge
-    """
-
-    cs_token: str  # cogniswitch token
-    OAI_token: str  # OpenAI API token
-    apiKey: str  # Cogniswitch OAuth token
-
-    def get_tools(self) -> List[BaseTool]:
-        """Get the tools in the toolkit."""
-        return [
-            CogniswitchKnowledgeStatus(
-                cs_token=self.cs_token, OAI_token=self.OAI_token, apiKey=self.apiKey
-            ),
-            CogniswitchKnowledgeRequest(
-                cs_token=self.cs_token, OAI_token=self.OAI_token, apiKey=self.apiKey
-            ),
-            CogniswitchKnowledgeSourceFile(
-                cs_token=self.cs_token, OAI_token=self.OAI_token, apiKey=self.apiKey
-            ),
-            CogniswitchKnowledgeSourceURL(
-                cs_token=self.cs_token, OAI_token=self.OAI_token, apiKey=self.apiKey
-            ),
-        ]
--- a/libs/community/langchain_community/agent_toolkits/polygon/toolkit.py
+++ b/libs/community/langchain_community/agent_toolkits/polygon/toolkit.py
@@ -2,7 +2,7 @@ from typing import List

 from langchain_community.agent_toolkits.base import BaseToolkit
 from langchain_community.tools import BaseTool
-from langchain_community.tools.polygon import PolygonLastQuote, PolygonTickerNews
+from langchain_community.tools.polygon import PolygonLastQuote
 from langchain_community.utilities.polygon import PolygonAPIWrapper


@@ -18,10 +18,7 @@ class PolygonToolkit(BaseToolkit):
        tools = [
            PolygonLastQuote(
                api_wrapper=polygon_api_wrapper,
-            ),
-            PolygonTickerNews(
-                api_wrapper=polygon_api_wrapper,
-            ),
+            )
        ]
        return cls(tools=tools)

--- a/libs/community/langchain_community/cache.py
+++ b/libs/community/langchain_community/cache.py
@@ -29,14 +29,12 @@ import uuid
 import warnings
 from abc import ABC
 from datetime import timedelta
-from functools import lru_cache, wraps
+from functools import lru_cache
 from typing import (
    TYPE_CHECKING,
    Any,
-    Awaitable,
    Callable,
    Dict,
-    Generator,
    List,
    Optional,
    Sequence,
@@ -58,23 +56,20 @@ except ImportError:

 from langchain_core.caches import RETURN_VAL_TYPE, BaseCache
 from langchain_core.embeddings import Embeddings
-from langchain_core.language_models.llms import LLM, aget_prompts, get_prompts
+from langchain_core.language_models.llms import LLM, get_prompts
 from langchain_core.load.dump import dumps
 from langchain_core.load.load import loads
 from langchain_core.outputs import ChatGeneration, Generation
 from langchain_core.utils import get_from_env

-from langchain_community.utilities.astradb import (
-    SetupMode,
-    _AstraDBCollectionEnvironment,
-)
+from langchain_community.utilities.astradb import AstraDBEnvironment
 from langchain_community.vectorstores.redis import Redis as RedisVectorstore

 logger = logging.getLogger(__file__)

 if TYPE_CHECKING:
    import momento
-    from astrapy.db import AstraDB, AsyncAstraDB
+    from astrapy.db import AstraDB
    from cassandra.cluster import Session as CassandraSession


@@ -461,31 +456,22 @@ class RedisCache(_RedisCacheBase):
    def lookup(self, prompt: str, llm_string: str) -> Optional[RETURN_VAL_TYPE]:
        """Look up based on prompt and llm_string."""
        # Read from a Redis HASH
-        try:
-            results = self.redis.hgetall(self._key(prompt, llm_string))
-            return self._get_generations(results)  # type: ignore[arg-type]
-        except Exception as e:
-            logger.error(f"Redis lookup failed: {e}")
-            return None
+        results = self.redis.hgetall(self._key(prompt, llm_string))
+        return self._get_generations(results)  # type: ignore[arg-type]

    def update(self, prompt: str, llm_string: str, return_val: RETURN_VAL_TYPE) -> None:
        """Update cache based on prompt and llm_string."""
        self._ensure_generation_type(return_val)
        key = self._key(prompt, llm_string)
-        try:
-            with self.redis.pipeline() as pipe:
-                self._configure_pipeline_for_update(key, pipe, return_val, self.ttl)
-                pipe.execute()
-        except Exception as e:
-            logger.error(f"Redis update failed: {e}")
+
+        with self.redis.pipeline() as pipe:
+            self._configure_pipeline_for_update(key, pipe, return_val, self.ttl)
+            pipe.execute()

    def clear(self, **kwargs: Any) -> None:
        """Clear cache. If `asynchronous` is True, flush asynchronously."""
-        try:
-            asynchronous = kwargs.get("asynchronous", False)
-            self.redis.flushdb(asynchronous=asynchronous, **kwargs)
-        except Exception as e:
-            logger.error(f"Redis clear failed: {e}")
+        asynchronous = kwargs.get("asynchronous", False)
+        self.redis.flushdb(asynchronous=asynchronous, **kwargs)


 class AsyncRedisCache(_RedisCacheBase):
@@ -534,12 +520,8 @@ class AsyncRedisCache(_RedisCacheBase):

    async def alookup(self, prompt: str, llm_string: str) -> Optional[RETURN_VAL_TYPE]:
        """Look up based on prompt and llm_string. Async version."""
-        try:
-            results = await self.redis.hgetall(self._key(prompt, llm_string))
-            return self._get_generations(results)  # type: ignore[arg-type]
-        except Exception as e:
-            logger.error(f"Redis async lookup failed: {e}")
-            return None
+        results = await self.redis.hgetall(self._key(prompt, llm_string))
+        return self._get_generations(results)  # type: ignore[arg-type]

    def update(self, prompt: str, llm_string: str, return_val: RETURN_VAL_TYPE) -> None:
        """Update cache based on prompt and llm_string."""
@@ -554,12 +536,10 @@ class AsyncRedisCache(_RedisCacheBase):
        """Update cache based on prompt and llm_string. Async version."""
        self._ensure_generation_type(return_val)
        key = self._key(prompt, llm_string)
-        try:
-            async with self.redis.pipeline() as pipe:
-                self._configure_pipeline_for_update(key, pipe, return_val, self.ttl)
-                await pipe.execute()  # type: ignore[attr-defined]
-        except Exception as e:
-            logger.error(f"Redis async update failed: {e}")
+
+        async with self.redis.pipeline() as pipe:
+            self._configure_pipeline_for_update(key, pipe, return_val, self.ttl)
+            await pipe.execute()  # type: ignore[attr-defined]

    def clear(self, **kwargs: Any) -> None:
        """Clear cache. If `asynchronous` is True, flush asynchronously."""
@@ -573,11 +553,8 @@ class AsyncRedisCache(_RedisCacheBase):
        Clear cache. If `asynchronous` is True, flush asynchronously.
        Async version.
        """
-        try:
-            asynchronous = kwargs.get("asynchronous", False)
-            await self.redis.flushdb(asynchronous=asynchronous, **kwargs)
-        except Exception as e:
-            logger.error(f"Redis async clear failed: {e}")
+        asynchronous = kwargs.get("asynchronous", False)
+        await self.redis.flushdb(asynchronous=asynchronous, **kwargs)


 class RedisSemanticCache(BaseCache):
@@ -1384,9 +1361,15 @@ ASTRA_DB_CACHE_DEFAULT_COLLECTION_NAME = "langchain_astradb_cache"


 class AstraDBCache(BaseCache):
-    @staticmethod
-    def _make_id(prompt: str, llm_string: str) -> str:
-        return f"{_hash(prompt)}#{_hash(llm_string)}"
+    """
+    Cache that uses Astra DB as a backend.
+
+    It uses a single collection as a kv store
+    The lookup keys, combined in the _id of the documents, are:
+        - prompt, a string
+        - llm_string, a deterministic str representation of the model parameters.
+          (needed to prevent same-prompt-different-model collisions)
+    """

    def __init__(
        self,
@@ -1395,52 +1378,39 @@ class AstraDBCache(BaseCache):
        token: Optional[str] = None,
        api_endpoint: Optional[str] = None,
        astra_db_client: Optional[AstraDB] = None,
-        async_astra_db_client: Optional[AsyncAstraDB] = None,
        namespace: Optional[str] = None,
-        pre_delete_collection: bool = False,
-        setup_mode: SetupMode = SetupMode.SYNC,
    ):
        """
-        Cache that uses Astra DB as a backend.
+        Create an AstraDB cache using a collection for storage.

-        It uses a single collection as a kv store
-        The lookup keys, combined in the _id of the documents, are:
-            - prompt, a string
-            - llm_string, a deterministic str representation of the model parameters.
-              (needed to prevent same-prompt-different-model collisions)
-
-        Args:
-            collection_name: name of the Astra DB collection to create/use.
-            token: API token for Astra DB usage.
-            api_endpoint: full URL to the API endpoint,
-                such as `https://<DB-ID>-us-east1.apps.astra.datastax.com`.
-            astra_db_client: *alternative to token+api_endpoint*,
+        Args (only keyword-arguments accepted):
+            collection_name (str): name of the Astra DB collection to create/use.
+            token (Optional[str]): API token for Astra DB usage.
+            api_endpoint (Optional[str]): full URL to the API endpoint,
+                such as "https://<DB-ID>-us-east1.apps.astra.datastax.com".
+            astra_db_client (Optional[Any]): *alternative to token+api_endpoint*,
                you can pass an already-created 'astrapy.db.AstraDB' instance.
-            async_astra_db_client: *alternative to token+api_endpoint*,
-                you can pass an already-created 'astrapy.db.AsyncAstraDB' instance.
-            namespace: namespace (aka keyspace) where the
+            namespace (Optional[str]): namespace (aka keyspace) where the
                collection is created. Defaults to the database's "default namespace".
-            setup_mode: mode used to create the Astra DB collection (SYNC, ASYNC or
-                OFF).
-            pre_delete_collection: whether to delete the collection
-                before creating it. If False and the collection already exists,
-                the collection will be used as is.
        """
-        self.astra_env = _AstraDBCollectionEnvironment(
-            collection_name=collection_name,
+        astra_env = AstraDBEnvironment(
            token=token,
            api_endpoint=api_endpoint,
            astra_db_client=astra_db_client,
-            async_astra_db_client=async_astra_db_client,
            namespace=namespace,
-            setup_mode=setup_mode,
-            pre_delete_collection=pre_delete_collection,
        )
-        self.collection = self.astra_env.collection
-        self.async_collection = self.astra_env.async_collection
+        self.astra_db = astra_env.astra_db
+        self.collection = self.astra_db.create_collection(
+            collection_name=collection_name,
+        )
+        self.collection_name = collection_name
+
+    @staticmethod
+    def _make_id(prompt: str, llm_string: str) -> str:
+        return f"{_hash(prompt)}#{_hash(llm_string)}"

    def lookup(self, prompt: str, llm_string: str) -> Optional[RETURN_VAL_TYPE]:
-        self.astra_env.ensure_db_setup()
+        """Look up based on prompt and llm_string."""
        doc_id = self._make_id(prompt, llm_string)
        item = self.collection.find_one(
            filter={
@@ -1450,25 +1420,18 @@ class AstraDBCache(BaseCache):
                "body_blob": 1,
            },
        )["data"]["document"]
-        return _loads_generations(item["body_blob"]) if item is not None else None
-
-    async def alookup(self, prompt: str, llm_string: str) -> Optional[RETURN_VAL_TYPE]:
-        await self.astra_env.aensure_db_setup()
-        doc_id = self._make_id(prompt, llm_string)
-        item = (
-            await self.async_collection.find_one(
-                filter={
-                    "_id": doc_id,
-                },
-                projection={
-                    "body_blob": 1,
-                },
-            )
-        )["data"]["document"]
-        return _loads_generations(item["body_blob"]) if item is not None else None
+        if item is not None:
+            generations = _loads_generations(item["body_blob"])
+            # this protects against malformed cached items:
+            if generations is not None:
+                return generations
+            else:
+                return None
+        else:
+            return None

    def update(self, prompt: str, llm_string: str, return_val: RETURN_VAL_TYPE) -> None:
-        self.astra_env.ensure_db_setup()
+        """Update cache based on prompt and llm_string."""
        doc_id = self._make_id(prompt, llm_string)
        blob = _dumps_generations(return_val)
        self.collection.upsert(
@@ -1478,19 +1441,6 @@ class AstraDBCache(BaseCache):
            },
        )

-    async def aupdate(
-        self, prompt: str, llm_string: str, return_val: RETURN_VAL_TYPE
-    ) -> None:
-        await self.astra_env.aensure_db_setup()
-        doc_id = self._make_id(prompt, llm_string)
-        blob = _dumps_generations(return_val)
-        await self.async_collection.upsert(
-            {
-                "_id": doc_id,
-                "body_blob": blob,
-            },
-        )
-
    def delete_through_llm(
        self, prompt: str, llm: LLM, stop: Optional[List[str]] = None
    ) -> None:
@@ -1504,40 +1454,14 @@ class AstraDBCache(BaseCache):
        )[1]
        return self.delete(prompt, llm_string=llm_string)

-    async def adelete_through_llm(
-        self, prompt: str, llm: LLM, stop: Optional[List[str]] = None
-    ) -> None:
-        """
-        A wrapper around `adelete` with the LLM being passed.
-        In case the llm(prompt) calls have a `stop` param, you should pass it here
-        """
-        llm_string = (
-            await aget_prompts(
-                {**llm.dict(), **{"stop": stop}},
-                [],
-            )
-        )[1]
-        return await self.adelete(prompt, llm_string=llm_string)
-
    def delete(self, prompt: str, llm_string: str) -> None:
        """Evict from cache if there's an entry."""
-        self.astra_env.ensure_db_setup()
        doc_id = self._make_id(prompt, llm_string)
        self.collection.delete_one(doc_id)

-    async def adelete(self, prompt: str, llm_string: str) -> None:
-        """Evict from cache if there's an entry."""
-        await self.astra_env.aensure_db_setup()
-        doc_id = self._make_id(prompt, llm_string)
-        await self.async_collection.delete_one(doc_id)
-
    def clear(self, **kwargs: Any) -> None:
-        self.astra_env.ensure_db_setup()
-        self.collection.clear()
-
-    async def aclear(self, **kwargs: Any) -> None:
-        await self.astra_env.aensure_db_setup()
-        await self.async_collection.clear()
+        """Clear cache. This is for all LLMs at once."""
+        self.astra_db.truncate_collection(self.collection_name)


 ASTRA_DB_SEMANTIC_CACHE_DEFAULT_THRESHOLD = 0.85
@@ -1545,43 +1469,19 @@ ASTRA_DB_CACHE_DEFAULT_COLLECTION_NAME = "langchain_astradb_semantic_cache"
 ASTRA_DB_SEMANTIC_CACHE_EMBEDDING_CACHE_SIZE = 16


-_unset = ["unset"]
-
-
-class _CachedAwaitable:
-    """Caches the result of an awaitable so it can be awaited multiple times"""
-
-    def __init__(self, awaitable: Awaitable[Any]):
-        self.awaitable = awaitable
-        self.result = _unset
-
-    def __await__(self) -> Generator:
-        if self.result is _unset:
-            self.result = yield from self.awaitable.__await__()
-        return self.result
-
-
-def _reawaitable(func: Callable) -> Callable:
-    """Makes an async function result awaitable multiple times"""
-
-    @wraps(func)
-    def wrapper(*args: Any, **kwargs: Any) -> _CachedAwaitable:
-        return _CachedAwaitable(func(*args, **kwargs))
-
-    return wrapper
-
-
-def _async_lru_cache(maxsize: int = 128, typed: bool = False) -> Callable:
-    """Least-recently-used async cache decorator.
-    Equivalent to functools.lru_cache for async functions"""
-
-    def decorating_function(user_function: Callable) -> Callable:
-        return lru_cache(maxsize, typed)(_reawaitable(user_function))
-
-    return decorating_function
-
-
 class AstraDBSemanticCache(BaseCache):
+    """
+    Cache that uses Astra DB as a vector-store backend for semantic
+    (i.e. similarity-based) lookup.
+
+    It uses a single (vector) collection and can store
+    cached values from several LLMs, so the LLM's 'llm_string' is stored
+    in the document metadata.
+
+    You can choose the preferred similarity (or use the API default) --
+    remember the threshold might require metric-dependend tuning.
+    """
+
    def __init__(
        self,
        *,
@@ -1589,52 +1489,44 @@ class AstraDBSemanticCache(BaseCache):
        token: Optional[str] = None,
        api_endpoint: Optional[str] = None,
        astra_db_client: Optional[AstraDB] = None,
-        async_astra_db_client: Optional[AsyncAstraDB] = None,
        namespace: Optional[str] = None,
-        setup_mode: SetupMode = SetupMode.SYNC,
-        pre_delete_collection: bool = False,
        embedding: Embeddings,
        metric: Optional[str] = None,
        similarity_threshold: float = ASTRA_DB_SEMANTIC_CACHE_DEFAULT_THRESHOLD,
    ):
        """
-        Cache that uses Astra DB as a vector-store backend for semantic
-        (i.e. similarity-based) lookup.
-
-        It uses a single (vector) collection and can store
-        cached values from several LLMs, so the LLM's 'llm_string' is stored
-        in the document metadata.
-
-        You can choose the preferred similarity (or use the API default).
-        The default score threshold is tuned to the default metric.
-        Tune it carefully yourself if switching to another distance metric.
-
+        Initialize the cache with all relevant parameters.
        Args:
-            collection_name: name of the Astra DB collection to create/use.
-            token: API token for Astra DB usage.
-            api_endpoint: full URL to the API endpoint,
-                such as `https://<DB-ID>-us-east1.apps.astra.datastax.com`.
-            astra_db_client: *alternative to token+api_endpoint*,
+
+            collection_name (str): name of the Astra DB collection to create/use.
+            token (Optional[str]): API token for Astra DB usage.
+            api_endpoint (Optional[str]): full URL to the API endpoint,
+                such as "https://<DB-ID>-us-east1.apps.astra.datastax.com".
+            astra_db_client (Optional[Any]): *alternative to token+api_endpoint*,
                you can pass an already-created 'astrapy.db.AstraDB' instance.
-            async_astra_db_client: *alternative to token+api_endpoint*,
-                you can pass an already-created 'astrapy.db.AsyncAstraDB' instance.
-            namespace: namespace (aka keyspace) where the
+            namespace (Optional[str]): namespace (aka keyspace) where the
                collection is created. Defaults to the database's "default namespace".
-            setup_mode: mode used to create the Astra DB collection (SYNC, ASYNC or
-                OFF).
-            pre_delete_collection: whether to delete the collection
-                before creating it. If False and the collection already exists,
-                the collection will be used as is.
-            embedding: Embedding provider for semantic encoding and search.
+            embedding (Embedding): Embedding provider for semantic
+                encoding and search.
            metric: the function to use for evaluating similarity of text embeddings.
                Defaults to 'cosine' (alternatives: 'euclidean', 'dot_product')
-            similarity_threshold: the minimum similarity for accepting a
-                (semantic-search) match.
+            similarity_threshold (float, optional): the minimum similarity
+                for accepting a (semantic-search) match.
+
+        The default score threshold is tuned to the default metric.
+        Tune it carefully yourself if switching to another distance metric.
        """
+        astra_env = AstraDBEnvironment(
+            token=token,
+            api_endpoint=api_endpoint,
+            astra_db_client=astra_db_client,
+            namespace=namespace,
+        )
+        self.astra_db = astra_env.astra_db
+
        self.embedding = embedding
        self.metric = metric
        self.similarity_threshold = similarity_threshold
-        self.collection_name = collection_name

        # The contract for this class has separate lookup and update:
        # in order to spare some embedding calculations we cache them between
@@ -1646,46 +1538,25 @@ class AstraDBSemanticCache(BaseCache):
            return self.embedding.embed_query(text=text)

        self._get_embedding = _cache_embedding
+        self.embedding_dimension = self._get_embedding_dimension()

-        @_async_lru_cache(maxsize=ASTRA_DB_SEMANTIC_CACHE_EMBEDDING_CACHE_SIZE)
-        async def _acache_embedding(text: str) -> List[float]:
-            return await self.embedding.aembed_query(text=text)
+        self.collection_name = collection_name

-        self._aget_embedding = _acache_embedding
-
-        embedding_dimension: Union[int, Awaitable[int], None] = None
-        if setup_mode == SetupMode.ASYNC:
-            embedding_dimension = self._aget_embedding_dimension()
-        elif setup_mode == SetupMode.SYNC:
-            embedding_dimension = self._get_embedding_dimension()
-
-        self.astra_env = _AstraDBCollectionEnvironment(
-            collection_name=collection_name,
-            token=token,
-            api_endpoint=api_endpoint,
-            astra_db_client=astra_db_client,
-            async_astra_db_client=async_astra_db_client,
-            namespace=namespace,
-            setup_mode=setup_mode,
-            pre_delete_collection=pre_delete_collection,
-            embedding_dimension=embedding_dimension,
-            metric=metric,
+        self.collection = self.astra_db.create_collection(
+            collection_name=self.collection_name,
+            dimension=self.embedding_dimension,
+            metric=self.metric,
        )
-        self.collection = self.astra_env.collection
-        self.async_collection = self.astra_env.async_collection

    def _get_embedding_dimension(self) -> int:
        return len(self._get_embedding(text="This is a sample sentence."))

-    async def _aget_embedding_dimension(self) -> int:
-        return len(await self._aget_embedding(text="This is a sample sentence."))
-
    @staticmethod
    def _make_id(prompt: str, llm_string: str) -> str:
        return f"{_hash(prompt)}#{_hash(llm_string)}"

    def update(self, prompt: str, llm_string: str, return_val: RETURN_VAL_TYPE) -> None:
-        self.astra_env.ensure_db_setup()
+        """Update cache based on prompt and llm_string."""
        doc_id = self._make_id(prompt, llm_string)
        llm_string_hash = _hash(llm_string)
        embedding_vector = self._get_embedding(text=prompt)
@@ -1700,38 +1571,14 @@ class AstraDBSemanticCache(BaseCache):
            }
        )

-    async def aupdate(
-        self, prompt: str, llm_string: str, return_val: RETURN_VAL_TYPE
-    ) -> None:
-        await self.astra_env.aensure_db_setup()
-        doc_id = self._make_id(prompt, llm_string)
-        llm_string_hash = _hash(llm_string)
-        embedding_vector = await self._aget_embedding(text=prompt)
-        body = _dumps_generations(return_val)
-        #
-        await self.async_collection.upsert(
-            {
-                "_id": doc_id,
-                "body_blob": body,
-                "llm_string_hash": llm_string_hash,
-                "$vector": embedding_vector,
-            }
-        )
-
    def lookup(self, prompt: str, llm_string: str) -> Optional[RETURN_VAL_TYPE]:
+        """Look up based on prompt and llm_string."""
        hit_with_id = self.lookup_with_id(prompt, llm_string)
        if hit_with_id is not None:
            return hit_with_id[1]
        else:
            return None

-    async def alookup(self, prompt: str, llm_string: str) -> Optional[RETURN_VAL_TYPE]:
-        hit_with_id = await self.alookup_with_id(prompt, llm_string)
-        if hit_with_id is not None:
-            return hit_with_id[1]
-        else:
-            return None
-
    def lookup_with_id(
        self, prompt: str, llm_string: str
    ) -> Optional[Tuple[str, RETURN_VAL_TYPE]]:
@@ -1739,7 +1586,6 @@ class AstraDBSemanticCache(BaseCache):
        Look up based on prompt and llm_string.
        If there are hits, return (document_id, cached_entry) for the top hit
        """
-        self.astra_env.ensure_db_setup()
        prompt_embedding: List[float] = self._get_embedding(text=prompt)
        llm_string_hash = _hash(llm_string)

@@ -1758,37 +1604,7 @@ class AstraDBSemanticCache(BaseCache):
            generations = _loads_generations(hit["body_blob"])
            if generations is not None:
                # this protects against malformed cached items:
-                return hit["_id"], generations
-            else:
-                return None
-
-    async def alookup_with_id(
-        self, prompt: str, llm_string: str
-    ) -> Optional[Tuple[str, RETURN_VAL_TYPE]]:
-        """
-        Look up based on prompt and llm_string.
-        If there are hits, return (document_id, cached_entry) for the top hit
-        """
-        await self.astra_env.aensure_db_setup()
-        prompt_embedding: List[float] = await self._aget_embedding(text=prompt)
-        llm_string_hash = _hash(llm_string)
-
-        hit = await self.async_collection.vector_find_one(
-            vector=prompt_embedding,
-            filter={
-                "llm_string_hash": llm_string_hash,
-            },
-            fields=["body_blob", "_id"],
-            include_similarity=True,
-        )
-
-        if hit is None or hit["$similarity"] < self.similarity_threshold:
-            return None
-        else:
-            generations = _loads_generations(hit["body_blob"])
-            if generations is not None:
-                # this protects against malformed cached items:
-                return hit["_id"], generations
+                return (hit["_id"], generations)
            else:
                return None

@@ -1801,39 +1617,14 @@ class AstraDBSemanticCache(BaseCache):
        )[1]
        return self.lookup_with_id(prompt, llm_string=llm_string)

-    async def alookup_with_id_through_llm(
-        self, prompt: str, llm: LLM, stop: Optional[List[str]] = None
-    ) -> Optional[Tuple[str, RETURN_VAL_TYPE]]:
-        llm_string = (
-            await aget_prompts(
-                {**llm.dict(), **{"stop": stop}},
-                [],
-            )
-        )[1]
-        return await self.alookup_with_id(prompt, llm_string=llm_string)
-
    def delete_by_document_id(self, document_id: str) -> None:
        """
        Given this is a "similarity search" cache, an invalidation pattern
        that makes sense is first a lookup to get an ID, and then deleting
        with that ID. This is for the second step.
        """
-        self.astra_env.ensure_db_setup()
        self.collection.delete_one(document_id)

-    async def adelete_by_document_id(self, document_id: str) -> None:
-        """
-        Given this is a "similarity search" cache, an invalidation pattern
-        that makes sense is first a lookup to get an ID, and then deleting
-        with that ID. This is for the second step.
-        """
-        await self.astra_env.aensure_db_setup()
-        await self.async_collection.delete_one(document_id)
-
    def clear(self, **kwargs: Any) -> None:
-        self.astra_env.ensure_db_setup()
-        self.collection.clear()
-
-    async def aclear(self, **kwargs: Any) -> None:
-        await self.astra_env.aensure_db_setup()
-        await self.async_collection.clear()
+        """Clear the *whole* semantic cache."""
+        self.astra_db.truncate_collection(self.collection_name)
--- a/libs/community/langchain_community/chat_message_histories/astradb.py
+++ b/libs/community/langchain_community/chat_message_histories/astradb.py
@@ -3,12 +3,9 @@ from __future__ import annotations

 import json
 import time
-from typing import TYPE_CHECKING, List, Optional, Sequence
+from typing import TYPE_CHECKING, List, Optional

-from langchain_community.utilities.astradb import (
-    SetupMode,
-    _AstraDBCollectionEnvironment,
-)
+from langchain_community.utilities.astradb import AstraDBEnvironment

 if TYPE_CHECKING:
    from astrapy.db import AstraDB
@@ -26,16 +23,16 @@ DEFAULT_COLLECTION_NAME = "langchain_message_store"
 class AstraDBChatMessageHistory(BaseChatMessageHistory):
    """Chat message history that stores history in Astra DB.

-    Args:
+    Args (only keyword-arguments accepted):
        session_id: arbitrary key that is used to store the messages
            of a single chat session.
-        collection_name: name of the Astra DB collection to create/use.
-        token: API token for Astra DB usage.
-        api_endpoint: full URL to the API endpoint,
+        collection_name (str): name of the Astra DB collection to create/use.
+        token (Optional[str]): API token for Astra DB usage.
+        api_endpoint (Optional[str]): full URL to the API endpoint,
            such as "https://<DB-ID>-us-east1.apps.astra.datastax.com".
-        astra_db_client: *alternative to token+api_endpoint*,
+        astra_db_client (Optional[Any]): *alternative to token+api_endpoint*,
            you can pass an already-created 'astrapy.db.AstraDB' instance.
-        namespace: namespace (aka keyspace) where the
+        namespace (Optional[str]): namespace (aka keyspace) where the
            collection is created. Defaults to the database's "default namespace".
    """

@@ -48,29 +45,24 @@ class AstraDBChatMessageHistory(BaseChatMessageHistory):
        api_endpoint: Optional[str] = None,
        astra_db_client: Optional[AstraDB] = None,
        namespace: Optional[str] = None,
-        setup_mode: SetupMode = SetupMode.SYNC,
-        pre_delete_collection: bool = False,
    ) -> None:
-        self.astra_env = _AstraDBCollectionEnvironment(
-            collection_name=collection_name,
+        """Create an Astra DB chat message history."""
+        astra_env = AstraDBEnvironment(
            token=token,
            api_endpoint=api_endpoint,
            astra_db_client=astra_db_client,
            namespace=namespace,
-            setup_mode=setup_mode,
-            pre_delete_collection=pre_delete_collection,
        )
+        self.astra_db = astra_env.astra_db

-        self.collection = self.astra_env.collection
-        self.async_collection = self.astra_env.async_collection
+        self.collection = self.astra_db.create_collection(collection_name)

        self.session_id = session_id
        self.collection_name = collection_name

    @property
-    def messages(self) -> List[BaseMessage]:
+    def messages(self) -> List[BaseMessage]:  # type: ignore
        """Retrieve all session messages from DB"""
-        self.astra_env.ensure_db_setup()
        message_blobs = [
            doc["body_blob"]
            for doc in sorted(
@@ -90,58 +82,16 @@ class AstraDBChatMessageHistory(BaseChatMessageHistory):
        messages = messages_from_dict(items)
        return messages

-    @messages.setter
-    def messages(self, messages: List[BaseMessage]) -> None:
-        raise NotImplementedError("Use add_messages instead")
-
-    async def aget_messages(self) -> List[BaseMessage]:
-        await self.astra_env.aensure_db_setup()
-        docs = self.async_collection.paginated_find(
-            filter={
-                "session_id": self.session_id,
-            },
-            projection={
-                "timestamp": 1,
-                "body_blob": 1,
-            },
-        )
-        sorted_docs = sorted(
-            [doc async for doc in docs],
-            key=lambda _doc: _doc["timestamp"],
-        )
-        message_blobs = [doc["body_blob"] for doc in sorted_docs]
-        items = [json.loads(message_blob) for message_blob in message_blobs]
-        messages = messages_from_dict(items)
-        return messages
-
-    def add_messages(self, messages: Sequence[BaseMessage]) -> None:
-        self.astra_env.ensure_db_setup()
-        docs = [
+    def add_message(self, message: BaseMessage) -> None:
+        """Write a message to the table"""
+        self.collection.insert_one(
            {
                "timestamp": time.time(),
                "session_id": self.session_id,
                "body_blob": json.dumps(message_to_dict(message)),
            }
-            for message in messages
-        ]
-        self.collection.chunked_insert_many(docs)
-
-    async def aadd_messages(self, messages: Sequence[BaseMessage]) -> None:
-        await self.astra_env.aensure_db_setup()
-        docs = [
-            {
-                "timestamp": time.time(),
-                "session_id": self.session_id,
-                "body_blob": json.dumps(message_to_dict(message)),
-            }
-            for message in messages
-        ]
-        await self.async_collection.chunked_insert_many(docs)
+        )

    def clear(self) -> None:
-        self.astra_env.ensure_db_setup()
+        """Clear session memory from DB"""
        self.collection.delete_many(filter={"session_id": self.session_id})
-
-    async def aclear(self) -> None:
-        await self.astra_env.aensure_db_setup()
-        await self.async_collection.delete_many(filter={"session_id": self.session_id})
--- a/libs/community/langchain_community/chat_models/baichuan.py
+++ b/libs/community/langchain_community/chat_models/baichuan.py
@@ -218,10 +218,9 @@ class ChatBaichuan(BaseChatModel):
                    m.get("delta"), default_chunk_class
                )
                default_chunk_class = chunk.__class__
-                cg_chunk = ChatGenerationChunk(message=chunk)
-                yield cg_chunk
+                yield ChatGenerationChunk(message=chunk)
                if run_manager:
-                    run_manager.on_llm_new_token(chunk.content, chunk=cg_chunk)
+                    run_manager.on_llm_new_token(chunk.content)

    def _chat(self, messages: List[BaseMessage], **kwargs: Any) -> requests.Response:
        parameters = {**self._default_params, **kwargs}
--- a/libs/community/langchain_community/chat_models/cohere.py
+++ b/libs/community/langchain_community/chat_models/cohere.py
@@ -147,10 +147,9 @@ class ChatCohere(BaseChatModel, BaseCohere):
        for data in stream:
            if data.event_type == "text-generation":
                delta = data.text
-                chunk = ChatGenerationChunk(message=AIMessageChunk(content=delta))
-                yield chunk
+                yield ChatGenerationChunk(message=AIMessageChunk(content=delta))
                if run_manager:
-                    run_manager.on_llm_new_token(delta, chunk=chunk)
+                    run_manager.on_llm_new_token(delta)

    async def _astream(
        self,
@@ -165,10 +164,9 @@ class ChatCohere(BaseChatModel, BaseCohere):
        async for data in stream:
            if data.event_type == "text-generation":
                delta = data.text
-                chunk = ChatGenerationChunk(message=AIMessageChunk(content=delta))
-                yield chunk
+                yield ChatGenerationChunk(message=AIMessageChunk(content=delta))
                if run_manager:
-                    await run_manager.on_llm_new_token(delta, chunk=chunk)
+                    await run_manager.on_llm_new_token(delta)

    def _get_generation_info(self, response: Any) -> Dict[str, Any]:
        """Get the generation info from cohere API response."""
--- a/libs/community/langchain_community/chat_models/deepinfra.py
+++ b/libs/community/langchain_community/chat_models/deepinfra.py
@@ -328,10 +328,9 @@ class ChatDeepInfra(BaseChatModel):
        for line in _parse_stream(response.iter_lines()):
            chunk = _handle_sse_line(line)
            if chunk:
-                cg_chunk = ChatGenerationChunk(message=chunk, generation_info=None)
-                yield cg_chunk
+                yield ChatGenerationChunk(message=chunk, generation_info=None)
                if run_manager:
-                    run_manager.on_llm_new_token(str(chunk.content), chunk=cg_chunk)
+                    run_manager.on_llm_new_token(str(chunk.content))

    async def _astream(
        self,
@@ -351,12 +350,9 @@ class ChatDeepInfra(BaseChatModel):
            async for line in _parse_stream_async(response.content):
                chunk = _handle_sse_line(line)
                if chunk:
-                    cg_chunk = ChatGenerationChunk(message=chunk, generation_info=None)
-                    yield cg_chunk
+                    yield ChatGenerationChunk(message=chunk, generation_info=None)
                    if run_manager:
-                        await run_manager.on_llm_new_token(
-                            str(chunk.content), chunk=cg_chunk
-                        )
+                        await run_manager.on_llm_new_token(str(chunk.content))

    async def _agenerate(
        self,
--- a/libs/community/langchain_community/chat_models/gigachat.py
+++ b/libs/community/langchain_community/chat_models/gigachat.py
@@ -154,10 +154,9 @@ class GigaChat(_BaseGigaChat, BaseChatModel):
        for chunk in self._client.stream(payload):
            if chunk.choices:
                content = chunk.choices[0].delta.content
-                cg_chunk = ChatGenerationChunk(message=AIMessageChunk(content=content))
-                yield cg_chunk
+                yield ChatGenerationChunk(message=AIMessageChunk(content=content))
                if run_manager:
-                    run_manager.on_llm_new_token(content, chunk=cg_chunk)
+                    run_manager.on_llm_new_token(content)

    async def _astream(
        self,
@@ -171,10 +170,9 @@ class GigaChat(_BaseGigaChat, BaseChatModel):
        async for chunk in self._client.astream(payload):
            if chunk.choices:
                content = chunk.choices[0].delta.content
-                cg_chunk = ChatGenerationChunk(message=AIMessageChunk(content=content))
-                yield cg_chunk
+                yield ChatGenerationChunk(message=AIMessageChunk(content=content))
                if run_manager:
-                    await run_manager.on_llm_new_token(content, chunk=cg_chunk)
+                    await run_manager.on_llm_new_token(content)

    def get_num_tokens(self, text: str) -> int:
        """Count approximate number of tokens"""
--- a/libs/community/langchain_community/chat_models/huggingface.py
+++ b/libs/community/langchain_community/chat_models/huggingface.py
@@ -1,5 +1,4 @@
 """Hugging Face Chat Wrapper."""
-
 from typing import Any, List, Optional, Union

 from langchain_core.callbacks.manager import (
@@ -53,7 +52,6 @@ class ChatHuggingFace(BaseChatModel):
        from transformers import AutoTokenizer

        self._resolve_model_id()
-
        self.tokenizer = (
            AutoTokenizer.from_pretrained(self.model_id)
            if self.tokenizer is None
@@ -92,10 +90,10 @@ class ChatHuggingFace(BaseChatModel):
    ) -> str:
        """Convert a list of messages into a prompt format expected by wrapped LLM."""
        if not messages:
-            raise ValueError("At least one HumanMessage must be provided!")
+            raise ValueError("at least one HumanMessage must be provided")

        if not isinstance(messages[-1], HumanMessage):
-            raise ValueError("Last message must be a HumanMessage!")
+            raise ValueError("last message must be a HumanMessage")

        messages_dicts = [self._to_chatml_format(m) for m in messages]

@@ -137,15 +135,20 @@ class ChatHuggingFace(BaseChatModel):
        from huggingface_hub import list_inference_endpoints

        available_endpoints = list_inference_endpoints("*")
-        if isinstance(self.llm, HuggingFaceHub) or (
-            hasattr(self.llm, "repo_id") and self.llm.repo_id
-        ):
+
+        if isinstance(self.llm, HuggingFaceTextGenInference):
+            endpoint_url = self.llm.inference_server_url
+
+        elif isinstance(self.llm, HuggingFaceEndpoint):
+            endpoint_url = self.llm.endpoint_url
+
+        elif isinstance(self.llm, HuggingFaceHub):
+            # no need to look up model_id for HuggingFaceHub LLM
            self.model_id = self.llm.repo_id
            return
-        elif isinstance(self.llm, HuggingFaceTextGenInference):
-            endpoint_url: Optional[str] = self.llm.inference_server_url
+
        else:
-            endpoint_url = self.llm.endpoint_url
+            raise ValueError(f"Unknown LLM type: {type(self.llm)}")

        for endpoint in available_endpoints:
            if endpoint.url == endpoint_url:
@@ -153,8 +156,8 @@ class ChatHuggingFace(BaseChatModel):

        if not self.model_id:
            raise ValueError(
-                "Failed to resolve model_id:"
-                f"Could not find model id for inference server: {endpoint_url}"
+                "Failed to resolve model_id"
+                f"Could not find model id for inference server provided: {endpoint_url}"
                "Make sure that your Hugging Face token has access to the endpoint."
            )

--- a/libs/community/langchain_community/chat_models/hunyuan.py
+++ b/libs/community/langchain_community/chat_models/hunyuan.py
@@ -275,10 +275,9 @@ class ChatHunyuan(BaseChatModel):
                    choice["delta"], default_chunk_class
                )
                default_chunk_class = chunk.__class__
-                cg_chunk = ChatGenerationChunk(message=chunk)
-                yield cg_chunk
+                yield ChatGenerationChunk(message=chunk)
                if run_manager:
-                    run_manager.on_llm_new_token(chunk.content, chunk=cg_chunk)
+                    run_manager.on_llm_new_token(chunk.content)

    def _chat(self, messages: List[BaseMessage], **kwargs: Any) -> requests.Response:
        if self.hunyuan_secret_key is None:
--- a/libs/community/langchain_community/chat_models/jinachat.py
+++ b/libs/community/langchain_community/chat_models/jinachat.py
@@ -312,10 +312,9 @@ class JinaChat(BaseChatModel):
            delta = chunk["choices"][0]["delta"]
            chunk = _convert_delta_to_message_chunk(delta, default_chunk_class)
            default_chunk_class = chunk.__class__
-            cg_chunk = ChatGenerationChunk(message=chunk)
-            yield cg_chunk
+            yield ChatGenerationChunk(message=chunk)
            if run_manager:
-                run_manager.on_llm_new_token(chunk.content, chunk=cg_chunk)
+                run_manager.on_llm_new_token(chunk.content)

    def _generate(
        self,
@@ -372,10 +371,9 @@ class JinaChat(BaseChatModel):
            delta = chunk["choices"][0]["delta"]
            chunk = _convert_delta_to_message_chunk(delta, default_chunk_class)
            default_chunk_class = chunk.__class__
-            cg_chunk = ChatGenerationChunk(message=chunk)
-            yield cg_chunk
+            yield ChatGenerationChunk(message=chunk)
            if run_manager:
-                await run_manager.on_llm_new_token(chunk.content, chunk=cg_chunk)
+                await run_manager.on_llm_new_token(chunk.content)

    async def _agenerate(
        self,
--- a/libs/community/langchain_community/chat_models/litellm.py
+++ b/libs/community/langchain_community/chat_models/litellm.py
@@ -355,10 +355,9 @@ class ChatLiteLLM(BaseChatModel):
            delta = chunk["choices"][0]["delta"]
            chunk = _convert_delta_to_message_chunk(delta, default_chunk_class)
            default_chunk_class = chunk.__class__
-            cg_chunk = ChatGenerationChunk(message=chunk)
-            yield cg_chunk
+            yield ChatGenerationChunk(message=chunk)
            if run_manager:
-                run_manager.on_llm_new_token(chunk.content, chunk=cg_chunk)
+                run_manager.on_llm_new_token(chunk.content)

    async def _astream(
        self,
@@ -379,10 +378,9 @@ class ChatLiteLLM(BaseChatModel):
            delta = chunk["choices"][0]["delta"]
            chunk = _convert_delta_to_message_chunk(delta, default_chunk_class)
            default_chunk_class = chunk.__class__
-            cg_chunk = ChatGenerationChunk(message=chunk)
-            yield cg_chunk
+            yield ChatGenerationChunk(message=chunk)
            if run_manager:
-                await run_manager.on_llm_new_token(chunk.content, chunk=cg_chunk)
+                await run_manager.on_llm_new_token(chunk.content)

    async def _agenerate(
        self,
--- a/libs/community/langchain_community/chat_models/litellm_router.py
+++ b/libs/community/langchain_community/chat_models/litellm_router.py
@@ -123,10 +123,9 @@ class ChatLiteLLMRouter(ChatLiteLLM):
            delta = chunk["choices"][0]["delta"]
            chunk = _convert_delta_to_message_chunk(delta, default_chunk_class)
            default_chunk_class = chunk.__class__
-            cg_chunk = ChatGenerationChunk(message=chunk)
-            yield cg_chunk
+            yield ChatGenerationChunk(message=chunk)
            if run_manager:
-                run_manager.on_llm_new_token(chunk.content, chunk=cg_chunk, **params)
+                run_manager.on_llm_new_token(chunk.content, **params)

    async def _astream(
        self,
@@ -149,12 +148,9 @@ class ChatLiteLLMRouter(ChatLiteLLM):
            delta = chunk["choices"][0]["delta"]
            chunk = _convert_delta_to_message_chunk(delta, default_chunk_class)
            default_chunk_class = chunk.__class__
-            cg_chunk = ChatGenerationChunk(message=chunk)
-            yield cg_chunk
+            yield ChatGenerationChunk(message=chunk)
            if run_manager:
-                await run_manager.on_llm_new_token(
-                    chunk.content, chunk=cg_chunk, **params
-                )
+                await run_manager.on_llm_new_token(chunk.content, **params)

    async def _agenerate(
        self,
--- a/libs/community/langchain_community/chat_models/ollama.py
+++ b/libs/community/langchain_community/chat_models/ollama.py
@@ -195,7 +195,6 @@ class ChatOllama(BaseChatModel, _OllamaCommon):
                if run_manager:
                    run_manager.on_llm_new_token(
                        chunk.text,
-                        chunk=chunk,
                        verbose=verbose,
                    )
        if final_chunk is None:
@@ -222,7 +221,6 @@ class ChatOllama(BaseChatModel, _OllamaCommon):
                if run_manager:
                    await run_manager.on_llm_new_token(
                        chunk.text,
-                        chunk=chunk,
                        verbose=verbose,
                    )
        if final_chunk is None:
--- a/libs/community/langchain_community/chat_models/pai_eas_endpoint.py
+++ b/libs/community/langchain_community/chat_models/pai_eas_endpoint.py
@@ -291,12 +291,9 @@ class PaiEasChatEndpoint(BaseChatModel):

                # yield text, if any
                if text:
-                    cg_chunk = ChatGenerationChunk(message=content)
                    if run_manager:
-                        await run_manager.on_llm_new_token(
-                            cast(str, content.content), chunk=cg_chunk
-                        )
-                    yield cg_chunk
+                        await run_manager.on_llm_new_token(cast(str, content.content))
+                    yield ChatGenerationChunk(message=content)

                # break if stop sequence found
                if stop_seq_found:
--- a/libs/community/langchain_community/chat_models/sparkllm.py
+++ b/libs/community/langchain_community/chat_models/sparkllm.py
@@ -224,10 +224,9 @@ class ChatSparkLLM(BaseChatModel):
                continue
            delta = content["data"]
            chunk = _convert_delta_to_message_chunk(delta, default_chunk_class)
-            cg_chunk = ChatGenerationChunk(message=chunk)
-            yield cg_chunk
+            yield ChatGenerationChunk(message=chunk)
            if run_manager:
-                run_manager.on_llm_new_token(str(chunk.content), chunk=cg_chunk)
+                run_manager.on_llm_new_token(str(chunk.content))

    def _generate(
        self,
--- a/libs/community/langchain_community/chat_models/vertexai.py
+++ b/libs/community/langchain_community/chat_models/vertexai.py
@@ -376,10 +376,9 @@ class ChatVertexAI(_VertexAICommon, BaseChatModel):
            chat = self._start_chat(history, **params)
            responses = chat.send_message_streaming(question.content, **params)
        for response in responses:
-            chunk = ChatGenerationChunk(message=AIMessageChunk(content=response.text))
            if run_manager:
-                run_manager.on_llm_new_token(response.text, chunk=chunk)
-            yield chunk
+                run_manager.on_llm_new_token(response.text)
+            yield ChatGenerationChunk(message=AIMessageChunk(content=response.text))

    def _start_chat(
        self, history: _ChatHistory, **kwargs: Any
--- a/libs/community/langchain_community/chat_models/volcengine_maas.py
+++ b/libs/community/langchain_community/chat_models/volcengine_maas.py
@@ -116,10 +116,9 @@ class VolcEngineMaasChat(BaseChatModel, VolcEngineMaasBase):
        for res in self.client.stream_chat(params):
            if res:
                msg = convert_dict_to_message(res)
-                chunk = ChatGenerationChunk(message=AIMessageChunk(content=msg.content))
-                yield chunk
+                yield ChatGenerationChunk(message=AIMessageChunk(content=msg.content))
                if run_manager:
-                    run_manager.on_llm_new_token(cast(str, msg.content), chunk=chunk)
+                    run_manager.on_llm_new_token(cast(str, msg.content))

    def _generate(
        self,
--- a/libs/community/langchain_community/chat_models/yuan2.py
+++ b/libs/community/langchain_community/chat_models/yuan2.py
@@ -269,13 +269,12 @@ class ChatYuan2(BaseChatModel):
                dict(finish_reason=finish_reason) if finish_reason is not None else None
            )
            default_chunk_class = chunk.__class__
-            cg_chunk = ChatGenerationChunk(
+            yield ChatGenerationChunk(
                message=chunk,
                generation_info=generation_info,
            )
-            yield cg_chunk
            if run_manager:
-                run_manager.on_llm_new_token(chunk.content, chunk=cg_chunk)
+                run_manager.on_llm_new_token(chunk.content)

    def _generate(
        self,
@@ -352,13 +351,12 @@ class ChatYuan2(BaseChatModel):
                dict(finish_reason=finish_reason) if finish_reason is not None else None
            )
            default_chunk_class = chunk.__class__
-            cg_chunk = ChatGenerationChunk(
+            yield ChatGenerationChunk(
                message=chunk,
                generation_info=generation_info,
            )
-            yield cg_chunk
            if run_manager:
-                await run_manager.on_llm_new_token(chunk.content, chunk=cg_chunk)
+                await run_manager.on_llm_new_token(chunk.content)

    async def _agenerate(
        self,
--- a/libs/community/langchain_community/chat_models/zhipuai.py
+++ b/libs/community/langchain_community/chat_models/zhipuai.py
@@ -327,10 +327,9 @@ class ChatZhipuAI(BaseChatModel):
        for r in response.events():
            if r.event == "add":
                delta = r.data
-                chunk = ChatGenerationChunk(message=AIMessageChunk(content=delta))
-                yield chunk
+                yield ChatGenerationChunk(message=AIMessageChunk(content=delta))
                if run_manager:
-                    run_manager.on_llm_new_token(delta, chunk=chunk)
+                    run_manager.on_llm_new_token(delta)

            elif r.event == "error":
                raise ValueError(f"Error from ZhipuAI API response: {r.data}")
--- a/libs/community/langchain_community/document_loaders/astradb.py
+++ b/libs/community/langchain_community/document_loaders/astradb.py
@@ -2,6 +2,8 @@ from __future__ import annotations

 import json
 import logging
+import threading
+from queue import Queue
 from typing import (
    TYPE_CHECKING,
    Any,
@@ -14,9 +16,10 @@ from typing import (
 )

 from langchain_core.documents import Document
+from langchain_core.runnables import run_in_executor

 from langchain_community.document_loaders.base import BaseLoader
-from langchain_community.utilities.astradb import _AstraDBEnvironment
+from langchain_community.utilities.astradb import AstraDBEnvironment

 if TYPE_CHECKING:
    from astrapy.db import AstraDB, AsyncAstraDB
@@ -25,10 +28,11 @@ logger = logging.getLogger(__name__)


 class AstraDBLoader(BaseLoader):
+    """Load DataStax Astra DB documents."""
+
    def __init__(
        self,
        collection_name: str,
-        *,
        token: Optional[str] = None,
        api_endpoint: Optional[str] = None,
        astra_db_client: Optional[AstraDB] = None,
@@ -40,27 +44,7 @@ class AstraDBLoader(BaseLoader):
        nb_prefetched: int = 1000,
        extraction_function: Callable[[Dict], str] = json.dumps,
    ) -> None:
-        """Load DataStax Astra DB documents.
-
-        Args:
-            collection_name: name of the Astra DB collection to use.
-            token: API token for Astra DB usage.
-            api_endpoint: full URL to the API endpoint,
-                such as `https://<DB-ID>-us-east1.apps.astra.datastax.com`.
-            astra_db_client: *alternative to token+api_endpoint*,
-                you can pass an already-created 'astrapy.db.AstraDB' instance.
-            async_astra_db_client: *alternative to token+api_endpoint*,
-                you can pass an already-created 'astrapy.db.AsyncAstraDB' instance.
-            namespace: namespace (aka keyspace) where the
-                collection is. Defaults to the database's "default namespace".
-            filter_criteria: Criteria to filter documents.
-            projection: Specifies the fields to return.
-            find_options: Additional options for the query.
-            nb_prefetched: Max number of documents to pre-fetch. Defaults to 1000.
-            extraction_function: Function applied to collection documents to create
-                the `page_content` of the LangChain Document. Defaults to `json.dumps`.
-        """
-        astra_env = _AstraDBEnvironment(
+        astra_env = AstraDBEnvironment(
            token=token,
            api_endpoint=api_endpoint,
            astra_db_client=astra_db_client,
@@ -77,30 +61,42 @@ class AstraDBLoader(BaseLoader):
        self.extraction_function = extraction_function

    def load(self) -> List[Document]:
+        """Eagerly load the content."""
        return list(self.lazy_load())

    def lazy_load(self) -> Iterator[Document]:
-        for doc in self.collection.paginated_find(
-            filter=self.filter,
-            options=self.find_options,
-            projection=self.projection,
-            sort=None,
-            prefetched=self.nb_prefetched,
-        ):
-            yield Document(
-                page_content=self.extraction_function(doc),
-                metadata={
-                    "namespace": self.collection.astra_db.namespace,
-                    "api_endpoint": self.collection.astra_db.base_url,
-                    "collection": self.collection_name,
-                },
-            )
+        queue = Queue(self.nb_prefetched)  # type: ignore
+        t = threading.Thread(target=self.fetch_results, args=(queue,))
+        t.start()
+        while True:
+            doc = queue.get()
+            if doc is None:
+                break
+            yield doc
+        t.join()

    async def aload(self) -> List[Document]:
        """Load data into Document objects."""
        return [doc async for doc in self.alazy_load()]

    async def alazy_load(self) -> AsyncIterator[Document]:
+        if not self.astra_env.async_astra_db:
+            iterator = run_in_executor(
+                None,
+                self.collection.paginated_find,
+                filter=self.filter,
+                options=self.find_options,
+                projection=self.projection,
+                sort=None,
+                prefetched=True,
+            )
+            done = object()
+            while True:
+                item = await run_in_executor(None, lambda it: next(it, done), iterator)
+                if item is done:
+                    break
+                yield item  # type: ignore[misc]
+            return
        async_collection = await self.astra_env.async_astra_db.collection(
            self.collection_name
        )
@@ -109,7 +105,7 @@ class AstraDBLoader(BaseLoader):
            options=self.find_options,
            projection=self.projection,
            sort=None,
-            prefetched=self.nb_prefetched,
+            prefetched=True,
        ):
            yield Document(
                page_content=self.extraction_function(doc),
@@ -119,3 +115,29 @@ class AstraDBLoader(BaseLoader):
                    "collection": self.collection_name,
                },
            )
+
+    def fetch_results(self, queue: Queue):  # type: ignore[no-untyped-def]
+        self.fetch_page_result(queue)
+        while self.find_options.get("pageState"):
+            self.fetch_page_result(queue)
+        queue.put(None)
+
+    def fetch_page_result(self, queue: Queue):  # type: ignore[no-untyped-def]
+        res = self.collection.find(
+            filter=self.filter,
+            options=self.find_options,
+            projection=self.projection,
+            sort=None,
+        )
+        self.find_options["pageState"] = res["data"].get("nextPageState")
+        for doc in res["data"]["documents"]:
+            queue.put(
+                Document(
+                    page_content=self.extraction_function(doc),
+                    metadata={
+                        "namespace": self.collection.astra_db.namespace,
+                        "api_endpoint": self.collection.astra_db.base_url,
+                        "collection": self.collection.collection_name,
+                    },
+                )
+            )
--- a/libs/community/langchain_community/document_loaders/cassandra.py
+++ b/libs/community/langchain_community/document_loaders/cassandra.py
@@ -29,18 +29,18 @@ class CassandraLoader(BaseLoader):
        table: Optional[str] = None,
        session: Optional[Session] = None,
        keyspace: Optional[str] = None,
-        query: Union[str, Statement, None] = None,
+        query: Optional[Union[str, Statement]] = None,
        page_content_mapper: Callable[[Any], str] = str,
        metadata_mapper: Callable[[Any], dict] = lambda _: {},
        *,
-        query_parameters: Union[dict, Sequence, None] = None,
+        query_parameters: Union[dict, Sequence] = None,  # type: ignore[assignment]
        query_timeout: Optional[float] = _NOT_SET,  # type: ignore[assignment]
        query_trace: bool = False,
-        query_custom_payload: Optional[dict] = None,
+        query_custom_payload: dict = None,  # type: ignore[assignment]
        query_execution_profile: Any = _NOT_SET,
        query_paging_state: Any = None,
-        query_host: Optional[Host] = None,
-        query_execute_as: Optional[str] = None,
+        query_host: Host = None,
+        query_execute_as: str = None,  # type: ignore[assignment]
    ) -> None:
        """
        Document Loader for Apache Cassandra.
--- a/libs/community/langchain_community/document_loaders/directory.py
+++ b/libs/community/langchain_community/document_loaders/directory.py
@@ -2,7 +2,7 @@ import concurrent
 import logging
 import random
 from pathlib import Path
-from typing import Any, List, Optional, Sequence, Type, Union
+from typing import Any, List, Optional, Type, Union

 from langchain_core.documents import Document

@@ -41,7 +41,6 @@ class DirectoryLoader(BaseLoader):
        use_multithreading: bool = False,
        max_concurrency: int = 4,
        *,
-        exclude: Union[Sequence[str], str] = (),
        sample_size: int = 0,
        randomize_sample: bool = False,
        sample_seed: Union[int, None] = None,
@@ -52,8 +51,6 @@ class DirectoryLoader(BaseLoader):
            path: Path to directory.
            glob: Glob pattern to use to find files. Defaults to "**/[!.]*"
               (all files except hidden).
-            exclude: A pattern or list of patterns to exclude from results.
-                Use glob syntax.
            silent_errors: Whether to silently ignore errors. Defaults to False.
            load_hidden: Whether to load hidden files. Defaults to False.
            loader_cls: Loader class to use for loading files.
@@ -67,38 +64,11 @@ class DirectoryLoader(BaseLoader):
                directory.
            randomize_sample: Shuffle the files to get a random sample.
            sample_seed: set the seed of the random shuffle for reproducibility.
-
-        Examples:
-
-            .. code-block:: python
-                from langchain_community.document_loaders import DirectoryLoader
-
-                # Load all non-hidden files in a directory.
-                loader = DirectoryLoader("/path/to/directory")
-
-                # Load all text files in a directory without recursion.
-                loader = DirectoryLoader("/path/to/directory", glob="*.txt")
-
-                # Recursively load all text files in a directory.
-                loader = DirectoryLoader(
-                    "/path/to/directory", glob="*.txt", recursive=True
-                )
-
-                # Load all files in a directory, except for py files.
-                loader = DirectoryLoader("/path/to/directory", exclude="*.py")
-
-                # Load all files in a directory, except for py or pyc files.
-                loader = DirectoryLoader(
-                    "/path/to/directory", exclude=["*.py", "*.pyc"]
-                )
        """
        if loader_kwargs is None:
            loader_kwargs = {}
-        if isinstance(exclude, str):
-            exclude = (exclude,)
        self.path = path
        self.glob = glob
-        self.exclude = exclude
        self.load_hidden = load_hidden
        self.loader_cls = loader_cls
        self.loader_kwargs = loader_kwargs
@@ -148,13 +118,7 @@ class DirectoryLoader(BaseLoader):
            raise ValueError(f"Expected directory, got file: '{self.path}'")

        docs: List[Document] = []
-
-        paths = p.rglob(self.glob) if self.recursive else p.glob(self.glob)
-        items = [
-            path
-            for path in paths
-            if not (self.exclude and any(path.match(glob) for glob in self.exclude))
-        ]
+        items = list(p.rglob(self.glob) if self.recursive else p.glob(self.glob))

        if self.sample_size > 0:
            if self.randomize_sample:
--- a/libs/community/langchain_community/document_loaders/pebblo.py
+++ b/libs/community/langchain_community/document_loaders/pebblo.py
@@ -2,6 +2,7 @@

 import logging
 import os
+import pwd
 import uuid
 from http import HTTPStatus
 from typing import Any, Dict, Iterator, List
@@ -259,8 +260,6 @@ class PebbloSafeLoader(BaseLoader):
            str: Name of owner.
        """
        try:
-            import pwd
-
            file_owner_uid = os.stat(file_path).st_uid
            file_owner_name = pwd.getpwuid(file_owner_uid).pw_name
        except Exception:
--- a/libs/community/langchain_community/embeddings/init.py
+++ b/libs/community/langchain_community/embeddings/init.py
@@ -65,13 +65,11 @@ from langchain_community.embeddings.mlflow import (
 from langchain_community.embeddings.mlflow_gateway import MlflowAIGatewayEmbeddings
 from langchain_community.embeddings.modelscope_hub import ModelScopeEmbeddings
 from langchain_community.embeddings.mosaicml import MosaicMLInstructorEmbeddings
-from langchain_community.embeddings.nemo import NeMoEmbeddings
 from langchain_community.embeddings.nlpcloud import NLPCloudEmbeddings
 from langchain_community.embeddings.oci_generative_ai import OCIGenAIEmbeddings
 from langchain_community.embeddings.octoai_embeddings import OctoAIEmbeddings
 from langchain_community.embeddings.ollama import OllamaEmbeddings
 from langchain_community.embeddings.openai import OpenAIEmbeddings
-from langchain_community.embeddings.optimum_intel import QuantizedBiEncoderEmbeddings
 from langchain_community.embeddings.sagemaker_endpoint import (
    SagemakerEndpointEmbeddings,
 )
@@ -84,7 +82,6 @@ from langchain_community.embeddings.sentence_transformer import (
    SentenceTransformerEmbeddings,
 )
 from langchain_community.embeddings.spacy_embeddings import SpacyEmbeddings
-from langchain_community.embeddings.sparkllm import SparkLLMTextEmbeddings
 from langchain_community.embeddings.tensorflow_hub import TensorflowHubEmbeddings
 from langchain_community.embeddings.vertexai import VertexAIEmbeddings
 from langchain_community.embeddings.volcengine import VolcanoEmbeddings
@@ -151,9 +148,6 @@ __all__ = [
    "BookendEmbeddings",
    "VolcanoEmbeddings",
    "OCIGenAIEmbeddings",
-    "QuantizedBiEncoderEmbeddings",
-    "NeMoEmbeddings",
-    "SparkLLMTextEmbeddings",
 ]


--- a/libs/community/langchain_community/embeddings/nemo.py
+++ b/libs/community/langchain_community/embeddings/nemo.py
@@ -1,169 +0,0 @@
-from __future__ import annotations
-
-import asyncio
-import json
-from typing import Any, Dict, List, Optional
-
-import aiohttp
-import requests
-from langchain_core.embeddings import Embeddings
-from langchain_core.pydantic_v1 import BaseModel, root_validator
-
-
-def is_endpoint_live(url: str, headers: Optional[dict], payload: Any) -> bool:
-    """
-    Check if an endpoint is live by sending a GET request to the specified URL.
-
-    Args:
-        url (str): The URL of the endpoint to check.
-
-    Returns:
-        bool: True if the endpoint is live (status code 200), False otherwise.
-
-    Raises:
-        Exception: If the endpoint returns a non-successful status code or if there is
-            an error querying the endpoint.
-    """
-    try:
-        response = requests.request("POST", url, headers=headers, data=payload)
-
-        # Check if the status code is 200 (OK)
-        if response.status_code == 200:
-            return True
-        else:
-            # Raise an exception if the status code is not 200
-            raise Exception(
-                f"Endpoint returned a non-successful status code: "
-                f"{response.status_code}"
-            )
-    except requests.exceptions.RequestException as e:
-        # Handle any exceptions (e.g., connection errors)
-        raise Exception(f"Error querying the endpoint: {e}")
-
-
-class NeMoEmbeddings(BaseModel, Embeddings):
-    batch_size: int = 16
-    model: str = "NV-Embed-QA-003"
-    api_endpoint_url: str = "http://localhost:8088/v1/embeddings"
-
-    @root_validator()
-    def validate_environment(cls, values: Dict) -> Dict:
-        """Validate that the end point is alive using the values that are provided."""
-
-        url = values["api_endpoint_url"]
-        model = values["model"]
-
-        # Optional: A minimal test payload and headers required by the endpoint
-        headers = {"Content-Type": "application/json"}
-        payload = json.dumps(
-            {"input": "Hello World", "model": model, "input_type": "query"}
-        )
-
-        is_endpoint_live(url, headers, payload)
-
-        return values
-
-    async def _aembedding_func(
-        self, session: Any, text: str, input_type: str
-    ) -> List[float]:
-        """Async call out to embedding endpoint.
-
-        Args:
-            text: The text to embed.
-
-        Returns:
-            Embeddings for the text.
-        """
-
-        headers = {"Content-Type": "application/json"}
-
-        async with session.post(
-            self.api_endpoint_url,
-            json={"input": text, "model": self.model, "input_type": input_type},
-            headers=headers,
-        ) as response:
-            response.raise_for_status()
-            answer = await response.text()
-            answer = json.loads(answer)
-            return answer["data"][0]["embedding"]
-
-    def _embedding_func(self, text: str, input_type: str) -> List[float]:
-        """Call out to Cohere's embedding endpoint.
-
-        Args:
-            text: The text to embed.
-
-        Returns:
-            Embeddings for the text.
-        """
-
-        payload = json.dumps(
-            {"input": text, "model": self.model, "input_type": input_type}
-        )
-        headers = {"Content-Type": "application/json"}
-
-        response = requests.request(
-            "POST", self.api_endpoint_url, headers=headers, data=payload
-        )
-        response_json = json.loads(response.text)
-        embedding = response_json["data"][0]["embedding"]
-
-        return embedding
-
-    def embed_documents(self, documents: List[str]) -> List[List[float]]:
-        """Embed a list of document texts.
-
-        Args:
-            texts: The list of texts to embed.
-
-        Returns:
-            List of embeddings, one for each text.
-        """
-        return [self._embedding_func(text, input_type="passage") for text in documents]
-
-    def embed_query(self, text: str) -> List[float]:
-        return self._embedding_func(text, input_type="query")
-
-    async def aembed_query(self, text: str) -> List[float]:
-        """Call out to NeMo's embedding endpoint async for embedding query text.
-
-        Args:
-            text: The text to embed.
-
-        Returns:
-            Embedding for the text.
-        """
-
-        async with aiohttp.ClientSession() as session:
-            embedding = await self._aembedding_func(session, text, "passage")
-            return embedding
-
-    async def aembed_documents(self, texts: List[str]) -> List[List[float]]:
-        """Call out to NeMo's embedding endpoint async for embedding search docs.
-
-        Args:
-            texts: The list of texts to embed.
-
-        Returns:
-            List of embeddings, one for each text.
-        """
-        embeddings = []
-
-        async with aiohttp.ClientSession() as session:
-            for batch in range(0, len(texts), self.batch_size):
-                text_batch = texts[batch : batch + self.batch_size]
-
-                for text in text_batch:
-                    # Create tasks for all texts in the batch
-                    tasks = [
-                        self._aembedding_func(session, text, "passage")
-                        for text in text_batch
-                    ]
-
-                    # Run all tasks concurrently
-                    batch_results = await asyncio.gather(*tasks)
-
-                    # Extend the embeddings list with results from this batch
-                    embeddings.extend(batch_results)
-
-        return embeddings
--- a/libs/community/langchain_community/embeddings/optimum_intel.py
+++ b/libs/community/langchain_community/embeddings/optimum_intel.py
@@ -1,208 +0,0 @@
-from typing import Any, Dict, List, Optional
-
-from langchain_core.embeddings import Embeddings
-from langchain_core.pydantic_v1 import BaseModel, Extra
-
-
-class QuantizedBiEncoderEmbeddings(BaseModel, Embeddings):
-    """Quantized bi-encoders embedding models.
-
-    Please ensure that you have installed optimum-intel and ipex.
-
-    Input:
-        model_name: str = Model name.
-        max_seq_len: int = The maximum sequence length for tokenization. (default 512)
-        pooling_strategy: str =
-            "mean" or "cls", pooling strategy for the final layer. (default "mean")
-        query_instruction: Optional[str] =
-            An instruction to add to the query before embedding. (default None)
-        document_instruction: Optional[str] =
-            An instruction to add to each document before embedding. (default None)
-        padding: Optional[bool] =
-            Whether to add padding during tokenization or not. (default True)
-        model_kwargs: Optional[Dict] =
-            Parameters to add to the model during initialization. (default {})
-        encode_kwargs: Optional[Dict] =
-            Parameters to add during the embedding forward pass. (default {})
-
-    Example:
-
-    from langchain_community.embeddings import QuantizedBiEncoderEmbeddings
-
-    model_name = "Intel/bge-small-en-v1.5-rag-int8-static"
-    encode_kwargs = {'normalize_embeddings': True}
-    hf = QuantizedBiEncoderEmbeddings(
-        model_name,
-        encode_kwargs=encode_kwargs,
-        query_instruction="Represent this sentence for searching relevant passages: "
-    )
-    """
-
-    def __init__(
-        self,
-        model_name: str,
-        max_seq_len: int = 512,
-        pooling_strategy: str = "mean",  # "mean" or "cls"
-        query_instruction: Optional[str] = None,
-        document_instruction: Optional[str] = None,
-        padding: bool = True,
-        model_kwargs: Optional[Dict] = None,
-        encode_kwargs: Optional[Dict] = None,
-        **kwargs: Any,
-    ) -> None:
-        super().__init__(**kwargs)
-        self.model_name_or_path = model_name
-        self.max_seq_len = max_seq_len
-        self.pooling = pooling_strategy
-        self.padding = padding
-        self.encode_kwargs = encode_kwargs or {}
-        self.model_kwargs = model_kwargs or {}
-
-        self.normalize = self.encode_kwargs.get("normalize_embeddings", False)
-        self.batch_size = self.encode_kwargs.get("batch_size", 32)
-
-        self.query_instruction = query_instruction
-        self.document_instruction = document_instruction
-
-        self.load_model()
-
-    def load_model(self) -> None:
-        try:
-            from transformers import AutoTokenizer
-        except ImportError as e:
-            raise ImportError(
-                "Unable to import transformers, please install with "
-                "`pip install -U transformers`."
-            ) from e
-        try:
-            from optimum.intel import IPEXModel
-
-            self.transformer_model = IPEXModel.from_pretrained(
-                self.model_name_or_path, **self.model_kwargs
-            )
-        except Exception as e:
-            raise Exception(
-                f"""
-Failed to load model {self.model_name_or_path}, due to the following error:
-{e}
-Please ensure that you have installed optimum-intel and ipex correctly,using:
-
-pip install optimum[neural-compressor]
-pip install intel_extension_for_pytorch
-
-For more information, please visit:
-* Install optimum-intel as shown here: https://github.com/huggingface/optimum-intel.
-* Install IPEX as shown here: https://intel.github.io/intel-extension-for-pytorch/index.html#installation?platform=cpu&version=v2.2.0%2Bcpu.
-"""
-            )
-        self.transformer_tokenizer = AutoTokenizer.from_pretrained(
-            pretrained_model_name_or_path=self.model_name_or_path,
-        )
-        self.transformer_model.eval()
-
-    class Config:
-        """Configuration for this pydantic object."""
-
-        extra = Extra.allow
-
-    def _embed(self, inputs: Any) -> Any:
-        try:
-            import torch
-        except ImportError as e:
-            raise ImportError(
-                "Unable to import torch, please install with `pip install -U torch`."
-            ) from e
-        with torch.inference_mode():
-            outputs = self.transformer_model(**inputs)
-            if self.pooling == "mean":
-                emb = self._mean_pooling(outputs, inputs["attention_mask"])
-            elif self.pooling == "cls":
-                emb = self._cls_pooling(outputs)
-            else:
-                raise ValueError("pooling method no supported")
-
-            if self.normalize:
-                emb = torch.nn.functional.normalize(emb, p=2, dim=1)
-            return emb
-
-    @staticmethod
-    def _cls_pooling(outputs: Any) -> Any:
-        if isinstance(outputs, dict):
-            token_embeddings = outputs["last_hidden_state"]
-        else:
-            token_embeddings = outputs[0]
-        return token_embeddings[:, 0]
-
-    @staticmethod
-    def _mean_pooling(outputs: Any, attention_mask: Any) -> Any:
-        try:
-            import torch
-        except ImportError as e:
-            raise ImportError(
-                "Unable to import torch, please install with `pip install -U torch`."
-            ) from e
-        if isinstance(outputs, dict):
-            token_embeddings = outputs["last_hidden_state"]
-        else:
-            # First element of model_output contains all token embeddings
-            token_embeddings = outputs[0]
-        input_mask_expanded = (
-            attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
-        )
-        sum_embeddings = torch.sum(token_embeddings * input_mask_expanded, 1)
-        sum_mask = torch.clamp(input_mask_expanded.sum(1), min=1e-9)
-        return sum_embeddings / sum_mask
-
-    def _embed_text(self, texts: List[str]) -> List[List[float]]:
-        inputs = self.transformer_tokenizer(
-            texts,
-            max_length=self.max_seq_len,
-            truncation=True,
-            padding=self.padding,
-            return_tensors="pt",
-        )
-        return self._embed(inputs).tolist()
-
-    def embed_documents(self, texts: List[str]) -> List[List[float]]:
-        """Embed a list of text documents using the Optimized Embedder model.
-
-        Input:
-            texts: List[str] = List of text documents to embed.
-        Output:
-            List[List[float]] = The embeddings of each text document.
-        """
-        try:
-            import pandas as pd
-        except ImportError as e:
-            raise ImportError(
-                "Unable to import pandas, please install with `pip install -U pandas`."
-            ) from e
-        try:
-            import tqdm
-        except ImportError as e:
-            raise ImportError(
-                "Unable to import tqdm, please install with `pip install -U tqdm`."
-            ) from e
-        docs = [
-            self.document_instruction + d if self.document_instruction else d
-            for d in texts
-        ]
-
-        # group into batches
-        text_list_df = pd.DataFrame(docs, columns=["texts"]).reset_index()
-
-        # assign each example with its batch
-        text_list_df["batch_index"] = text_list_df["index"] // self.batch_size
-
-        # create groups
-        batches = list(text_list_df.groupby(["batch_index"])["texts"].apply(list))
-
-        vectors = []
-        for batch in tqdm(batches, desc="Batches"):
-            vectors += self._embed_text(batch)
-        return vectors
-
-    def embed_query(self, text: str) -> List[float]:
-        if self.query_instruction:
-            text = self.query_instruction + text
-        return self._embed_text([text])[0]
--- a/libs/community/langchain_community/embeddings/sparkllm.py
+++ b/libs/community/langchain_community/embeddings/sparkllm.py
@@ -1,184 +0,0 @@
-import base64
-import hashlib
-import hmac
-import json
-import logging
-from datetime import datetime
-from time import mktime
-from typing import Any, Dict, List, Optional
-from urllib.parse import urlencode
-from wsgiref.handlers import format_date_time
-
-import numpy as np
-import requests
-from langchain_core.embeddings import Embeddings
-from langchain_core.pydantic_v1 import BaseModel, SecretStr, root_validator
-from langchain_core.utils import convert_to_secret_str, get_from_dict_or_env
-from numpy import ndarray
-
-# Used for document and knowledge embedding
-EMBEDDING_P_API_URL: str = "https://cn-huabei-1.xf-yun.com/v1/private/sa8a05c27"
-# Used for user questions embedding
-EMBEDDING_Q_API_URL: str = "https://cn-huabei-1.xf-yun.com/v1/private/s50d55a16"
-
-# SparkLLMTextEmbeddings is an embedding model provided by iFLYTEK Co., Ltd.. (https://iflytek.com/en/).
-
-# Official Website: https://www.xfyun.cn/doc/spark/Embedding_new_api.html
-# Developers need to create an application in the console first, use the appid, APIKey,
-# and APISecret provided in the application for authentication,
-# and generate an authentication URL for handshake.
-# You can get one by registering at https://console.xfyun.cn/services/bm3.
-# SparkLLMTextEmbeddings support 2K token window and preduces vectors with
-# 2560 dimensions.
-
-logger = logging.getLogger(__name__)
-
-
-class Url:
-    def __init__(self, host: str, path: str, schema: str) -> None:
-        self.host = host
-        self.path = path
-        self.schema = schema
-        pass
-
-
-class SparkLLMTextEmbeddings(BaseModel, Embeddings):
-    """SparkLLM Text Embedding models."""
-
-    spark_app_id: SecretStr
-    spark_api_key: SecretStr
-    spark_api_secret: SecretStr
-
-    @root_validator(allow_reuse=True)
-    def validate_environment(cls, values: Dict) -> Dict:
-        """Validate that auth token exists in environment."""
-        cls.spark_app_id = convert_to_secret_str(
-            get_from_dict_or_env(values, "spark_app_id", "SPARK_APP_ID")
-        )
-        cls.spark_api_key = convert_to_secret_str(
-            get_from_dict_or_env(values, "spark_api_key", "SPARK_API_KEY")
-        )
-        cls.spark_api_secret = convert_to_secret_str(
-            get_from_dict_or_env(values, "spark_api_secret", "SPARK_API_SECRET")
-        )
-        return values
-
-    def _embed(self, texts: List[str], host: str) -> Optional[List[List[float]]]:
-        url = self._assemble_ws_auth_url(
-            request_url=host,
-            method="POST",
-            api_key=self.spark_api_key.get_secret_value(),
-            api_secret=self.spark_api_secret.get_secret_value(),
-        )
-        content = self._get_body(self.spark_app_id.get_secret_value(), texts)
-        response = requests.post(
-            url, json=content, headers={"content-type": "application/json"}
-        ).text
-        res_arr = self._parser_message(response)
-        if res_arr is not None:
-            return res_arr.tolist()
-        return None
-
-    def embed_documents(self, texts: List[str]) -> Optional[List[List[float]]]:  # type: ignore[override]
-        """Public method to get embeddings for a list of documents.
-
-        Args:
-            texts: The list of texts to embed.
-
-        Returns:
-            A list of embeddings, one for each text, or None if an error occurs.
-        """
-        return self._embed(texts, EMBEDDING_P_API_URL)
-
-    def embed_query(self, text: str) -> Optional[List[float]]:  # type: ignore[override]
-        """Public method to get embedding for a single query text.
-
-        Args:
-            text: The text to embed.
-
-        Returns:
-            Embeddings for the text, or None if an error occurs.
-        """
-        result = self._embed([text], EMBEDDING_Q_API_URL)
-        return result[0] if result is not None else None
-
-    @staticmethod
-    def _assemble_ws_auth_url(
-        request_url: str, method: str = "GET", api_key: str = "", api_secret: str = ""
-    ) -> str:
-        u = SparkLLMTextEmbeddings._parse_url(request_url)
-        host = u.host
-        path = u.path
-        now = datetime.now()
-        date = format_date_time(mktime(now.timetuple()))
-        signature_origin = "host: {}\ndate: {}\n{} {} HTTP/1.1".format(
-            host, date, method, path
-        )
-        signature_sha = hmac.new(
-            api_secret.encode("utf-8"),
-            signature_origin.encode("utf-8"),
-            digestmod=hashlib.sha256,
-        ).digest()
-        signature_sha_str = base64.b64encode(signature_sha).decode(encoding="utf-8")
-        authorization_origin = (
-            'api_key="%s", algorithm="%s", headers="%s", signature="%s"'
-            % (api_key, "hmac-sha256", "host date request-line", signature_sha_str)
-        )
-        authorization = base64.b64encode(authorization_origin.encode("utf-8")).decode(
-            encoding="utf-8"
-        )
-        values = {"host": host, "date": date, "authorization": authorization}
-
-        return request_url + "?" + urlencode(values)
-
-    @staticmethod
-    def _parse_url(request_url: str) -> Url:
-        stidx = request_url.index("://")
-        host = request_url[stidx + 3 :]
-        schema = request_url[: stidx + 3]
-        edidx = host.index("/")
-        if edidx <= 0:
-            raise AssembleHeaderException("invalid request url:" + request_url)
-        path = host[edidx:]
-        host = host[:edidx]
-        u = Url(host, path, schema)
-        return u
-
-    @staticmethod
-    def _get_body(appid: str, text: List[str]) -> Dict[str, Any]:
-        body = {
-            "header": {"app_id": appid, "uid": "39769795890", "status": 3},
-            "parameter": {"emb": {"feature": {"encoding": "utf8"}}},
-            "payload": {
-                "messages": {
-                    "text": base64.b64encode(json.dumps(text).encode("utf-8")).decode()
-                }
-            },
-        }
-        return body
-
-    @staticmethod
-    def _parser_message(
-        message: str,
-    ) -> Optional[ndarray]:
-        data = json.loads(message)
-        code = data["header"]["code"]
-        if code != 0:
-            logger.warning(f"Request error: {code}, {data}")
-            return None
-        else:
-            text_base = data["payload"]["feature"]["text"]
-            text_data = base64.b64decode(text_base)
-            dt = np.dtype(np.float32)
-            dt = dt.newbyteorder("<")
-            text = np.frombuffer(text_data, dtype=dt)
-            if len(text) > 2560:
-                array = text[:2560]
-            else:
-                array = text
-            return array
-
-
-class AssembleHeaderException(Exception):
-    def __init__(self, msg: str) -> None:
-        self.message = msg
--- a/libs/community/langchain_community/embeddings/voyageai.py
+++ b/libs/community/langchain_community/embeddings/voyageai.py
@@ -86,15 +86,6 @@ class VoyageEmbeddings(BaseModel, Embeddings):
    show_progress_bar: bool = False
    """Whether to show a progress bar when embedding. Must have tqdm installed if set 
        to True."""
-    truncation: Optional[bool] = None
-    """Whether to truncate the input texts to fit within the context length.
-    
-        If True, over-length input texts will be truncated to fit within the context 
-        length, before vectorized by the embedding model. If False, an error will be 
-        raised if any given text exceeds the context length. If not specified 
-        (defaults to None), we will truncate the input text before sending it to the 
-        embedding model if it slightly exceeds the context window length. If it 
-        significantly exceeds the context window length, an error will be raised."""

    class Config:
        """Configuration for this pydantic object."""
@@ -113,14 +104,12 @@ class VoyageEmbeddings(BaseModel, Embeddings):
        self, input: List[str], input_type: Optional[str] = None
    ) -> Dict:
        api_key = cast(SecretStr, self.voyage_api_key).get_secret_value()
-        params: Dict = {
+        params = {
            "url": self.voyage_api_base,
            "headers": {"Authorization": f"Bearer {api_key}"},
            "json": {"model": self.model, "input": input, "input_type": input_type},
            "timeout": self.request_timeout,
        }
-        if self.truncation is not None:
-            params["json"]["truncation"] = self.truncation
        return params

    def _get_embeddings(
--- a/libs/community/langchain_community/graphs/kuzu_graph.py
+++ b/libs/community/langchain_community/graphs/kuzu_graph.py
@@ -36,7 +36,10 @@ class KuzuGraph:

    def query(self, query: str, params: dict = {}) -> List[Dict[str, Any]]:
        """Query Kùzu database"""
-        result = self.conn.execute(query, params)
+        params_list = []
+        for param_name in params:
+            params_list.append([param_name, params[param_name]])
+        result = self.conn.execute(query, params_list)
        column_names = result.get_column_names()
        return_list = []
        while result.has_next():
@@ -76,16 +79,20 @@ class KuzuGraph:

        rel_properties = []
        for table in rel_tables:
-            table_name = table["name"]
-            current_table_schema = {"properties": [], "label": table_name}
-            query_result = self.conn.execute(
-                f"CALL table_info('{table_name}') RETURN *;"
-            )
-            while query_result.has_next():
-                row = query_result.get_next()
-                prop_name = row[1]
-                prop_type = row[2]
-                current_table_schema["properties"].append((prop_name, prop_type))
+            current_table_schema = {"properties": [], "label": table["name"]}
+            properties_text = self.conn._connection.get_rel_property_names(
+                table["name"]
+            ).split("\n")
+            for i, line in enumerate(properties_text):
+                # The first 3 lines defines src, dst and name, so we skip them
+                if i < 3:
+                    continue
+                if not line:
+                    continue
+                property_name, property_type = line.strip().split(" ")
+                current_table_schema["properties"].append(
+                    (property_name, property_type)
+                )
            rel_properties.append(current_table_schema)

        self.schema = (
--- a/libs/community/langchain_community/llms/init.py
+++ b/libs/community/langchain_community/llms/init.py
@@ -582,12 +582,6 @@ def _import_volcengine_maas() -> Any:
    return VolcEngineMaasLLM


-def _import_sparkllm() -> Any:
-    from langchain_community.llms.sparkllm import SparkLLM
-
-    return SparkLLM
-
-
 def __getattr__(name: str) -> Any:
    if name == "AI21":
        return _import_ai21()
@@ -775,8 +769,6 @@ def __getattr__(name: str) -> Any:
            k: v() for k, v in get_type_to_cls_dict().items()
        }
        return type_to_cls_dict
-    elif name == "SparkLLM":
-        return _import_sparkllm()
    else:
        raise AttributeError(f"Could not find: {name}")

@@ -869,7 +861,6 @@ __all__ = [
    "YandexGPT",
    "Yuan2",
    "VolcEngineMaasLLM",
-    "SparkLLM",
 ]


@@ -959,5 +950,4 @@ def get_type_to_cls_dict() -> Dict[str, Callable[[], Type[BaseLLM]]]:
        "yandex_gpt": _import_yandex_gpt,
        "yuan2": _import_yuan2,
        "VolcEngineMaasLLM": _import_volcengine_maas,
-        "SparkLLM": _import_sparkllm(),
    }
--- a/libs/community/langchain_community/llms/huggingface_endpoint.py
+++ b/libs/community/langchain_community/llms/huggingface_endpoint.py
@@ -1,17 +1,12 @@
-import json
-import logging
-from typing import Any, AsyncIterator, Dict, Iterator, List, Mapping, Optional
+from typing import Any, Dict, List, Mapping, Optional

-from langchain_core.callbacks import (
-    AsyncCallbackManagerForLLMRun,
-    CallbackManagerForLLMRun,
-)
+import requests
+from langchain_core.callbacks import CallbackManagerForLLMRun
 from langchain_core.language_models.llms import LLM
-from langchain_core.outputs import GenerationChunk
-from langchain_core.pydantic_v1 import Extra, Field, root_validator
-from langchain_core.utils import get_from_dict_or_env, get_pydantic_field_names
+from langchain_core.pydantic_v1 import Extra, root_validator
+from langchain_core.utils import get_from_dict_or_env

-logger = logging.getLogger(__name__)
+from langchain_community.llms.utils import enforce_stop_tokens

 VALID_TASKS = (
    "text2text-generation",
@@ -22,198 +17,70 @@ VALID_TASKS = (


 class HuggingFaceEndpoint(LLM):
-    """
-    HuggingFace Endpoint.
+    """HuggingFace Endpoint models.

-    To use this class, you should have installed the ``huggingface_hub`` package, and
-    the environment variable ``HUGGINGFACEHUB_API_TOKEN`` set with your API token,
-    or given as a named parameter to the constructor.
+    To use, you should have the ``huggingface_hub`` python package installed, and the
+    environment variable ``HUGGINGFACEHUB_API_TOKEN`` set with your API token, or pass
+    it as a named parameter to the constructor.
+
+    Only supports `text-generation` and `text2text-generation` for now.

    Example:
        .. code-block:: python

-            # Basic Example (no streaming)
-            llm = HuggingFaceEndpoint(
-                endpoint_url="http://localhost:8010/",
-                max_new_tokens=512,
-                top_k=10,
-                top_p=0.95,
-                typical_p=0.95,
-                temperature=0.01,
-                repetition_penalty=1.03,
+            from langchain_community.llms import HuggingFaceEndpoint
+            endpoint_url = (
+                "https://abcdefghijklmnop.us-east-1.aws.endpoints.huggingface.cloud"
+            )
+            hf = HuggingFaceEndpoint(
+                endpoint_url=endpoint_url,
                huggingfacehub_api_token="my-api-key"
            )
-            print(llm("What is Deep Learning?"))
-
-            # Streaming response example
-            from langchain_community.callbacks import streaming_stdout
-
-            callbacks = [streaming_stdout.StreamingStdOutCallbackHandler()]
-            llm = HuggingFaceEndpoint(
-                endpoint_url="http://localhost:8010/",
-                max_new_tokens=512,
-                top_k=10,
-                top_p=0.95,
-                typical_p=0.95,
-                temperature=0.01,
-                repetition_penalty=1.03,
-                callbacks=callbacks,
-                streaming=True,
-                huggingfacehub_api_token="my-api-key"
-            )
-            print(llm("What is Deep Learning?"))
-
    """

-    endpoint_url: Optional[str] = None
+    endpoint_url: str = ""
    """Endpoint URL to use."""
-    repo_id: Optional[str] = None
-    """Repo to use."""
-    huggingfacehub_api_token: Optional[str] = None
-    max_new_tokens: int = 512
-    """Maximum number of generated tokens"""
-    top_k: Optional[int] = None
-    """The number of highest probability vocabulary tokens to keep for
-    top-k-filtering."""
-    top_p: Optional[float] = 0.95
-    """If set to < 1, only the smallest set of most probable tokens with probabilities
-    that add up to `top_p` or higher are kept for generation."""
-    typical_p: Optional[float] = 0.95
-    """Typical Decoding mass. See [Typical Decoding for Natural Language
-    Generation](https://arxiv.org/abs/2202.00666) for more information."""
-    temperature: Optional[float] = 0.8
-    """The value used to module the logits distribution."""
-    repetition_penalty: Optional[float] = None
-    """The parameter for repetition penalty. 1.0 means no penalty.
-    See [this paper](https://arxiv.org/pdf/1909.05858.pdf) for more details."""
-    return_full_text: bool = False
-    """Whether to prepend the prompt to the generated text"""
-    truncate: Optional[int] = None
-    """Truncate inputs tokens to the given size"""
-    stop_sequences: List[str] = Field(default_factory=list)
-    """Stop generating tokens if a member of `stop_sequences` is generated"""
-    seed: Optional[int] = None
-    """Random sampling seed"""
-    inference_server_url: str = ""
-    """text-generation-inference instance base url"""
-    timeout: int = 120
-    """Timeout in seconds"""
-    streaming: bool = False
-    """Whether to generate a stream of tokens asynchronously"""
-    do_sample: bool = False
-    """Activate logits sampling"""
-    watermark: bool = False
-    """Watermarking with [A Watermark for Large Language Models]
-    (https://arxiv.org/abs/2301.10226)"""
-    server_kwargs: Dict[str, Any] = Field(default_factory=dict)
-    """Holds any text-generation-inference server parameters not explicitly specified"""
-    model_kwargs: Dict[str, Any] = Field(default_factory=dict)
-    """Holds any model parameters valid for `call` not explicitly specified"""
-    model: str
-    client: Any
-    async_client: Any
    task: Optional[str] = None
    """Task to call the model with.
    Should be a task that returns `generated_text` or `summary_text`."""
+    model_kwargs: Optional[dict] = None
+    """Keyword arguments to pass to the model."""
+
+    huggingfacehub_api_token: Optional[str] = None

    class Config:
        """Configuration for this pydantic object."""

        extra = Extra.forbid

-    @root_validator(pre=True)
-    def build_extra(cls, values: Dict[str, Any]) -> Dict[str, Any]:
-        """Build extra kwargs from additional params that were passed in."""
-        all_required_field_names = get_pydantic_field_names(cls)
-        extra = values.get("model_kwargs", {})
-        for field_name in list(values):
-            if field_name in extra:
-                raise ValueError(f"Found {field_name} supplied twice.")
-            if field_name not in all_required_field_names:
-                logger.warning(
-                    f"""WARNING! {field_name} is not default parameter.
-                    {field_name} was transferred to model_kwargs.
-                    Please make sure that {field_name} is what you intended."""
-                )
-                extra[field_name] = values.pop(field_name)
-
-        invalid_model_kwargs = all_required_field_names.intersection(extra.keys())
-        if invalid_model_kwargs:
-            raise ValueError(
-                f"Parameters {invalid_model_kwargs} should be specified explicitly. "
-                f"Instead they were passed in as part of `model_kwargs` parameter."
-            )
-
-        values["model_kwargs"] = extra
-        if "endpoint_url" not in values and "repo_id" not in values:
-            raise ValueError(
-                "Please specify an `endpoint_url` or `repo_id` for the model."
-            )
-        if "endpoint_url" in values and "repo_id" in values:
-            raise ValueError(
-                "Please specify either an `endpoint_url` OR a `repo_id`, not both."
-            )
-        values["model"] = values.get("endpoint_url") or values.get("repo_id")
-        return values
-
    @root_validator()
    def validate_environment(cls, values: Dict) -> Dict:
-        """Validate that package is installed and that the API token is valid."""
+        """Validate that api key and python package exists in environment."""
+        huggingfacehub_api_token = get_from_dict_or_env(
+            values, "huggingfacehub_api_token", "HUGGINGFACEHUB_API_TOKEN"
+        )
        try:
-            from huggingface_hub import login
+            from huggingface_hub.hf_api import HfApi
+
+            try:
+                HfApi(
+                    endpoint="https://huggingface.co",  # Can be a Private Hub endpoint.
+                    token=huggingfacehub_api_token,
+                ).whoami()
+            except Exception as e:
+                raise ValueError(
+                    "Could not authenticate with huggingface_hub. "
+                    "Please check your API token."
+                ) from e

        except ImportError:
            raise ImportError(
                "Could not import huggingface_hub python package. "
                "Please install it with `pip install huggingface_hub`."
            )
-        try:
-            huggingfacehub_api_token = get_from_dict_or_env(
-                values, "huggingfacehub_api_token", "HUGGINGFACEHUB_API_TOKEN"
-            )
-            login(token=huggingfacehub_api_token)
-        except Exception as e:
-            raise ValueError(
-                "Could not authenticate with huggingface_hub. "
-                "Please check your API token."
-            ) from e
-
-        from huggingface_hub import AsyncInferenceClient, InferenceClient
-
-        values["client"] = InferenceClient(
-            model=values["model"],
-            timeout=values["timeout"],
-            token=huggingfacehub_api_token,
-            **values["server_kwargs"],
-        )
-        values["async_client"] = AsyncInferenceClient(
-            model=values["model"],
-            timeout=values["timeout"],
-            token=huggingfacehub_api_token,
-            **values["server_kwargs"],
-        )
-
+        values["huggingfacehub_api_token"] = huggingfacehub_api_token
        return values

-    @property
-    def _default_params(self) -> Dict[str, Any]:
-        """Get the default parameters for calling text generation inference API."""
-        return {
-            "max_new_tokens": self.max_new_tokens,
-            "top_k": self.top_k,
-            "top_p": self.top_p,
-            "typical_p": self.typical_p,
-            "temperature": self.temperature,
-            "repetition_penalty": self.repetition_penalty,
-            "return_full_text": self.return_full_text,
-            "truncate": self.truncate,
-            "stop_sequences": self.stop_sequences,
-            "seed": self.seed,
-            "do_sample": self.do_sample,
-            "watermark": self.watermark,
-            **self.model_kwargs,
-        }
-
    @property
    def _identifying_params(self) -> Mapping[str, Any]:
        """Get the identifying parameters."""
@@ -228,13 +95,6 @@ class HuggingFaceEndpoint(LLM):
        """Return type of llm."""
        return "huggingface_endpoint"

-    def _invocation_params(
-        self, runtime_stop: Optional[List[str]], **kwargs: Any
-    ) -> Dict[str, Any]:
-        params = {**self._default_params, **kwargs}
-        params["stop_sequences"] = params["stop_sequences"] + (runtime_stop or [])
-        return params
-
    def _call(
        self,
        prompt: str,
@@ -242,129 +102,62 @@ class HuggingFaceEndpoint(LLM):
        run_manager: Optional[CallbackManagerForLLMRun] = None,
        **kwargs: Any,
    ) -> str:
-        """Call out to HuggingFace Hub's inference endpoint."""
-        invocation_params = self._invocation_params(stop, **kwargs)
-        if self.streaming:
-            completion = ""
-            for chunk in self._stream(prompt, stop, run_manager, **invocation_params):
-                completion += chunk.text
-            return completion
-        else:
-            invocation_params["stop"] = invocation_params[
-                "stop_sequences"
-            ]  # porting 'stop_sequences' into the 'stop' argument
-            response = self.client.post(
-                json={"inputs": prompt, "parameters": invocation_params},
-                stream=False,
-                task=self.task,
+        """Call out to HuggingFace Hub's inference endpoint.
+
+        Args:
+            prompt: The prompt to pass into the model.
+            stop: Optional list of stop words to use when generating.
+
+        Returns:
+            The string generated by the model.
+
+        Example:
+            .. code-block:: python
+
+                response = hf("Tell me a joke.")
+        """
+        _model_kwargs = self.model_kwargs or {}
+
+        # payload samples
+        params = {**_model_kwargs, **kwargs}
+        parameter_payload = {"inputs": prompt, "parameters": params}
+
+        # HTTP headers for authorization
+        headers = {
+            "Authorization": f"Bearer {self.huggingfacehub_api_token}",
+            "Content-Type": "application/json",
+        }
+
+        # send request
+        try:
+            response = requests.post(
+                self.endpoint_url, headers=headers, json=parameter_payload
            )
-            response_text = json.loads(response.decode())[0]["generated_text"]
-
-            # Maybe the generation has stopped at one of the stop sequences:
-            # then we remove this stop sequence from the end of the generated text
-            for stop_seq in invocation_params["stop_sequences"]:
-                if response_text[-len(stop_seq) :] == stop_seq:
-                    response_text = response_text[: -len(stop_seq)]
-            return response_text
-
-    async def _acall(
-        self,
-        prompt: str,
-        stop: Optional[List[str]] = None,
-        run_manager: Optional[AsyncCallbackManagerForLLMRun] = None,
-        **kwargs: Any,
-    ) -> str:
-        invocation_params = self._invocation_params(stop, **kwargs)
-        if self.streaming:
-            completion = ""
-            async for chunk in self._astream(
-                prompt, stop, run_manager, **invocation_params
-            ):
-                completion += chunk.text
-            return completion
-        else:
-            invocation_params["stop"] = invocation_params["stop_sequences"]
-            response = await self.async_client.post(
-                json={"inputs": prompt, "parameters": invocation_params},
-                stream=False,
-                task=self.task,
+        except requests.exceptions.RequestException as e:  # This is the correct syntax
+            raise ValueError(f"Error raised by inference endpoint: {e}")
+        generated_text = response.json()
+        if "error" in generated_text:
+            raise ValueError(
+                f"Error raised by inference API: {generated_text['error']}"
            )
-            response_text = json.loads(response.decode())[0]["generated_text"]
-
-            # Maybe the generation has stopped at one of the stop sequences:
-            # then remove this stop sequence from the end of the generated text
-            for stop_seq in invocation_params["stop_sequences"]:
-                if response_text[-len(stop_seq) :] == stop_seq:
-                    response_text = response_text[: -len(stop_seq)]
-            return response_text
-
-    def _stream(
-        self,
-        prompt: str,
-        stop: Optional[List[str]] = None,
-        run_manager: Optional[CallbackManagerForLLMRun] = None,
-        **kwargs: Any,
-    ) -> Iterator[GenerationChunk]:
-        invocation_params = self._invocation_params(stop, **kwargs)
-
-        for response in self.client.text_generation(
-            prompt, **invocation_params, stream=True
-        ):
-            # identify stop sequence in generated text, if any
-            stop_seq_found: Optional[str] = None
-            for stop_seq in invocation_params["stop_sequences"]:
-                if stop_seq in response:
-                    stop_seq_found = stop_seq
-
-            # identify text to yield
-            text: Optional[str] = None
-            if stop_seq_found:
-                text = response[: response.index(stop_seq_found)]
-            else:
-                text = response
-
-            # yield text, if any
-            if text:
-                chunk = GenerationChunk(text=text)
-                yield chunk
-                if run_manager:
-                    run_manager.on_llm_new_token(chunk.text)
-
-            # break if stop sequence found
-            if stop_seq_found:
-                break
-
-    async def _astream(
-        self,
-        prompt: str,
-        stop: Optional[List[str]] = None,
-        run_manager: Optional[AsyncCallbackManagerForLLMRun] = None,
-        **kwargs: Any,
-    ) -> AsyncIterator[GenerationChunk]:
-        invocation_params = self._invocation_params(stop, **kwargs)
-        async for response in await self.async_client.text_generation(
-            prompt, **invocation_params, stream=True
-        ):
-            # identify stop sequence in generated text, if any
-            stop_seq_found: Optional[str] = None
-            for stop_seq in invocation_params["stop_sequences"]:
-                if stop_seq in response:
-                    stop_seq_found = stop_seq
-
-            # identify text to yield
-            text: Optional[str] = None
-            if stop_seq_found:
-                text = response[: response.index(stop_seq_found)]
-            else:
-                text = response
-
-            # yield text, if any
-            if text:
-                chunk = GenerationChunk(text=text)
-                yield chunk
-                if run_manager:
-                    await run_manager.on_llm_new_token(chunk.text)
-
-            # break if stop sequence found
-            if stop_seq_found:
-                break
+        if self.task == "text-generation":
+            text = generated_text[0]["generated_text"]
+            # Remove prompt if included in generated text.
+            if text.startswith(prompt):
+                text = text[len(prompt) :]
+        elif self.task == "text2text-generation":
+            text = generated_text[0]["generated_text"]
+        elif self.task == "summarization":
+            text = generated_text[0]["summary_text"]
+        elif self.task == "conversational":
+            text = generated_text["response"][1]
+        else:
+            raise ValueError(
+                f"Got invalid task {self.task}, "
+                f"currently only {VALID_TASKS} are supported"
+            )
+        if stop is not None:
+            # This is a bit hacky, but I can't figure out a better way to enforce
+            # stop tokens when making calls to huggingface_hub.
+            text = enforce_stop_tokens(text, stop)
+        return text
--- a/libs/community/langchain_community/llms/huggingface_hub.py
+++ b/libs/community/langchain_community/llms/huggingface_hub.py
@@ -1,7 +1,6 @@
 import json
 from typing import Any, Dict, List, Mapping, Optional

-from langchain_core._api.deprecation import deprecated
 from langchain_core.callbacks import CallbackManagerForLLMRun
 from langchain_core.language_models.llms import LLM
 from langchain_core.pydantic_v1 import Extra, root_validator
@@ -20,10 +19,8 @@ VALID_TASKS_DICT = {
 }


-@deprecated("0.0.21", removal="0.2.0", alternative="HuggingFaceEndpoint")
 class HuggingFaceHub(LLM):
    """HuggingFaceHub  models.
-    ! This class is deprecated, you should use HuggingFaceEndpoint instead.

    To use, you should have the ``huggingface_hub`` python package installed, and the
    environment variable ``HUGGINGFACEHUB_API_TOKEN`` set with your API token, or pass
--- a/Show More
+++ b/Show More