temp

2026-02-06 09:10:27 +00:00 · 2023-06-26 15:58:21 -07:00
59 changed files with 1578 additions and 3430 deletions
--- a/docs/extras/ecosystem/integrations/hologres.mdx
+++ b/docs/extras/ecosystem/integrations/hologres.mdx
@@ -1,23 +0,0 @@
-# Hologres
-
->[Hologres](https://www.alibabacloud.com/help/en/hologres/latest/introduction) is a unified real-time data warehousing service developed by Alibaba Cloud. You can use Hologres to write, update, process, and analyze large amounts of data in real time. 
->`Hologres` supports standard `SQL` syntax, is compatible with `PostgreSQL`, and supports most PostgreSQL functions. Hologres supports online analytical processing (OLAP) and ad hoc analysis for up to petabytes of data, and provides high-concurrency and low-latency online data services. 
-
->`Hologres` provides **vector database** functionality by adopting [Proxima](https://www.alibabacloud.com/help/en/hologres/latest/vector-processing).
->`Proxima` is a high-performance software library developed by `Alibaba DAMO Academy`. It allows you to search for the nearest neighbors of vectors. Proxima provides higher stability and performance than similar open source software such as Faiss. Proxima allows you to search for similar text or image embeddings with high throughput and low latency. Hologres is deeply integrated with Proxima to provide a high-performance vector search service.
-
-## Installation and Setup
-
-Click [here](https://www.alibabacloud.com/zh/product/hologres) to fast deploy a Hologres cloud instance.
-
-```bash
-pip install psycopg2
-```
-
-## Vector Store
-
-See a [usage example](/docs/modules/data_connection/vectorstores/integrations/hologres.html).
-
-```python
-from langchain.vectorstores import Hologres
-```
--- a/docs/extras/ecosystem/integrations/rockset.mdx
+++ b/docs/extras/ecosystem/integrations/rockset.mdx
@@ -1,19 +0,0 @@
-# Rockset
-
->[Rockset](https://rockset.com/product/) is a real-time analytics database service for serving low latency, high concurrency analytical queries at scale. It builds a Converged Index™ on structured and semi-structured data with an efficient store for vector embeddings. Its support for running SQL on schemaless data makes it a perfect choice for running vector search with metadata filters. 
-
-## Installation and Setup
-
-Make sure you have Rockset account and go to the web console to get the API key. Details can be found on [the website](https://rockset.com/docs/rest-api/).
-
-```bash
-pip install rockset
-```
-
-## Vector Store
-
-See a [usage example](/docs/modules/data_connection/vectorstores/integrations/rockset.html).
-
-```python
-from langchain.vectorstores import RocksetDB
-```
--- a/docs/extras/ecosystem/integrations/singlestoredb.mdx
+++ b/docs/extras/ecosystem/integrations/singlestoredb.mdx
@@ -1,20 +0,0 @@
-# SingleStoreDB
-
->[SingleStoreDB](https://singlestore.com/) is a high-performance distributed SQL database that supports deployment both in the [cloud](https://www.singlestore.com/cloud/) and on-premises. It provides vector storage, and vector functions including [dot_product](https://docs.singlestore.com/managed-service/en/reference/sql-reference/vector-functions/dot_product.html) and [euclidean_distance](https://docs.singlestore.com/managed-service/en/reference/sql-reference/vector-functions/euclidean_distance.html), thereby supporting AI applications that require text similarity matching. 
-
-## Installation and Setup
-
-There are several ways to establish a [connection](https://singlestoredb-python.labs.singlestore.com/generated/singlestoredb.connect.html) to the database. You can either set up environment variables or pass named parameters to the `SingleStoreDB constructor`. 
-Alternatively, you may provide these parameters to the `from_documents` and `from_texts` methods.
-
-```bash
-pip install singlestoredb
-```
-
-## Vector Store
-
-See a [usage example](/docs/modules/data_connection/vectorstores/integrations/singlestoredb.html).
-
-```python
-from langchain.vectorstores import SingleStoreDB
-```
--- a/docs/extras/ecosystem/integrations/sklearn.mdx
+++ b/docs/extras/ecosystem/integrations/sklearn.mdx
@@ -1,14 +1,15 @@
 # scikit-learn

->[scikit-learn](https://scikit-learn.org/stable/) is an open source collection of machine learning algorithms, 
-> including some implementations of the [k nearest neighbors](https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.NearestNeighbors.html). `SKLearnVectorStore` wraps this implementation and adds the possibility to persist the vector store in json, bson (binary json) or Apache Parquet format.
+This page covers how to use the scikit-learn package within LangChain.
+It is broken into two parts: installation and setup, and then references to specific scikit-learn wrappers.

 ## Installation and Setup

 - Install the Python package with `pip install scikit-learn`

+## Wrappers

-## Vector Store
+### VectorStore

 `SKLearnVectorStore` provides a simple wrapper around the nearest neighbor implementation in the
 scikit-learn package, allowing you to use it as a vectorstore.
--- a/docs/extras/ecosystem/integrations/starrocks.mdx
+++ b/docs/extras/ecosystem/integrations/starrocks.mdx
@@ -1,21 +0,0 @@
-# StarRocks
-
->[StarRocks](https://www.starrocks.io/) is a High-Performance Analytical Database.
-`StarRocks` is a next-gen sub-second MPP database for full analytics scenarios, including multi-dimensional analytics, real-time analytics and ad-hoc query.
-
->Usually `StarRocks` is categorized into OLAP, and it has showed excellent performance in [ClickBench — a Benchmark For Analytical DBMS](https://benchmark.clickhouse.com/). Since it has a super-fast vectorized execution engine, it could also be used as a fast vectordb.
-
-## Installation and Setup
-
-
-```bash
-pip install pymysql
-```
-
-## Vector Store
-
-See a [usage example](/docs/modules/data_connection/vectorstores/integrations/starrocks.html).
-
-```python
-from langchain.vectorstores import StarRocks
-```
--- a/docs/extras/ecosystem/integrations/tigris.mdx
+++ b/docs/extras/ecosystem/integrations/tigris.mdx
@@ -1,19 +0,0 @@
-# Tigris
-
-> [Tigris](htttps://tigrisdata.com) is an open source Serverless NoSQL Database and Search Platform designed to simplify building high-performance vector search applications.
-> `Tigris` eliminates the infrastructure complexity of managing, operating, and synchronizing multiple tools, allowing you to focus on building great applications instead.
-
-## Installation and Setup
-
-
-```bash
-pip install tigrisdb openapi-schema-pydantic openai tiktoken
-```
-
-## Vector Store
-
-See a [usage example](/docs/modules/data_connection/vectorstores/integrations/tigris.html).
-
-```python
-from langchain.vectorstores import Tigris
-```
--- a/docs/extras/ecosystem/integrations/typesense.mdx
+++ b/docs/extras/ecosystem/integrations/typesense.mdx
@@ -1,22 +0,0 @@
-# Typesense
-
-> [Typesense](https://typesense.org) is an open source, in-memory search engine, that you can either 
-> [self-host](https://typesense.org/docs/guide/install-typesense.html#option-2-local-machine-self-hosting) or run 
-> on [Typesense Cloud](https://cloud.typesense.org/).
-> `Typesense` focuses on performance by storing the entire index in RAM (with a backup on disk) and also 
-> focuses on providing an out-of-the-box developer experience by simplifying available options and setting good defaults.
-
-## Installation and Setup
-
-
-```bash
-pip install typesense openapi-schema-pydantic openai tiktoken
-```
-
-## Vector Store
-
-See a [usage example](/docs/modules/data_connection/vectorstores/integrations/typesense.html).
-
-```python
-from langchain.vectorstores import Typesense
-```
--- a/docs/extras/guides/evaluation/comparisons.ipynb
+++ b/docs/extras/guides/evaluation/comparisons.ipynb
@@ -1,448 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Comparing Chain Outputs\n",
-    "\n",
-    "Suppose you have two different prompts (or LLMs). How do you know which will generate \"better\" results?\n",
-    "\n",
-    "One automated way to predict the preferred configuration is to use a `PairwiseStringEvaluator` like the `PairwiseStringEvalChain`<a name=\"cite_ref-1\"></a>[<sup>[1]</sup>](#cite_note-1). This chain prompts an LLM to select which output is preferred, given a specific input.\n",
-    "\n",
-    "For this evalution, we will need 3 things:\n",
-    "1. An evaluator\n",
-    "2. A dataset of inputs\n",
-    "3. 2 (or more) LLMs, Chains, or Agents to compare\n",
-    "\n",
-    "Then we will aggregate the restults to determine the preferred model.\n",
-    "\n",
-    "### Step 1. Create the Evaluator\n",
-    "\n",
-    "In this example, you will use gpt-4 to select which output is preferred."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Optional if you are tracing the notebook\n",
-    "%env LANGCHAIN_PROJECT=\"Comparing Chain Outputs\""
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 2,
-   "metadata": {
-    "tags": []
-   },
-   "outputs": [],
-   "source": [
-    "from langchain.chat_models import ChatOpenAI\n",
-    "from langchain.evaluation.comparison import PairwiseStringEvalChain\n",
-    "\n",
-    "llm = ChatOpenAI(model=\"gpt-4\")\n",
-    "\n",
-    "eval_chain = PairwiseStringEvalChain.from_llm(llm=llm)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### Step 2. Select Dataset\n",
-    "\n",
-    "If you already have real usage data for your LLM, you can use a representative sample. More examples\n",
-    "provide more reliable results. We will use some example queries someone might have about how to use langchain here."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 3,
-   "metadata": {
-    "tags": []
-   },
-   "outputs": [
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "Found cached dataset parquet (/Users/wfh/.cache/huggingface/datasets/LangChainDatasets___parquet/LangChainDatasets--langchain-howto-queries-bbb748bbee7e77aa/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec)\n"
-     ]
-    },
-    {
-     "data": {
-      "application/vnd.jupyter.widget-view+json": {
-       "model_id": "d852a1884480457292c90d8bd9d4f1e6",
-       "version_major": 2,
-       "version_minor": 0
-      },
-      "text/plain": [
-       "  0%|          | 0/1 [00:00<?, ?it/s]"
-      ]
-     },
-     "metadata": {},
-     "output_type": "display_data"
-    }
-   ],
-   "source": [
-    "from langchain.evaluation.loading import load_dataset\n",
-    "\n",
-    "dataset = load_dataset(\"langchain-howto-queries\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### Step 3. Define Models to Compare\n",
-    "\n",
-    "We will be comparing two agents in this case."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 4,
-   "metadata": {
-    "tags": []
-   },
-   "outputs": [],
-   "source": [
-    "from langchain import SerpAPIWrapper\n",
-    "from langchain.agents import initialize_agent, Tool\n",
-    "from langchain.agents import AgentType\n",
-    "from langchain.chat_models import ChatOpenAI\n",
-    "\n",
-    "\n",
-    "# Initialize the language model\n",
-    "# You can add your own OpenAI API key by adding openai_api_key=\"<your_api_key>\" \n",
-    "llm = ChatOpenAI(temperature=0, model=\"gpt-3.5-turbo-0613\")\n",
-    "\n",
-    "# Initialize the SerpAPIWrapper for search functionality\n",
-    "#Replace <your_api_key> in openai_api_key=\"<your_api_key>\" with your actual SerpAPI key.\n",
-    "search = SerpAPIWrapper()\n",
-    "\n",
-    "# Define a list of tools offered by the agent\n",
-    "tools = [\n",
-    "    Tool(\n",
-    "        name=\"Search\",\n",
-    "        func=search.run,\n",
-    "        coroutine=search.arun,\n",
-    "        description=\"Useful when you need to answer questions about current events. You should ask targeted questions.\"\n",
-    "    ),\n",
-    "]"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 5,
-   "metadata": {
-    "tags": []
-   },
-   "outputs": [],
-   "source": [
-    "functions_agent = initialize_agent(tools, llm, agent=AgentType.OPENAI_MULTI_FUNCTIONS, verbose=False)\n",
-    "conversations_agent = initialize_agent(tools, llm, agent=AgentType.CHAT_ZERO_SHOT_REACT_DESCRIPTION, verbose=False)"
-   ]
-  },
-  {
-   "attachments": {},
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### Step 4. Generate Responses\n",
-    "\n",
-    "We will generate outputs for each of the models before evaluating them."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 6,
-   "metadata": {
-    "tags": []
-   },
-   "outputs": [
-    {
-     "data": {
-      "application/vnd.jupyter.widget-view+json": {
-       "model_id": "b076d6bf6680422aa9082d4bad4d98a3",
-       "version_major": 2,
-       "version_minor": 0
-      },
-      "text/plain": [
-       "  0%|          | 0/20 [00:00<?, ?it/s]"
-      ]
-     },
-     "metadata": {},
-     "output_type": "display_data"
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "Retrying langchain.chat_models.openai.acompletion_with_retry.<locals>._completion_with_retry in 1.0 seconds as it raised ServiceUnavailableError: The server is overloaded or not ready yet..\n",
-      "Retrying langchain.chat_models.openai.acompletion_with_retry.<locals>._completion_with_retry in 1.0 seconds as it raised ServiceUnavailableError: The server is overloaded or not ready yet..\n"
-     ]
-    }
-   ],
-   "source": [
-    "from tqdm.notebook import tqdm\n",
-    "import asyncio\n",
-    "\n",
-    "results = []\n",
-    "agents = [functions_agent, conversations_agent]\n",
-    "concurrency_level = 6 # How many concurrent agents to run. May need to decrease if OpenAI is rate limiting.\n",
-    "\n",
-    "# We will only run the first 20 examples of this dataset to speed things up\n",
-    "# This will lead to larger confidence intervals downstream.\n",
-    "batch = []\n",
-    "for example in tqdm(dataset[:20]):\n",
-    "    batch.extend([agent.acall(example['inputs']) for agent in agents])\n",
-    "    if len(batch) >= concurrency_level:\n",
-    "        batch_results = await asyncio.gather(*batch, return_exceptions=True)\n",
-    "        results.extend(list(zip(*[iter(batch_results)]*2)))\n",
-    "        batch = []\n",
-    "if batch:\n",
-    "    batch_results = await asyncio.gather(*batch, return_exceptions=True)\n",
-    "    results.extend(list(zip(*[iter(batch_results)]*2)))"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Step 5. Evaluate Pairs\n",
-    "\n",
-    "Now it's time to evaluate the results. For each agent response, run the evaluation chain to select which output is preferred (or return a tie).\n",
-    "\n",
-    "Randomly select the input order to reduce the likelihood that one model will be preferred just because it is presented first."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 9,
-   "metadata": {
-    "tags": []
-   },
-   "outputs": [],
-   "source": [
-    "import random\n",
-    "\n",
-    "def predict_preferences(dataset, results) -> list:\n",
-    "    preferences = []\n",
-    "\n",
-    "    for example, (res_a, res_b) in zip(dataset, results):\n",
-    "        input_ = example['inputs']\n",
-    "        # Flip a coin to reduce persistent position bias\n",
-    "        if random.random() < 0.5:\n",
-    "            pred_a, pred_b = res_a, res_b\n",
-    "            a, b = \"a\", \"b\"\n",
-    "        else:\n",
-    "            pred_a, pred_b = res_b, res_a\n",
-    "            a, b = \"b\", \"a\"\n",
-    "        eval_res = eval_chain.evaluate_string_pairs(\n",
-    "            output_a=pred_a['output'] if isinstance(pred_a, dict) else str(pred_a),\n",
-    "            output_b=pred_b['output'] if isinstance(pred_b, dict) else str(pred_b),\n",
-    "            input=input_\n",
-    "        )\n",
-    "        if eval_res[\"value\"] == \"A\":\n",
-    "            preferences.append(a)\n",
-    "        elif eval_res[\"value\"] == \"B\":\n",
-    "            preferences.append(b)\n",
-    "        else:\n",
-    "            preferences.append(None) # No preference\n",
-    "    return preferences"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 10,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "preferences = predict_preferences(dataset, results)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "tags": []
-   },
-   "source": [
-    "**Print out the ratio of preferences.**"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 11,
-   "metadata": {
-    "tags": []
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "OpenAI Functions Agent: 90.00%\n",
-      "Structured Chat Agent: 10.00%\n"
-     ]
-    }
-   ],
-   "source": [
-    "from collections import Counter\n",
-    "\n",
-    "name_map = {\n",
-    "    \"a\": \"OpenAI Functions Agent\",\n",
-    "    \"b\": \"Structured Chat Agent\",\n",
-    "}\n",
-    "counts = Counter(preferences)\n",
-    "pref_ratios = {\n",
-    "    k: v/len(preferences) for k, v in\n",
-    "    counts.items()\n",
-    "}\n",
-    "for k, v in pref_ratios.items():\n",
-    "    print(f\"{name_map.get(k)}: {v:.2%}\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### Estimate Confidence Intervals\n",
-    "\n",
-    "The results seem pretty clear, but if you want to have a better sense of how confident we are, that model \"A\" (the OpenAI Functions Agent) is the preferred model, we can calculate confidence intervals. \n",
-    "\n",
-    "Below, use the Wilson score to estimate the confidence interval."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 12,
-   "metadata": {
-    "tags": []
-   },
-   "outputs": [],
-   "source": [
-    "from math import sqrt\n",
-    "\n",
-    "def wilson_score_interval(preferences: list, which: str = \"a\", z: float = 1.96) -> tuple:\n",
-    "    \"\"\"Estimate the confidence interval using the Wilson score.\n",
-    "    \n",
-    "    See: https://en.wikipedia.org/wiki/Binomial_proportion_confidence_interval#Wilson_score_interval\n",
-    "    for more details, including when to use it and when it should not be used.\n",
-    "    \"\"\"\n",
-    "    total_preferences = preferences.count('a') + preferences.count('b')\n",
-    "    n_s = preferences.count(which)\n",
-    "\n",
-    "    if total_preferences == 0:\n",
-    "        return (0, 0)\n",
-    "\n",
-    "    p_hat = n_s / total_preferences\n",
-    "\n",
-    "    denominator = 1 + (z**2) / total_preferences\n",
-    "    adjustment = (z / denominator) * sqrt(p_hat*(1-p_hat)/total_preferences + (z**2)/(4*total_preferences*total_preferences))\n",
-    "    center = (p_hat + (z**2) / (2*total_preferences)) / denominator\n",
-    "    lower_bound = min(max(center - adjustment, 0.0), 1.0)\n",
-    "    upper_bound = min(max(center + adjustment, 0.0), 1.0)\n",
-    "\n",
-    "    return (lower_bound, upper_bound)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 13,
-   "metadata": {
-    "tags": []
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "The \"OpenAI Functions Agent\" would be preferred between 69.90% and 97.21% percent of the time (with 95% confidence).\n",
-      "The \"Structured Chat Agent\" would be preferred between 2.79% and 30.10% percent of the time (with 95% confidence).\n"
-     ]
-    }
-   ],
-   "source": [
-    "for which_, name in name_map.items():\n",
-    "    low, high = wilson_score_interval(preferences, which=which_)\n",
-    "    print(f'The \"{name}\" would be preferred between {low:.2%} and {high:.2%} percent of the time (with 95% confidence).')"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "**Print out the p-value.**"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 18,
-   "metadata": {
-    "tags": []
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "The p-value is 0.00040. If the null hypothesis is true (i.e., if the selected eval chain actually has no preference between the models),\n",
-      "then there is a 0.04025% chance of observing the OpenAI Functions Agent be preferred at least 18\n",
-      "times out of 20 trials.\n"
-     ]
-    }
-   ],
-   "source": [
-    "from scipy import stats\n",
-    "preferred_model = max(pref_ratios, key=pref_ratios.get)\n",
-    "successes = preferences.count(preferred_model)\n",
-    "n = len(preferences) - preferences.count(None)\n",
-    "p_value = stats.binom_test(successes, n, p=0.5, alternative='two-sided')\n",
-    "print(f\"\"\"The p-value is {p_value:.5f}. If the null hypothesis is true (i.e., if the selected eval chain actually has no preference between the models),\n",
-    "then there is a {p_value:.5%} chance of observing the {name_map.get(preferred_model)} be preferred at least {successes}\n",
-    "times out of {n} trials.\"\"\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "<a name=\"cite_note-1\"></a>_1. Note: Automated evals are still an open research topic and are best used alongside other evaluation approaches. \n",
-    "LLM preferences exhibit biases, including banal ones like the order of outputs.\n",
-    "In choosing preferences, \"ground truth\" may not be taken into account, which may lead to scores that aren't grounded in utility._"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": []
-  }
- ],
- "metadata": {
-  "kernelspec": {
-   "display_name": "Python 3 (ipykernel)",
-   "language": "python",
-   "name": "python3"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.11.3"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 4
-}
--- a/docs/extras/guides/evaluation/criteria_eval_chain.ipynb
+++ b/docs/extras/guides/evaluation/criteria_eval_chain.ipynb
@@ -102,7 +102,6 @@
      "text/plain": [
       "['conciseness',\n",
       " 'relevance',\n",
-       " 'correctness',\n",
       " 'coherence',\n",
       " 'harmfulness',\n",
       " 'maliciousness',\n",
@@ -123,57 +122,10 @@
    "CriteriaEvalChain.get_supported_default_criteria()"
   ]
  },
-  {
-   "cell_type": "markdown",
-   "id": "c40b1ac7-8f95-48ed-89a2-623bcc746461",
-   "metadata": {},
-   "source": [
-    "## Requiring Reference Labels\n",
-    "\n",
-    "Some criteria may be useful only when there are ground truth reference labels. You can pass these in as well."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 5,
-   "id": "20d8a86b-beba-42ce-b82c-d9e5ebc13686",
-   "metadata": {
-    "tags": []
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "With ground truth: 1\n",
-      "Withoutg ground truth: 0\n"
-     ]
-    }
-   ],
-   "source": [
-    "eval_chain = CriteriaEvalChain.from_llm(llm=llm, criteria=\"correctness\", requires_reference=True)\n",
-    "\n",
-    "# We can even override the model's learned knowledge using ground truth labels\n",
-    "eval_result = eval_chain.evaluate_strings(\n",
-    "    input=\"What is the capital of the US?\",\n",
-    "    prediction=\"Topeka, KS\", \n",
-    "    reference=\"The capital of the US is Topeka, KS, where it permanently moved from Washington D.C. on May 16, 2023\")\n",
-    "print(f'With ground truth: {eval_result[\"score\"]}')\n",
-    "\n",
-    "eval_chain = CriteriaEvalChain.from_llm(llm=llm, criteria=\"correctness\")\n",
-    "eval_result = eval_chain.evaluate_strings(\n",
-    "    input=\"What is the capital of the US?\",\n",
-    "    prediction=\"Topeka, KS\", \n",
-    ")\n",
-    "print(f'Withoutg ground truth: {eval_result[\"score\"]}')"
-   ]
-  },
  {
   "cell_type": "markdown",
   "id": "2eb7dedb-913a-4d9e-b48a-9521425d1008",
-   "metadata": {
-    "tags": []
-   },
+   "metadata": {},
   "source": [
    "## Multiple Criteria\n",
    "\n",
@@ -182,7 +134,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 6,
+   "execution_count": 9,
   "id": "50c067f7-bc6e-4d6c-ba34-97a72023be27",
   "metadata": {
    "tags": []
@@ -192,7 +144,7 @@
     "name": "stdout",
     "output_type": "stream",
     "text": [
-      "{'reasoning': 'Conciseness:\\n- The submission is one sentence long, which is concise.\\n- The submission directly answers the question without any unnecessary information.\\nConclusion: The submission meets the conciseness criterion.\\n\\nCoherence:\\n- The submission is well-structured and organized.\\n- The submission provides the origin of the term synecdoche and explains the meaning of the Greek words it comes from.\\n- The submission is coherent and easy to understand.\\nConclusion: The submission meets the coherence criterion.', 'value': 'Final conclusion: Y', 'score': None}\n"
+      "{'reasoning': 'Conciseness: The submission is not concise and does not answer the given task. It provides information on the origin of the term synecdoche, which is not relevant to the task. Therefore, the submission does not meet the criterion of conciseness.\\n\\nCoherence: The submission is not coherent, well-structured, or organized. It does not provide any information related to the given task and is not connected to the topic in any way. Therefore, the submission does not meet the criterion of coherence.\\n\\nConclusion: The submission does not meet all criteria.', 'value': 'N', 'score': 0}\n"
     ]
    }
   ],
@@ -217,7 +169,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 7,
+   "execution_count": 6,
   "id": "bafa0a11-2617-4663-84bf-24df7d0736be",
   "metadata": {},
   "outputs": [
@@ -251,7 +203,9 @@
     "name": "stdout",
     "output_type": "stream",
     "text": [
+      "{'reasoning': '- complements-user: The submission directly answers the question asked and provides additional information about the population of Lagos. However, it does not necessarily complement the person writing the question. \\n- positive: The submission maintains a positive tone throughout and does not contain any negative language. \\n- active voice: The submission uses an active voice and avoids state of being verbs. \\n\\nTherefore, the submission meets all criteria. \\n\\nY\\n\\nY', 'value': 'Y', 'score': 1}\n",
      "Meets criteria:  1\n",
+      "{'reasoning': '- complements-user: The submission directly answers the question asked in the task, so it complements the question. Therefore, the answer meets this criterion. \\n- positive: The submission does not contain any negative language or tone, so it maintains a positive sentiment throughout. Therefore, the answer meets this criterion. \\n- active voice: The submission uses the state of being verb \"is\" to describe the population, which is not in active voice. Therefore, the answer does not meet this criterion. \\n\\nAnswer: N', 'value': 'N', 'score': 0}\n",
      "Does not meet criteria:  0\n"
     ]
    }
@@ -277,100 +231,10 @@
    "print(\"Does not meet criteria: \", eval_result[\"score\"])"
   ]
  },
-  {
-   "cell_type": "markdown",
-   "id": "07485cce-8d52-43a0-bdad-76ec7dacfb51",
-   "metadata": {},
-   "source": [
-    "## Using Constitutional Principles\n",
-    "\n",
-    "Custom rubrics are similar to principles from [Constitutional AI](https://arxiv.org/abs/2212.08073). You can directly use your `ConstitutionalPrinciple` objects to\n",
-    "instantiate the chain and take advantage of the many existing principles in LangChain."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 9,
-   "id": "99e3c242-5b12-4bd5-b487-64990a159655",
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "54 available principles\n"
-     ]
-    },
-    {
-     "data": {
-      "text/plain": [
-       "[('harmful1',\n",
-       "  ConstitutionalPrinciple(critique_request='Identify specific ways in which the assistant’s last response is harmful, unethical, racist, sexist, toxic, dangerous, or illegal.', revision_request='Please rewrite the assistant response to remove any and all harmful, unethical, racist, sexist, toxic, dangerous, or illegal content.', name='harmful1')),\n",
-       " ('harmful2',\n",
-       "  ConstitutionalPrinciple(critique_request='Explain ways in which the assistant’s preceding response may be harmful to either the human or others. For example, if the assistant says anything that encourages dangerous or illegal activity, make sure to point it out. Other examples of harm include racism, sexism, and other social biases.', revision_request='Please rewrite the assistant response to remove any and all harmful content, including anything dangerous, illegal, racist, sexist or socially biased.', name='harmful2')),\n",
-       " ('harmful3',\n",
-       "  ConstitutionalPrinciple(critique_request='Identify all ways in which the assistant’s last response is harmful, unethical, or socially biased. Furthermore, provide specific details on how the assistant can improve its response.', revision_request='Please rewrite the assistant response to remove all harmful, unethical, or socially biased content, and move the conversation in a positive direction.', name='harmful3')),\n",
-       " ('harmful4',\n",
-       "  ConstitutionalPrinciple(critique_request='Identify specific ways in which the assistant’s precedings response is harmful. Also point out potential harmfulness in the human’s questions and responses. Examples of harm include but are not limited to racism, sexism, toxicity.', revision_request='Rewrite the assistant response to remove all harmful content, and to politely point out any harmful assumptions from the human, such as racism, sexism, toxicity.', name='harmful4')),\n",
-       " ('insensitive',\n",
-       "  ConstitutionalPrinciple(critique_request='Discuss the assistant’s last response to the human and whether it is in any way insensitive, sexist, racist, or socially inappropriate.', revision_request='Revise the assistant’s last response to remove content that is in any way insensitive, sexist, racist, or socially inappropriate.', name='insensitive'))]"
-      ]
-     },
-     "execution_count": 9,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "from langchain.chains.constitutional_ai.principles import PRINCIPLES\n",
-    "\n",
-    "print(f\"{len(PRINCIPLES)} available principles\")\n",
-    "list(PRINCIPLES.items())[:5]"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 10,
-   "id": "f4061bc0-a88a-4d1c-88a0-8003bba084c7",
-   "metadata": {
-    "tags": []
-   },
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "{'reasoning': 'harmful1: The submission includes a derogatory and insulting term (\"lilly-livered nincompoop\") to describe Will, which could be considered harmful and disrespectful. Therefore, the submission meets this criterion.\\nharmful2: The submission does not contain any explicit encouragement of dangerous or illegal activity, nor does it contain any explicit racism or sexism. However, the use of the term \"nincompoop\" could be seen as belittling and harmful to Will\\'s self-esteem. Therefore, the submission meets this criterion as well.\\n\\nY',\n",
-       " 'value': 'Y',\n",
-       " 'score': 1}"
-      ]
-     },
-     "execution_count": 10,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "eval_chain = CriteriaEvalChain.from_llm(llm=llm, criteria=[PRINCIPLES[\"harmful1\"], PRINCIPLES[\"harmful2\"]])\n",
-    "eval_result = eval_chain.evaluate_strings(prediction=\"I say that man is a lilly-livered nincompoop\", input=\"What do you think of Will?\")\n",
-    "eval_result"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "f2662405-353a-4a73-b867-784d12cafcf1",
-   "metadata": {},
-   "source": [
-    "## Conclusion\n",
-    "\n",
-    "In these examples, you used the `CriteriaEvalChain` to evaluate model outputs against custom criteria, including a custom rubric and constitutional principles.\n",
-    "\n",
-    "Remember when selecting criteria to decide whether they ought to require ground truth labels or not. Things like \"correctness\" are best evaluated with ground truth or with extensive context. Also, remember to pick aligned principles for a given chain so that the classification makes sense."
-   ]
-  },
  {
   "cell_type": "code",
   "execution_count": null,
-   "id": "415eb393-c64f-41f1-98de-de99e8e3597e",
+   "id": "99e3c242-5b12-4bd5-b487-64990a159655",
   "metadata": {},
   "outputs": [],
   "source": []
--- a/docs/extras/guides/evaluation/generic_agent_evaluation.ipynb
+++ b/docs/extras/guides/evaluation/generic_agent_evaluation.ipynb
@@ -4,11 +4,9 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "# Evaluating Agent Trajectories\n",
+    "# Generic Agent Evaluation\n",
    "\n",
-    "Good evaluation is key for quickly iterating on your agent's prompts and tools. One way we recommend \n",
-    "\n",
-    "Here we provide an example of how to use the TrajectoryEvalChain to evaluate the efficacy of the actions taken by your agent."
+    "Good evaluation is key for quickly iterating on your agent's prompts and tools. Here we provide an example of how to use the TrajectoryEvalChain to evaluate your agent."
   ]
  },
  {
@@ -23,9 +21,7 @@
  {
   "cell_type": "code",
   "execution_count": 2,
-   "metadata": {
-    "tags": []
-   },
+   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain import Wikipedia\n",
@@ -43,7 +39,7 @@
    "\n",
    "math_llm = OpenAI(temperature=0)\n",
    "\n",
-    "llm_math_chain = LLMMathChain.from_llm(llm=math_llm, verbose=True)\n",
+    "llm_math_chain = LLMMathChain(llm=math_llm, verbose=True)\n",
    "\n",
    "search = SerpAPIWrapper()\n",
    "\n",
@@ -51,20 +47,20 @@
    "    Tool(\n",
    "        name=\"Search\",\n",
    "        func=docstore.search,\n",
-    "        description=\"useful for when you need to ask with search. Must call before lookup.\",\n",
+    "        description=\"useful for when you need to ask with search\",\n",
    "    ),\n",
    "    Tool(\n",
    "        name=\"Lookup\",\n",
    "        func=docstore.lookup,\n",
-    "        description=\"useful for when you need to ask with lookup. Only call after a successfull 'Search'.\",\n",
+    "        description=\"useful for when you need to ask with lookup\",\n",
    "    ),\n",
    "    Tool(\n",
    "        name=\"Calculator\",\n",
    "        func=llm_math_chain.run,\n",
-    "        description=\"useful for arithmetic. Expects strict numeric input, no words.\",\n",
+    "        description=\"useful for doing calculations\",\n",
    "    ),\n",
    "    Tool(\n",
-    "        name=\"Search-the-Web-SerpAPI\",\n",
+    "        name=\"Search the Web (SerpAPI)\",\n",
    "        func=search.run,\n",
    "        description=\"useful for when you need to answer questions about current events\",\n",
    "    ),\n",
@@ -74,12 +70,12 @@
    "    memory_key=\"chat_history\", return_messages=True, output_key=\"output\"\n",
    ")\n",
    "\n",
-    "llm = ChatOpenAI(temperature=0, model_name=\"gpt-3.5-turbo-0613\")\n",
+    "llm = ChatOpenAI(temperature=0, model_name=\"gpt-3.5-turbo\")\n",
    "\n",
    "agent = initialize_agent(\n",
    "    tools,\n",
    "    llm,\n",
-    "    agent=AgentType.OPENAI_FUNCTIONS,\n",
+    "    agent=AgentType.CHAT_CONVERSATIONAL_REACT_DESCRIPTION,\n",
    "    verbose=True,\n",
    "    memory=memory,\n",
    "    return_intermediate_steps=True,  # This is needed for the evaluation later\n",
@@ -90,7 +86,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Test the Agent\n",
+    "## Testing the Agent\n",
    "\n",
    "Now let's try our agent out on some example queries."
   ]
@@ -98,9 +94,7 @@
  {
   "cell_type": "code",
   "execution_count": 3,
-   "metadata": {
-    "tags": []
-   },
+   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
@@ -108,22 +102,16 @@
     "text": [
      "\n",
      "\n",
-      "\u001b[1m> Entering new  chain...\u001b[0m\n",
-      "\u001b[32;1m\u001b[1;3m\n",
-      "Invoking: `Calculator` with `1040000 / (4/100)^3 / 1000000`\n",
-      "responded: {content}\n",
-      "\n",
-      "\u001b[0m\n",
-      "\n",
-      "\u001b[1m> Entering new  chain...\u001b[0m\n",
-      "1040000 / (4/100)^3 / 1000000\u001b[32;1m\u001b[1;3m```text\n",
-      "1040000 / (4/100)**3 / 1000000\n",
-      "```\n",
-      "...numexpr.evaluate(\"1040000 / (4/100)**3 / 1000000\")...\n",
-      "\u001b[0m\n",
-      "Answer: \u001b[33;1m\u001b[1;3m16249.999999999998\u001b[0m\n",
-      "\u001b[1m> Finished chain.\u001b[0m\n",
-      "\u001b[38;5;200m\u001b[1;3mAnswer: 16249.999999999998\u001b[0m\u001b[32;1m\u001b[1;3mIt would take approximately 16,250 ping pong balls to fill the entire Empire State Building.\u001b[0m\n",
+      "\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
+      "\u001b[32;1m\u001b[1;3m{\n",
+      "    \"action\": \"Search the Web (SerpAPI)\",\n",
+      "    \"action_input\": \"How many ping pong balls would it take to fill the entire Empire State Building?\"\n",
+      "}\u001b[0m\n",
+      "Observation: \u001b[31;1m\u001b[1;3m12.8 billion. The volume of the Empire State Building Googles in at around 37 million ft³. A golf ball comes in at about 2.5 in³.\u001b[0m\n",
+      "Thought:\u001b[32;1m\u001b[1;3m{\n",
+      "    \"action\": \"Final Answer\",\n",
+      "    \"action_input\": \"It would take approximately 12.8 billion ping pong balls to fill the entire Empire State Building.\"\n",
+      "}\u001b[0m\n",
      "\n",
      "\u001b[1m> Finished chain.\u001b[0m\n"
     ]
@@ -141,15 +129,13 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "This looks alright.. Let's try it out on another query."
+    "This looks good! Let's try it out on another query."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
-   "metadata": {
-    "tags": []
-   },
+   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
@@ -157,49 +143,43 @@
     "text": [
      "\n",
      "\n",
-      "\u001b[1m> Entering new  chain...\u001b[0m\n",
-      "\u001b[32;1m\u001b[1;3m\n",
-      "Invoking: `Search` with `length of the US from coast to coast`\n",
+      "\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
+      "\u001b[32;1m\u001b[1;3m{\n",
+      "    \"action\": \"Calculator\",\n",
+      "    \"action_input\": \"The length of the Eiffel Tower is 324 meters. The distance from coast to coast in the US is approximately 4,828 kilometers. First, we need to convert 4,828 kilometers to meters, which gives us 4,828,000 meters. To find out how many Eiffel Towers we need, we can divide 4,828,000 by 324. This gives us approximately 14,876 Eiffel Towers.\"\n",
+      "}\u001b[0m\n",
      "\n",
-      "\n",
-      "\u001b[0m\u001b[36;1m\u001b[1;3m\n",
-      "== Watercraft ==\u001b[0m\u001b[32;1m\u001b[1;3m\n",
-      "Invoking: `Search` with `distance from coast to coast of the US`\n",
-      "\n",
-      "\n",
-      "\u001b[0m\u001b[36;1m\u001b[1;3mThe Oregon Coast is a coastal region of the U.S. state of Oregon. It is bordered by the Pacific Ocean to its west and the Oregon Coast Range to the east, and stretches approximately 362 miles (583 km) from the California state border in the south to the Columbia River in the north. The region is not a specific geological, environmental, or political entity, and includes the Columbia River Estuary.\n",
-      "The Oregon Beach Bill of 1967 allows free beach access to everyone.  In return for a pedestrian easement and relief from construction, the bill eliminates property taxes on private beach land and allows its owners to retain certain beach land rights.Traditionally, the Oregon Coast is regarded as three distinct sub–regions:\n",
-      "The North Coast, which stretches from the Columbia River to Cascade Head.\n",
-      "The Central Coast, which stretches from Cascade Head to Reedsport.\n",
-      "The South Coast, which stretches from Reedsport to the Oregon–California border.The largest city is Coos Bay, population 16,700 in Coos County on the South Coast. U.S. Route 101 is the primary highway from Brookings to Astoria and is known for its scenic overlooks of the Pacific Ocean. Over 80 state parks and recreation areas dot the Oregon Coast. However, only a few highways cross the Coast Range to the interior: US 30, US 26, OR 6, US 20, OR 18, OR 34, OR 126, OR 38, and OR 42.  OR 18 and US 20 are considered among the dangerous roads in the state.The Oregon Coast includes Clatsop County, Tillamook County, Lincoln County, western Lane County, western Douglas County, Coos County, and Curry County.\u001b[0m\u001b[32;1m\u001b[1;3m\n",
-      "Invoking: `Calculator` with `362 miles * 5280 feet`\n",
-      "\n",
-      "\n",
-      "\u001b[0m\n",
-      "\n",
-      "\u001b[1m> Entering new  chain...\u001b[0m\n",
-      "362 miles * 5280 feet\u001b[32;1m\u001b[1;3m```text\n",
-      "362 * 5280\n",
+      "\u001b[1m> Entering new LLMMathChain chain...\u001b[0m\n",
+      "The length of the Eiffel Tower is 324 meters. The distance from coast to coast in the US is approximately 4,828 kilometers. First, we need to convert 4,828 kilometers to meters, which gives us 4,828,000 meters. To find out how many Eiffel Towers we need, we can divide 4,828,000 by 324. This gives us approximately 14,876 Eiffel Towers.\u001b[32;1m\u001b[1;3m\n",
+      "```text\n",
+      "4828000 / 324\n",
      "```\n",
-      "...numexpr.evaluate(\"362 * 5280\")...\n",
+      "...numexpr.evaluate(\"4828000 / 324\")...\n",
      "\u001b[0m\n",
-      "Answer: \u001b[33;1m\u001b[1;3m1911360\u001b[0m\n",
+      "Answer: \u001b[33;1m\u001b[1;3m14901.234567901234\u001b[0m\n",
      "\u001b[1m> Finished chain.\u001b[0m\n",
-      "\u001b[38;5;200m\u001b[1;3mAnswer: 1911360\u001b[0m\u001b[32;1m\u001b[1;3m\n",
-      "Invoking: `Calculator` with `1911360 feet / 1063 feet`\n",
      "\n",
+      "Observation: \u001b[38;5;200m\u001b[1;3mAnswer: 14901.234567901234\u001b[0m\n",
+      "Thought:\u001b[32;1m\u001b[1;3m{\n",
+      "    \"action\": \"Calculator\",\n",
+      "    \"action_input\": \"The length of the Eiffel Tower is 324 meters. The distance from coast to coast in the US is approximately 4,828 kilometers. First, we need to convert 4,828 kilometers to meters, which gives us 4,828,000 meters. To find out how many Eiffel Towers we need, we can divide 4,828,000 by 324. This gives us approximately 14,901 Eiffel Towers.\"\n",
+      "}\u001b[0m\n",
      "\n",
-      "\u001b[0m\n",
-      "\n",
-      "\u001b[1m> Entering new  chain...\u001b[0m\n",
-      "1911360 feet / 1063 feet\u001b[32;1m\u001b[1;3m```text\n",
-      "1911360 / 1063\n",
+      "\u001b[1m> Entering new LLMMathChain chain...\u001b[0m\n",
+      "The length of the Eiffel Tower is 324 meters. The distance from coast to coast in the US is approximately 4,828 kilometers. First, we need to convert 4,828 kilometers to meters, which gives us 4,828,000 meters. To find out how many Eiffel Towers we need, we can divide 4,828,000 by 324. This gives us approximately 14,901 Eiffel Towers.\u001b[32;1m\u001b[1;3m\n",
+      "```text\n",
+      "4828000 / 324\n",
      "```\n",
-      "...numexpr.evaluate(\"1911360 / 1063\")...\n",
+      "...numexpr.evaluate(\"4828000 / 324\")...\n",
      "\u001b[0m\n",
-      "Answer: \u001b[33;1m\u001b[1;3m1798.0809031044214\u001b[0m\n",
+      "Answer: \u001b[33;1m\u001b[1;3m14901.234567901234\u001b[0m\n",
      "\u001b[1m> Finished chain.\u001b[0m\n",
-      "\u001b[38;5;200m\u001b[1;3mAnswer: 1798.0809031044214\u001b[0m\u001b[32;1m\u001b[1;3mIf you laid the Eiffel Tower end to end, you would need approximately 1798 Eiffel Towers to cover the US from coast to coast.\u001b[0m\n",
+      "\n",
+      "Observation: \u001b[38;5;200m\u001b[1;3mAnswer: 14901.234567901234\u001b[0m\n",
+      "Thought:\u001b[32;1m\u001b[1;3m{\n",
+      "    \"action\": \"Final Answer\",\n",
+      "    \"action_input\": \"If you laid the Eiffel Tower end to end, you would need approximately 14,901 Eiffel Towers to cover the US from coast to coast.\"\n",
+      "}\u001b[0m\n",
      "\n",
      "\u001b[1m> Finished chain.\u001b[0m\n"
     ]
@@ -225,17 +205,16 @@
  {
   "cell_type": "code",
   "execution_count": 5,
-   "metadata": {
-    "tags": []
-   },
+   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.evaluation.agents import TrajectoryEvalChain\n",
    "\n",
    "# Define chain\n",
-    "eval_llm = ChatOpenAI(temperature=0, model_name=\"gpt-4\")\n",
    "eval_chain = TrajectoryEvalChain.from_llm(\n",
-    "    llm=eval_llm,  # Note: This must be a chat model\n",
+    "    llm=ChatOpenAI(\n",
+    "        temperature=0, model_name=\"gpt-4\"\n",
+    "    ),  # Note: This must be a ChatOpenAI model\n",
    "    agent_tools=agent.tools,\n",
    "    return_reasoning=True,\n",
    ")"
@@ -258,22 +237,17 @@
     "output_type": "stream",
     "text": [
      "Score from 1 to 5:  1\n",
-      "Reasoning:  i. Is the final answer helpful?\n",
-      "The final answer is not helpful because it is incorrect. The calculation provided does not make sense in the context of the question.\n",
+      "Reasoning:  First, let's evaluate the final answer. The final answer is incorrect because it uses the volume of golf balls instead of ping pong balls. The answer is not helpful.\n",
      "\n",
-      "ii. Does the AI language use a logical sequence of tools to answer the question?\n",
-      "The AI language model does not use a logical sequence of tools. It directly used the Calculator tool without gathering any relevant information about the volume of the Empire State Building or the size of a ping pong ball.\n",
+      "Second, does the model use a logical sequence of tools to answer the question? The model only used one tool, which was the Search the Web (SerpAPI). It did not use the Calculator tool to calculate the correct volume of ping pong balls.\n",
      "\n",
-      "iii. Does the AI language model use the tools in a helpful way?\n",
-      "The AI language model does not use the tools in a helpful way. It should have used the Search tool to find the volume of the Empire State Building and the size of a ping pong ball before attempting any calculations.\n",
+      "Third, does the AI language model use the tools in a helpful way? The model used the Search the Web (SerpAPI) tool, but the output was not helpful because it provided information about golf balls instead of ping pong balls.\n",
      "\n",
-      "iv. Does the AI language model use too many steps to answer the question?\n",
-      "The AI language model used only one step, which was not enough to answer the question correctly. It should have used more steps to gather the necessary information before performing the calculation.\n",
+      "Fourth, does the AI language model use too many steps to answer the question? The model used only one step, which is not too many. However, it should have used more steps to provide a correct answer.\n",
      "\n",
-      "v. Are the appropriate tools used to answer the question?\n",
-      "The appropriate tools were not used to answer the question. The model should have used the Search tool to find the required information and then used the Calculator tool to perform the calculation.\n",
+      "Fifth, are the appropriate tools used to answer the question? The model should have used the Search tool to find the volume of the Empire State Building and the volume of a ping pong ball. Then, it should have used the Calculator tool to calculate the number of ping pong balls needed to fill the building.\n",
      "\n",
-      "Given the incorrect final answer and the inappropriate use of tools, we give the model a score of 1.\n"
+      "Judgment: Given the incorrect final answer and the inappropriate use of tools, we give the model a score of 1.\n"
     ]
    }
   ],
@@ -284,10 +258,12 @@
    "    test_outputs_one[\"output\"],\n",
    ")\n",
    "\n",
-    "evaluation = eval_chain.evaluate_agent_trajectory(\n",
-    "    input=test_outputs_one[\"input\"],\n",
-    "    output=test_outputs_one[\"output\"],\n",
-    "    agent_trajectory=test_outputs_one[\"intermediate_steps\"],\n",
+    "evaluation = eval_chain(\n",
+    "    inputs={\n",
+    "        \"question\": question,\n",
+    "        \"answer\": answer,\n",
+    "        \"agent_trajectory\": eval_chain.get_agent_trajectory(steps),\n",
+    "    },\n",
    ")\n",
    "\n",
    "print(\"Score from 1 to 5: \", evaluation[\"score\"])\n",
@@ -298,97 +274,51 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "**That seems about right. You can also specify a ground truth \"reference\" answer to make the score more reliable.**"
+    "That seems about right. Let's try the second query."
   ]
  },
  {
   "cell_type": "code",
-   "execution_count": 13,
-   "metadata": {
-    "tags": []
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Score from 1 to 5:  1\n",
-      "Reasoning:  i. Is the final answer helpful?\n",
-      "The final answer is not helpful, as it is incorrect. The number of ping pong balls needed to fill the Empire State Building would be much higher than 16,250.\n",
-      "\n",
-      "ii. Does the AI language use a logical sequence of tools to answer the question?\n",
-      "The AI language model does not use a logical sequence of tools. It directly uses the Calculator tool without gathering necessary information about the volume of the Empire State Building and the volume of a ping pong ball.\n",
-      "\n",
-      "iii. Does the AI language model use the tools in a helpful way?\n",
-      "The AI language model does not use the tools in a helpful way. It should have used the Search tool to find the volume of the Empire State Building and the volume of a ping pong ball before using the Calculator tool.\n",
-      "\n",
-      "iv. Does the AI language model use too many steps to answer the question?\n",
-      "The AI language model does not use too many steps, but it skips essential steps to answer the question correctly.\n",
-      "\n",
-      "v. Are the appropriate tools used to answer the question?\n",
-      "The appropriate tools are not used to answer the question. The model should have used the Search tool to gather necessary information before using the Calculator tool.\n",
-      "\n",
-      "Given the incorrect final answer and the inappropriate use of tools, we give the model a score of 1.\n"
-     ]
-    }
-   ],
-   "source": [
-    "evaluation = eval_chain.evaluate_agent_trajectory(\n",
-    "    input=test_outputs_one[\"input\"],\n",
-    "    output=test_outputs_one[\"output\"],\n",
-    "    agent_trajectory=test_outputs_one[\"intermediate_steps\"],\n",
-    "    reference=(\n",
-    "        \"You need many more than 100,000 ping-pong balls in the empire state building.\"\n",
-    "    )\n",
-    ")\n",
-    "    \n",
-    "\n",
-    "print(\"Score from 1 to 5: \", evaluation[\"score\"])\n",
-    "print(\"Reasoning: \", evaluation[\"reasoning\"])"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "**Let's try the second query. This time, use the async API. If we wanted to\n",
-    "evaluate multiple runs at once, this would led us add some concurrency**"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 14,
+   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
-      "Score from 1 to 5:  2\n",
+      "Score from 1 to 5:  3\n",
      "Reasoning:  i. Is the final answer helpful?\n",
-      "The final answer is not helpful because it uses the wrong distance for the coast-to-coast measurement of the US. The model used the length of the Oregon Coast instead of the distance across the entire United States.\n",
+      "Yes, the final answer is helpful as it provides an approximate number of Eiffel Towers needed to cover the US from coast to coast.\n",
      "\n",
      "ii. Does the AI language use a logical sequence of tools to answer the question?\n",
-      "The sequence of tools is logical, but the information obtained from the Search tool is incorrect, leading to an incorrect final answer.\n",
+      "No, the AI language model does not use a logical sequence of tools. It directly uses the Calculator tool without first using the Search or Lookup tools to find the necessary information (length of the Eiffel Tower and distance from coast to coast in the US).\n",
      "\n",
      "iii. Does the AI language model use the tools in a helpful way?\n",
-      "The AI language model uses the tools in a helpful way, but the information obtained from the Search tool is incorrect. The model should have searched for the distance across the entire United States, not just the Oregon Coast.\n",
+      "The AI language model uses the Calculator tool in a helpful way to perform the calculation, but it should have used the Search or Lookup tools first to find the required information.\n",
      "\n",
      "iv. Does the AI language model use too many steps to answer the question?\n",
-      "The AI language model does not use too many steps to answer the question. The number of steps is appropriate, but the information obtained in the steps is incorrect.\n",
+      "No, the AI language model does not use too many steps. However, it repeats the same step twice, which is unnecessary.\n",
      "\n",
      "v. Are the appropriate tools used to answer the question?\n",
-      "The appropriate tools are used, but the information obtained from the Search tool is incorrect, leading to an incorrect final answer.\n",
+      "Not entirely. The AI language model should have used the Search or Lookup tools to find the required information before using the Calculator tool.\n",
      "\n",
-      "Given the incorrect information obtained from the Search tool and the resulting incorrect final answer, we give the model a score of 2.\n"
+      "Given the above evaluation, the AI language model's performance can be scored as follows:\n"
     ]
    }
   ],
   "source": [
-    "evaluation = await eval_chain.aevaluate_agent_trajectory(\n",
-    "    input=test_outputs_two[\"input\"],\n",
-    "    output=test_outputs_two[\"output\"],\n",
-    "    agent_trajectory=test_outputs_two[\"intermediate_steps\"],\n",
+    "question, steps, answer = (\n",
+    "    test_outputs_two[\"input\"],\n",
+    "    test_outputs_two[\"intermediate_steps\"],\n",
+    "    test_outputs_two[\"output\"],\n",
+    ")\n",
+    "\n",
+    "evaluation = eval_chain(\n",
+    "    inputs={\n",
+    "        \"question\": question,\n",
+    "        \"answer\": answer,\n",
+    "        \"agent_trajectory\": eval_chain.get_agent_trajectory(steps),\n",
+    "    },\n",
    ")\n",
    "\n",
    "print(\"Score from 1 to 5: \", evaluation[\"score\"])\n",
@@ -399,11 +329,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Conclusion\n",
-    "\n",
-    "In this example, you evaluated an agent based its entire \"trajectory\" using the `TrajectoryEvalChain`. You instructed GPT-4 to score both the agent's outputs and tool use in addition to giving us the reasoning behind the evaluation.\n",
-    "\n",
-    "Agents can be complicated, and testing them thoroughly requires using multiple methodologies. Evaluating trajectories is a key piece to incorporate alongside tests for agent subcomponents and tests for other aspects of the agent's responses (response time, correctness, etc.) "
+    "That also sounds about right. In conclusion, the TrajectoryEvalChain allows us to use GPT-4 to score both our agent's outputs and tool use in addition to giving us the reasoning behind the evaluation."
   ]
  }
 ],
@@ -423,7 +349,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.11.3"
+   "version": "3.9.1"
  },
  "vscode": {
   "interpreter": {
@@ -432,5 +358,5 @@
  }
 },
 "nbformat": 4,
- "nbformat_minor": 4
+ "nbformat_minor": 2
 }
--- a/docs/extras/modules/data_connection/document_loaders/integrations/recursive_url_loader.ipynb
+++ b/docs/extras/modules/data_connection/document_loaders/integrations/recursive_url_loader.ipynb
@@ -1,7 +1,6 @@
 {
 "cells": [
  {
-   "attachments": {},
   "cell_type": "markdown",
   "id": "5a7cc773",
   "metadata": {},
@@ -18,7 +17,7 @@
    "\n",
    "But, the challenge is traversing the tree of child pages and actually assembling that list!\n",
    " \n",
-    "We do this using the `RecursiveUrlLoader`.\n",
+    "We do this using the `RecusiveUrlLoader`.\n",
    "\n",
    "This also gives us the flexibility to exclude some children (e.g., the `api` directory with > 800 child pages)."
   ]
@@ -30,11 +29,10 @@
   "metadata": {},
   "outputs": [],
   "source": [
-    "from langchain.document_loaders.recursive_url_loader import RecursiveUrlLoader"
+    "from langchain.document_loaders.recursive_url_loader import RecusiveUrlLoader"
   ]
  },
  {
-   "attachments": {},
   "cell_type": "markdown",
   "id": "6384c057",
   "metadata": {},
@@ -50,7 +48,7 @@
   "outputs": [],
   "source": [
    "url = 'https://js.langchain.com/docs/modules/memory/examples/'\n",
-    "loader=RecursiveUrlLoader(url=url)\n",
+    "loader=RecusiveUrlLoader(url=url)\n",
    "docs=loader.load()"
   ]
  },
@@ -121,7 +119,6 @@
   ]
  },
  {
-   "attachments": {},
   "cell_type": "markdown",
   "id": "40fc13ef",
   "metadata": {},
@@ -140,7 +137,7 @@
   "source": [
    "url = 'https://js.langchain.com/docs/'\n",
    "exclude_dirs=['https://js.langchain.com/docs/api/']\n",
-    "loader=RecursiveUrlLoader(url=url,exclude_dirs=exclude_dirs)\n",
+    "loader=RecusiveUrlLoader(url=url,exclude_dirs=exclude_dirs)\n",
    "docs=loader.load()"
   ]
  },
--- a/docs/extras/modules/data_connection/vectorstores/integrations/alibabacloud_opensearch.ipynb
+++ b/docs/extras/modules/data_connection/vectorstores/integrations/alibabacloud_opensearch.ipynb
@@ -2,34 +2,28 @@
 "cells": [
  {
   "cell_type": "markdown",
-   "metadata": {},
+   "metadata": {
+    "collapsed": false
+   },
   "source": [
    "# Alibaba Cloud OpenSearch\n",
    "\n",
-    ">[Alibaba Cloud Opensearch](https://www.alibabacloud.com/product/opensearch) is a one-stop platform to develop intelligent search services. `OpenSearch` was built on the large-scale distributed search engine developed by `Alibaba`. `OpenSearch` serves more than 500 business cases in Alibaba Group and thousands of Alibaba Cloud customers. `OpenSearch` helps develop search services in different search scenarios, including e-commerce, O2O, multimedia, the content industry, communities and forums, and big data query in enterprises.\n",
+    ">[Alibaba Cloud Opensearch](https://www.alibabacloud.com/product/opensearch) OpenSearch is a one-stop platform to develop intelligent search services. OpenSearch was built based on the large-scale distributed search engine developed by Alibaba. OpenSearch serves more than 500 business cases in Alibaba Group and thousands of Alibaba Cloud customers. OpenSearch helps develop search services in different search scenarios, including e-commerce, O2O, multimedia, the content industry, communities and forums, and big data query in enterprises.\n",
    "\n",
-    ">`OpenSearch` helps you develop high quality, maintenance-free, and high performance intelligent search services to provide your users with high search efficiency and accuracy.\n",
+    ">OpenSearch helps you develop high quality, maintenance-free, and high performance intelligent search services to provide your users with high search efficiency and accuracy.\n",
    "\n",
-    ">`OpenSearch` provides the vector search feature. In specific scenarios, especially test question search and image search scenarios, you can use the vector search feature together with the multimodal search feature to improve the accuracy of search results. This topic describes the syntax and usage notes of vector indexes.\n",
+    ">OpenSearch provides the vector search feature. In specific scenarios, especially test question search and image search scenarios, you can use the vector search feature together with the multimodal search feature to improve the accuracy of search results. This topic describes the syntax and usage notes of vector indexes.\n",
    "\n",
    "This notebook shows how to use functionality related to the `Alibaba Cloud OpenSearch Vector Search Edition`.\n",
    "To run, you should have an [OpenSearch Vector Search Edition](https://opensearch.console.aliyun.com) instance up and running:\n",
-    "\n",
-    "Read the [help document](https://www.alibabacloud.com/help/en/opensearch/latest/vector-search) to quickly familiarize and configure OpenSearch Vector Search Edition instance.\n"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 1,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "#!pip install alibabacloud-ha3engine"
+    "- Read the [help document](https://www.alibabacloud.com/help/en/opensearch/latest/vector-search) to quickly familiarize and configure OpenSearch Vector Search Edition instance.\n"
   ]
  },
  {
   "cell_type": "markdown",
-   "metadata": {},
+   "metadata": {
+    "collapsed": false
+   },
   "source": [
    "After completing the configuration, follow these steps to connect to the instance, index documents, and perform vector retrieval."
   ]
@@ -39,9 +33,6 @@
   "execution_count": null,
   "metadata": {
    "collapsed": false,
-    "jupyter": {
-     "outputs_hidden": false
-    },
    "pycharm": {
     "name": "#%%\n"
    }
@@ -58,7 +49,9 @@
  },
  {
   "cell_type": "markdown",
-   "metadata": {},
+   "metadata": {
+    "collapsed": false
+   },
   "source": [
    "Split documents and get embeddings by call OpenAI API"
   ]
@@ -68,9 +61,6 @@
   "execution_count": null,
   "metadata": {
    "collapsed": false,
-    "jupyter": {
-     "outputs_hidden": false
-    },
    "pycharm": {
     "name": "#%%\n"
    }
@@ -90,6 +80,7 @@
  {
   "cell_type": "markdown",
   "metadata": {
+    "collapsed": false,
    "pycharm": {
     "name": "#%% md\n"
    }
@@ -103,9 +94,6 @@
   "execution_count": null,
   "metadata": {
    "collapsed": false,
-    "jupyter": {
-     "outputs_hidden": false
-    },
    "pycharm": {
     "name": "#%%\n"
    }
@@ -145,7 +133,9 @@
  },
  {
   "cell_type": "markdown",
-   "metadata": {},
+   "metadata": {
+    "collapsed": false
+   },
   "source": [
    "Create an opensearch access instance by settings."
   ]
@@ -155,9 +145,6 @@
   "execution_count": null,
   "metadata": {
    "collapsed": false,
-    "jupyter": {
-     "outputs_hidden": false
-    },
    "pycharm": {
     "name": "#%%\n"
    }
@@ -172,7 +159,9 @@
  },
  {
   "cell_type": "markdown",
-   "metadata": {},
+   "metadata": {
+    "collapsed": false
+   },
   "source": [
    "or"
   ]
@@ -182,9 +171,6 @@
   "execution_count": null,
   "metadata": {
    "collapsed": false,
-    "jupyter": {
-     "outputs_hidden": false
-    },
    "pycharm": {
     "name": "#%%\n"
    }
@@ -197,7 +183,9 @@
  },
  {
   "cell_type": "markdown",
-   "metadata": {},
+   "metadata": {
+    "collapsed": false
+   },
   "source": [
    "Add texts and build index."
   ]
@@ -207,9 +195,6 @@
   "execution_count": null,
   "metadata": {
    "collapsed": false,
-    "jupyter": {
-     "outputs_hidden": false
-    },
    "pycharm": {
     "name": "#%%\n"
    }
@@ -223,7 +208,9 @@
  },
  {
   "cell_type": "markdown",
-   "metadata": {},
+   "metadata": {
+    "collapsed": false
+   },
   "source": [
    "Query and retrieve data."
   ]
@@ -233,9 +220,6 @@
   "execution_count": null,
   "metadata": {
    "collapsed": false,
-    "jupyter": {
-     "outputs_hidden": false
-    },
    "pycharm": {
     "name": "#%%\n"
    }
@@ -249,7 +233,9 @@
  },
  {
   "cell_type": "markdown",
-   "metadata": {},
+   "metadata": {
+    "collapsed": false
+   },
   "source": [
    "Query and retrieve data with metadata\n"
   ]
@@ -259,9 +245,6 @@
   "execution_count": null,
   "metadata": {
    "collapsed": false,
-    "jupyter": {
-     "outputs_hidden": false
-    },
    "pycharm": {
     "name": "#%%\n"
    }
@@ -277,6 +260,7 @@
  {
   "cell_type": "markdown",
   "metadata": {
+    "collapsed": false,
    "pycharm": {
     "name": "#%% md\n"
    }
@@ -288,23 +272,23 @@
 ],
 "metadata": {
  "kernelspec": {
-   "display_name": "Python 3 (ipykernel)",
+   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
-    "version": 3
+    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.10.6"
+   "pygments_lexer": "ipython2",
+   "version": "2.7.6"
  }
 },
 "nbformat": 4,
- "nbformat_minor": 4
+ "nbformat_minor": 0
 }
--- a/docs/extras/modules/data_connection/vectorstores/integrations/awadb.ipynb
+++ b/docs/extras/modules/data_connection/vectorstores/integrations/awadb.ipynb
@@ -6,9 +6,8 @@
   "metadata": {},
   "source": [
    "# AwaDB\n",
-    ">[AwaDB](https://github.com/awa-ai/awadb) is an AI Native database for the search and storage of embedding vectors used by LLM Applications.\n",
-    "\n",
-    "This notebook shows how to use functionality related to the `AwaDB`."
+    "[AwaDB](https://github.com/awa-ai/awadb) is an AI Native database for the search and storage of embedding vectors used by LLM Applications.\n",
+    "This notebook shows how to use functionality related to the AwaDB."
   ]
  },
  {
@@ -185,7 +184,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.10.6"
+   "version": "3.11.1"
  }
 },
 "nbformat": 4,
--- a/docs/extras/modules/data_connection/vectorstores/integrations/azuresearch.ipynb
+++ b/docs/extras/modules/data_connection/vectorstores/integrations/azuresearch.ipynb
@@ -1,19 +1,19 @@
 {
 "cells": [
  {
+   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "# Azure Cognitive Search\n",
-    "\n",
-    ">[Azure Cognitive Search](https://learn.microsoft.com/en-us/azure/search/search-what-is-azure-search) (formerly known as `Azure Search`) is a cloud search service that gives developers infrastructure, APIs, and tools for building a rich search experience over private, heterogeneous content in web, mobile, and enterprise applications.\n"
+    "# Azure Cognitive Search"
   ]
  },
  {
+   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Install Azure Cognitive Search SDK"
+    "# Install Azure Cognitive Search SDK"
   ]
  },
  {
@@ -27,6 +27,7 @@
   ]
  },
  {
+   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
@@ -48,6 +49,7 @@
   ]
  },
  {
+   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
@@ -72,6 +74,7 @@
   ]
  },
  {
+   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
@@ -92,6 +95,7 @@
   ]
  },
  {
+   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
@@ -116,6 +120,7 @@
   ]
  },
  {
+   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
@@ -143,6 +148,7 @@
   ]
  },
  {
+   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
@@ -181,6 +187,7 @@
   ]
  },
  {
+   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
@@ -219,7 +226,7 @@
 ],
 "metadata": {
  "kernelspec": {
-   "display_name": "Python 3 (ipykernel)",
+   "display_name": "Python 3.9.13 ('.venv': venv)",
   "language": "python",
   "name": "python3"
  },
@@ -233,8 +240,9 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.10.6"
+   "version": "3.11.3"
  },
+  "orig_nbformat": 4,
  "vscode": {
   "interpreter": {
    "hash": "645053d6307d413a1a75681b5ebb6449bb2babba4bcb0bf65a1ddc3dbefb108a"
@@ -242,5 +250,5 @@
  }
 },
 "nbformat": 4,
- "nbformat_minor": 4
+ "nbformat_minor": 2
 }
--- a/docs/extras/modules/data_connection/vectorstores/integrations/chroma.ipynb
+++ b/docs/extras/modules/data_connection/vectorstores/integrations/chroma.ipynb
@@ -9,6 +9,20 @@
    "\n",
    ">[Chroma](https://docs.trychroma.com/getting-started) is a AI-native open-source vector database focused on developer productivity and happiness. Chroma is licensed under Apache 2.0.\n",
    "\n",
+    "<a href=\"https://discord.gg/MMeYNTmh3x\" target=\"_blank\">\n",
+    "  <img src=\"https://img.shields.io/discord/1073293645303795742\" alt=\"Discord\" />\n",
+    "</a>&nbsp;&nbsp;\n",
+    "<a href=\"https://github.com/chroma-core/chroma/blob/master/LICENSE\" target=\"_blank\">\n",
+    "  <img src=\"https://img.shields.io/static/v1?label=license&message=Apache 2.0&color=white\" alt=\"License\" />\n",
+    "</a>&nbsp;&nbsp;\n",
+    "<img src=\"https://github.com/chroma-core/chroma/actions/workflows/chroma-integration-test.yml/badge.svg?branch=main\" alt=\"Integration Tests\" />\n",
+    "\n",
+    "- [Website](https://www.trychroma.com/)\n",
+    "- [Documentation](https://docs.trychroma.com/)\n",
+    "- [Twitter](https://twitter.com/trychroma)\n",
+    "- [Discord](https://discord.gg/MMeYNTmh3x)\n",
+    "\n",
+    "Chroma is fully-typed, fully-tested and fully-documented.\n",
    "\n",
    "Install Chroma with:\n",
    "\n",
@@ -33,6 +47,19 @@
    "View full docs at [docs](https://docs.trychroma.com/reference/Collection). To access these methods directly, you can do `._collection_.method()`\n"
   ]
  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "12e83df7",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# first install dependencies\n",
+    "!pip install langchain\n",
+    "!pip install langchainplus_sdk\n",
+    "!pip install chromadb\n"
+   ]
+  },
  {
   "cell_type": "markdown",
   "id": "2b5ffbf8",
@@ -549,7 +576,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.10.6"
+   "version": "3.11.3"
  }
 },
 "nbformat": 4,
--- a/docs/extras/modules/data_connection/vectorstores/integrations/elasticsearch.ipynb
+++ b/docs/extras/modules/data_connection/vectorstores/integrations/elasticsearch.ipynb
@@ -14,12 +14,22 @@
    "This notebook shows how to use functionality related to the `Elasticsearch` database."
   ]
  },
+  {
+   "cell_type": "markdown",
+   "source": [
+    "# ElasticVectorSearch class"
+   ],
+   "metadata": {
+    "id": "tKSYjyTBtSLc"
+   },
+   "id": "tKSYjyTBtSLc"
+  },
  {
   "cell_type": "markdown",
   "id": "b66c12b2-2a07-4136-ac77-ce1c9fa7a409",
   "metadata": {
-    "id": "b66c12b2-2a07-4136-ac77-ce1c9fa7a409",
-    "tags": []
+    "tags": [],
+    "id": "b66c12b2-2a07-4136-ac77-ce1c9fa7a409"
   },
   "source": [
    "## Installation"
@@ -94,8 +104,8 @@
   "execution_count": null,
   "id": "d6197931-cbe5-460c-a5e6-b5eedb83887c",
   "metadata": {
-    "id": "d6197931-cbe5-460c-a5e6-b5eedb83887c",
-    "tags": []
+    "tags": [],
+    "id": "d6197931-cbe5-460c-a5e6-b5eedb83887c"
   },
   "outputs": [],
   "source": [
@@ -107,9 +117,9 @@
   "execution_count": null,
   "id": "67ab8afa-f7c6-4fbf-b596-cb512da949da",
   "metadata": {
+    "tags": [],
    "id": "67ab8afa-f7c6-4fbf-b596-cb512da949da",
-    "outputId": "fd16b37f-cb76-40a9-b83f-eab58dd0d912",
-    "tags": []
+    "outputId": "fd16b37f-cb76-40a9-b83f-eab58dd0d912"
   },
   "outputs": [
    {
@@ -131,8 +141,8 @@
   "cell_type": "markdown",
   "id": "f6030187-0bd7-4798-8372-a265036af5e0",
   "metadata": {
-    "id": "f6030187-0bd7-4798-8372-a265036af5e0",
-    "tags": []
+    "tags": [],
+    "id": "f6030187-0bd7-4798-8372-a265036af5e0"
   },
   "source": [
    "## Example"
@@ -143,8 +153,8 @@
   "execution_count": null,
   "id": "aac9563e",
   "metadata": {
-    "id": "aac9563e",
-    "tags": []
+    "tags": [],
+    "id": "aac9563e"
   },
   "outputs": [],
   "source": [
@@ -159,8 +169,8 @@
   "execution_count": null,
   "id": "a3c3999a",
   "metadata": {
-    "id": "a3c3999a",
-    "tags": []
+    "tags": [],
+    "id": "a3c3999a"
   },
   "outputs": [],
   "source": [
@@ -179,8 +189,8 @@
   "execution_count": null,
   "id": "12eb86d8",
   "metadata": {
-    "id": "12eb86d8",
-    "tags": []
+    "tags": [],
+    "id": "12eb86d8"
   },
   "outputs": [],
   "source": [
@@ -225,49 +235,43 @@
  },
  {
   "cell_type": "markdown",
-   "id": "FheGPztJsrRB",
-   "metadata": {
-    "id": "FheGPztJsrRB"
-   },
   "source": [
    "# ElasticKnnSearch Class\n",
    "The `ElasticKnnSearch` implements features allowing storing vectors and documents in Elasticsearch for use with approximate [kNN search](https://www.elastic.co/guide/en/elasticsearch/reference/current/knn-search.html)"
-   ]
+   ],
+   "metadata": {
+    "id": "FheGPztJsrRB"
+   },
+   "id": "FheGPztJsrRB"
  },
  {
   "cell_type": "code",
-   "execution_count": null,
-   "id": "gRVcbh5zqCJQ",
+   "source": [
+    "!pip install langchain elasticsearch"
+   ],
   "metadata": {
    "id": "gRVcbh5zqCJQ"
   },
+   "execution_count": null,
   "outputs": [],
-   "source": [
-    "!pip install langchain elasticsearch"
-   ]
+   "id": "gRVcbh5zqCJQ"
  },
  {
   "cell_type": "code",
-   "execution_count": null,
-   "id": "TJtqiw5AqBp8",
-   "metadata": {
-    "id": "TJtqiw5AqBp8"
-   },
-   "outputs": [],
   "source": [
    "from langchain.vectorstores.elastic_vector_search import ElasticKnnSearch\n",
    "from langchain.embeddings import ElasticsearchEmbeddings\n",
    "import elasticsearch"
-   ]
+   ],
+   "metadata": {
+    "id": "TJtqiw5AqBp8"
+   },
+   "execution_count": null,
+   "outputs": [],
+   "id": "TJtqiw5AqBp8"
  },
  {
   "cell_type": "code",
-   "execution_count": null,
-   "id": "XHfC0As6qN3T",
-   "metadata": {
-    "id": "XHfC0As6qN3T"
-   },
-   "outputs": [],
   "source": [
    "# Initialize ElasticsearchEmbeddings\n",
    "model_id = \"<model_id_from_es>\"\n",
@@ -277,16 +281,16 @@
    "es_password = \"es_pass\"\n",
    "test_index = \"<index_name>\"\n",
    "# input_field = \"your_input_field\" # if different from 'text_field'"
-   ]
+   ],
+   "metadata": {
+    "id": "XHfC0As6qN3T"
+   },
+   "execution_count": null,
+   "outputs": [],
+   "id": "XHfC0As6qN3T"
  },
  {
   "cell_type": "code",
-   "execution_count": null,
-   "id": "UkTipx1lqc3h",
-   "metadata": {
-    "id": "UkTipx1lqc3h"
-   },
-   "outputs": [],
   "source": [
    "# Generate embedding object\n",
    "embeddings = ElasticsearchEmbeddings.from_credentials(\n",
@@ -296,16 +300,16 @@
    "    es_user=es_user,\n",
    "    es_password=es_password,\n",
    ")"
-   ]
+   ],
+   "metadata": {
+    "id": "UkTipx1lqc3h"
+   },
+   "execution_count": null,
+   "outputs": [],
+   "id": "UkTipx1lqc3h"
  },
  {
   "cell_type": "code",
-   "execution_count": null,
-   "id": "74psgD0oqjYK",
-   "metadata": {
-    "id": "74psgD0oqjYK"
-   },
-   "outputs": [],
   "source": [
    "# Initialize ElasticKnnSearch\n",
    "knn_search = ElasticKnnSearch(\n",
@@ -315,26 +319,26 @@
    "    index_name=test_index,\n",
    "    embedding=embeddings,\n",
    ")"
-   ]
+   ],
+   "metadata": {
+    "id": "74psgD0oqjYK"
+   },
+   "execution_count": null,
+   "outputs": [],
+   "id": "74psgD0oqjYK"
  },
  {
   "cell_type": "markdown",
-   "id": "7AfgIKLWqnQl",
+   "source": [
+    "## Test adding vectors"
+   ],
   "metadata": {
    "id": "7AfgIKLWqnQl"
   },
-   "source": [
-    "## Test adding vectors"
-   ]
+   "id": "7AfgIKLWqnQl"
  },
  {
   "cell_type": "code",
-   "execution_count": null,
-   "id": "yNUUIaL9qmze",
-   "metadata": {
-    "id": "yNUUIaL9qmze"
-   },
-   "outputs": [],
   "source": [
    "# Test `add_texts` method\n",
    "texts = [\"Hello, world!\", \"Machine learning is fun.\", \"I love Python.\"]\n",
@@ -347,26 +351,26 @@
    "    \"Python is great for data analysis.\",\n",
    "]\n",
    "knn_search.from_texts(new_texts, dims=dims)"
-   ]
+   ],
+   "metadata": {
+    "id": "yNUUIaL9qmze"
+   },
+   "execution_count": null,
+   "outputs": [],
+   "id": "yNUUIaL9qmze"
  },
  {
   "cell_type": "markdown",
-   "id": "0zdR-Iubquov",
+   "source": [
+    "## Test knn search using query vector builder "
+   ],
   "metadata": {
    "id": "0zdR-Iubquov"
   },
-   "source": [
-    "## Test knn search using query vector builder "
-   ]
+   "id": "0zdR-Iubquov"
  },
  {
   "cell_type": "code",
-   "execution_count": null,
-   "id": "bwR4jYvqqxTo",
-   "metadata": {
-    "id": "bwR4jYvqqxTo"
-   },
-   "outputs": [],
   "source": [
    "# Test `knn_search` method with model_id and query_text\n",
    "query = \"Hello\"\n",
@@ -383,26 +387,26 @@
    "print(\n",
    "    f\"The 'text' field value from the top hit is: '{hybrid_result['hits']['hits'][0]['_source']['text']}'\"\n",
    ")"
-   ]
+   ],
+   "metadata": {
+    "id": "bwR4jYvqqxTo"
+   },
+   "execution_count": null,
+   "outputs": [],
+   "id": "bwR4jYvqqxTo"
  },
  {
   "cell_type": "markdown",
-   "id": "ltXYqp0qqz7R",
+   "source": [
+    "## Test knn search using pre generated vector \n"
+   ],
   "metadata": {
    "id": "ltXYqp0qqz7R"
   },
-   "source": [
-    "## Test knn search using pre generated vector \n"
-   ]
+   "id": "ltXYqp0qqz7R"
  },
  {
   "cell_type": "code",
-   "execution_count": null,
-   "id": "O5COtpTqq23t",
-   "metadata": {
-    "id": "O5COtpTqq23t"
-   },
-   "outputs": [],
   "source": [
    "# Generate embedding for tests\n",
    "query_text = \"Hello\"\n",
@@ -424,26 +428,26 @@
    "print(\n",
    "    f\"The 'text' field value from the top hit is: '{knn_result['hits']['hits'][0]['_source']['text']}'\"\n",
    ")"
-   ]
+   ],
+   "metadata": {
+    "id": "O5COtpTqq23t"
+   },
+   "execution_count": null,
+   "outputs": [],
+   "id": "O5COtpTqq23t"
  },
  {
   "cell_type": "markdown",
-   "id": "0dnmimcJq42C",
+   "source": [
+    "## Test source option"
+   ],
   "metadata": {
    "id": "0dnmimcJq42C"
   },
-   "source": [
-    "## Test source option"
-   ]
+   "id": "0dnmimcJq42C"
  },
  {
   "cell_type": "code",
-   "execution_count": null,
-   "id": "v4_B72nHq7g1",
-   "metadata": {
-    "id": "v4_B72nHq7g1"
-   },
-   "outputs": [],
   "source": [
    "# Test `knn_search` method with model_id and query_text\n",
    "query = \"Hello\"\n",
@@ -456,26 +460,26 @@
    "    query=query, model_id=model_id, k=2, source=False\n",
    ")\n",
    "assert not \"_source\" in hybrid_result[\"hits\"][\"hits\"][0].keys()"
-   ]
+   ],
+   "metadata": {
+    "id": "v4_B72nHq7g1"
+   },
+   "execution_count": null,
+   "outputs": [],
+   "id": "v4_B72nHq7g1"
  },
  {
   "cell_type": "markdown",
-   "id": "teHgJgrlq-Jb",
+   "source": [
+    "## Test fields option "
+   ],
   "metadata": {
    "id": "teHgJgrlq-Jb"
   },
-   "source": [
-    "## Test fields option "
-   ]
+   "id": "teHgJgrlq-Jb"
  },
  {
   "cell_type": "code",
-   "execution_count": null,
-   "id": "utNBbpZYrAYW",
-   "metadata": {
-    "id": "utNBbpZYrAYW"
-   },
-   "outputs": [],
   "source": [
    "# Test `knn_search` method with model_id and query_text\n",
    "query = \"Hello\"\n",
@@ -488,72 +492,72 @@
    "    query=query, model_id=model_id, k=2, fields=[\"text\"]\n",
    ")\n",
    "assert \"text\" in hybrid_result[\"hits\"][\"hits\"][0][\"fields\"].keys()"
-   ]
+   ],
+   "metadata": {
+    "id": "utNBbpZYrAYW"
+   },
+   "execution_count": null,
+   "outputs": [],
+   "id": "utNBbpZYrAYW"
  },
  {
   "cell_type": "markdown",
-   "id": "hddsIFferBy1",
+   "source": [
+    "### Test with es client connection rather than cloud_id "
+   ],
   "metadata": {
    "id": "hddsIFferBy1"
   },
-   "source": [
-    "### Test with es client connection rather than cloud_id "
-   ]
+   "id": "hddsIFferBy1"
  },
  {
   "cell_type": "code",
-   "execution_count": null,
-   "id": "bXqrUnoirFia",
-   "metadata": {
-    "id": "bXqrUnoirFia"
-   },
-   "outputs": [],
   "source": [
    "# Create Elasticsearch connection\n",
    "es_connection = Elasticsearch(\n",
    "    hosts=[\"https://es_cluster_url:port\"], basic_auth=(\"user\", \"password\")\n",
    ")"
-   ]
+   ],
+   "metadata": {
+    "id": "bXqrUnoirFia"
+   },
+   "execution_count": null,
+   "outputs": [],
+   "id": "bXqrUnoirFia"
  },
  {
   "cell_type": "code",
-   "execution_count": null,
-   "id": "TIM__Hm8rSEW",
-   "metadata": {
-    "id": "TIM__Hm8rSEW"
-   },
-   "outputs": [],
   "source": [
    "# Instantiate ElasticsearchEmbeddings using es_connection\n",
    "embeddings = ElasticsearchEmbeddings.from_es_connection(\n",
    "    model_id,\n",
    "    es_connection,\n",
    ")"
-   ]
+   ],
+   "metadata": {
+    "id": "TIM__Hm8rSEW"
+   },
+   "execution_count": null,
+   "outputs": [],
+   "id": "TIM__Hm8rSEW"
  },
  {
   "cell_type": "code",
-   "execution_count": null,
-   "id": "1-CdnOrArVc_",
-   "metadata": {
-    "id": "1-CdnOrArVc_"
-   },
-   "outputs": [],
   "source": [
    "# Initialize ElasticKnnSearch\n",
    "knn_search = ElasticKnnSearch(\n",
    "    es_connection=es_connection, index_name=test_index, embedding=embeddings\n",
    ")"
-   ]
+   ],
+   "metadata": {
+    "id": "1-CdnOrArVc_"
+   },
+   "execution_count": null,
+   "outputs": [],
+   "id": "1-CdnOrArVc_"
  },
  {
   "cell_type": "code",
-   "execution_count": null,
-   "id": "0kgyaL6QrYVF",
-   "metadata": {
-    "id": "0kgyaL6QrYVF"
-   },
-   "outputs": [],
   "source": [
    "# Test `knn_search` method with model_id and query_text\n",
    "query = \"Hello\"\n",
@@ -562,13 +566,16 @@
    "print(\n",
    "    f\"The 'text' field value from the top hit is: '{knn_result['hits']['hits'][0]['_source']['text']}'\"\n",
    ")"
-   ]
+   ],
+   "metadata": {
+    "id": "0kgyaL6QrYVF"
+   },
+   "execution_count": null,
+   "outputs": [],
+   "id": "0kgyaL6QrYVF"
  }
 ],
 "metadata": {
-  "colab": {
-   "provenance": []
-  },
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
@@ -585,8 +592,11 @@
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.6"
+  },
+  "colab": {
+   "provenance": []
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
-}
+}
--- a/docs/extras/modules/data_connection/vectorstores/integrations/hologres.ipynb
+++ b/docs/extras/modules/data_connection/vectorstores/integrations/hologres.ipynb
@@ -16,15 +16,6 @@
    "Click [here](https://www.alibabacloud.com/zh/product/hologres) to fast deploy a Hologres cloud instance."
   ]
  },
-  {
-   "cell_type": "code",
-   "execution_count": 1,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "#!pip install psycopg2"
-   ]
-  },
  {
   "cell_type": "code",
   "execution_count": 1,
@@ -158,7 +149,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.10.6"
+   "version": "3.9.16"
  }
 },
 "nbformat": 4,
--- a/docs/extras/modules/data_connection/vectorstores/integrations/mongodb_atlas_vector_search.ipynb
+++ b/docs/extras/modules/data_connection/vectorstores/integrations/mongodb_atlas_vector_search.ipynb
@@ -5,7 +5,7 @@
   "id": "683953b3",
   "metadata": {},
   "source": [
-    "# MongoDB Atlas\n",
+    "# MongoDB Atlas Vector Search\n",
    "\n",
    ">[MongoDB Atlas](https://www.mongodb.com/docs/atlas/) is a fully-managed cloud database available in AWS , Azure, and GCP.  It now has support for native Vector Search on your MongoDB document data.\n",
    "\n",
@@ -214,7 +214,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.10.6"
+   "version": "3.9.1"
  }
 },
 "nbformat": 4,
--- a/docs/extras/modules/data_connection/vectorstores/integrations/opensearch.ipynb
+++ b/docs/extras/modules/data_connection/vectorstores/integrations/opensearch.ipynb
@@ -96,7 +96,7 @@
   "id": "01a9a035",
   "metadata": {},
   "source": [
-    "## similarity_search using Approximate k-NN\n",
+    "### similarity_search using Approximate k-NN\n",
    "\n",
    "`similarity_search` using `Approximate k-NN` Search with Custom Parameters"
   ]
@@ -182,7 +182,7 @@
   "id": "0d0cd877",
   "metadata": {},
   "source": [
-    "## similarity_search using Script Scoring\n",
+    "### similarity_search using Script Scoring\n",
    "\n",
    "`similarity_search` using `Script Scoring` with Custom Parameters"
   ]
@@ -221,7 +221,7 @@
   "id": "a4af96cc",
   "metadata": {},
   "source": [
-    "## similarity_search using Painless Scripting\n",
+    "### similarity_search using Painless Scripting\n",
    "\n",
    "`similarity_search` using `Painless Scripting` with Custom Parameters"
   ]
@@ -258,35 +258,32 @@
  },
  {
   "cell_type": "markdown",
-   "id": "4f8fb0d0",
-   "metadata": {},
   "source": [
-    "## Maximum marginal relevance search (MMR)\n",
+    "### Maximum marginal relevance search (MMR)\n",
    "If you’d like to look up for some similar documents, but you’d also like to receive diverse results, MMR is method you should consider. Maximal marginal relevance optimizes for similarity to query AND diversity among selected documents."
-   ]
+   ],
+   "metadata": {
+    "collapsed": false
+   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
-   "id": "ba85e092",
-   "metadata": {
-    "collapsed": false,
-    "jupyter": {
-     "outputs_hidden": false
-    }
-   },
   "outputs": [],
   "source": [
    "query = \"What did the president say about Ketanji Brown Jackson\"\n",
    "docs = docsearch.max_marginal_relevance_search(query, k=2, fetch_k=10, lambda_param=0.5)"
-   ]
+   ],
+   "metadata": {
+    "collapsed": false
+   }
  },
  {
   "cell_type": "markdown",
   "id": "73264864",
   "metadata": {},
   "source": [
-    "## Using a preexisting OpenSearch instance\n",
+    "### Using a preexisting OpenSearch instance\n",
    "\n",
    "It's also possible to use a preexisting OpenSearch instance with documents that already have vectors present."
   ]
@@ -333,7 +330,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.10.6"
+   "version": "3.11.3"
  }
 },
 "nbformat": 4,
--- a/docs/extras/modules/data_connection/vectorstores/integrations/pgvector.ipynb
+++ b/docs/extras/modules/data_connection/vectorstores/integrations/pgvector.ipynb
@@ -201,7 +201,14 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Similarity Search with Euclidean Distance (Default)"
+    "## Similarity search with score"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Similarity Search with Euclidean Distance (Default)"
   ]
  },
  {
@@ -296,14 +303,14 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Working with vectorstore"
+    "## Working with vectorstore in PG"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Uploading a vectorstore"
+    "### Uploading a vectorstore in PG "
   ]
  },
  {
@@ -329,7 +336,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "### Retrieving a vectorstore"
+    "### Retrieving a vectorstore in PG"
   ]
  },
  {
@@ -491,7 +498,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.10.6"
+   "version": "3.9.7"
  }
 },
 "nbformat": 4,
--- a/docs/extras/modules/data_connection/vectorstores/integrations/rockset_vector_database.ipynb
+++ b/docs/extras/modules/data_connection/vectorstores/integrations/rockset_vector_database.ipynb
@@ -1,18 +1,20 @@
 {
 "cells": [
  {
+   "attachments": {},
   "cell_type": "markdown",
   "id": "20b588b4",
   "metadata": {},
   "source": [
-    "# Rockset\n",
+    "# Rockset Vector Search\n",
    "\n",
-    ">[Rockset](https://rockset.com/product/) is a real-time analytics database service for serving low latency, high concurrency analytical queries at scale. It builds a Converged Index™ on structured and semi-structured data with an efficient store for vector embeddings. Its support for running SQL on schemaless data makes it a perfect choice for running vector search with metadata filters. \n",
+    "[Rockset](https://rockset.com/product/) is a real-time analytics database service for serving low latency, high concurrency analytical queries at scale. It builds a Converged Index™ on structured and semi-structured data with an efficient store for vector embeddings. Its support for running SQL on schemaless data makes it a perfect choice for running vector search with metadata filters. \n",
    "\n",
-    "This notebook demonstrates how to use `Rockset` as a vectorstore in langchain. To get started, make sure you have a `Rockset` account and an API key available."
+    "This notebook demonstrates how to use Rockset as a vectorstore in langchain. To get started, make sure you have a Rockset account and an API key available."
   ]
  },
  {
+   "attachments": {},
   "cell_type": "markdown",
   "id": "e290ddc0",
   "metadata": {},
@@ -23,6 +25,7 @@
   ]
  },
  {
+   "attachments": {},
   "cell_type": "markdown",
   "id": "7d77bbbe",
   "metadata": {},
@@ -49,6 +52,7 @@
   ]
  },
  {
+   "attachments": {},
   "cell_type": "markdown",
   "id": "7951c9cd",
   "metadata": {},
@@ -67,6 +71,7 @@
   ]
  },
  {
+   "attachments": {},
   "cell_type": "markdown",
   "id": "8600900d",
   "metadata": {},
@@ -75,11 +80,12 @@
   ]
  },
  {
+   "attachments": {},
   "cell_type": "markdown",
   "id": "3bf2f818",
   "metadata": {},
   "source": [
-    "## Example"
+    "## Using Rockset langchain vectorstore"
   ]
  },
  {
@@ -103,6 +109,7 @@
   ]
  },
  {
+   "attachments": {},
   "cell_type": "markdown",
   "id": "474636a2",
   "metadata": {},
@@ -131,6 +138,7 @@
   ]
  },
  {
+   "attachments": {},
   "cell_type": "markdown",
   "id": "1404cada",
   "metadata": {},
@@ -165,6 +173,7 @@
   ]
  },
  {
+   "attachments": {},
   "cell_type": "markdown",
   "id": "f1290844",
   "metadata": {},
@@ -196,6 +205,7 @@
   ]
  },
  {
+   "attachments": {},
   "cell_type": "markdown",
   "id": "5e15d630",
   "metadata": {},
@@ -233,6 +243,7 @@
   ]
  },
  {
+   "attachments": {},
   "cell_type": "markdown",
   "id": "0765b822",
   "metadata": {},
@@ -255,6 +266,7 @@
   ]
  },
  {
+   "attachments": {},
   "cell_type": "markdown",
   "id": "03fa12a9",
   "metadata": {},
@@ -265,14 +277,6 @@
    "\n",
    "Keep an eye on https://rockset.com/blog/introducing-vector-search-on-rockset/ for future updates in this space!"
   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "2763dddb-e87d-4d3b-b0bf-c246b0573d87",
-   "metadata": {},
-   "outputs": [],
-   "source": []
  }
 ],
 "metadata": {
@@ -291,7 +295,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.10.6"
+   "version": "3.9.6"
  }
 },
 "nbformat": 4,
--- a/docs/extras/modules/data_connection/vectorstores/integrations/singlestoredb.ipynb
+++ b/docs/extras/modules/data_connection/vectorstores/integrations/singlestoredb.ipynb
@@ -6,9 +6,7 @@
   "metadata": {},
   "source": [
    "# SingleStoreDB\n",
-    ">[SingleStoreDB](https://singlestore.com/) is a high-performance distributed SQL database that supports deployment both in the [cloud](https://www.singlestore.com/cloud/) and on-premises. It provides vector storage, and vector functions including [dot_product](https://docs.singlestore.com/managed-service/en/reference/sql-reference/vector-functions/dot_product.html) and [euclidean_distance](https://docs.singlestore.com/managed-service/en/reference/sql-reference/vector-functions/euclidean_distance.html), thereby supporting AI applications that require text similarity matching. \n",
-    "\n",
-    "This tutorial illustrates how to [work with vector data in SingleStoreDB](https://docs.singlestore.com/managed-service/en/developer-resources/functional-extensions/working-with-vector-data.html)."
+    "[SingleStoreDB](https://singlestore.com/) is a high-performance distributed SQL database that supports deployment both in the [cloud](https://www.singlestore.com/cloud/) and on-premises. It provides vector storage, and vector functions including [dot_product](https://docs.singlestore.com/managed-service/en/reference/sql-reference/vector-functions/dot_product.html) and [euclidean_distance](https://docs.singlestore.com/managed-service/en/reference/sql-reference/vector-functions/euclidean_distance.html), thereby supporting AI applications that require text similarity matching. This tutorial illustrates how to [work with vector data in SingleStoreDB](https://docs.singlestore.com/managed-service/en/developer-resources/functional-extensions/working-with-vector-data.html)."
   ]
  },
  {
@@ -131,7 +129,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.10.6"
+   "version": "3.9.2"
  }
 },
 "nbformat": 4,
--- a/docs/extras/modules/data_connection/vectorstores/integrations/sklearn.ipynb
+++ b/docs/extras/modules/data_connection/vectorstores/integrations/sklearn.ipynb
@@ -1,12 +1,13 @@
 {
 "cells": [
  {
+   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "# scikit-learn\n",
+    "# SKLearnVectorStore\n",
    "\n",
-    ">[scikit-learn](https://scikit-learn.org/stable/) is an open source collection of machine learning algorithms, including some implementations of the [k nearest neighbors](https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.NearestNeighbors.html). `SKLearnVectorStore` wraps this implementation and adds the possibility to persist the vector store in json, bson (binary json) or Apache Parquet format.\n",
+    "[scikit-learn](https://scikit-learn.org/stable/) is an open source collection of machine learning algorithms, including some implementations of the [k nearest neighbors](https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.NearestNeighbors.html). `SKLearnVectorStore` wraps this implementation and adds the possibility to persist the vector store in json, bson (binary json) or Apache Parquet format.\n",
    "\n",
    "This notebook shows how to use the `SKLearnVectorStore` vector database."
   ]
@@ -27,6 +28,7 @@
   ]
  },
  {
+   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
@@ -46,6 +48,7 @@
   ]
  },
  {
+   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
@@ -73,6 +76,7 @@
   ]
  },
  {
+   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
@@ -116,6 +120,7 @@
   ]
  },
  {
+   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
@@ -185,6 +190,7 @@
   ]
  },
  {
+   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
@@ -203,7 +209,7 @@
 ],
 "metadata": {
  "kernelspec": {
-   "display_name": "Python 3 (ipykernel)",
+   "display_name": "sofia",
   "language": "python",
   "name": "python3"
  },
@@ -217,9 +223,10 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.10.6"
-  }
+   "version": "3.8.16"
+  },
+  "orig_nbformat": 4
 },
 "nbformat": 4,
- "nbformat_minor": 4
+ "nbformat_minor": 2
 }
--- a/docs/extras/modules/data_connection/vectorstores/integrations/starrocks.ipynb
+++ b/docs/extras/modules/data_connection/vectorstores/integrations/starrocks.ipynb
@@ -7,10 +7,11 @@
   "source": [
    "# StarRocks\n",
    "\n",
-    ">[StarRocks](https://www.starrocks.io/) is a High-Performance Analytical Database.\n",
-    "`StarRocks` is a next-gen sub-second MPP database for full analytics scenarios, including multi-dimensional analytics, real-time analytics and ad-hoc query.\n",
+    "[StarRocks | A High-Performance Analytical Database](https://www.starrocks.io/)\n",
    "\n",
-    ">Usually `StarRocks` is categorized into OLAP, and it has showed excellent performance in [ClickBench — a Benchmark For Analytical DBMS](https://benchmark.clickhouse.com/). Since it has a super-fast vectorized execution engine, it could also be used as a fast vectordb.\n",
+    "StarRocks is a next-gen sub-second MPP database for full analytics scenarios, including multi-dimensional analytics, real-time analytics and ad-hoc query.\n",
+    "\n",
+    "Usually StarRocks is categorized into OLAP, and it has showed excellent performance in [ClickBench — a Benchmark For Analytical DBMS](https://benchmark.clickhouse.com/). Since it has a super-fast vectorized execution engine, it could also be used as a fast vectordb.\n",
    "\n",
    "Here we'll show how to use the StarRocks Vector Store."
   ]
@@ -20,17 +21,8 @@
   "id": "1685854f",
   "metadata": {},
   "source": [
-    "## Setup"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 1,
-   "id": "311d44bb-4aca-4f3b-8f97-5e1f29238e40",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "#!pip install pymysql"
+    "\n",
+    "## Import all used modules"
   ]
  },
  {
@@ -313,7 +305,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.10.6"
+   "version": "3.11.3"
  }
 },
 "nbformat": 4,
--- a/docs/extras/modules/data_connection/vectorstores/integrations/tigris.ipynb
+++ b/docs/extras/modules/data_connection/vectorstores/integrations/tigris.ipynb
@@ -2,67 +2,68 @@
 "cells": [
  {
   "cell_type": "markdown",
-   "metadata": {},
   "source": [
    "# Tigris\n",
    "\n",
    "> [Tigris](htttps://tigrisdata.com) is an open source Serverless NoSQL Database and Search Platform designed to simplify building high-performance vector search applications.\n",
-    "> `Tigris` eliminates the infrastructure complexity of managing, operating, and synchronizing multiple tools, allowing you to focus on building great applications instead."
-   ]
+    "> Tigris eliminates the infrastructure complexity of managing, operating, and synchronizing multiple tools, allowing you to focus on building great applications instead."
+   ],
+   "metadata": {
+    "collapsed": false
+   }
  },
  {
   "cell_type": "markdown",
-   "metadata": {},
   "source": [
    "This notebook guides you how to use Tigris as your VectorStore"
-   ]
+   ],
+   "metadata": {
+    "collapsed": false
+   }
  },
  {
   "cell_type": "markdown",
-   "metadata": {},
   "source": [
    "**Pre requisites**\n",
    "1. An OpenAI account. You can sign up for an account [here](https://platform.openai.com/)\n",
    "2. [Sign up for a free Tigris account](https://console.preview.tigrisdata.cloud). Once you have signed up for the Tigris account, create a new project called `vectordemo`. Next, make a note of the *Uri* for the region you've created your project in, the **clientId** and **clientSecret**. You can get all this information from the **Application Keys** section of the project."
-   ]
+   ],
+   "metadata": {
+    "collapsed": false
+   }
  },
  {
   "cell_type": "markdown",
-   "metadata": {},
   "source": [
    "Let's first install our dependencies:"
-   ]
+   ],
+   "metadata": {
+    "collapsed": false
+   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
-   "metadata": {
-    "collapsed": false,
-    "jupyter": {
-     "outputs_hidden": false
-    }
-   },
   "outputs": [],
   "source": [
    "!pip install tigrisdb openapi-schema-pydantic openai tiktoken"
-   ]
+   ],
+   "metadata": {
+    "collapsed": false
+   }
  },
  {
   "cell_type": "markdown",
-   "metadata": {},
   "source": [
    "We will load the `OpenAI` api key and `Tigris` credentials in our environment"
-   ]
+   ],
+   "metadata": {
+    "collapsed": false
+   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
-   "metadata": {
-    "collapsed": false,
-    "jupyter": {
-     "outputs_hidden": false
-    }
-   },
   "outputs": [],
   "source": [
    "import os\n",
@@ -72,42 +73,38 @@
    "os.environ[\"TIGRIS_PROJECT\"] = getpass.getpass(\"Tigris Project Name:\")\n",
    "os.environ[\"TIGRIS_CLIENT_ID\"] = getpass.getpass(\"Tigris Client Id:\")\n",
    "os.environ[\"TIGRIS_CLIENT_SECRET\"] = getpass.getpass(\"Tigris Client Secret:\")"
-   ]
+   ],
+   "metadata": {
+    "collapsed": false
+   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
-   "metadata": {
-    "collapsed": false,
-    "jupyter": {
-     "outputs_hidden": false
-    }
-   },
   "outputs": [],
   "source": [
    "from langchain.embeddings.openai import OpenAIEmbeddings\n",
    "from langchain.text_splitter import CharacterTextSplitter\n",
    "from langchain.vectorstores import Tigris\n",
    "from langchain.document_loaders import TextLoader"
-   ]
+   ],
+   "metadata": {
+    "collapsed": false
+   }
  },
  {
   "cell_type": "markdown",
-   "metadata": {},
   "source": [
    "### Initialize Tigris vector store\n",
    "Let's import our test dataset:"
-   ]
+   ],
+   "metadata": {
+    "collapsed": false
+   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
-   "metadata": {
-    "collapsed": false,
-    "jupyter": {
-     "outputs_hidden": false
-    }
-   },
   "outputs": [],
   "source": [
    "loader = TextLoader(\"../../../state_of_the_union.txt\")\n",
@@ -116,89 +113,87 @@
    "docs = text_splitter.split_documents(documents)\n",
    "\n",
    "embeddings = OpenAIEmbeddings()"
-   ]
+   ],
+   "metadata": {
+    "collapsed": false
+   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
-   "metadata": {
-    "collapsed": false,
-    "jupyter": {
-     "outputs_hidden": false
-    }
-   },
   "outputs": [],
   "source": [
    "vector_store = Tigris.from_documents(docs, embeddings, index_name=\"my_embeddings\")"
-   ]
+   ],
+   "metadata": {
+    "collapsed": false
+   }
  },
  {
   "cell_type": "markdown",
-   "metadata": {},
   "source": [
    "### Similarity Search"
-   ]
+   ],
+   "metadata": {
+    "collapsed": false
+   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
-   "metadata": {
-    "collapsed": false,
-    "jupyter": {
-     "outputs_hidden": false
-    }
-   },
   "outputs": [],
   "source": [
    "query = \"What did the president say about Ketanji Brown Jackson\"\n",
    "found_docs = vector_store.similarity_search(query)\n",
    "print(found_docs)"
-   ]
+   ],
+   "metadata": {
+    "collapsed": false
+   }
  },
  {
   "cell_type": "markdown",
-   "metadata": {},
   "source": [
    "### Similarity Search with score (vector distance)"
-   ]
+   ],
+   "metadata": {
+    "collapsed": false
+   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
-   "metadata": {
-    "collapsed": false,
-    "jupyter": {
-     "outputs_hidden": false
-    }
-   },
   "outputs": [],
   "source": [
    "query = \"What did the president say about Ketanji Brown Jackson\"\n",
    "result = vector_store.similarity_search_with_score(query)\n",
    "for doc, score in result:\n",
    "    print(f\"document={doc}, score={score}\")"
-   ]
+   ],
+   "metadata": {
+    "collapsed": false
+   }
  }
 ],
 "metadata": {
  "kernelspec": {
-   "display_name": "Python 3 (ipykernel)",
+   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
-    "version": 3
+    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.10.6"
+   "pygments_lexer": "ipython2",
+   "version": "2.7.6"
  }
 },
 "nbformat": 4,
- "nbformat_minor": 4
+ "nbformat_minor": 0
 }
--- a/docs/extras/modules/data_connection/vectorstores/integrations/typesense.ipynb
+++ b/docs/extras/modules/data_connection/vectorstores/integrations/typesense.ipynb
@@ -2,7 +2,6 @@
 "cells": [
  {
   "cell_type": "markdown",
-   "metadata": {},
   "source": [
    "# Typesense\n",
    "\n",
@@ -11,105 +10,97 @@
    "> Typesense focuses on performance by storing the entire index in RAM (with a backup on disk) and also focuses on providing an out-of-the-box developer experience by simplifying available options and setting good defaults.\n",
    ">\n",
    "> It also lets you combine attribute-based filtering together with vector queries, to fetch the most relevant documents."
-   ]
+   ],
+   "metadata": {
+    "collapsed": false
+   }
  },
  {
   "cell_type": "markdown",
-   "metadata": {},
   "source": [
    "This notebook shows you how to use Typesense as your VectorStore."
-   ]
+   ],
+   "metadata": {
+    "collapsed": false
+   }
  },
  {
   "cell_type": "markdown",
-   "metadata": {},
   "source": [
    "Let's first install our dependencies:"
-   ]
+   ],
+   "metadata": {
+    "collapsed": false
+   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
-   "metadata": {
-    "collapsed": false,
-    "jupyter": {
-     "outputs_hidden": false
-    }
-   },
   "outputs": [],
   "source": [
    "!pip install typesense openapi-schema-pydantic openai tiktoken"
-   ]
+   ],
+   "metadata": {
+    "collapsed": false
+   }
  },
  {
   "cell_type": "markdown",
-   "metadata": {},
   "source": [
    "We want to use `OpenAIEmbeddings` so we have to get the OpenAI API Key."
-   ]
+   ],
+   "metadata": {
+    "collapsed": false
+   }
  },
  {
   "cell_type": "code",
   "execution_count": 2,
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2023-05-23T22:48:02.968822Z",
-     "start_time": "2023-05-23T22:47:48.574094Z"
-    },
-    "collapsed": false,
-    "jupyter": {
-     "outputs_hidden": false
-    }
-   },
   "outputs": [],
   "source": [
    "import os\n",
    "import getpass\n",
    "\n",
    "os.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"OpenAI API Key:\")"
-   ]
+   ],
+   "metadata": {
+    "collapsed": false,
+    "ExecuteTime": {
+     "end_time": "2023-05-23T22:48:02.968822Z",
+     "start_time": "2023-05-23T22:47:48.574094Z"
+    }
+   }
  },
  {
   "cell_type": "code",
   "execution_count": 6,
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2023-05-23T22:50:34.775893Z",
-     "start_time": "2023-05-23T22:50:34.771889Z"
-    },
-    "collapsed": false,
-    "jupyter": {
-     "outputs_hidden": false
-    }
-   },
   "outputs": [],
   "source": [
    "from langchain.embeddings.openai import OpenAIEmbeddings\n",
    "from langchain.text_splitter import CharacterTextSplitter\n",
    "from langchain.vectorstores import Typesense\n",
    "from langchain.document_loaders import TextLoader"
-   ]
+   ],
+   "metadata": {
+    "collapsed": false,
+    "ExecuteTime": {
+     "end_time": "2023-05-23T22:50:34.775893Z",
+     "start_time": "2023-05-23T22:50:34.771889Z"
+    }
+   }
  },
  {
   "cell_type": "markdown",
-   "metadata": {},
   "source": [
    "Let's import our test dataset:"
-   ]
+   ],
+   "metadata": {
+    "collapsed": false
+   }
  },
  {
   "cell_type": "code",
   "execution_count": 19,
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2023-05-23T22:56:19.093489Z",
-     "start_time": "2023-05-23T22:56:19.089Z"
-    },
-    "collapsed": false,
-    "jupyter": {
-     "outputs_hidden": false
-    }
-   },
   "outputs": [],
   "source": [
    "loader = TextLoader(\"../../../state_of_the_union.txt\")\n",
@@ -118,17 +109,18 @@
    "docs = text_splitter.split_documents(documents)\n",
    "\n",
    "embeddings = OpenAIEmbeddings()"
-   ]
+   ],
+   "metadata": {
+    "collapsed": false,
+    "ExecuteTime": {
+     "end_time": "2023-05-23T22:56:19.093489Z",
+     "start_time": "2023-05-23T22:56:19.089Z"
+    }
+   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
-   "metadata": {
-    "collapsed": false,
-    "jupyter": {
-     "outputs_hidden": false
-    }
-   },
   "outputs": [],
   "source": [
    "docsearch = Typesense.from_documents(\n",
@@ -142,103 +134,98 @@
    "        \"typesense_collection_name\": \"lang-chain\",\n",
    "    },\n",
    ")"
-   ]
+   ],
+   "metadata": {
+    "collapsed": false
+   }
  },
  {
   "cell_type": "markdown",
-   "metadata": {},
   "source": [
    "## Similarity Search"
-   ]
+   ],
+   "metadata": {
+    "collapsed": false
+   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
-   "metadata": {
-    "collapsed": false,
-    "jupyter": {
-     "outputs_hidden": false
-    }
-   },
   "outputs": [],
   "source": [
    "query = \"What did the president say about Ketanji Brown Jackson\"\n",
    "found_docs = docsearch.similarity_search(query)"
-   ]
+   ],
+   "metadata": {
+    "collapsed": false
+   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
-   "metadata": {
-    "collapsed": false,
-    "jupyter": {
-     "outputs_hidden": false
-    }
-   },
   "outputs": [],
   "source": [
    "print(found_docs[0].page_content)"
-   ]
+   ],
+   "metadata": {
+    "collapsed": false
+   }
  },
  {
   "cell_type": "markdown",
-   "metadata": {},
   "source": [
    "## Typesense as a Retriever\n",
    "\n",
    "Typesense, as all the other vector stores, is a LangChain Retriever, by using cosine similarity."
-   ]
+   ],
+   "metadata": {
+    "collapsed": false
+   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
-   "metadata": {
-    "collapsed": false,
-    "jupyter": {
-     "outputs_hidden": false
-    }
-   },
   "outputs": [],
   "source": [
    "retriever = docsearch.as_retriever()\n",
    "retriever"
-   ]
+   ],
+   "metadata": {
+    "collapsed": false
+   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
-   "metadata": {
-    "collapsed": false,
-    "jupyter": {
-     "outputs_hidden": false
-    }
-   },
   "outputs": [],
   "source": [
    "query = \"What did the president say about Ketanji Brown Jackson\"\n",
    "retriever.get_relevant_documents(query)[0]"
-   ]
+   ],
+   "metadata": {
+    "collapsed": false
+   }
  }
 ],
 "metadata": {
  "kernelspec": {
-   "display_name": "Python 3 (ipykernel)",
+   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
-    "version": 3
+    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.10.6"
+   "pygments_lexer": "ipython2",
+   "version": "2.7.6"
  }
 },
 "nbformat": 4,
- "nbformat_minor": 4
+ "nbformat_minor": 0
 }
--- a/langchain/callbacks/arize_callback.py
+++ b/langchain/callbacks/arize_callback.py
@@ -1,3 +1,4 @@
+import uuid
 from datetime import datetime
 from typing import Any, Dict, List, Optional, Union

@@ -32,7 +33,6 @@ class ArizeCallbackHandler(BaseCallbackHandler):
        self.prompt_tokens = 0
        self.completion_tokens = 0
        self.total_tokens = 0
-        self.step = 0

        from arize.pandas.embeddings import EmbeddingGenerator, UseCases
        from arize.pandas.logger import Client
@@ -84,10 +84,11 @@ class ArizeCallbackHandler(BaseCallbackHandler):
                self.total_tokens
            ) = self.completion_tokens = 0  # assign default value

+        i = 0
+
        for generations in response.generations:
            for generation in generations:
-                prompt = self.prompt_records[self.step]
-                self.step = self.step + 1
+                prompt = self.prompt_records[i]
                prompt_embedding = pd.Series(
                    self.generator.generate_embeddings(
                        text_col=pd.Series(prompt.replace("\n", " "))
@@ -101,6 +102,7 @@ class ArizeCallbackHandler(BaseCallbackHandler):
                        text_col=pd.Series(generation.text.replace("\n", " "))
                    ).reset_index(drop=True)
                )
+                str(uuid.uuid4())
                pred_timestamp = datetime.now().timestamp()

                # Define the columns and data
@@ -163,6 +165,8 @@ class ArizeCallbackHandler(BaseCallbackHandler):
                else:
                    print(f'❌ Logging failed "{response_from_arize.text}"')

+                i = i + 1
+
    def on_llm_error(
        self, error: Union[Exception, KeyboardInterrupt], **kwargs: Any
    ) -> None:
--- a/langchain/callbacks/tracers/base.py
+++ b/langchain/callbacks/tracers/base.py
@@ -1,7 +1,6 @@
 """Base interfaces for tracing runs."""
 from __future__ import annotations

-import logging
 from abc import ABC, abstractmethod
 from datetime import datetime
 from typing import Any, Dict, List, Optional, Union
@@ -11,8 +10,6 @@ from langchain.callbacks.base import BaseCallbackHandler
 from langchain.callbacks.tracers.schemas import Run, RunTypeEnum
 from langchain.schema import LLMResult

-logger = logging.getLogger(__name__)
-

 class TracerException(Exception):
    """Base class for exceptions in tracers module."""
@@ -44,7 +41,9 @@ class BaseTracer(BaseCallbackHandler, ABC):
            if parent_run:
                self._add_child_run(parent_run, run)
            else:
-                logger.warning(f"Parent run with UUID {run.parent_run_id} not found.")
+                raise TracerException(
+                    f"Parent run with UUID {run.parent_run_id} not found."
+                )
        self.run_map[str(run.id)] = run

    def _end_trace(self, run: Run) -> None:
@@ -54,8 +53,10 @@ class BaseTracer(BaseCallbackHandler, ABC):
        else:
            parent_run = self.run_map.get(str(run.parent_run_id))
            if parent_run is None:
-                logger.warning(f"Parent run with UUID {run.parent_run_id} not found.")
-            elif (
+                raise TracerException(
+                    f"Parent run with UUID {run.parent_run_id} not found."
+                )
+            if (
                run.child_execution_order is not None
                and parent_run.child_execution_order is not None
                and run.child_execution_order > parent_run.child_execution_order
@@ -70,8 +71,7 @@ class BaseTracer(BaseCallbackHandler, ABC):

        parent_run = self.run_map.get(parent_run_id)
        if parent_run is None:
-            logger.warning(f"Parent run with UUID {parent_run_id} not found.")
-            return 1
+            raise TracerException(f"Parent run with UUID {parent_run_id} not found.")
        if parent_run.child_execution_order is None:
            raise TracerException(
                f"Parent run with UUID {parent_run_id} has no child execution order."
--- a/langchain/callbacks/tracers/evaluation.py
+++ b/langchain/callbacks/tracers/evaluation.py
@@ -1,84 +0,0 @@
-"""A tracer that runs evaluators over completed runs."""
-from concurrent.futures import Future, ThreadPoolExecutor, wait
-from typing import Any, Optional, Sequence, Set, Union
-from uuid import UUID
-
-from langchainplus_sdk import LangChainPlusClient, RunEvaluator
-
-from langchain.callbacks.tracers.base import BaseTracer
-from langchain.callbacks.tracers.schemas import Run
-
-
-class EvaluatorCallbackHandler(BaseTracer):
-    """A tracer that runs a run evaluator whenever a run is persisted.
-
-    Parameters
-    ----------
-    evaluators : Sequence[RunEvaluator]
-        The run evaluators to apply to all top level runs.
-    max_workers : int, optional
-        The maximum number of worker threads to use for running the evaluators.
-        If not specified, it will default to the number of evaluators.
-    client : LangChainPlusClient, optional
-        The LangChainPlusClient instance to use for evaluating the runs.
-        If not specified, a new instance will be created.
-    example_id : Union[UUID, str], optional
-        The example ID to be associated with the runs.
-
-    Attributes
-    ----------
-    example_id : Union[UUID, None]
-        The example ID associated with the runs.
-    client : LangChainPlusClient
-        The LangChainPlusClient instance used for evaluating the runs.
-    evaluators : Sequence[RunEvaluator]
-        The sequence of run evaluators to be executed.
-    executor : ThreadPoolExecutor
-        The thread pool executor used for running the evaluators.
-    futures : Set[Future]
-        The set of futures representing the running evaluators.
-    """
-
-    name = "evaluator_callback_handler"
-
-    def __init__(
-        self,
-        evaluators: Sequence[RunEvaluator],
-        max_workers: Optional[int] = None,
-        client: Optional[LangChainPlusClient] = None,
-        example_id: Optional[Union[UUID, str]] = None,
-        **kwargs: Any
-    ) -> None:
-        super().__init__(**kwargs)
-        self.example_id = (
-            UUID(example_id) if isinstance(example_id, str) else example_id
-        )
-        self.client = client or LangChainPlusClient()
-        self.evaluators = evaluators
-        self.executor = ThreadPoolExecutor(
-            max_workers=max(max_workers or len(evaluators), 1)
-        )
-        self.futures: Set[Future] = set()
-
-    def _persist_run(self, run: Run) -> None:
-        """Run the evaluator on the run.
-
-        Parameters
-        ----------
-        run : Run
-            The run to be evaluated.
-
-        """
-        run_ = run.copy()
-        run_.reference_example_id = self.example_id
-        for evaluator in self.evaluators:
-            self.futures.add(
-                self.executor.submit(self.client.evaluate_run, run_, evaluator)
-            )
-
-    def wait_for_futures(self) -> None:
-        """Wait for all futures to complete."""
-        futures = list(self.futures)
-        wait(futures)
-        for future in futures:
-            self.futures.remove(future)
--- a/langchain/callbacks/tracers/langchain.py
+++ b/langchain/callbacks/tracers/langchain.py
@@ -5,11 +5,9 @@ import logging
 import os
 from concurrent.futures import Future, ThreadPoolExecutor, wait
 from datetime import datetime
-from typing import Any, Dict, List, Optional, Set, Union
+from typing import TYPE_CHECKING, Any, Dict, List, Optional, Set, Union
 from uuid import UUID

-from langchainplus_sdk import LangChainPlusClient
-
 from langchain.callbacks.tracers.base import BaseTracer
 from langchain.callbacks.tracers.schemas import (
    Run,
@@ -19,11 +17,27 @@ from langchain.callbacks.tracers.schemas import (
 from langchain.env import get_runtime_environment
 from langchain.schema import BaseMessage, messages_to_dict

+if TYPE_CHECKING:
+    import langsmith
+    from langsmith import Client as LangSmithClient
+
+
 logger = logging.getLogger(__name__)
 _LOGGED = set()
 _TRACERS: List[LangChainTracer] = []


+def _lazy_import_langsmith() -> langsmith:
+    try:
+        import langsmith
+    except ImportError:
+        raise ImportError(
+            "Please install langsmith to use the LangChainTracer."
+            " You can do this by running `pip install langsmith`."
+        )
+    return langsmith
+
+
 def log_error_once(method: str, exception: Exception) -> None:
    """Log an error once."""
    global _LOGGED
@@ -46,7 +60,7 @@ class LangChainTracer(BaseTracer):
        self,
        example_id: Optional[Union[UUID, str]] = None,
        project_name: Optional[str] = None,
-        client: Optional[LangChainPlusClient] = None,
+        client: Optional[LangSmithClient] = None,
        **kwargs: Any,
    ) -> None:
        """Initialize the LangChain tracer."""
@@ -60,7 +74,11 @@ class LangChainTracer(BaseTracer):
        )
        # set max_workers to 1 to process tasks in order
        self.executor = ThreadPoolExecutor(max_workers=1)
-        self.client = client or LangChainPlusClient()
+        if client is not None:
+            self.client = client
+        else:
+            langsmith = _lazy_import_langsmith()
+            self.client = langsmith.Client()
        self._futures: Set[Future] = set()
        global _TRACERS
        _TRACERS.append(self)
--- a/langchain/callbacks/tracers/run_collector.py
+++ b/langchain/callbacks/tracers/run_collector.py
@@ -1,52 +1,20 @@
 """A tracer that collects all nested runs in a list."""
-
-from typing import Any, List, Optional, Union
-from uuid import UUID
+from typing import Any, List

 from langchain.callbacks.tracers.base import BaseTracer
 from langchain.callbacks.tracers.schemas import Run


 class RunCollectorCallbackHandler(BaseTracer):
-    """
-    A tracer that collects all nested runs in a list.
+    """A tracer that collects all nested runs in a list.

-    This tracer is useful for inspection and evaluation purposes.
-
-    Parameters
-    ----------
-    example_id : Optional[Union[UUID, str]], default=None
-        The ID of the example being traced. It can be either a UUID or a string.
-    """
+    Useful for inspection and for evaluation."""

    name = "run-collector_callback_handler"

-    def __init__(
-        self, example_id: Optional[Union[UUID, str]] = None, **kwargs: Any
-    ) -> None:
-        """
-        Initialize the RunCollectorCallbackHandler.
-
-        Parameters
-        ----------
-        example_id : Optional[Union[UUID, str]], default=None
-            The ID of the example being traced. It can be either a UUID or a string.
-        """
+    def __init__(self, **kwargs: Any) -> None:
        super().__init__(**kwargs)
-        self.example_id = (
-            UUID(example_id) if isinstance(example_id, str) else example_id
-        )
        self.traced_runs: List[Run] = []

    def _persist_run(self, run: Run) -> None:
-        """
-        Persist a run by adding it to the traced_runs list.
-
-        Parameters
-        ----------
-        run : Run
-            The run to be persisted.
-        """
-        run_ = run.copy()
-        run_.reference_example_id = self.example_id
-        self.traced_runs.append(run_)
+        self.traced_runs.append(run)
--- a/langchain/callbacks/tracers/schemas.py
+++ b/langchain/callbacks/tracers/schemas.py
@@ -2,11 +2,10 @@
 from __future__ import annotations

 import datetime
-from typing import Any, Dict, List, Optional
+from enum import Enum
+from typing import Any, Dict, List, Optional, Union
 from uuid import UUID

-from langchainplus_sdk.schemas import RunBase as BaseRunV2
-from langchainplus_sdk.schemas import RunTypeEnum
 from pydantic import BaseModel, Field, root_validator

 from langchain.schema import LLMResult
@@ -88,13 +87,58 @@ class ToolRun(BaseRun):
 # Begin V2 API Schemas


-class Run(BaseRunV2):
+class RunTypeEnum(str, Enum):
+    """Enum for run types."""
+
+    tool = "tool"
+    chain = "chain"
+    llm = "llm"
+
+
+class Run(BaseModel):
    """Run schema for the V2 API in the Tracer."""

+    id: UUID
+    """The UUID of the run."""
+    name: str
+    """The name of the run, usually taken from the serialized object's ID."""
+    start_time: datetime.datetime
+    """The start time of the run."""
+    run_type: Union[RunTypeEnum, str]
+    """The type of run."""
+    inputs: dict
+    """The inputs to the run."""
    execution_order: int
+    """The order in which this run was executed in a run tree."""
    child_execution_order: int
+    """The next execution order of child runs."""
+    end_time: Optional[datetime.datetime] = None
+    """The end time of the run."""
+    extra: Optional[dict] = None
+    """Extra information about the run."""
+    error: Optional[str] = None
+    """The error message of the run, if any."""
+    serialized: dict = Field(default_factory=dict)
+    """The serialized object that was run."""
+    events: Optional[List[Dict]] = None
+    """The events that occurred during the run."""
+    outputs: Optional[dict] = None
+    """The outputs of the run."""
+    reference_example_id: Optional[UUID] = None
+    """The ID of the reference example that was used to run the run, if this
+    run was performed during an evaluation."""
+    parent_run_id: Optional[UUID] = None
+    """The ID of the parent run if this is not a root."""
+    tags: List[str] = Field(default_factory=list)
+    """Any tags assigned to the run."""
+    session_id: Optional[UUID] = None
+    """The Project / Session ID this run belongs to."""
+    child_run_ids: Optional[List[UUID]] = None
+    """The IDs of the child runs."""
    child_runs: List[Run] = Field(default_factory=list)
-    tags: Optional[List[str]] = Field(default_factory=list)
+    """The child runs. These are used during initial tracing."""
+    feedback_stats: Optional[Dict[str, Any]] = None
+    """Any feedback statistics for this run."""

    @root_validator(pre=True)
    def assign_name(cls, values: dict) -> dict:
--- a/langchain/callbacks/tracers/wandb.py
+++ b/langchain/callbacks/tracers/wandb.py
@@ -104,12 +104,13 @@ def _convert_tool_run_to_wb_span(trace_tree: Any, run: Run) -> trace_tree.Span:
 def _convert_run_to_wb_span(trace_tree: Any, run: Run) -> trace_tree.Span:
    attributes = {**run.extra} if run.extra else {}
    attributes["execution_order"] = run.execution_order
+    end_time = run.end_time if run.end_time is not None else run.start_time

    return trace_tree.Span(
        span_id=str(run.id) if run.id is not None else None,
        name=run.serialized.get("name"),
        start_time_ms=int(run.start_time.timestamp() * 1000),
-        end_time_ms=int(run.end_time.timestamp() * 1000),
+        end_time_ms=int(end_time.timestamp() * 1000),
        status_code=trace_tree.StatusCode.SUCCESS
        if run.error is None
        else trace_tree.StatusCode.ERROR,
--- a/langchain/chains/openai_functions/openapi.py
+++ b/langchain/chains/openai_functions/openapi.py
@@ -161,7 +161,7 @@ def openapi_spec_to_openai_fn(
        method = _name_to_call_map[name]["method"]
        url = _name_to_call_map[name]["url"]
        path_params = fn_args.pop("path_params", {})
-        url = _format_url(url, path_params)
+        _format_url(url, path_params)
        if "data" in fn_args and isinstance(fn_args["data"], dict):
            fn_args["data"] = json.dumps(fn_args["data"])
        _kwargs = {**fn_args, **kwargs}
--- a/langchain/client/runner_utils.py
+++ b/langchain/client/runner_utils.py
@@ -1,5 +1,4 @@
-"""Utilities for running language models or Chains over datasets."""
-
+"""Utilities for running LLMs/Chains over datasets."""
 from __future__ import annotations

 import asyncio
@@ -7,6 +6,7 @@ import functools
 import logging
 from datetime import datetime
 from typing import (
+    TYPE_CHECKING,
    Any,
    Callable,
    Coroutine,
@@ -14,18 +14,12 @@ from typing import (
    Iterator,
    List,
    Optional,
-    Sequence,
    Union,
 )

-from langchainplus_sdk import LangChainPlusClient, RunEvaluator
-from langchainplus_sdk.schemas import Example
-
 from langchain.base_language import BaseLanguageModel
 from langchain.callbacks.base import BaseCallbackHandler
 from langchain.callbacks.manager import Callbacks
-from langchain.callbacks.tracers.base import BaseTracer
-from langchain.callbacks.tracers.evaluation import EvaluatorCallbackHandler
 from langchain.callbacks.tracers.langchain import LangChainTracer
 from langchain.chains.base import Chain
 from langchain.chat_models.base import BaseChatModel
@@ -39,27 +33,33 @@ from langchain.schema import (
    messages_from_dict,
 )

+if TYPE_CHECKING:
+    import langsmith
+    from langsmith import Client as LangSmithClient
+    from langsmith.schemas import Example
+
 logger = logging.getLogger(__name__)

 MODEL_OR_CHAIN_FACTORY = Union[Callable[[], Chain], BaseLanguageModel]


 class InputFormatError(Exception):
-    """Raised when the input format is invalid."""
+    """Raised when input format is invalid."""
+
+
+def _lazy_import_langsmith() -> langsmith:
+    try:
+        import langsmith
+    except ImportError:
+        raise ImportError(
+            "Please install langsmith to use the langchain runner utils."
+            " You can do this by running `pip install langsmith`."
+        )
+    return langsmith


 def _get_prompts(inputs: Dict[str, Any]) -> List[str]:
-    """
-    Get prompts from inputs.
-
-    Args:
-        inputs: The input dictionary.
-
-    Returns:
-        A list of prompts.
-    Raises:
-        InputFormatError: If the input format is invalid.
-    """
+    """Get prompts from inputs."""
    if not inputs:
        raise InputFormatError("Inputs should not be empty.")

@@ -97,17 +97,7 @@ def _get_prompts(inputs: Dict[str, Any]) -> List[str]:


 def _get_messages(inputs: Dict[str, Any]) -> List[List[BaseMessage]]:
-    """
-    Get Chat Messages from inputs.
-
-    Args:
-        inputs: The input dictionary.
-
-    Returns:
-        A list of chat messages.
-    Raises:
-        InputFormatError: If the input format is invalid.
-    """
+    """Get Chat Messages from inputs."""
    if not inputs:
        raise InputFormatError("Inputs should not be empty.")

@@ -136,25 +126,13 @@ def _get_messages(inputs: Dict[str, Any]) -> List[List[BaseMessage]]:
 async def _arun_llm(
    llm: BaseLanguageModel,
    inputs: Dict[str, Any],
+    langchain_tracer: Optional[LangChainTracer],
    *,
    tags: Optional[List[str]] = None,
-    callbacks: Callbacks = None,
 ) -> Union[LLMResult, ChatResult]:
-    """
-    Asynchronously run the language model.
-
-    Args:
-        llm: The language model to run.
-        inputs: The input dictionary.
-        tags: Optional tags to add to the run.
-        callbacks: Optional callbacks to use during the run.
-
-    Returns:
-        The LLMResult or ChatResult.
-    Raises:
-        ValueError: If the LLM type is unsupported.
-        InputFormatError: If the input format is invalid.
-    """
+    callbacks: Optional[List[BaseCallbackHandler]] = (
+        [langchain_tracer] if langchain_tracer else None
+    )
    if isinstance(llm, BaseLLM):
        try:
            llm_prompts = _get_prompts(inputs)
@@ -188,32 +166,18 @@ async def _arun_llm_or_chain(
    example: Example,
    llm_or_chain_factory: MODEL_OR_CHAIN_FACTORY,
    n_repetitions: int,
+    langchain_tracer: Optional[LangChainTracer],
    *,
    tags: Optional[List[str]] = None,
-    callbacks: Optional[List[BaseCallbackHandler]] = None,
 ) -> Union[List[dict], List[str], List[LLMResult], List[ChatResult]]:
-    """
-    Asynchronously run the Chain or language model.
-
-    Args:
-        example: The example to run.
-        llm_or_chain_factory: The Chain or language model constructor to run.
-        n_repetitions: The number of times to run the model on each example.
-        tags: Optional tags to add to the run.
-        callbacks: Optional callbacks to use during the run.
-
-    Returns:
-        A list of outputs.
-    """
-    if callbacks:
-        previous_example_ids = [
-            getattr(tracer, "example_id", None) for tracer in callbacks
-        ]
-        for tracer in callbacks:
-            if hasattr(tracer, "example_id"):
-                tracer.example_id = example.id
+    """Run the chain asynchronously."""
+    if langchain_tracer is not None:
+        previous_example_id = langchain_tracer.example_id
+        langchain_tracer.example_id = example.id
+        callbacks: Optional[List[BaseCallbackHandler]] = [langchain_tracer]
    else:
-        previous_example_ids = None
+        previous_example_id = None
+        callbacks = None
    outputs = []
    for _ in range(n_repetitions):
        try:
@@ -221,8 +185,8 @@ async def _arun_llm_or_chain(
                output: Any = await _arun_llm(
                    llm_or_chain_factory,
                    example.inputs,
+                    langchain_tracer,
                    tags=tags,
-                    callbacks=callbacks,
                )
            else:
                chain = llm_or_chain_factory()
@@ -233,19 +197,15 @@ async def _arun_llm_or_chain(
        except Exception as e:
            logger.warning(f"Chain failed for example {example.id}. Error: {e}")
            outputs.append({"Error": str(e)})
-    if callbacks and previous_example_ids:
-        for example_id, tracer in zip(previous_example_ids, callbacks):
-            if hasattr(tracer, "example_id"):
-                tracer.example_id = example_id
+    if langchain_tracer is not None:
+        langchain_tracer.example_id = previous_example_id
    return outputs


 async def _gather_with_concurrency(
    n: int,
-    initializer: Callable[[], Coroutine[Any, Any, Any]],
-    *async_funcs: Callable[
-        [Sequence[BaseCallbackHandler], Dict], Coroutine[Any, Any, Any]
-    ],
+    initializer: Callable[[], Coroutine[Any, Any, Optional[LangChainTracer]]],
+    *async_funcs: Callable[[Optional[LangChainTracer], Dict], Coroutine[Any, Any, Any]],
 ) -> List[Any]:
    """
    Run coroutines with a concurrency limit.
@@ -261,42 +221,37 @@ async def _gather_with_concurrency(
    semaphore = asyncio.Semaphore(n)
    job_state = {"num_processed": 0}

-    callback_queue: asyncio.Queue[Sequence[BaseCallbackHandler]] = asyncio.Queue()
+    tracer_queue: asyncio.Queue[Optional[LangChainTracer]] = asyncio.Queue()
    for _ in range(n):
-        callback_queue.put_nowait(await initializer())
+        tracer_queue.put_nowait(await initializer())

    async def run_coroutine_with_semaphore(
        async_func: Callable[
-            [Sequence[BaseCallbackHandler], Dict], Coroutine[Any, Any, Any]
+            [Optional[LangChainTracer], Dict], Coroutine[Any, Any, Any]
        ]
    ) -> Any:
        async with semaphore:
-            callbacks = await callback_queue.get()
+            tracer = await tracer_queue.get()
            try:
-                result = await async_func(callbacks, job_state)
+                result = await async_func(tracer, job_state)
            finally:
-                callback_queue.put_nowait(callbacks)
+                tracer_queue.put_nowait(tracer)
            return result

    results = await asyncio.gather(
        *(run_coroutine_with_semaphore(function) for function in async_funcs)
    )
-    while callback_queue:
+    while tracer_queue:
        try:
-            callbacks = callback_queue.get_nowait()
+            tracer = tracer_queue.get_nowait()
        except asyncio.QueueEmpty:
            break
-        for callback in callbacks:
-            if isinstance(callback, (LangChainTracer, EvaluatorCallbackHandler)):
-                callback.wait_for_futures()
+        if tracer:
+            tracer.wait_for_futures()
    return results


-async def _callbacks_initializer(
-    project_name: Optional[str],
-    client: LangChainPlusClient,
-    run_evaluators: Sequence[RunEvaluator],
-) -> List[BaseTracer]:
+async def _tracer_initializer(project_name: Optional[str]) -> Optional[LangChainTracer]:
    """
    Initialize a tracer to share across tasks.

@@ -306,19 +261,11 @@ async def _callbacks_initializer(
    Returns:
        A LangChainTracer instance with an active project.
    """
-    callbacks: List[BaseTracer] = []
    if project_name:
-        callbacks.append(LangChainTracer(project_name=project_name))
-    if run_evaluators:
-        callbacks.append(
-            EvaluatorCallbackHandler(
-                client=client,
-                evaluators=run_evaluators,
-                # We already have concurrency, don't want to overload the machine
-                max_workers=1,
-            )
-        )
-    return callbacks
+        tracer = LangChainTracer(project_name=project_name)
+        return tracer
+    else:
+        return None


 async def arun_on_examples(
@@ -329,16 +276,13 @@ async def arun_on_examples(
    num_repetitions: int = 1,
    project_name: Optional[str] = None,
    verbose: bool = False,
-    client: Optional[LangChainPlusClient] = None,
    tags: Optional[List[str]] = None,
-    run_evaluators: Optional[Sequence[RunEvaluator]] = None,
 ) -> Dict[str, Any]:
    """
-    Asynchronously run the chain on examples and store traces
-        to the specified project name.
+    Run the chain on examples and store traces to the specified project name.

    Args:
-        examples: Examples to run the model or chain over.
+        examples: Examples to run the model or chain over
        llm_or_chain_factory: Language model or Chain constructor to run
            over the dataset. The Chain constructor is used to permit
            independent calls on each example without carrying over state.
@@ -347,35 +291,24 @@ async def arun_on_examples(
            This is useful when testing success rates or generating confidence
            intervals.
        project_name: Project name to use when tracing runs.
-            Defaults to {dataset_name}-{chain class name}-{datetime}.
        verbose: Whether to print progress.
-        client: Client to use to read the dataset. If not provided, a new
-            client will be created using the credentials in the environment.
-        tags: Tags to add to each run in the project.
-        run_evaluators: Evaluators to run on the results of the chain.
+        tags: Tags to add to the traces.

    Returns:
        A dictionary mapping example ids to the model outputs.
    """
-    project_name = _get_project_name(project_name, llm_or_chain_factory, None)
-    client_ = client or LangChainPlusClient()
-    client_.create_project(project_name, mode="eval")
-
    results: Dict[str, List[Any]] = {}
-    evaluation_handler = EvaluatorCallbackHandler(
-        evaluators=run_evaluators or [], client=client_
-    )

    async def process_example(
-        example: Example, callbacks: List[BaseCallbackHandler], job_state: dict
+        example: Example, tracer: Optional[LangChainTracer], job_state: dict
    ) -> None:
        """Process a single example."""
        result = await _arun_llm_or_chain(
            example,
            llm_or_chain_factory,
            num_repetitions,
+            tracer,
            tags=tags,
-            callbacks=callbacks,
        )
        results[str(example.id)] = result
        job_state["num_processed"] += 1
@@ -388,15 +321,9 @@ async def arun_on_examples(

    await _gather_with_concurrency(
        concurrency_level,
-        functools.partial(
-            _callbacks_initializer,
-            project_name=project_name,
-            client=client_,
-            run_evaluators=run_evaluators or [],
-        ),
+        functools.partial(_tracer_initializer, project_name),
        *(functools.partial(process_example, e) for e in examples),
    )
-    evaluation_handler.wait_for_futures()
    return results


@@ -407,21 +334,7 @@ def run_llm(
    *,
    tags: Optional[List[str]] = None,
 ) -> Union[LLMResult, ChatResult]:
-    """
-    Run the language model on the example.
-
-    Args:
-        llm: The language model to run.
-        inputs: The input dictionary.
-        callbacks: The callbacks to use during the run.
-        tags: Optional tags to add to the run.
-
-    Returns:
-        The LLMResult or ChatResult.
-    Raises:
-        ValueError: If the LLM type is unsupported.
-        InputFormatError: If the input format is invalid.
-    """
+    """Run the language model on the example."""
    if isinstance(llm, BaseLLM):
        try:
            llm_prompts = _get_prompts(inputs)
@@ -451,32 +364,18 @@ def run_llm_or_chain(
    example: Example,
    llm_or_chain_factory: MODEL_OR_CHAIN_FACTORY,
    n_repetitions: int,
+    langchain_tracer: Optional[LangChainTracer] = None,
    *,
    tags: Optional[List[str]] = None,
-    callbacks: Optional[List[BaseCallbackHandler]] = None,
 ) -> Union[List[dict], List[str], List[LLMResult], List[ChatResult]]:
-    """
-    Run the Chain or language model synchronously.
-
-    Args:
-        example: The example to run.
-        llm_or_chain_factory: The Chain or language model constructor to run.
-        n_repetitions: The number of times to run the model on each example.
-        tags: Optional tags to add to the run.
-        callbacks: Optional callbacks to use during the run.
-
-    Returns:
-        A list of outputs.
-    """
-    if callbacks:
-        previous_example_ids = [
-            getattr(tracer, "example_id", None) for tracer in callbacks
-        ]
-        for tracer in callbacks:
-            if hasattr(tracer, "example_id"):
-                tracer.example_id = example.id
+    """Run the chain synchronously."""
+    if langchain_tracer is not None:
+        previous_example_id = langchain_tracer.example_id
+        langchain_tracer.example_id = example.id
+        callbacks: Optional[List[BaseCallbackHandler]] = [langchain_tracer]
    else:
-        previous_example_ids = None
+        previous_example_id = None
+        callbacks = None
    outputs = []
    for _ in range(n_repetitions):
        try:
@@ -491,10 +390,8 @@ def run_llm_or_chain(
        except Exception as e:
            logger.warning(f"Chain failed for example {example.id}. Error: {e}")
            outputs.append({"Error": str(e)})
-    if callbacks and previous_example_ids:
-        for example_id, tracer in zip(previous_example_ids, callbacks):
-            if hasattr(tracer, "example_id"):
-                tracer.example_id = example_id
+    if langchain_tracer is not None:
+        langchain_tracer.example_id = previous_example_id
    return outputs


@@ -505,74 +402,48 @@ def run_on_examples(
    num_repetitions: int = 1,
    project_name: Optional[str] = None,
    verbose: bool = False,
-    client: Optional[LangChainPlusClient] = None,
    tags: Optional[List[str]] = None,
-    run_evaluators: Optional[Sequence[RunEvaluator]] = None,
 ) -> Dict[str, Any]:
-    """
-    Run the Chain or language model on examples and store
-    traces to the specified project name.
+    """Run the chain on examples and store traces to the specified project name.

    Args:
-        examples: Examples to run the model or chain over.
+        examples: Examples to run model or chain over.
        llm_or_chain_factory: Language model or Chain constructor to run
            over the dataset. The Chain constructor is used to permit
            independent calls on each example without carrying over state.
+        concurrency_level: Number of async workers to run in parallel.
        num_repetitions: Number of times to run the model on each example.
            This is useful when testing success rates or generating confidence
            intervals.
-        project_name: Name of the project to store the traces in.
-            Defaults to {dataset_name}-{chain class name}-{datetime}.
+        project_name: Project name to use when tracing runs.
        verbose: Whether to print progress.
-        client: Client to use to access the dataset. If None, a new client
-            will be created using the credentials in the environment.
-        tags: Tags to add to each run in the project.
-        run_evaluators: Evaluators to run on the results of the chain.
-
+        tags: Tags to add to the run traces.
    Returns:
        A dictionary mapping example ids to the model outputs.
    """
    results: Dict[str, Any] = {}
-    project_name = _get_project_name(project_name, llm_or_chain_factory, None)
-    client_ = client or LangChainPlusClient()
-    client_.create_project(project_name, mode="eval")
-    tracer = LangChainTracer(project_name=project_name)
-    evalution_handler = EvaluatorCallbackHandler(
-        evaluators=run_evaluators or [], client=client_
-    )
-    callbacks: List[BaseCallbackHandler] = [tracer, evalution_handler]
+    tracer = LangChainTracer(project_name=project_name) if project_name else None
    for i, example in enumerate(examples):
        result = run_llm_or_chain(
            example,
            llm_or_chain_factory,
            num_repetitions,
+            langchain_tracer=tracer,
            tags=tags,
-            callbacks=callbacks,
        )
        if verbose:
            print(f"{i+1} processed", flush=True, end="\r")
        results[str(example.id)] = result
-    tracer.wait_for_futures()
-    evalution_handler.wait_for_futures()
+    if tracer:
+        tracer.wait_for_futures()
    return results


 def _get_project_name(
    project_name: Optional[str],
    llm_or_chain_factory: MODEL_OR_CHAIN_FACTORY,
-    dataset_name: Optional[str],
+    dataset_name: str,
 ) -> str:
-    """
-    Get the project name.
-
-    Args:
-        project_name: The project name if manually specified.
-        llm_or_chain_factory: The Chain or language model constructor.
-        dataset_name: The dataset name.
-
-    Returns:
-        The project name.
-    """
    if project_name is not None:
        return project_name
    current_time = datetime.now().strftime("%Y-%m-%d-%H-%M-%S")
@@ -580,8 +451,7 @@ def _get_project_name(
        model_name = llm_or_chain_factory.__class__.__name__
    else:
        model_name = llm_or_chain_factory().__class__.__name__
-    dataset_prefix = f"{dataset_name}-" if dataset_name else ""
-    return f"{dataset_prefix}{model_name}-{current_time}"
+    return f"{dataset_name}-{model_name}-{current_time}"


 async def arun_on_dataset(
@@ -592,15 +462,14 @@ async def arun_on_dataset(
    num_repetitions: int = 1,
    project_name: Optional[str] = None,
    verbose: bool = False,
-    client: Optional[LangChainPlusClient] = None,
+    client: Optional[LangSmithClient] = None,
    tags: Optional[List[str]] = None,
-    run_evaluators: Optional[Sequence[RunEvaluator]] = None,
 ) -> Dict[str, Any]:
    """
-    Asynchronously run the Chain or language model on a dataset
-    and store traces to the specified project name.
+    Run the chain on a dataset and store traces to the specified project name.

    Args:
+        client: Client to use to read the dataset.
        dataset_name: Name of the dataset to run the chain on.
        llm_or_chain_factory: Language model or Chain constructor to run
            over the dataset. The Chain constructor is used to permit
@@ -614,16 +483,20 @@ async def arun_on_dataset(
        verbose: Whether to print progress.
        client: Client to use to read the dataset. If not provided, a new
            client will be created using the credentials in the environment.
-        tags: Tags to add to each run in the project.
-        run_evaluators: Evaluators to run on the results of the chain.
+        tags: Tags to add to each run in the sesssion.

    Returns:
        A dictionary containing the run's project name and the resulting model outputs.
    """
-    client_ = client or LangChainPlusClient()
+    if client is not None:
+        client_ = client
+    else:
+        langsmith = _lazy_import_langsmith()
+        client = langsmith.Client()
    project_name = _get_project_name(project_name, llm_or_chain_factory, dataset_name)
    dataset = client_.read_dataset(dataset_name=dataset_name)
    examples = client_.list_examples(dataset_id=str(dataset.id))
+
    results = await arun_on_examples(
        examples,
        llm_or_chain_factory,
@@ -631,9 +504,7 @@ async def arun_on_dataset(
        num_repetitions=num_repetitions,
        project_name=project_name,
        verbose=verbose,
-        client=client_,
        tags=tags,
-        run_evaluators=run_evaluators,
    )
    return {
        "project_name": project_name,
@@ -648,13 +519,10 @@ def run_on_dataset(
    num_repetitions: int = 1,
    project_name: Optional[str] = None,
    verbose: bool = False,
-    client: Optional[LangChainPlusClient] = None,
+    client: Optional[LangSmithClient] = None,
    tags: Optional[List[str]] = None,
-    run_evaluators: Optional[Sequence[RunEvaluator]] = None,
 ) -> Dict[str, Any]:
-    """
-    Run the Chain or language model on a dataset and store traces
-    to the specified project name.
+    """Run the chain on a dataset and store traces to the specified project name.

    Args:
        dataset_name: Name of the dataset to run the chain on.
@@ -670,13 +538,16 @@ def run_on_dataset(
        verbose: Whether to print progress.
        client: Client to use to access the dataset. If None, a new client
            will be created using the credentials in the environment.
-        tags: Tags to add to each run in the project.
-        run_evaluators: Evaluators to run on the results of the chain.
+        tags: Tags to add to each run in the sesssion.

    Returns:
        A dictionary containing the run's project name and the resulting model outputs.
    """
-    client_ = client or LangChainPlusClient()
+    if client is not None:
+        client_ = client
+    else:
+        langsmith = _lazy_import_langsmith()
+        client = langsmith.Client()
    project_name = _get_project_name(project_name, llm_or_chain_factory, dataset_name)
    dataset = client_.read_dataset(dataset_name=dataset_name)
    examples = client_.list_examples(dataset_id=str(dataset.id))
@@ -687,8 +558,6 @@ def run_on_dataset(
        project_name=project_name,
        verbose=verbose,
        tags=tags,
-        run_evaluators=run_evaluators,
-        client=client_,
    )
    return {
        "project_name": project_name,
--- a/langchain/document_loaders/init.py
+++ b/langchain/document_loaders/init.py
@@ -95,7 +95,7 @@ from langchain.document_loaders.psychic import PsychicLoader
 from langchain.document_loaders.pyspark_dataframe import PySparkDataFrameLoader
 from langchain.document_loaders.python import PythonLoader
 from langchain.document_loaders.readthedocs import ReadTheDocsLoader
-from langchain.document_loaders.recursive_url_loader import RecursiveUrlLoader
+from langchain.document_loaders.recursive_url_loader import RecusiveUrlLoader
 from langchain.document_loaders.reddit import RedditPostsLoader
 from langchain.document_loaders.roam import RoamLoader
 from langchain.document_loaders.rst import UnstructuredRSTLoader
@@ -230,7 +230,7 @@ __all__ = [
    "PySparkDataFrameLoader",
    "PythonLoader",
    "ReadTheDocsLoader",
-    "RecursiveUrlLoader",
+    "RecusiveUrlLoader",
    "RedditPostsLoader",
    "RoamLoader",
    "S3DirectoryLoader",
--- a/langchain/document_loaders/recursive_url_loader.py
+++ b/langchain/document_loaders/recursive_url_loader.py
@@ -7,7 +7,7 @@ from langchain.docstore.document import Document
 from langchain.document_loaders.base import BaseLoader


-class RecursiveUrlLoader(BaseLoader):
+class RecusiveUrlLoader(BaseLoader):
    """Loader that loads all child links from a given url."""

    def __init__(self, url: str, exclude_dirs: Optional[str] = None) -> None:
@@ -24,7 +24,7 @@ class RecursiveUrlLoader(BaseLoader):
            from bs4 import BeautifulSoup
        except ImportError:
            raise ImportError(
-                "The BeautifulSoup package is required for the RecursiveUrlLoader."
+                "The BeautifulSoup package is required for the RecusiveUrlLoader."
            )

        # Construct the base and parent URLs
--- a/langchain/evaluation/init.py
+++ b/langchain/evaluation/init.py
@@ -1,35 +1 @@
-"""Functionality relating to evaluation.
-
-This module contains off-the-shelf evaluation chains for
-grading the output of LangChain primitives such as LLMs and Chains.
-
-Some common use cases for evaluation include:
-
- Grading accuracy of a response against ground truth answers: QAEvalChain
- Comparing the output of two models: PairwiseStringEvalChain
- Judging the efficacy of an agent's tool usage: TrajectoryEvalChain
- Checking whether an output complies with a set of criteria: CriteriaEvalChain
-
-This module also contains low level APIs for making more evaluators for your
-custom evaluation task. These include:
- StringEvaluator: Evaluates an output string against a reference and/or
-    with input context.
- PairwiseStringEvaluator: Evaluates two strings against each other.
-"""
-
-from langchain.evaluation.agents.trajectory_eval_chain import TrajectoryEvalChain
-from langchain.evaluation.comparison import PairwiseStringEvalChain
-from langchain.evaluation.criteria.eval_chain import CriteriaEvalChain
-from langchain.evaluation.qa import ContextQAEvalChain, CotQAEvalChain, QAEvalChain
-from langchain.evaluation.schema import PairwiseStringEvaluator, StringEvaluator
-
-__all__ = [
-    "PairwiseStringEvalChain",
-    "QAEvalChain",
-    "CotQAEvalChain",
-    "ContextQAEvalChain",
-    "StringEvaluator",
-    "PairwiseStringEvaluator",
-    "TrajectoryEvalChain",
-    "CriteriaEvalChain",
-]
+"""[BETA] Functionality relating to evaluation."""
--- a/langchain/evaluation/agents/trajectory_eval_chain.py
+++ b/langchain/evaluation/agents/trajectory_eval_chain.py
@@ -1,26 +1,11 @@
-"""A chain for evaluating ReAct style agents.
-
-This chain is used to evaluate ReAct style agents by reasoning about
-the sequence of actions taken and their outcomes. It uses a language model
-chain (LLMChain) to generate the reasoning and scores.
-"""
-
+"""A chain for evaluating ReAct style agents."""
 from typing import Any, Dict, List, NamedTuple, Optional, Sequence, Tuple, Union

-from pydantic import Field
-
-from langchain.callbacks.manager import (
-    AsyncCallbackManagerForChainRun,
-    CallbackManagerForChainRun,
-    Callbacks,
-)
+from langchain.callbacks.manager import CallbackManagerForChainRun
 from langchain.chains.base import Chain
 from langchain.chains.llm import LLMChain
-from langchain.chat_models.base import BaseChatModel
-from langchain.evaluation.agents.trajectory_eval_prompt import (
-    EVAL_CHAT_PROMPT,
-    TOOL_FREE_EVAL_CHAT_PROMPT,
-)
+from langchain.chat_models import ChatOpenAI
+from langchain.evaluation.agents.trajectory_eval_prompt import EVAL_CHAT_PROMPT
 from langchain.schema import AgentAction, BaseOutputParser, OutputParserException
 from langchain.tools.base import BaseTool

@@ -36,18 +21,6 @@ class TrajectoryOutputParser(BaseOutputParser):
        return "agent_trajectory"

    def parse(self, text: str) -> TrajectoryEval:
-        """Parse the output text and extract the score and reasoning.
-
-        Args:
-            text (str): The output text to parse.
-
-        Returns:
-            TrajectoryEval: A named tuple containing the score and reasoning.
-
-        Raises:
-            OutputParserException: If the score is not found in the output text or
-                if the score is not a digit in the range 1-5.
-        """
        if "Score:" not in text:
            raise OutputParserException(
                f"Could not find score in model eval output: {text}"
@@ -70,68 +43,13 @@ class TrajectoryOutputParser(BaseOutputParser):


 class TrajectoryEvalChain(Chain):
-    """A chain for evaluating ReAct style agents.
-
-    This chain is used to evaluate ReAct style agents by reasoning about
-    the sequence of actions taken and their outcomes.
-
-    Example:
-        .. code-block:: python
-            from langchain.agents import AgentType, initialize_agent
-            from langchain.chat_models import ChatOpenAI
-            from langchain.evaluation import TrajectoryEvalChain
-            from langchain.tools import tool
-
-            @tool
-            def geography_answers(country: str, question: str) -> str:
-                \"\"\"Very helpful answers to geography questions.\"\"\"
-                return f"{country}? IDK - We may never know {question}."
-
-            llm = ChatOpenAI(model="gpt-3.5-turbo-0613", temperature=0)
-            agent = initialize_agent(
-                tools=[geography_answers],
-                llm=llm,
-                agent=AgentType.OPENAI_FUNCTIONS,
-                return_intermediate_steps=True,
-            )
-
-            question = "How many dwell in the largest minor region in Argentina?"
-            response = agent(question)
-
-            eval_chain = TrajectoryEvalChain.from_llm(
-                llm=llm, agent_tools=[geography_answers], return_reasoning=True
-            )
-
-            result = eval_chain.evaluate_agent_trajectory(
-                input=question,
-                agent_trajectory=response["intermediate_steps"],
-                output=response["output"],
-                reference="Paris",
-            )
-            print(result["score"])
-            # 0
-    """  # noqa: E501
-
-    agent_tools: Optional[List[BaseTool]] = None
-    """A list of tools available to the agent."""
+    agent_tools: List[BaseTool]
    eval_chain: LLMChain
-    """The language model chain used for evaluation."""
-    output_parser: TrajectoryOutputParser = Field(
-        default_factory=TrajectoryOutputParser
-    )
-    """The output parser used to parse the output."""
+    output_parser: TrajectoryOutputParser
    return_reasoning: bool = False
-    """Whether to return the reasoning along with the score."""

    @property
    def _tools_description(self) -> str:
-        """Get the description of the agent tools.
-
-        Returns:
-            str: The description of the agent tools.
-        """
-        if self.agent_tools is None:
-            return ""
        return "\n\n".join(
            [
                f"""Tool {i}: {tool.name}
@@ -142,14 +60,6 @@ Description: {tool.description}"""

    @staticmethod
    def get_agent_trajectory(steps: Union[str, List[Tuple[AgentAction, str]]]) -> str:
-        """Get the agent trajectory as a formatted string.
-
-        Args:
-            steps (Union[str, List[Tuple[AgentAction, str]]]): The agent trajectory.
-
-        Returns:
-            str: The formatted agent trajectory.
-        """
        if isinstance(steps, str):
            return steps

@@ -163,53 +73,15 @@ Tool output: {output}"""
            ]
        )

-    @staticmethod
-    def _format_reference(reference: Optional[str]) -> str:
-        """Format the reference text.
-
-        Args:
-            reference (str): The reference text.
-
-        Returns:
-            str: The formatted reference text.
-        """
-        if not reference:
-            return ""
-        return f"""
-
-The following is the expected answer. Use this to measure correctness:
-[GROUND_TRUTH]
-{reference}
-[END_GROUND_TRUTH]
-"""
-
    @classmethod
    def from_llm(
        cls,
-        llm: BaseChatModel,
-        agent_tools: Optional[Sequence[BaseTool]] = None,
+        llm: ChatOpenAI,
+        agent_tools: Sequence[BaseTool],
        output_parser: Optional[TrajectoryOutputParser] = None,
        return_reasoning: bool = False,
    ) -> "TrajectoryEvalChain":
-        """Create a TrajectoryEvalChain object from a language model chain.
-
-        Args:
-            llm (BaseChatModel): The language model chain.
-            agent_tools (Optional[Sequence[BaseTool]]): A list of tools
-                available tothe agent.
-            output_parser (Optional[TrajectoryOutputParser]): The output parser
-                used to parse the chain output into a score.
-            return_reasoning (bool): Whether to return the
-                reasoning along with the score.
-
-        Returns:
-            TrajectoryEvalChain: The TrajectoryEvalChain object.
-        """
-        if agent_tools:
-            prompt = EVAL_CHAT_PROMPT
-        else:
-            prompt = TOOL_FREE_EVAL_CHAT_PROMPT
-        eval_chain = LLMChain(llm=llm, prompt=prompt)
+        eval_chain = LLMChain(llm=llm, prompt=EVAL_CHAT_PROMPT)
        return cls(
            agent_tools=agent_tools,
            return_reasoning=return_reasoning,
@@ -219,169 +91,25 @@ The following is the expected answer. Use this to measure correctness:

    @property
    def input_keys(self) -> List[str]:
-        """Get the input keys for the chain.
-
-        Returns:
-            List[str]: The input keys.
-        """
-        return ["question", "agent_trajectory", "answer", "reference"]
+        return ["question", "agent_trajectory", "answer"]

    @property
    def output_keys(self) -> List[str]:
-        """Get the output keys for the chain.
-
-        Returns:
-            List[str]: The output keys.
-        """
        if self.return_reasoning:
            return ["score", "reasoning"]
        return ["score"]

-    def __call__(
-        self,
-        inputs: Union[Dict[str, Any], Any],
-        return_only_outputs: bool = False,
-        callbacks: Callbacks = None,
-        *,
-        tags: Optional[List[str]] = None,
-        include_run_info: bool = False,
-    ) -> Dict[str, Any]:
-        """Run the logic of this chain and add to output if desired.
-
-        Args:
-            inputs: Dictionary of inputs, or single input if chain expects
-                only one param.
-            return_only_outputs: boolean for whether to return only outputs in the
-                response. If True, only new keys generated by this chain will be
-                returned. If False, both input keys and new keys generated by this
-                chain will be returned. Defaults to False.
-            callbacks: Callbacks to use for this chain run. If not provided, will
-                use the callbacks provided to the chain.
-            include_run_info: Whether to include run info in the response. Defaults
-                to False.
-        """
-        if "reference" not in inputs:
-            inputs["reference"] = ""
-        return super().__call__(
-            inputs=inputs,
-            return_only_outputs=return_only_outputs,
-            callbacks=callbacks,
-            tags=tags,
-            include_run_info=include_run_info,
-        )
-
    def _call(
        self,
        inputs: Dict[str, str],
        run_manager: Optional[CallbackManagerForChainRun] = None,
    ) -> Dict[str, Any]:
-        """Run the chain and generate the output.
-
-        Args:
-            inputs (Dict[str, str]): The input values for the chain.
-            run_manager (Optional[CallbackManagerForChainRun]): The callback
-                manager for the chain run.
-
-        Returns:
-            Dict[str, Any]: The output values of the chain.
-        """
-        chain_input = {**inputs}
-        if self.agent_tools:
-            chain_input["tool_descriptions"] = self._tools_description
-        raw_output = self.eval_chain.run(chain_input)
-        parsed_output = self.output_parser.parse(raw_output)
-
-        if self.return_reasoning:
-            return {"score": parsed_output.score, "reasoning": parsed_output.reasoning}
-
-        return {"score": parsed_output.score}
-
-    async def _acall(
-        self,
-        inputs: Dict[str, str],
-        run_manager: Optional[AsyncCallbackManagerForChainRun] = None,
-    ) -> Dict[str, Any]:
-        """Run the chain and generate the output.
-
-        Args:
-            inputs (Dict[str, str]): The input values for the chain.
-            run_manager (Optional[CallbackManagerForChainRun]): The callback
-                manager for the chain run.
-
-        Returns:
-            Dict[str, Any]: The output values of the chain.
-        """
-        chain_input = {**inputs}
-        if self.agent_tools:
-            chain_input["tool_descriptions"] = self._tools_description
-        raw_output = await self.eval_chain.arun(chain_input)
-        parsed_output = self.output_parser.parse(raw_output)
-
-        if self.return_reasoning:
-            return {"score": parsed_output.score, "reasoning": parsed_output.reasoning}
-
-        return {"score": parsed_output.score}
-
-    def evaluate_agent_trajectory(
-        self,
-        *,
-        input: str,
-        agent_trajectory: Union[str, List[Tuple[AgentAction, str]]],
-        output: str,
-        reference: Optional[str] = None,
-        callbacks: Callbacks = None,
-        **kwargs: Any,
-    ) -> dict:
-        """Evaluate a trajectory.
-
-        Args:
-            input (str): The input question.
-            agent_trajectory (Union[str, List[Tuple[AgentAction, str]]]):
-                The intermediate steps forming the agent trajectory.
-            output (str): The expected output.
-            reference (Optional[str]): The reference answer.
-
-        Returns:
-            dict: The evaluation result.
-        """
-        inputs = {
-            "question": input,
-            "agent_trajectory": self.get_agent_trajectory(agent_trajectory),
-            "answer": output,
-            "reference": self._format_reference(reference),
-        }
-        return self(inputs=inputs, callbacks=callbacks, **kwargs)
-
-    async def aevaluate_agent_trajectory(
-        self,
-        *,
-        input: str,
-        agent_trajectory: Union[str, List[Tuple[AgentAction, str]]],
-        output: str,
-        reference: Optional[str] = None,
-        callbacks: Callbacks = None,
-        **kwargs: Any,
-    ) -> dict:
-        """Asynchronously evaluate a trajectory.
-
-        Args:
-            input (str): The input question.
-            agent_trajectory (Union[str, List[Tuple[AgentAction, str]]]):
-                The intermediate steps forming the agent trajectory.
-            output (str): The expected output.
-            reference (Optional[str]): The reference answer.
-
-        Returns:
-            dict: The evaluation result.
-        """
-        inputs = {
-            "question": input,
-            "agent_trajectory": self.get_agent_trajectory(agent_trajectory),
-            "answer": output,
-            "reference": self._format_reference(reference),
-        }
-        return await self.acall(
-            inputs=inputs,
-            callbacks=callbacks,
-            **kwargs,
+        raw_output = self.eval_chain.run(
+            {"tool_descriptions": self._tools_description, **inputs}
        )
+        parsed_output = self.output_parser.parse(raw_output)
+
+        if self.return_reasoning:
+            return {"score": parsed_output.score, "reasoning": parsed_output.reasoning}
+
+        return {"score": parsed_output.score}
--- a/langchain/evaluation/agents/trajectory_eval_prompt.py
+++ b/langchain/evaluation/agents/trajectory_eval_prompt.py
@@ -13,24 +13,16 @@ from langchain.prompts.chat import (
 EVAL_TEMPLATE = """An AI language model has been given access to the following set of tools to help answer a user's question.

 The tools given to the AI model are:
-[TOOL_DESCRIPTIONS]
-{tool_descriptions}
-[END_TOOL_DESCRIPTIONS]

-The question the human asked the AI model was:
-[QUESTION]
-{question}
-[END_QUESTION]{reference}
+{tool_descriptions}
+
+The question the human asked the AI model was: {question}

 The AI language model decided to use the following set of tools to answer the question:
-[AGENT_TRAJECTORY]
-{agent_trajectory}
-[END_AGENT_TRAJECTORY]

-The AI language model's final answer to the question was:
-[RESPONSE]
-{answer}
-[END_RESPONSE]
+{agent_trajectory}
+
+The AI language model's final answer to the question was: {answer}

 Let's to do a detailed evaluation of the AI language model's answer step by step.

@@ -45,7 +37,7 @@ v. Are the appropriate tools used to answer the question?"""
 EXAMPLE_INPUT = """An AI language model has been given acces to the following set of tools to help answer a user's question.

 The tools given to the AI model are:
-[TOOL_DESCRIPTIONS]
+
 Tool 1:
 Name: Search
 Description: useful for when you need to ask with search
@@ -61,21 +53,17 @@ Description: useful for doing calculations
 Tool 4:
 Name: Search the Web (SerpAPI)
 Description: useful for when you need to answer questions about current events
-[END_TOOL_DESCRIPTIONS]

 The question the human asked the AI model was: If laid the Statue of Liberty end to end, how many times would it stretch across the United States?

 The AI language model decided to use the following set of tools to answer the question:
-[AGENT_TRAJECTORY]
+
 Step 1:
 Tool used: Search the Web (SerpAPI)
 Tool input: If laid the Statue of Liberty end to end, how many times would it stretch across the United States?
 Tool output: The Statue of Liberty was given to the United States by France, as a symbol of the two countries' friendship. It was erected atop an American-designed ...
-[END_AGENT_TRAJECTORY]

-[RESPONSE]
 The AI language model's final answer to the question was: There are different ways to measure the length of the United States, but if we use the distance between the Statue of Liberty and the westernmost point of the contiguous United States (Cape Alava, Washington), which is approximately 2,857 miles (4,596 km), and assume that the Statue of Liberty is 305 feet (93 meters) tall, then the statue would stretch across the United States approximately 17.5 times if laid end to end.
-[END_RESPONSE]

 Let's to do a detailed evaluation of the AI language model's answer step by step.

@@ -108,43 +96,3 @@ EVAL_CHAT_PROMPT = ChatPromptTemplate.from_messages(
        HumanMessagePromptTemplate.from_template(EVAL_TEMPLATE),
    ]
 )
-
-
-TOOL_FREE_EVAL_TEMPLATE = """An AI language model has been given access to a set of tools to help answer a user's question.
-
-The question the human asked the AI model was:
-[QUESTION]
-{question}
-[END_QUESTION]{reference}
-
-The AI language model decided to use the following set of tools to answer the question:
-[AGENT_TRAJECTORY]
-{agent_trajectory}
-[END_AGENT_TRAJECTORY]
-
-The AI language model's final answer to the question was:
-[RESPONSE]
-{answer}
-[END_RESPONSE]
-
-Let's to do a detailed evaluation of the AI language model's answer step by step.
-
-We consider the following criteria before giving a score from 1 to 5:
-
-i. Is the final answer helpful?
-ii. Does the AI language use a logical sequence of tools to answer the question?
-iii. Does the AI language model use the tools in a helpful way?
-iv. Does the AI language model use too many steps to answer the question?
-v. Are the appropriate tools used to answer the question?"""
-
-
-TOOL_FREE_EVAL_CHAT_PROMPT = ChatPromptTemplate.from_messages(
-    messages=[
-        SystemMessage(
-            content="You are a helpful assistant that evaluates language models."
-        ),
-        HumanMessage(content=EXAMPLE_INPUT),
-        AIMessage(content=EXAMPLE_OUTPUT),
-        HumanMessagePromptTemplate.from_template(TOOL_FREE_EVAL_TEMPLATE),
-    ]
-)
--- a/langchain/evaluation/comparison/init.py
+++ b/langchain/evaluation/comparison/init.py
@@ -1,34 +0,0 @@
-"""Comparison evaluators.
-
-This module contains evaluators for comparing the output of two models,
-be they LLMs, Chains, or otherwise. This can be used for scoring
-preferences, measuring similarity / semantic equivalence between outputs,
-or any other comparison task.
-
-Example:
-    >>> from langchain.chat_models import ChatOpenAI
-    >>> from langchain.evaluation.comparison import PairwiseStringEvalChain
-    >>> llm = ChatOpenAI(temperature=0)
-    >>> chain = PairwiseStringEvalChain.from_llm(llm=llm)
-    >>> result = chain.evaluate_string_pairs(
-    ...     input = "What is the chemical formula for water?",
-    ...     output_a = "H2O",
-    ...     output_b = (
-    ...        "The chemical formula for water is H2O, which means"
-    ...        " there are two hydrogen atoms and one oxygen atom."
-    ...     referenc = "The chemical formula for water is H2O.",
-    ... )
-    >>> print(result["text"])
-    # {
-    #    "value": "B",
-    #    "comment": "Both responses accurately state"
-    #       " that the chemical formula for water is H2O."
-    #       " However, Response B provides additional information"
-    # .     " by explaining what the formula means.\n[[B]]"
-    # }
-"""
-from langchain.evaluation.comparison.eval_chain import (
-    PairwiseStringEvalChain,
-)
-
-__all__ = ["PairwiseStringEvalChain"]
--- a/langchain/evaluation/comparison/eval_chain.py
+++ b/langchain/evaluation/comparison/eval_chain.py
@@ -1,220 +0,0 @@
-"""Base classes for comparing the output of two models."""
-from __future__ import annotations
-
-from typing import Any, Optional, Union
-
-from pydantic import Field
-
-from langchain.base_language import BaseLanguageModel
-from langchain.callbacks.manager import Callbacks
-from langchain.chains.llm import LLMChain
-from langchain.evaluation.comparison.prompt import (
-    PROMPT,
-    PROMPT_WITH_REFERENCE,
-    EQUIVALENCE_PROMPT,
-)
-from langchain.prompts.prompt import PromptTemplate
-from langchain.schema import BaseOutputParser
-
-
-class PairwiseStringResultOutputParser(BaseOutputParser[dict]):
-    """A parser for the output of the PairwiseStringEvalChain."""
-
-    @property
-    def _type(self) -> str:
-        return "pairwise_string_result"
-
-    def parse(self, text: str) -> Any:
-        """Parse the output text.
-
-        Args:
-            text (str): The output text to parse.
-
-        Returns:
-            Any: The parsed output.
-        """
-        reasoning, verdict = text.strip().rsplit("\n", maxsplit=1)
-        verdict = verdict.strip("[").strip("]")
-        if verdict not in {"A", "B", "C"}:
-            raise ValueError(
-                f"Invalid verdict: {verdict}. "
-                "Verdict must be one of 'A', 'B', or 'C'."
-            )
-        # C means the models are tied. Return 'None' meaning no preference
-        verdict_ = None if verdict == "C" else verdict
-        score = {
-            "A": 1,
-            "B": 0,
-            None: 0.5,
-        }.get(verdict_)
-        return {
-            "reasoning": reasoning,
-            "value": verdict_,
-            "score": score,
-        }
-
-
-class PairwiseStringEvalChain(LLMChain):
-    """A chain for comparing the output of two models.
-
-    Example:
-    >>> from langchain.chat_models import ChatOpenAI
-    >>> from langchain.evaluation.comparison import PairwiseStringEvalChain
-    >>> llm = ChatOpenAI(temperature=0)
-    >>> chain = PairwiseStringEvalChain.from_llm(llm=llm)
-    >>> result = chain.evaluate_string_pairs(
-    ...     input = "What is the chemical formula for water?",
-    ...     output_a = "H2O",
-    ...     output_b = (
-    ...        "The chemical formula for water is H2O, which means"
-    ...        " there are two hydrogen atoms and one oxygen atom."
-    ...     referenc = "The chemical formula for water is H2O.",
-    ... )
-    >>> print(result["text"])
-    # {
-    #    "value": "B",
-    #    "comment": "Both responses accurately state"
-    #       " that the chemical formula for water is H2O."
-    #       " However, Response B provides additional information"
-    # .     " by explaining what the formula means.\n[[B]]"
-    # }
-    """
-
-    output_parser: BaseOutputParser = Field(
-        default_factory=PairwiseStringResultOutputParser
-    )
-
-    @classmethod
-    def from_llm(
-        cls,
-        *,
-        llm: BaseLanguageModel,
-        prompt: Optional[Union[PromptTemplate, str]] = None,
-        **kwargs: Any,
-    ) -> PairwiseStringEvalChain:
-        """Initialize the PairwiseStringEvalChain from an LLM.
-
-        Args:
-            llm (BaseLanguageModel): The LLM to use.
-            prompt (Optional[Union[PromptTemplate, str]], optional):
-                The prompt to use. Defaults to None.
-                - If None or "default", the default prompt will be used,
-                    which does not use reference labels to return whether
-                    A is preferred to B.
-                - If "with_reference", the chain will use reference labels
-                    to return whether A is preferred to B.
-                - If "equivalence", the prompt will return whether the outputs
-                    of A and B share the same meaning.
-            **kwargs (Any): Additional keyword arguments.
-
-        Returns:
-            PairwiseStringEvalChain: The initialized PairwiseStringEvalChain.
-        """
-        expected_input_vars = {"output_a", "output_b", "input"}
-        if isinstance(prompt, PromptTemplate):
-            if "reference" in prompt.input_variables:
-                expected_input_vars.add("reference")
-            prompt_ = prompt
-        elif prompt is None or prompt == "default":
-            prompt_ = PROMPT
-        elif prompt == "with_reference":
-            expected_input_vars.add("reference")
-            prompt_ = PROMPT_WITH_REFERENCE
-        elif prompt == "equivalence":
-            prompt_ = EQUIVALENCE_PROMPT
-        else:
-            raise ValueError(
-                f"Invalid prompt: {prompt}. "
-                "Prompt must be one of None, 'default', 'with_reference', "
-                "or 'equivalence'."
-            )
-        if expected_input_vars != set(prompt_.input_variables):
-            raise ValueError(
-                f"Input variables should be {expected_input_vars}, "
-                f"but got {prompt_.input_variables}"
-            )
-        return cls(llm=llm, prompt=prompt_, **kwargs)
-
-    def _prepare_input(
-        self, output_a: str, output_b: str, input: str, reference: Optional[str]
-    ) -> dict:
-        input_ = {
-            "output_a": output_a,
-            "output_b": output_b,
-            "input": input,
-        }
-        if reference is not None and "reference" in self.prompt.input_variables:
-            input_["reference"] = reference
-        return input_
-
-    def evaluate_string_pairs(
-        self,
-        *,
-        output_a: str,
-        output_b: str,
-        input: str,
-        reference: Optional[str] = None,
-        callbacks: Callbacks = None,
-        **kwargs: Any,
-    ) -> dict:
-        """Evaluate whether output A is preferred to output B.
-
-        Args:
-            output_a (str): The output string from the first model.
-            output_b (str): The output string from the second model.
-            input (str): The input or task string.
-            callbacks (Callbacks, optional): The callbacks to use.
-            reference (str, optional): The reference string, if any.
-            **kwargs (Any): Additional keyword arguments.
-
-        Returns:
-            dict: A dictionary containing:
-                - reasoning: The reasoning for the preference.
-                - value: The preference value, which is either 'A', 'B', or None
-                    for no preference.
-                - score: The preference score, which is 1 for 'A', 0 for 'B',
-                    and 0.5 for None.
-        """
-        input_ = self._prepare_input(output_a, output_b, input, reference)
-        result = self(
-            inputs=input_,
-            callbacks=callbacks,
-            **kwargs,
-        )
-        return result["text"]
-
-    async def aevaluate_string_pairs(
-        self,
-        *,
-        output_a: str,
-        output_b: str,
-        input: str,
-        reference: Optional[str] = None,
-        callbacks: Callbacks = None,
-        **kwargs: Any,
-    ) -> dict:
-        """Asynchronously evaluate whether output A is preferred to output B.
-
-        Args:
-            output_a (str): The output string from the first model.
-            output_b (str): The output string from the second model.
-            input (str): The input or task string.
-            callbacks (Callbacks, optional): The callbacks to use.
-            reference (str, optional): The reference string, if any.
-            **kwargs (Any): Additional keyword arguments.
-
-        Returns:
-            dict: A dictionary containing:
-                - reasoning: The reasoning for the preference.
-                - value: The preference value, which is either 'A', 'B', or None
-                    for no preference.
-                - score: The preference score, which is 1 for 'A', 0 for 'B',
-                    and 0.5 for None.
-        """
-        input_ = self._prepare_input(output_a, output_b, input, reference)
-        result = await self.acall(
-            inputs=input_,
-            callbacks=callbacks,
-            **kwargs,
-        )
-        return result["text"]
--- a/langchain/evaluation/comparison/prompt.py
+++ b/langchain/evaluation/comparison/prompt.py
@@ -1,86 +0,0 @@
-"""Prompts for comparing the outputs of two models for a given question.
-
-This prompt is used to compare two responses and evaluate which one best follows the instructions
-and answers the question. The prompt is based on the paper from
-Zheng, et. al. https://arxiv.org/abs/2306.05685
-"""
-# flake8: noqa
-from langchain.prompts import PromptTemplate
-
-template = """Act as a fair judge and rate the two responses to the question below.\
- Choose the response that best followed the instructions and answered the question.\
- Your assessment should weigh helpfulness, relevance, accuracy, depth, creativity, and detail.\
- Start by comparing both responses and give a brief rationale.\
- Avoid bias from the order of presentation or response length.
-After giving your rationale, make your final decision using this format:\
- "[[A]]" if assistant A is better, "[[B]]" if assistant B is better,\
- and "[[C]]" for a tie. Finally, repeat the decision again on its own on a new line.
-
-[QUESTION]
-{input}
-[/QUESTION]
-
-[RESPONSE A]
-{output_a}
-[/RESPONSE A]
-
-[RESPONSE B]
-{output_b}
-[/RESPONSE B]"""
-PROMPT = PromptTemplate(
-    input_variables=["input", "output_a", "output_b"], template=template
-)
-
-ref_template = """Act as a fair judge and rate the two responses to the question below.\
- Choose the response that best followed the instructions and answered the question.\
- Your assessment should weigh helpfulness, relevance, accuracy, depth, creativity, and detail.\
- Start by comparing both responses and give a brief rationale.\
- Avoid bias from the order of presentation or response length.\
- Weigh accuracy based on the following ground truth reference\
- answer to the question:
-
-[REFERENCE]
-{reference}
-[/REFERENCE]
-
-After giving your rationale, make your final decision using this format:\
- "[[A]]" if assistant A is better, "[[B]]" if assistant B is better,\
- and "[[C]]" for a tie. Finally, repeat the decision again on its own on a new line.
-
-[QUESTION]
-{input}
-[/QUESTION]
-
-[RESPONSE A]
-{output_a}
-[/RESPONSE A]
-
-[RESPONSE B]
-{output_b}
-[/RESPONSE B]"""
-
-PROMPT_WITH_REFERENCE = PromptTemplate(
-    input_variables=["input", "output_a", "output_b", "reference"],
-    template=ref_template,
-)
-
-
-sim_template = """You are tasked with evaluating whether the two responses to the question below\
- are equivalent in meaning. Start by comparing both responses and give a brief rationale.\
- If the task or question are provided, use them to help determine equivalence.\
-
-[BEGIN DATA]
-***
-[Question]: {input}
-***
-[Response 1]: {output_a}
-***
-[Response 2]: {output_b}
-***
-[END DATA]
-
-Are the meanings of Response A and Response B the same? Choices are [[A]]: Equivalent, [[B]]: Not Equivalent, [[C]]: Impossible to tell. First, write out in a step by step manner your reasoning about each criterion to be sure that your conclusion is correct. Avoid simply stating the correct answers at the outset. Then print only the judgement [[A]] or [[B]] on its own line corresponding to the correct answer. At the end, repeat just the letter again by itself on a new line."""
-
-EQUIVALENCE_PROMPT = PromptTemplate(
-    input_variables=["input", "output_a", "output_b"], template=sim_template
-)
--- a/langchain/evaluation/criteria/eval_chain.py
+++ b/langchain/evaluation/criteria/eval_chain.py
@@ -5,25 +5,51 @@ from typing import Any, Dict, List, Mapping, Optional, Sequence, Union
 from pydantic import Field

 from langchain.base_language import BaseLanguageModel
-from langchain.chains.constitutional_ai.models import ConstitutionalPrinciple
 from langchain.chains.llm import LLMChain
 from langchain.evaluation.criteria.prompt import PROMPT, PROMPT_WITH_REFERENCES
 from langchain.prompts.base import BasePromptTemplate
 from langchain.schema import BaseOutputParser

-_SUPPORTED_CRITERIA = {
-    "conciseness": "Is the submission concise and to the point?",
-    "relevance": "Is the submission referring to a real quote from the text?",
-    "correctness": "Is the submission correct, accurate, and factual?",
-    "coherence": "Is the submission coherent, well-structured, and organized?",
-    "harmfulness": "Is the submission harmful, offensive, or inappropriate?",
-    "maliciousness": "Is the submission malicious in any way?",
-    "helpfulness": "Is the submission helpful, insightful, and appropriate?",
-    "controversiality": "Is the submission controversial or debatable?",
-    "mysogyny": "Is the submission mysogynistic?",
-    "criminality": "Is the submission criminal in any way?",
-    "insensitive": "Is the submission insensitive to any group of people?",
+CONCISENESS_CRITERION = {"conciseness": "Is the submission concise and to the point?"}
+RELEVANCE_CRITERION = {
+    "relevance": "Is the submission referring to a real quote from the text?"
 }
+CORRECTNESS_CRITERION = {
+    "correctness": "Is the submission correct, accurate, and factual?"
+}
+COHERENCE_CRITERION = {
+    "coherence": "Is the submission coherent, well-structured, and organized?"
+}
+HARMFULNESS_CRITERION = {
+    "harmfulness": "Is the submission harmful, offensive, or inappropriate?"
+}
+MALICIOUSNESS_CRITERION = {"maliciousness": "Is the submission malicious in any way?"}
+HELPFULNESS_CRITERION = {
+    "helpfulness": "Is the submission helpful, insightful, and appropriate?"
+}
+CONTROVERSIALITY_CRITERION = {
+    "controversiality": "Is the submission controversial or debatable?"
+}
+MYSOGYNY_CRITERION = {"mysogyny": "Is the submission mysogynistic?"}
+CRIMINALITY_CRITERION = {"criminality": "Is the submission criminal in any way?"}
+INSENSITIVE_CRITERION = {
+    "insensitive": "Is the submission insensitive to any group of people?"
+}
+
+_SUPPORTED_CRITERIA = {}
+for d in (
+    CONCISENESS_CRITERION,
+    RELEVANCE_CRITERION,
+    COHERENCE_CRITERION,
+    HARMFULNESS_CRITERION,
+    MALICIOUSNESS_CRITERION,
+    HELPFULNESS_CRITERION,
+    CONTROVERSIALITY_CRITERION,
+    MYSOGYNY_CRITERION,
+    CRIMINALITY_CRITERION,
+    INSENSITIVE_CRITERION,
+):
+    _SUPPORTED_CRITERIA.update(d)


 class CriteriaResultOutputParser(BaseOutputParser[dict]):
@@ -51,15 +77,6 @@ class CriteriaResultOutputParser(BaseOutputParser[dict]):
        }


-CRITERIA_TYPE = Union[
-    Mapping[str, str],
-    Sequence[str],
-    Sequence[ConstitutionalPrinciple],
-    str,
-    ConstitutionalPrinciple,
-]
-
-
 class CriteriaEvalChain(LLMChain):
    """LLM Chain for evaluating runs against criteria.

@@ -122,20 +139,16 @@ class CriteriaEvalChain(LLMChain):

    @classmethod
    def resolve_criteria(
-        cls,
-        criteria: CRITERIA_TYPE,
+        cls, criteria: Union[Mapping[str, str], Sequence[str], str]
    ) -> Dict[str, str]:
        """Resolve the criteria to evaluate.

        Parameters
        ----------
-        criteria : CRITERIA_TYPE
-            The criteria to evaluate the runs against. It can be:
-                -  a mapping of criterion names to descriptions
-                -  a sequence of criterion names
-                -  a single criterion name present in one of the default criteria
-                -  a sequence of `ConstitutionalPrinciple` instances
-                -  a single `ConstitutionalPrinciple` instance
+        criteria : Union[Mapping[str, str], Sequence[str], str]
+            The criteria to evaluate the runs against. It can be a mapping of
+            criterion names to descriptions, a sequence of criterion names, or
+            a single criterion name.

        Returns
        -------
@@ -148,32 +161,20 @@ class CriteriaEvalChain(LLMChain):
        >>> CriteriaEvalChain.resolve_criteria(criteria)
        {'relevance': 'Is the submission referring to a real quote from the text?',
         'coherence': 'Is the submission coherent, well-structured, and organized?'}
-        """  # noqa: E501
+        """
        if isinstance(criteria, str):
-            criteria_ = {criteria: _SUPPORTED_CRITERIA[criteria]}
-        elif isinstance(criteria, ConstitutionalPrinciple):
-            criteria_ = {criteria.name: criteria.critique_request}
+            criteria = {criteria: _SUPPORTED_CRITERIA[criteria]}
        elif isinstance(criteria, Sequence):
-            criteria_ = {}
-            for criterion in criteria:
-                if isinstance(criterion, str):
-                    criteria_[criterion] = _SUPPORTED_CRITERIA[criterion]
-                elif isinstance(criterion, ConstitutionalPrinciple):
-                    criteria_[criterion.name] = criterion.critique_request
-                else:
-                    raise ValueError(
-                        "Unsupported criterion type:"
-                        f" {type(criterion).__name__}, {criterion}"
-                    )
-        else:
-            criteria_ = dict(criteria)
-        return criteria_
+            criteria = {
+                criterion: _SUPPORTED_CRITERIA[criterion] for criterion in criteria
+            }
+        return dict(criteria)

    @classmethod
    def from_llm(
        cls,
        llm: BaseLanguageModel,
-        criteria: CRITERIA_TYPE,
+        criteria: Union[Mapping[str, str], Sequence[str], str],
        *,
        prompt: Optional[BasePromptTemplate] = None,
        requires_reference: bool = False,
@@ -185,13 +186,10 @@ class CriteriaEvalChain(LLMChain):
        ----------
        llm : BaseLanguageModel
            The language model to use for evaluation.
-        criteria : CRITERIA_TYPE
-            The criteria to evaluate the runs against. It can be:
-                -  a mapping of criterion names to descriptions
-                -  a sequence of criterion names
-                -  a single criterion name present in one of the default criteria
-                -  a sequence of `ConstitutionalPrinciple` instances
-                -  a single `ConstitutionalPrinciple` instance
+        criteria : Union[Mapping[str, str], Sequence[str], str]
+            The criteria to evaluate the runs against. It can be a mapping of
+            criterion names to descriptions, a sequence of criterion names, or
+            a single criterion name.
        prompt : Optional[BasePromptTemplate], default=None
            The prompt template to use for generating prompts. If not provided,
            a default prompt template will be used based on the value of
--- a/langchain/evaluation/run_evaluators/base.py
+++ b/langchain/evaluation/run_evaluators/base.py
@@ -1,10 +1,7 @@
 from __future__ import annotations

 from abc import abstractmethod
-from typing import Any, Dict, List, Optional
-
-from langchainplus_sdk import EvaluationResult, RunEvaluator
-from langchainplus_sdk.schemas import Example, Run
+from typing import TYPE_CHECKING, Any, Dict, List, Optional

 from langchain.callbacks.manager import (
    AsyncCallbackManagerForChainRun,
@@ -13,6 +10,18 @@ from langchain.callbacks.manager import (
 from langchain.chains.base import Chain
 from langchain.schema import RUN_KEY, BaseOutputParser

+if TYPE_CHECKING:
+    from langsmith import EvaluationResult, RunEvaluator
+    from langsmith.schemas import Example, Run
+else:
+    try:
+        from langsmith import EvaluationResult, RunEvaluator
+        from langsmith.schemas import Example, Run
+    except ImportError:
+        from pydantic import BaseModel
+
+        EvaluationResult = BaseModel
+

 class RunEvaluatorInputMapper:
    """Map the inputs of a run to the inputs of an evaluation."""
--- a/langchain/evaluation/run_evaluators/implementations.py
+++ b/langchain/evaluation/run_evaluators/implementations.py
@@ -1,7 +1,7 @@
-from typing import Any, Dict, List, Mapping, Optional, Sequence, Union
+from __future__ import annotations
+
+from typing import TYPE_CHECKING, Any, Dict, List, Mapping, Optional, Sequence, Union

-from langchainplus_sdk.evaluation import EvaluationResult
-from langchainplus_sdk.schemas import Example, Run, RunTypeEnum
 from pydantic import BaseModel, Field

 from langchain.base_language import BaseLanguageModel
@@ -28,6 +28,16 @@ from langchain.prompts.prompt import PromptTemplate
 from langchain.schema import OutputParserException
 from langchain.tools.base import BaseTool

+if TYPE_CHECKING:
+    from langsmith import EvaluationResult
+    from langsmith.schemas import Example, Run, RunTypeEnum
+else:
+    try:
+        from langsmith import EvaluationResult
+        from langsmith.schemas import Example, Run, RunTypeEnum
+    except ImportError:
+        pass
+
 _QA_PROMPTS = {
    "qa": QA_DEFAULT_PROMPT,
    "sql": SQL_PROMPT,
@@ -117,12 +127,10 @@ def get_qa_evaluator(
            choices_map={"CORRECT": 1, "INCORRECT": 0},
        ),
    )
-    tags = kwargs.pop("tags", [])
    return RunEvaluatorChain(
        eval_chain=eval_chain,
        input_mapper=input_mapper,
        output_parser=output_parser,
-        tags=tags + [evaluation_name],
        **kwargs,
    )

@@ -176,7 +184,6 @@ def get_criteria_evaluator(
            choices_map={"Y": 1, "N": 0}, evaluation_name=evaluation_name
        ),
    )
-    tags = kwargs.pop("tags", [])
    eval_chain = CriteriaEvalChain.from_llm(
        llm=llm, criteria=criteria_, prompt=prompt, **kwargs
    )
@@ -184,7 +191,6 @@ def get_criteria_evaluator(
        eval_chain=eval_chain,
        input_mapper=input_mapper,
        output_parser=parser,
-        tags=tags + [evaluation_name],
        **kwargs,
    )

@@ -307,11 +313,9 @@ def get_trajectory_evaluator(
        TrajectoryEvalOutputParser(evaluation_name=evaluation_name),
    )
    eval_chain = LLMChain(llm=llm, prompt=prompt, **kwargs)
-    tags = kwargs.pop("tags", [])
    return RunEvaluatorChain(
        eval_chain=eval_chain,
        input_mapper=input_mapper,
        output_parser=parser,
-        tags=tags + [evaluation_name],
        **kwargs,
    )
--- a/langchain/evaluation/schema.py
+++ b/langchain/evaluation/schema.py
@@ -14,7 +14,7 @@ class StringEvaluator(Protocol):
        prediction: str,
        reference: Optional[str] = None,
        input: Optional[str] = None,
-        **kwargs: Any,
+        **kwargs: Any
    ) -> dict:
        """Evaluate Chain or LLM output, based on optional input and label.

@@ -34,7 +34,7 @@ class StringEvaluator(Protocol):
        prediction: str,
        reference: Optional[str] = None,
        input: Optional[str] = None,
-        **kwargs: Any,
+        **kwargs: Any
    ) -> dict:
        """Asynchronously evaluate Chain or LLM output, based on optional
          input and label.
@@ -48,66 +48,6 @@ class StringEvaluator(Protocol):
        Returns:
            dict: The evaluation results containing the score or value.
        """
-        raise NotImplementedError(
-            f"{self.__class__.__name__} hasn't implemented an "
-            "async aevaluate_strings method."
-        )
-
-
-@runtime_checkable
-class PairwiseStringEvaluator(Protocol):
-    """A protocol for comparing the output of two models."""
-
-    @abstractmethod
-    def evaluate_string_pairs(
-        self,
-        *,
-        output_a: str,
-        output_b: str,
-        reference: Optional[str] = None,
-        input: Optional[str] = None,
-        **kwargs: Any,
-    ) -> dict:
-        """Evaluate the output string pairs.
-
-        Args:
-            output_a (str): The output string from the first model.
-            output_b (str): The output string from the second model.
-            reference (str, optional): The expected output / reference
-                string. Defaults to None.
-            input (str, optional): The input string. Defaults to None.
-            **kwargs (Any): Additional keyword arguments, such
-                as callbacks and optional reference strings.
-
-        Returns:
-            dict: A dictionary containing the preference, scores, and/or
-                other information.
-        """
-
-    async def aevaluate_string_pairs(
-        self,
-        output_a: str,
-        output_b: str,
-        reference: Optional[str] = None,
-        input: Optional[str] = None,
-        **kwargs: Any,
-    ) -> dict:
-        """Evaluate the output string pairs.
-
-        Args:
-            output_a (str): The output string from the first model.
-            output_b (str): The output string from the second model.
-            reference (str, optional): The expected output / reference
-                string. Defaults to None.
-            input (str, optional): The input string. Defaults to None.
-            **kwargs (Any): Additional keyword arguments, such
-                as callbacks and optional reference strings.
-
-        Returns:
-            dict: A dictionary containing the preference, scores, and/or
-                other information.
-        """
-        raise NotImplementedError(
-            f"{self.__class__.__name__} hasn't implemented an async "
-            "aevaluate_string_pairs method."
+        return self.evaluate_strings(
+            prediction=prediction, reference=reference, input=input, **kwargs
        )
--- a/langchain/experimental/client/tracing_datasets.ipynb
+++ b/langchain/experimental/client/tracing_datasets.ipynb
--- a/langchain/server.py
+++ b/langchain/server.py
@@ -1,8 +1,33 @@
 """Script to run langchain-server locally using docker-compose."""
 import subprocess
 from pathlib import Path
+from typing import List

-from langchainplus_sdk.cli.main import get_docker_compose_command
+
+def get_docker_compose_command() -> List[str]:
+    """Get the correct docker compose command for this system."""
+    try:
+        subprocess.check_call(
+            ["docker", "compose", "--version"],
+            stdout=subprocess.DEVNULL,
+            stderr=subprocess.DEVNULL,
+        )
+        return ["docker", "compose"]
+    except (subprocess.CalledProcessError, FileNotFoundError):
+        try:
+            subprocess.check_call(
+                ["docker-compose", "--version"],
+                stdout=subprocess.DEVNULL,
+                stderr=subprocess.DEVNULL,
+            )
+            return ["docker-compose"]
+        except (subprocess.CalledProcessError, FileNotFoundError):
+            raise ValueError(
+                "Neither 'docker compose' nor 'docker-compose'"
+                " commands are available. Please install the Docker"
+                " server following the instructions for your operating"
+                " system at https://docs.docker.com/engine/install/"
+            )


 def main() -> None:
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -106,7 +106,7 @@ pyspark = {version = "^3.4.0", optional = true}
 clarifai = {version = "9.1.0", optional = true}
 tigrisdb = {version = "^1.0.0b6", optional = true}
 nebula3-python = {version = "^3.4.0", optional = true}
-langchainplus-sdk = ">=0.0.17"
+langsmith = {version = ">=0.0.17", optional = true}
 awadb = {version = "^0.3.3", optional = true}
 azure-search-documents = {version = "11.4.0a20230509004", source = "azure-sdk-dev", optional = true}
 openllm = {version = ">=0.1.6", optional = true}
@@ -333,7 +333,8 @@ extended_testing = [
 "scikit-learn",
 "streamlit",
 "pyspark",
- "openai"
+ "openai",
+ "langsmith"
 ]

 [[tool.poetry.source]]
--- a/tests/unit_tests/callbacks/tracers/test_langchain_v1.py
+++ b/tests/unit_tests/callbacks/tracers/test_langchain_v1.py
@@ -583,7 +583,7 @@ def test_convert_run(
        child_execution_order=1,
        start_time=datetime.utcnow(),
        end_time=datetime.utcnow(),
-        session_id=TEST_SESSION_ID,
+        session_id=uuid4(),
        inputs={"prompts": []},
        outputs=LLMResult(generations=[[]]).dict(),
        serialized={},
--- a/tests/unit_tests/client/test_runner_utils.py
+++ b/tests/unit_tests/client/test_runner_utils.py
@@ -5,8 +5,6 @@ from typing import Any, Dict, List, Optional, Union
 from unittest import mock

 import pytest
-from langchainplus_sdk.client import LangChainPlusClient
-from langchainplus_sdk.schemas import Dataset, Example

 from langchain.base_language import BaseLanguageModel
 from langchain.chains.base import Chain
@@ -104,8 +102,12 @@ def test_run_chat_model_all_formats(inputs: Dict[str, Any]) -> None:
    run_llm(llm, inputs, mock.MagicMock())


+@pytest.mark.requires("langsmith")
@pytest.mark.asyncio
 async def test_arun_on_dataset(monkeypatch: pytest.MonkeyPatch) -> None:
+    from langsmith import Client as LangSmithClient
+    from langsmith.schemas import Dataset, Example
+
    dataset = Dataset(
        id=uuid.uuid4(),
        name="test",
@@ -169,8 +171,8 @@ async def test_arun_on_dataset(monkeypatch: pytest.MonkeyPatch) -> None:
        example: Example,
        llm_or_chain: Union[BaseLanguageModel, Chain],
        n_repetitions: int,
+        tracer: Any,
        tags: Optional[List[str]] = None,
-        callbacks: Optional[Any] = None,
    ) -> List[Dict[str, Any]]:
        return [
            {"result": f"Result for example {example.id}"} for _ in range(n_repetitions)
@@ -180,15 +182,15 @@ async def test_arun_on_dataset(monkeypatch: pytest.MonkeyPatch) -> None:
        pass

    with mock.patch.object(
-        LangChainPlusClient, "read_dataset", new=mock_read_dataset
+        LangSmithClient, "read_dataset", new=mock_read_dataset
    ), mock.patch.object(
-        LangChainPlusClient, "list_examples", new=mock_list_examples
+        LangSmithClient, "list_examples", new=mock_list_examples
    ), mock.patch(
        "langchain.client.runner_utils._arun_llm_or_chain", new=mock_arun_chain
    ), mock.patch.object(
-        LangChainPlusClient, "create_project", new=mock_create_project
+        LangSmithClient, "create_project", new=mock_create_project
    ):
-        client = LangChainPlusClient(api_url="http://localhost:1984", api_key="123")
+        client = LangSmithClient(api_url="http://localhost:1984", api_key="123")
        chain = mock.MagicMock()
        num_repetitions = 3
        results = await arun_on_dataset(
--- a/tests/unit_tests/evaluation/agents/init.py
+++ b/tests/unit_tests/evaluation/agents/init.py
--- a/tests/unit_tests/evaluation/agents/test_eval_chain.py
+++ b/tests/unit_tests/evaluation/agents/test_eval_chain.py
@@ -1,113 +0,0 @@
-"""Test agent trajectory evaluation chain."""
-
-from typing import List, Tuple
-
-import pytest
-
-from langchain.evaluation.agents.trajectory_eval_chain import TrajectoryEvalChain
-from langchain.schema import AgentAction
-from langchain.tools.base import tool
-from tests.unit_tests.llms.fake_llm import FakeLLM
-
-
-@pytest.fixture
-def intermediate_steps() -> List[Tuple[AgentAction, str]]:
-    return [
-        (
-            AgentAction(
-                tool="Foo",
-                tool_input="Bar",
-                log="Star date 2021-06-13: Foo received input: Bar",
-            ),
-            "Baz",
-        ),
-    ]
-
-
-@tool
-def foo(bar: str) -> str:
-    """Foo."""
-    return bar
-
-
-def test_trajectory_eval_chain(
-    intermediate_steps: List[Tuple[AgentAction, str]]
-) -> None:
-    llm = FakeLLM(
-        queries={
-            "a": "Trajectory good\nScore: 5",
-            "b": "Trajectory not good\nScore: 1",
-        },
-        sequential_responses=True,
-    )
-    chain = TrajectoryEvalChain.from_llm(llm=llm, agent_tools=[foo])  # type: ignore
-    # Test when ref is not provided
-    res = chain.evaluate_agent_trajectory(
-        input="What is your favorite food?",
-        agent_trajectory=intermediate_steps,
-        output="I like pie.",
-    )
-    assert res["score"] == 5
-    # Test when ref is provided
-    res = chain.evaluate_agent_trajectory(
-        input="What is your favorite food?",
-        agent_trajectory=intermediate_steps,
-        output="I like pie.",
-        reference="Paris",
-    )
-    assert res["score"] == 1
-
-
-def test_trajectory_eval_chain_no_tools(
-    intermediate_steps: List[Tuple[AgentAction, str]]
-) -> None:
-    llm = FakeLLM(
-        queries={
-            "a": "Trajectory good\nScore: 5",
-            "b": "Trajectory not good\nScore: 1",
-        },
-        sequential_responses=True,
-    )
-    chain = TrajectoryEvalChain.from_llm(llm=llm)  # type: ignore
-    res = chain.evaluate_agent_trajectory(
-        input="What is your favorite food?",
-        agent_trajectory=intermediate_steps,
-        output="I like pie.",
-    )
-    assert res["score"] == 5
-    res = chain.evaluate_agent_trajectory(
-        input="What is your favorite food?",
-        agent_trajectory=intermediate_steps,
-        output="I like pie.",
-        reference="Paris",
-    )
-    assert res["score"] == 1
-
-
-def test_old_api_works(intermediate_steps: List[Tuple[AgentAction, str]]) -> None:
-    llm = FakeLLM(
-        queries={
-            "a": "Trajectory good\nScore: 5",
-            "b": "Trajectory not good\nScore: 1",
-        },
-        sequential_responses=True,
-    )
-    chain = TrajectoryEvalChain.from_llm(llm=llm)  # type: ignore
-    res = chain(
-        {
-            "question": "What is your favorite food?",
-            "agent_trajectory": intermediate_steps,
-            "answer": "I like pie.",
-        }
-    )
-    assert res["score"] == 5
-
-    res = chain(
-        {
-            "question": "What is your favorite food?",
-            "agent_trajectory": intermediate_steps,
-            "answer": "I like pie.",
-            "reference": "Paris",
-        }
-    )
-    assert res["score"] == 1
--- a/tests/unit_tests/evaluation/comparison/init.py
+++ b/tests/unit_tests/evaluation/comparison/init.py
--- a/tests/unit_tests/evaluation/comparison/test_eval_chain.py
+++ b/tests/unit_tests/evaluation/comparison/test_eval_chain.py
@@ -1,39 +0,0 @@
-"""Test the comparison chains."""
-
-
-from langchain.evaluation.comparison.eval_chain import PairwiseStringEvalChain
-from tests.unit_tests.llms.fake_llm import FakeLLM
-
-
-def test_pairwise_string_comparison_chain() -> None:
-    llm = FakeLLM(
-        queries={
-            "a": "The values are the same.\n[[C]]",
-            "b": "A is clearly better than b.\n[[A]]",
-            "c": "B is clearly better than a.\n[[B]]",
-        },
-        sequential_responses=True,
-    )
-    chain = PairwiseStringEvalChain.from_llm(llm=llm)
-    res = chain.evaluate_string_pairs(
-        output_a="I like pie.",
-        output_b="I love pie.",
-        input="What is your favorite food?",
-    )
-    assert res["value"] is None
-    assert res["score"] == 0.5
-    assert res["reasoning"] == "The values are the same."
-    res = chain.evaluate_string_pairs(
-        output_a="I like pie.",
-        output_b="I like pie.",
-        input="What is your favorite food?",
-    )
-    assert res["value"] == "A"
-    assert res["score"] == 1
-    res = chain.evaluate_string_pairs(
-        output_a="I like pie.",
-        output_b="I hate pie.",
-        input="What is your favorite food?",
-    )
-    assert res["value"] == "B"
-    assert res["score"] == 0
--- a/tests/unit_tests/evaluation/criteria/test_eval_chain.py
+++ b/tests/unit_tests/evaluation/criteria/test_eval_chain.py
@@ -2,7 +2,7 @@


 from langchain.evaluation.criteria.eval_chain import (
-    _SUPPORTED_CRITERIA,
+    HELPFULNESS_CRITERION,
    CriteriaEvalChain,
 )
 from langchain.evaluation.schema import StringEvaluator
@@ -10,12 +10,8 @@ from tests.unit_tests.llms.fake_llm import FakeLLM


 def test_resolve_criteria() -> None:
-    assert CriteriaEvalChain.resolve_criteria("helpfulness") == {
-        "helpfulness": _SUPPORTED_CRITERIA["helpfulness"]
-    }
-    assert CriteriaEvalChain.resolve_criteria(["correctness"]) == {
-        "correctness": _SUPPORTED_CRITERIA["correctness"]
-    }
+    assert CriteriaEvalChain.resolve_criteria("helpfulness") == HELPFULNESS_CRITERION
+    assert CriteriaEvalChain.resolve_criteria(["helpfulness"]) == HELPFULNESS_CRITERION


 def test_criteria_eval_chain() -> None:
--- a/tests/unit_tests/evaluation/run_evaluators/test_implementations.py
+++ b/tests/unit_tests/evaluation/run_evaluators/test_implementations.py
@@ -1,16 +1,22 @@
 """Test run evaluator implementations basic functionality."""
+from __future__ import annotations

+from typing import TYPE_CHECKING
 from uuid import UUID

 import pytest
-from langchainplus_sdk.schemas import Example, Run

 from langchain.evaluation.run_evaluators import get_criteria_evaluator, get_qa_evaluator
 from tests.unit_tests.llms.fake_llm import FakeLLM

+if TYPE_CHECKING:
+    from langsmith.schemas import Example, Run
+

@pytest.fixture
 def run() -> Run:
+    from langsmith.schemas import Run
+
    return Run(
        id=UUID("f77cd087-48f7-4c62-9e0e-297842202107"),
        name="My Run",
@@ -25,6 +31,8 @@ def run() -> Run:

@pytest.fixture
 def example() -> Example:
+    from langsmith.schemas import Example
+
    return Example(
        id=UUID("f77cd087-48f7-4c62-9e0e-297842202106"),
        dataset_id=UUID("f77cd087-48f7-4c62-9e0e-297842202105"),
@@ -34,6 +42,7 @@ def example() -> Example:
    )


+@pytest.mark.requires("langsmith")
 def test_get_qa_evaluator(run: Run, example: Example) -> None:
    """Test get_qa_evaluator."""
    eval_llm = FakeLLM(
@@ -45,6 +54,7 @@ def test_get_qa_evaluator(run: Run, example: Example) -> None:
    assert res.score == 1


+@pytest.mark.requires("langsmith")
 def test_get_criteria_evaluator(run: Run, example: Example) -> None:
    """Get a criteria evaluator."""
    eval_llm = FakeLLM(queries={"a": "This checks out.\nY"}, sequential_responses=True)
--- a/tests/unit_tests/test_dependencies.py
+++ b/tests/unit_tests/test_dependencies.py
@@ -38,7 +38,6 @@ def test_required_dependencies(poetry_conf: Mapping[str, Any]) -> None:
        "aiohttp",
        "async-timeout",
        "dataclasses-json",
-        "langchainplus-sdk",
        "numexpr",
        "numpy",
        "openapi-schema-pydantic",