Compare commits

..

146 Commits

Author SHA1 Message Date
William Fu-Hinthorn
75fd543b33 Warn if reference passed but evaluator doesn't require it 2023-07-03 16:44:22 -07:00
William FH
04001ff077 Log errors (#7105)
Re-add change that was inadvertently undone in #6995
2023-07-03 14:47:32 -07:00
William FH
3f9744c9f4 Accept no 'reasoning' response in qa evaluator (#7107)
Re add since #6995 inadvertently undid #7031
2023-07-03 14:47:17 -07:00
Bagatur
fd3f8efec7 fix retriever signatures (#7097) 2023-07-03 14:21:36 -06:00
Nicolas
490fcf9d98 docs: New experimental UI for Mendable Search (#6558)
This PR introduces a new Mendable UI tailored to a better search
experience.

We're more closely integrating our traditional search with our AI
generation.
With this change, you won't have to tab back and forth between the
mendable bot and the keyword search. Both types of search are handled in
the same bar. This should make the docs easier to navigate. while still
letting users get code generations or AI-summarized answers if they so
wish. Also, it should reduce the cost.

Would love to hear your feedback :)

Cc: @dev2049 @hwchase17
2023-07-03 20:52:13 +01:00
Nuno Campos
c8f8b1b327 Add events to tracer runs (#7090)
<!-- Thank you for contributing to LangChain!

Replace this comment with:
  - Description: a description of the change, 
  - Issue: the issue # it fixes (if applicable),
  - Dependencies: any dependencies required for this change,
- Tag maintainer: for a quicker response, tag the relevant maintainer
(see below),
- Twitter handle: we announce bigger features on Twitter. If your PR
gets announced and you'd like a mention, we'll gladly shout you out!

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
  2. an example notebook showing its use.

Maintainer responsibilities:
  - General / Misc / if you don't know who to tag: @dev2049
  - DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev
  - Models / Prompts: @hwchase17, @dev2049
  - Memory: @hwchase17
  - Agents / Tools / Toolkits: @vowelparrot
  - Tracing / Callbacks: @agola11
  - Async: @agola11

If no one reviews your PR within a few days, feel free to @-mention the
same people again.

See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
 -->
2023-07-03 12:43:43 -07:00
genewoo
e49abd1277 Add Metal support to llama.cpp doc (#7092)
- Description: Add Metal support to llama.cpp doc
  - Issue: #7091 
  - Dependencies: N/A
  - Twitter handle: gene_wu
2023-07-03 13:35:39 -06:00
Bagatur
fad2c7e5e0 update pr tmpl (#7095) 2023-07-03 13:34:03 -06:00
Nuno Campos
98dbea6310 Add tags to all callback handler methods (#7073)
<!-- Thank you for contributing to LangChain!

Replace this comment with:
  - Description: a description of the change, 
  - Issue: the issue # it fixes (if applicable),
  - Dependencies: any dependencies required for this change,
- Tag maintainer: for a quicker response, tag the relevant maintainer
(see below),
- Twitter handle: we announce bigger features on Twitter. If your PR
gets announced and you'd like a mention, we'll gladly shout you out!

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
  2. an example notebook showing its use.

Maintainer responsibilities:
  - General / Misc / if you don't know who to tag: @dev2049
  - DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev
  - Models / Prompts: @hwchase17, @dev2049
  - Memory: @hwchase17
  - Agents / Tools / Toolkits: @vowelparrot
  - Tracing / Callbacks: @agola11
  - Async: @agola11

If no one reviews your PR within a few days, feel free to @-mention the
same people again.

See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
 -->
2023-07-03 10:39:46 -07:00
Mike Salvatore
d0c7f7c317 Remove None default value for FAISS relevance_score_fn (#7085)
## Description

The type hint for `FAISS.__init__()`'s `relevance_score_fn` parameter
allowed the parameter to be set to `None`. However, a default function
is provided by the constructor. This led to an unnecessary check in the
code, as well as a test to verify this check.

**ASSUMPTION**: There's no reason to ever set `relevance_score_fn` to
`None`.

This PR changes the type hint and removes the unnecessary code.
2023-07-03 10:11:49 -06:00
Bagatur
719316e84c bump 222 (#7086) 2023-07-03 10:03:55 -06:00
rjarun8
e2d61ab85a Add SpacyEmbeddings class (#6967)
- Description: Added a new SpacyEmbeddings class for generating
embeddings using the Spacy library.
- Issue: Sentencebert/Bert/Spacy/Doc2vec embedding support #6952
- Dependencies: This change requires the Spacy library and the
'en_core_web_sm' Spacy model.
- Tag maintainer: @dev2049
- Twitter handle: N/A

This change includes a new SpacyEmbeddings class, but does not include a
test or an example notebook.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-07-03 09:38:31 -06:00
Leonid Ganeline
16fbd528c5 docs: commented out editUrl option (#6440) 2023-07-03 07:59:11 -07:00
adam91holt
80e86b602e Remove duplicate mongodb integration doc (#7006) 2023-07-03 02:23:33 -06:00
joaomsimoes
c669d98693 Update get_started.mdx (#7005)
typo in chat = ChatOpenAI(open_api_key="...") should be openai_api_key
2023-07-03 02:23:12 -06:00
Bagatur
1cdb33a090 openapi chain nit (#7012) 2023-07-03 02:22:53 -06:00
Johnny Lim
a081e419a0 Fix sample in FAISS section (#7050)
This PR fixes a sample in the FAISS section in the reference docs.
2023-07-03 02:18:32 -06:00
Ikko Eltociear Ashimine
be93775ebc Fix typo in google_places_api.py (#7055) 2023-07-03 02:14:18 -06:00
Harrison Chase
60b05511d3 move base prompt to schema (#6995)
Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-07-02 22:38:59 -04:00
Leonid Ganeline
200be43da6 added Brave Search document_loader (#6989)
- Added `Brave Search` document loader.
- Refactored BraveSearch wrapper
- Added a Jupyter Notebook example
- Added `Ecosystem/Integrations` BraveSearch page 

Please review:
  - DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev
2023-07-02 19:01:24 -07:00
Sergey Kozlov
6d15854cda Add JSON Lines support to JSONLoader (#6913)
**Description**:

The JSON Lines format is used by some services such as OpenAI and
HuggingFace. It's also a convenient alternative to CSV.

This PR adds JSON Lines support to `JSONLoader` and also updates related
tests.

**Tag maintainer**: @rlancemartin, @eyurtsev.

PS I was not able to build docs locally so didn't update related
section.
2023-07-02 12:32:41 -07:00
Ofer Mendelevitch
153b56d19b Vectara upd2 (#6506)
Update to Vectara integration 
- By user request added "add_files" to take advantage of Vectara
capabilities to process files on the backend, without the need for
separate loading of documents and chunking in the chain.
- Updated vectara.ipynb example notebook to be broader and added testing
of add_file()
 
  @hwchase17 - project lead

---------

Co-authored-by: rlm <pexpresss31@gmail.com>
2023-07-02 12:15:50 -07:00
Leonid Ganeline
1feac83323 docstrings document_loaders 2 (#6890)
updated docstring for the `document_loaders`

Maintainer responsibilities:
  - DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev
2023-07-02 12:14:22 -07:00
Leonid Ganeline
77ae8084a0 docstrings document_loaders 1 (#6847)
- Updated docstrings in `document_loaders`
- several code fixes.
- added `docs/extras/ecosystem/integrations/airtable.md`

@rlancemartin, @eyurtsev
2023-07-02 12:13:04 -07:00
0xcha05
e41b382e1c Added filter and delete all option to delete function in Pinecone integration, updated base VectorStore's delete function (#6876)
### Description:
Updated the delete function in the Pinecone integration to allow for
deletion of vectors by specifying a filter condition, and to delete all
vectors in a namespace.

Made the ids parameter optional in the delete function in the base
VectorStore class and allowed for additional keyword arguments.

Updated the delete function in several classes (Redis, Chroma, Supabase,
Deeplake, Elastic, Weaviate, and Cassandra) to match the changes made in
the base VectorStore class. This involved making the ids parameter
optional and allowing for additional keyword arguments.
2023-07-02 11:46:19 -07:00
Bagatur
5a45363954 bump 221 (#7047) 2023-07-02 08:32:15 -06:00
Bagatur
7acd524210 Rm retriever kwargs (#7013)
Doesn't actually limit the Retriever interface but hopefully in practice
it does
2023-07-02 08:22:24 -06:00
Johnny Lim
9dc77614e3 Polish reference docs (#7045)
This PR fixes broken links in the reference docs.
2023-07-02 08:08:51 -06:00
skspark
e5f6f0ffc4 Support params on GoogleSearchApiWrapper (#6810) (#7014)
## Description
Support search params in GoogleSearchApiWrapper's result call, for the
extra filtering on search,
to support extra query parameters that google cse provides:

https://developers.google.com/custom-search/v1/reference/rest/v1/cse/list?hl=ko

## Issue
#6810
2023-07-02 01:18:38 -06:00
Johnny Lim
052c797429 Fix typo (#7023)
This PR fixes a typo.
2023-07-02 01:17:30 -06:00
Alex Iribarren
dc2264619a Fix openai multi functions agent docs (#7028) 2023-07-02 01:16:40 -06:00
William FH
6a64870ea0 Accept no 'reasoning' response in qa evaluator (#7030) 2023-07-01 12:46:19 -07:00
William FH
7ebb76a5fa Log Errors in Evaluator Callback (#7031) 2023-07-01 12:10:00 -07:00
Stefano Lottini
8d2281a8ca Second Attempt - Add concurrent insertion of vector rows in the Cassandra Vector Store (#7017)
Retrying with the same improvements as in #6772, this time trying not to
mess up with branches.

@rlancemartin doing a fresh new PR from a branch with a new name. This
should do. Thank you for your help!

---------

Co-authored-by: Jonathan Ellis <jbellis@datastax.com>
Co-authored-by: rlm <pexpresss31@gmail.com>
2023-07-01 11:09:52 -07:00
Harrison Chase
3bfe7cf467 Harrison/split schema dir (#7025)
should be no functional changes

also keep __init__ exposing a lot for backwards compat

---------

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-07-01 13:39:19 -04:00
Davis Chase
556c425042 Improve docstrings for langchain.schema.py (#6802)
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-07-01 09:46:52 -07:00
Matt Robinson
0498dad562 feat: enable UnstructuredEmailLoader to process attachments (#6977)
### Summary

Updates `UnstructuredEmailLoader` so that it can process attachments in
addition to the e-mail content. The loader will process attachments if
the `process_attachments` kwarg is passed when the loader is
instantiated.

### Testing

```python

file_path = "fake-email-attachment.eml"
loader = UnstructuredEmailLoader(
    file_path, mode="elements", process_attachments=True
)
docs = loader.load()
docs[-1]
```

### Reviewers

-  @rlancemartin 
-  @eyurtsev
- @hwchase17
2023-07-01 06:09:26 -07:00
Matthew Foster Walsh
59697b406d Fix typo in quickstart.mdx (#6985)
Removed an extra "to" from a sentence. @dev2049 very minor documentation
fix.
2023-07-01 02:53:52 -06:00
Paul Grillenberger
aa37b10b28 Fix: Correct typo (#6988)
Description: Correct a minor typo in the docs. @dev2049
2023-07-01 02:53:34 -06:00
Zander Chase
b0859c9b18 Add New Retriever Interface with Callbacks (#5962)
Handle the new retriever events in a way that (I think) is entirely
backwards compatible? Needs more testing for some of the chain changes
and all.

This creates an entire new run type, however. We could also just treat
this as an event within a chain run presumably (same with memory)

Adds a subclass initializer that upgrades old retriever implementations
to the new schema, along with tests to ensure they work.

First commit doesn't upgrade any of our retriever implementations (to
show that we can pass the tests along with additional ones testing the
upgrade logic).

Second commit upgrades the known universe of retrievers in langchain.

- [X] Add callback handling methods for retriever start/end/error (open
to renaming to 'retrieval' if you want that)
- [X] Update BaseRetriever schema to support callbacks
- [X] Tests for upgrading old "v1" retrievers for backwards
compatibility
- [X] Update existing retriever implementations to implement the new
interface
- [X] Update calls within chains to .{a]get_relevant_documents to pass
the child callback manager
- [X] Update the notebooks/docs to reflect the new interface
- [X] Test notebooks thoroughly


Not handled:
- Memory pass throughs: retrieval memory doesn't have a parent callback
manager passed through the method

---------

Co-authored-by: Nuno Campos <nuno@boringbits.io>
Co-authored-by: William Fu-Hinthorn <13333726+hinthornw@users.noreply.github.com>
2023-06-30 14:44:03 -07:00
William FH
a5b206caf3 Remove Promptlayer Notebook (#6996)
It's breaking our docs build
2023-06-30 14:30:24 -07:00
Daniel Chalef
b26cca8008 Zep Authentication (#6728)
## Description: Add Zep API Key argument to ZepChatMessageHistory and
ZepRetriever
- correct docs site links
- add zep api_key auth to constructors

ZepChatMessageHistory: @hwchase17, 
ZepRetriever: @rlancemartin, @eyurtsev
2023-06-30 14:24:26 -07:00
William FH
e4625846e5 Add Flyte Callback Handler (#6139) (#6986)
Signed-off-by: Samhita Alla <aallasamhita@gmail.com>
Co-authored-by: Samhita Alla <aallasamhita@gmail.com>
2023-06-30 12:25:22 -07:00
Bagatur
e3b7effc8f Beef up import test (#6979) 2023-06-30 09:26:05 -07:00
Bagatur
1ce9ef3828 Rm pytz dep (#6978) 2023-06-30 09:24:01 -07:00
Davis Chase
eb180e321f Page per class-style api reference (#6560)
can make it prettier, but what do we think of overall structure?

https://api.python.langchain.com/en/dev2049-page_per_class/api_ref.html

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Nuno Campos <nuno@boringbits.io>
2023-06-30 09:23:32 -07:00
William FH
64039b9f11 Promptlayer Callback (#6975)
Co-authored-by: Saleh Hindi <saleh.hindi.one@gmail.com>
Co-authored-by: jped <jonathanped@gmail.com>
2023-06-30 08:32:42 -07:00
William FH
13c62cf6b1 Arthur Callback (#6972)
Co-authored-by: Max Cembalest <115359769+arthuractivemodeling@users.noreply.github.com>
2023-06-30 07:48:02 -07:00
William FH
8c73037dff Simplify eval arg names (#6944)
It'll be easier to switch between these if the names of predictions are
consistent
2023-06-30 07:47:53 -07:00
Bagatur
8f5eca236f release v220 (#6962) 2023-06-30 06:52:09 -07:00
Bagatur
60b0d6ea35 Bagatur/openllm ensure available (#6960)
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
Co-authored-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-06-30 00:54:23 -07:00
Siraj Aizlewood
521c6f0233 Provided default values for tags and inheritable_tags args in BaseRun… (#6858)
when running AsyncCallbackManagerForChainRun (from
langchain.callbacks.manager import AsyncCallbackManagerForChainRun),
provided default values for tags and inheritable_tages of empty lists in
manager.py BaseRunManager.


- Description: In manager.py, `BaseRunManager`, default values were
provided for the `__init__` args `tags` and `inheritable_tags`. They
default to empty lists (`[]`).
- Issue: When trying to use Nvidia NeMo Guardrails with LangChain, the
following exception was raised:
2023-06-29 22:01:08 -07:00
Davis Chase
bd6a0ee9e9 Redirect vecstores (#6948) 2023-06-29 19:22:21 -07:00
Davis Chase
f780678910 Add back in clickhouse mongo vecstore notebooks (#6949) 2023-06-29 19:21:47 -07:00
Jacob Lee
73831ef3d8 Change code block color scheme (#6945)
Adds contrast, makes code blocks more readable.
2023-06-29 19:21:11 -07:00
Tahjyei Thompson
7d8830f707 Add OpenAIMultiFunctionsAgent to import list in agents directory (#6824)
- Added OpenAIMultiFunctionsAgent to the import list of the Agents
directory

---------

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-06-29 18:34:26 -07:00
Matt Florence
0f6737735d Order messages in PostgresChatMessageHistory (#6830)
Fixes issue: https://github.com/hwchase17/langchain/issues/6829

This guarantees message history is in the correct order. 

---------

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-06-29 18:10:28 -07:00
lucasiscovici
e9950392dd Add password to PyPDR loader and parser (#6908)
Add password to PyPDR loader and parser

---------

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-06-29 17:35:50 -07:00
Zander Chase
429f4dbe4d Add Input Mapper in run_on_dataset (#6894)
If you create a dataset from runs and run the same chain or llm on it
later, it usually works great.

If you have an agent dataset and want to run a different agent on it, or
have more complex schema, it's hard for us to automatically map these
values every time. This PR lets you pass in an input_mapper function
that converts the example inputs to whatever format your model expects
2023-06-29 16:53:49 -07:00
Lei Pan
76d03f398d support max_chunk_bytes in OpensearchVectorSearch to pass down to bulk (#6855)
Support `max_chunk_bytes` kwargs to pass down to `buik` helper, in order
to support the request limits in Opensearch locally and in AWS.

@rlancemartin, @eyurtsev
2023-06-29 15:50:08 -07:00
Hashem Alsaket
5861770a53 Updated QA notebook (#6801)
Description: `all_metadatas` was not defined, `OpenAIEmbeddings` was not
imported,
Issue: #6723 the issue # it fixes (if applicable),
Dependencies: lark,
Tag maintainer: @vowelparrot , @dev2049

---------

Co-authored-by: rlm <pexpresss31@gmail.com>
2023-06-29 15:41:53 -07:00
Kacper Łukawski
140ba682f1 Support named vectors in Qdrant (#6871)
# Description

This PR makes it possible to use named vectors from Qdrant in Langchain.
That was requested multiple times, as people want to reuse externally
created collections in Langchain. It doesn't change anything for the
existing applications. The changes were covered with some integration
tests and included in the docs.

## Example

```python
Qdrant.from_documents(
    docs,
    embeddings,
    location=":memory:",
    collection_name="my_documents",
    vector_name="custom_vector",
)
```

### Issue: #2594 

Tagging @rlancemartin & @eyurtsev. I'd appreciate your review.
2023-06-29 15:14:22 -07:00
bradcrossen
9ca1cf003c Re-add Support for SQLAlchemy <1.4 (#6895)
Support for SQLAlchemy 1.3 was removed in version 0.0.203 by change
#6086. Re-adding support.

- Description: Imports SQLAlchemy Row at class creation time instead of
at init to support SQLAlchemy <1.4. This is the only breaking change and
was introduced in version 0.0.203 #6086.
  
A similar change was merged before:
https://github.com/hwchase17/langchain/pull/4647
  
  - Dependencies: Reduces SQLAlchemy dependency to > 1.3
  - Tag maintainer: @rlancemartin, @eyurtsev, @hwchase17, @wangxuqi

---------

Co-authored-by: rlm <pexpresss31@gmail.com>
2023-06-29 14:49:35 -07:00
corranmac
20c6ade2fc Grobid parser for Scientific Articles from PDF (#6729)
### Scientific Article PDF Parsing via Grobid

`Description:`
This change adds the GrobidParser class, which uses the Grobid library
to parse scientific articles into a universal XML format containing the
article title, references, sections, section text etc. The GrobidParser
uses a local Grobid server to return PDFs document as XML and parses the
XML to optionally produce documents of individual sentences or of whole
paragraphs. Metadata includes the text, paragraph number, pdf relative
bboxes, pages (text may overlap over two pages), section title
(Introduction, Methodology etc), section_number (i.e 1.1, 2.3), the
title of the paper and finally the file path.
      
Grobid parsing is useful beyond standard pdf parsing as it accurately
outputs sections and paragraphs within them. This allows for
post-fitering of results for specific sections i.e. limiting results to
the methodology section or results. While sections are split via
headings, ideally they could be classified specifically into
introduction, methodology, results, discussion, conclusion. I'm
currently experimenting with chatgpt-3.5 for this function, which could
later be implemented as a textsplitter.

`Dependencies:`
For use, the grobid repo must be cloned and Java must be installed, for
colab this is:

```
!apt-get install -y openjdk-11-jdk -q
!update-alternatives --set java /usr/lib/jvm/java-11-openjdk-amd64/bin/java
!git clone https://github.com/kermitt2/grobid.git
os.environ["JAVA_HOME"] = "/usr/lib/jvm/java-11-openjdk-amd64"
os.chdir('grobid')
!./gradlew clean install
```

Once installed the server is ran on localhost:8070 via
```
get_ipython().system_raw('nohup ./gradlew run > grobid.log 2>&1 &')
```

@rlancemartin, @eyurtsev

Twitter Handle: @Corranmac

Grobid Demo Notebook is
[here](https://colab.research.google.com/drive/1X-St_mQRmmm8YWtct_tcJNtoktbdGBmd?usp=sharing).

---------

Co-authored-by: rlm <pexpresss31@gmail.com>
2023-06-29 14:29:29 -07:00
Baichuan Sun
6157bdf9d9 Add API Header for Amazon API Gateway Authentication (#6902)
Add API Headers support for Amazon API Gateway to enable Authentication
using DynamoDB.

<!-- Thank you for contributing to LangChain!

Replace this comment with:
  - Description: a description of the change, 
  - Issue: the issue # it fixes (if applicable),
  - Dependencies: any dependencies required for this change,
- Tag maintainer: for a quicker response, tag the relevant maintainer
(see below),
- Twitter handle: we announce bigger features on Twitter. If your PR
gets announced and you'd like a mention, we'll gladly shout you out!

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
  2. an example notebook showing its use.

Maintainer responsibilities:
  - General / Misc / if you don't know who to tag: @dev2049
  - DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev
  - Models / Prompts: @hwchase17, @dev2049
  - Memory: @hwchase17
  - Agents / Tools / Toolkits: @vowelparrot
  - Tracing / Callbacks: @agola11
  - Async: @agola11

If no one reviews your PR within a few days, feel free to @-mention the
same people again.

See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
 -->
2023-06-29 12:58:07 -07:00
Wey Gu
1c66aa6d56 chore: NebulaGraph prompt optmization (#6904)
Was preparing for a demo project of NebulaGraphQAChain to find out the
prompt needed to be optimized a little bit.

Please @hwchase17 kindly help review.

Thanks!
2023-06-29 12:57:39 -07:00
Harrison Chase
0ba175e13f move octo notebook (#6901) 2023-06-29 12:20:55 -07:00
Stefano Lottini
75fb9d2fdc Cassandra support for chat history using CassIO library (#6771)
### Overview

This PR aims at building on #4378, expanding the capabilities and
building on top of the `cassIO` library to interface with the database
(as opposed to using the core drivers directly).

Usage of `cassIO` (a library abstracting Cassandra access for
ML/GenAI-specific purposes) is already established since #6426 was
merged, so no new dependencies are introduced.

In the same spirit, we try to uniform the interface for using Cassandra
instances throughout LangChain: all our appreciation of the work by
@jj701 notwithstanding, who paved the way for this incremental work
(thank you!), we identified a few reasons for changing the way a
`CassandraChatMessageHistory` is instantiated. Advocating a syntax
change is something we don't take lighthearted way, so we add some
explanations about this below.

Additionally, this PR expands on integration testing, enables use of
Cassandra's native Time-to-Live (TTL) features and improves the phrasing
around the notebook example and the short "integrations" documentation
paragraph.

We would kindly request @hwchase to review (since this is an elaboration
and proposed improvement of #4378 who had the same reviewer).

### About the __init__ breaking changes

There are
[many](https://docs.datastax.com/en/developer/python-driver/3.28/api/cassandra/cluster/)
options when creating the `Cluster` object, and new ones might be added
at any time. Choosing some of them and exposing them as `__init__`
parameters `CassandraChatMessageHistory` will prove to be insufficient
for at least some users.

On the other hand, working through `kwargs` or adding a long, long list
of arguments to `__init__` is not a desirable option either. For this
reason, (as done in #6426), we propose that whoever instantiates the
Chat Message History class provide a Cassandra `Session` object, ready
to use. This also enables easier injection of mocks and usage of
Cassandra-compatible connections (such as those to the cloud database
DataStax Astra DB, obtained with a different set of init parameters than
`contact_points` and `port`).

We feel that a breaking change might still be acceptable since LangChain
is at `0.*`. However, while maintaining that the approach we propose
will be more flexible in the future, room could be made for a
"compatibility layer" that respects the current init method. Honestly,
we would to that only if there are strong reasons for it, as that would
entail an additional maintenance burden.

### Other changes

We propose to remove the keyspace creation from the class code for two
reasons: first, production Cassandra instances often employ RBAC so that
the database user reading/writing from tables does not necessarily (and
generally shouldn't) have permission to create keyspaces, and second
that programmatic keyspace creation is not a best practice (it should be
done more or less manually, with extra care about schema mismatched
among nodes, etc). Removing this (usually unnecessary) operation from
the `__init__` path would also improve initialization performance
(shorter time).

We suggest, likewise, to remove the `__del__` method (which would close
the database connection), for the following reason: it is the
recommended best practice to create a single Cassandra `Session` object
throughout an application (it is a resource-heavy object capable to
handle concurrency internally), so in case Cassandra is used in other
ways by the app there is the risk of truncating the connection for all
usages when the history instance is destroyed. Moreover, the `Session`
object, in typical applications, is best left to garbage-collect itself
automatically.

As mentioned above, we defer the actual database I/O to the `cassIO`
library, which is designed to encode practices optimized for LLM
applications (among other) without the need to expose LangChain
developers to the internals of CQL (Cassandra Query Language). CassIO is
already employed by the LangChain's Vector Store support for Cassandra.

We added a few more connection options in the companion notebook example
(most notably, Astra DB) to encourage usage by anyone who cannot run
their own Cassandra cluster.

We surface the `ttl_seconds` option for automatic handling of an
expiration time to chat history messages, a likely useful feature given
that very old messages generally may lose their importance.

We elaborated a bit more on the integration testing (Time-to-live,
separation of "session ids", ...).

### Remarks from linter & co.

We reinstated `cassio` as a dependency both in the "optional" group and
in the "integration testing" group of `pyproject.toml`. This might not
be the right thing do to, in which case the author of this PR offer his
apologies (lack of confidence with Poetry - happy to be pointed in the
right direction, though!).

During linter tests, we were hit by some errors which appear unrelated
to the code in the PR. We left them here and report on them here for
awareness:

```
langchain/vectorstores/mongodb_atlas.py:137: error: Argument 1 to "insert_many" of "Collection" has incompatible type "List[Dict[str, Sequence[object]]]"; expected "Iterable[Union[MongoDBDocumentType, RawBSONDocument]]"  [arg-type]
langchain/vectorstores/mongodb_atlas.py:186: error: Argument 1 to "aggregate" of "Collection" has incompatible type "List[object]"; expected "Sequence[Mapping[str, Any]]"  [arg-type]

langchain/vectorstores/qdrant.py:16: error: Name "grpc" is not defined  [name-defined]
langchain/vectorstores/qdrant.py:19: error: Name "grpc" is not defined  [name-defined]
langchain/vectorstores/qdrant.py:20: error: Name "grpc" is not defined  [name-defined]
langchain/vectorstores/qdrant.py:22: error: Name "grpc" is not defined  [name-defined]
langchain/vectorstores/qdrant.py:23: error: Name "grpc" is not defined  [name-defined]
```

In the same spirit, we observe that to even get `import langchain` run,
it seems that a `pip install bs4` is missing from the minimal package
installation path.

Thank you!
2023-06-29 10:50:34 -07:00
Zander Chase
f5663603cf Throw error if evaluation key not present (#6874) 2023-06-29 10:30:39 -07:00
Zander Chase
be164b20d8 Accept any single input (#6888)
If I upload a dataset with a single input and output column, we should
be able to let the chain prepare the input without having to maintain a
strict dataset format.
2023-06-29 10:29:16 -07:00
Harrison Chase
8502117f62 bump version to 219 (#6899) 2023-06-28 23:48:42 -07:00
Pablo
6370808d41 Adding support for async (_acall) for VertexAICommon LLM (#5588)
# Adding support for async (_acall) for VertexAICommon LLM

This PR implements the `_acall` method under `_VertexAICommon`. Because
VertexAI itself does not provide an async interface, I implemented it
via a ThreadPoolExecutor that can delegate execution of VertexAI calls
to other threads.

Twitter handle: @polecitoem : )


## Who can review?

Community members can review the PR once tests pass. Tag
maintainers/contributors who might be interested:

fyi - @agola11 for async functionality
fyi - @Ark-kun from VertexAI
2023-06-28 23:07:41 -07:00
Mike Salvatore
cbd759aaeb Fix inconsistent logging_and_data_dir parameter in AwaDB (#6775)
## Description

Tag maintainer: @rlancemartin, @eyurtsev 

### log_and_data_dir
`AwaDB.__init__()` accepts a parameter named `log_and_data_dir`. But
`AwaDB.from_texts()` and `AwaDB.from_documents()` accept a parameter
named `logging_and_data_dir`. This inconsistency in this parameter name
can lead to confusion on the part of the caller.

This PR renames `logging_and_data_dir` to `log_and_data_dir` to make all
functions consistent with the constructor.

### embedding

`AwaDB.__init__()` accepts a parameter named `embedding_model`. But
`AwaDB.from_texts()` and `AwaDB.from_documents()` accept a parameter
named `embeddings`. This inconsistency in this parameter name can lead
to confusion on the part of the caller.

This PR renames `embedding_model` to `embeddings` to make AwaDB's
constructor consistent with the classmethod "constructors" as specified
by `VectorStore` abstract base class.
2023-06-28 23:06:52 -07:00
Harrison Chase
3ac08c3de4 Harrison/octo ml (#6897)
Co-authored-by: Bassem Yacoube <125713079+AI-Bassem@users.noreply.github.com>
Co-authored-by: Shotaro Kohama <khmshtr28@gmail.com>
Co-authored-by: Rian Dolphin <34861538+rian-dolphin@users.noreply.github.com>
Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
Co-authored-by: Shashank Deshpande <shashankdeshpande18@gmail.com>
2023-06-28 23:04:11 -07:00
Jiří Moravčík
a6b40b73e5 Add call_actor_task to the Apify integration (#6862)
A user has been testing the Apify integration inside langchain and he
was not able to run saved Actor tasks.

This PR adds support for calling saved Actor tasks on the Apify platform
to the existing integration. The structure of very similar to the one of
calling Actors.
2023-06-28 22:13:47 -07:00
Shashank Deshpande
99cfe192da added example notebook - use custom functions with openai agent (#6865)
<!-- Thank you for contributing to LangChain!

Replace this comment with:
  - Description: a description of the change, 
  - Issue: the issue # it fixes (if applicable),
  - Dependencies: any dependencies required for this change,
- Tag maintainer: for a quicker response, tag the relevant maintainer
(see below),
- Twitter handle: we announce bigger features on Twitter. If your PR
gets announced and you'd like a mention, we'll gladly shout you out!

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
  2. an example notebook showing its use.

Maintainer responsibilities:
  - General / Misc / if you don't know who to tag: @dev2049
  - DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev
  - Models / Prompts: @hwchase17, @dev2049
  - Memory: @hwchase17
  - Agents / Tools / Toolkits: @vowelparrot
  - Tracing / Callbacks: @agola11
  - Async: @agola11

If no one reviews your PR within a few days, feel free to @-mention the
same people again.

See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
 -->
2023-06-28 22:07:33 -07:00
Rian Dolphin
2e39ede848 add with score option for max marginal relevance (#6867)
### Adding the functionality to return the scores with retrieved
documents when using the max marginal relevance
- Description: Add the method
`max_marginal_relevance_search_with_score_by_vector` to the FAISS
wrapper. Functionality operates the same as
`similarity_search_with_score_by_vector` except for using the max
marginal relevance retrieval framework like is used in the
`max_marginal_relevance_search_by_vector` method.
  - Dependencies: None
  - Tag maintainer: @rlancemartin @eyurtsev 
  - Twitter handle: @RianDolphin

---------

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-06-28 22:00:34 -07:00
Shotaro Kohama
398e4cd2dc Update langchain.chains.create_extraction_chain_pydantic to parse results successfully (#6887)
<!-- Thank you for contributing to LangChain!

Replace this comment with:
  - Description: a description of the change, 
  - Issue: the issue # it fixes (if applicable),
  - Dependencies: any dependencies required for this change,
- Tag maintainer: for a quicker response, tag the relevant maintainer
(see below),
- Twitter handle: we announce bigger features on Twitter. If your PR
gets announced and you'd like a mention, we'll gladly shout you out!

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
  2. an example notebook showing its use.

Maintainer responsibilities:
  - General / Misc / if you don't know who to tag: @dev2049
  - DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev
  - Models / Prompts: @hwchase17, @dev2049
  - Memory: @hwchase17
  - Agents / Tools / Toolkits: @vowelparrot
  - Tracing / Callbacks: @agola11
  - Async: @agola11

If no one reviews your PR within a few days, feel free to @-mention the
same people again.

See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
 -->
 
- Description: 
- The current code uses `PydanticSchema.schema()` and
`_get_extraction_function` at the same time. As a result, a response
from OpenAI has two nested `info`, and
`PydanticAttrOutputFunctionsParser` fails to parse it. This PR will use
the pydantic class given as an arg instead.
- Issue: no related issue yet
- Dependencies: no dependency change
- Tag maintainer: @dev2049
- Twitter handle: @shotarok28
2023-06-28 21:57:41 -07:00
Eduard van Valkenburg
57f370cde9 PowerBI Toolkit additional logs (#6881)
Added some additional logs to better be able to troubleshoot and
understand the performance of the call to PBI vs the rest of the work.
2023-06-28 18:16:41 -07:00
Robert Lewis
c9c8d2599e Update Zapier Jupyter notebook to include brief OAuth example (#6892)
Description: Adds a brief example of using an OAuth access token with
the Zapier wrapper. Also links to the Zapier documentation to learn more
about OAuth flows.

---------

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-06-28 18:06:22 -07:00
Zhicheng Geng
16b11bda83 Use getLogger instead of basicConfig in multi_query.py (#6891)
Remove `logging.basicConfig`, which turns on logging. Use `getLogger`
instead
2023-06-28 18:06:10 -07:00
Davis Chase
f07dd02b50 Docs /redirects (#6790)
Auto-generated a bunch of redirects from initial docs refactor commit
2023-06-28 17:07:53 -07:00
Harrison Chase
e5611565b7 bump version to 218 (#6857) 2023-06-27 23:36:37 -07:00
Yaohui Wang
9d1bd18596 feat (documents): add LarkSuite document loader (#6420)
<!--
Thank you for contributing to LangChain! Your PR will appear in our
release under the title you set. Please make sure it highlights your
valuable contribution.

Replace this with a description of the change, the issue it fixes (if
applicable), and relevant context. List any dependencies required for
this change.

After you're done, someone will review your PR. They may suggest
improvements. If no one reviews your PR within a few days, feel free to
@-mention the same people again, as notifications can get lost.

Finally, we'd love to show appreciation for your contribution - if you'd
like us to shout you out on Twitter, please also include your handle!
-->

<!-- Remove if not applicable -->

### Summary

This PR adds a LarkSuite (FeiShu) document loader. 
> [LarkSuite](https://www.larksuite.com/) is an enterprise collaboration
platform developed by ByteDance.

### Tests

- an integration test case is added
- an example notebook showing usage is added. [Notebook
preview](https://github.com/yaohui-wyh/langchain/blob/master/docs/extras/modules/data_connection/document_loaders/integrations/larksuite.ipynb)

<!-- If you're adding a new integration, please include:

1. a test for the integration - favor unit tests that does not rely on
network access.
2. an example notebook showing its use


See contribution guidelines for more information on how to write tests,
lint
etc:


https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
-->

### Who can review?

- PTAL @eyurtsev @hwchase17

<!-- For a quicker response, figure out the right person to tag with @

  @hwchase17 - project lead

  Tracing / Callbacks
  - @agola11

  Async
  - @agola11

  DataLoaders
  - @eyurtsev

  Models
  - @hwchase17
  - @agola11

  Agents / Tools / Toolkits
  - @hwchase17

  VectorStores / Retrievers / Memory
  - @dev2049

 -->

---------

Co-authored-by: Yaohui Wang <wangyaohui.01@bytedance.com>
2023-06-27 23:08:05 -07:00
Jingsong Gao
a435a436c1 feat(document_loaders): add tencent cos directory and file loader (#6401)
<!--
Thank you for contributing to LangChain! Your PR will appear in our
release under the title you set. Please make sure it highlights your
valuable contribution.

Replace this with a description of the change, the issue it fixes (if
applicable), and relevant context. List any dependencies required for
this change.

After you're done, someone will review your PR. They may suggest
improvements. If no one reviews your PR within a few days, feel free to
@-mention the same people again, as notifications can get lost.

Finally, we'd love to show appreciation for your contribution - if you'd
like us to shout you out on Twitter, please also include your handle!
-->

<!-- Remove if not applicable -->

- add tencent cos directory and file support for document-loader

#### Before submitting

<!-- If you're adding a new integration, please include:

1. a test for the integration - favor unit tests that does not rely on
network access.
2. an example notebook showing its use


See contribution guidelines for more information on how to write tests,
lint
etc:


https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
-->

#### Who can review?

@eyurtsev
2023-06-27 23:07:20 -07:00
Ninely
d6cd0deaef feat: Add streaming only final aiter of agent (#6274)
<!--
Thank you for contributing to LangChain! Your PR will appear in our
release under the title you set. Please make sure it highlights your
valuable contribution.

Replace this with a description of the change, the issue it fixes (if
applicable), and relevant context. List any dependencies required for
this change.

After you're done, someone will review your PR. They may suggest
improvements. If no one reviews your PR within a few days, feel free to
@-mention the same people again, as notifications can get lost.

Finally, we'd love to show appreciation for your contribution - if you'd
like us to shout you out on Twitter, please also include your handle!
-->

<!-- Remove if not applicable -->

#### Add streaming only final async iterator of agent
This callback returns an async iterator and only streams the final
output of an agent.

<!-- If you're adding a new integration, please include:

1. a test for the integration - favor unit tests that does not rely on
network access.
2. an example notebook showing its use


See contribution guidelines for more information on how to write tests,
lint
etc:


https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
-->

#### Who can review?

Tag maintainers/contributors who might be interested: @agola11

<!-- For a quicker response, figure out the right person to tag with @

  @hwchase17 - project lead

  Tracing / Callbacks
  - @agola11

  Async
  - @agola11

  DataLoaders
  - @eyurtsev

  Models
  - @hwchase17
  - @agola11

  Agents / Tools / Toolkits
  - @hwchase17

  VectorStores / Retrievers / Memory
  - @dev2049

 -->
2023-06-27 23:06:25 -07:00
Shashank Deshpande
1db266b20d Update link in apis.mdx (#6812)
<!-- Thank you for contributing to LangChain!

Replace this comment with:
  - Description: a description of the change, 
  - Issue: the issue # it fixes (if applicable),
  - Dependencies: any dependencies required for this change,
- Tag maintainer: for a quicker response, tag the relevant maintainer
(see below),
- Twitter handle: we announce bigger features on Twitter. If your PR
gets announced and you'd like a mention, we'll gladly shout you out!

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
  2. an example notebook showing its use.

Maintainer responsibilities:
  - General / Misc / if you don't know who to tag: @dev2049
  - DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev
  - Models / Prompts: @hwchase17, @dev2049
  - Memory: @hwchase17
  - Agents / Tools / Toolkits: @vowelparrot
  - Tracing / Callbacks: @agola11
  - Async: @agola11

If no one reviews your PR within a few days, feel free to @-mention the
same people again.

See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
 -->
2023-06-27 23:00:26 -07:00
Lance Martin
3f9900a864 Create MultiQueryRetriever (#6833)
Distance-based vector database retrieval embeds (represents) queries in
high-dimensional space and finds similar embedded documents based on
"distance". But, retrieval may produce difference results with subtle
changes in query wording or if the embeddings do not capture the
semantics of the data well. Prompt engineering / tuning is sometimes
done to manually address these problems, but can be tedious.

The `MultiQueryRetriever` automates the process of prompt tuning by
using an LLM to generate multiple queries from different perspectives
for a given user input query. For each query, it retrieves a set of
relevant documents and takes the unique union across all queries to get
a larger set of potentially relevant documents. By generating multiple
perspectives on the same question, the `MultiQueryRetriever` might be
able to overcome some of the limitations of the distance-based retrieval
and get a richer set of results.

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-06-27 22:59:40 -07:00
Tim Asp
3ca1a387c2 Web Loader: Add proxy support (#6792)
Proxies are helpful, especially when you start querying against more
anti-bot websites.

[Proxy
services](https://developers.oxylabs.io/advanced-proxy-solutions/web-unblocker/making-requests)
(of which there are many) and `requests` make it easy to rotate IPs to
prevent banning by just passing along a simple dict to `requests`.

CC @rlancemartin, @eyurtsev
2023-06-27 22:27:49 -07:00
Ayan Bandyopadhyay
f92ccf70fd Update to the latest Psychic python library version (#6804)
Update the Psychic document loader to use the latest `psychicapi` python
library version: `0.8.0`
2023-06-27 22:26:38 -07:00
Hun-soo Jung
f3d178f600 Specify utilities package in SerpAPIWrapper docstring (#6821)
- Description: Specify utilities package in SerpAPIWrapper docstring
  - Issue: Not an issue
  - Dependencies: (n/a)
  - Tag maintainer: @dev2049 
  - Twitter handle: (n/a)
2023-06-27 22:26:20 -07:00
Matt Robinson
dd2a151543 Docs/unstructured api key (#6781)
### Summary

The Unstructured API will soon begin requiring API keys. This PR updates
the Unstructured integrations docs with instructions on how to generate
Unstructured API keys.

### Reviewers

@rlancemartin
@eyurtsev
@hwchase17
2023-06-27 16:54:15 -07:00
Matthew Plachter
d6664af0ee add async to zapier nla tools (#6791)
Replace this comment with:
  - Description: Add Async functionality to Zapier NLA Tools
  - Issue:  n/a 
  - Dependencies: n/a
  - Tag maintainer: 

Maintainer responsibilities:
  - Agents / Tools / Toolkits: @vowelparrot
  - Async: @agola11

If no one reviews your PR within a few days, feel free to @-mention the
same people again.

See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
2023-06-27 16:53:35 -07:00
Neil Neuwirth
efe0d39c6a Adjusted OpenAI cost calculation (#6798)
Added parentheses to ensure the division operation is performed before
multiplication. This now correctly calculates the cost by dividing the
number of tokens by 1000 first (to get the cost per token), and then
multiplies it with the model's cost per 1k tokens @agola11
2023-06-27 16:53:06 -07:00
Ian
b4c196f785 fix pinecone delete bug (#6816)
The implementation of delete in pinecone vector omits the namespace,
which will cause delete failed
2023-06-27 16:50:17 -07:00
Janos Tolgyesi
f1070de038 WebBaseLoader: optionally raise exception in the case of http error (#6823)
- **Description**: this PR adds the possibility to raise an exception in
the case the http request did not return a 2xx status code. This is
particularly useful in the situation when the url points to a
non-existent web page, the server returns a http status of 404 NOT
FOUND, but WebBaseLoader anyway parses and returns the http body of the
error message.
  - **Dependencies**: none,
  - **Tag maintainer**: @rlancemartin, @eyurtsev,
  - **Twitter handle**: jtolgyesi
2023-06-27 16:43:59 -07:00
rafael
ef72a7cf26 rail_parser: Allow creation from pydantic (#6832)
<!-- Thank you for contributing to LangChain!

Replace this comment with:
  - Description: a description of the change, 
  - Issue: the issue # it fixes (if applicable),
  - Dependencies: any dependencies required for this change,
- Tag maintainer: for a quicker response, tag the relevant maintainer
(see below),
- Twitter handle: we announce bigger features on Twitter. If your PR
gets announced and you'd like a mention, we'll gladly shout you out!

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
  2. an example notebook showing its use.

Maintainer responsibilities:
  - General / Misc / if you don't know who to tag: @dev2049
  - DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev
  - Models / Prompts: @hwchase17, @dev2049
  - Memory: @hwchase17
  - Agents / Tools / Toolkits: @vowelparrot
  - Tracing / Callbacks: @agola11
  - Async: @agola11

If no one reviews your PR within a few days, feel free to @-mention the
same people again.

See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
 -->

Adds a way to create the guardrails output parser from a pydantic model.
2023-06-27 16:40:52 -07:00
Augustine Theodore
a980095efc Enhancement : Ignore deleted messages and media in WhatsAppChatLoader (#6839)
- Description: Ignore deleted messages and media
  - Issue: #6838 
  - Dependencies: No new dependencies
  - Tag maintainer: @rlancemartin, @eyurtsev
2023-06-27 16:36:55 -07:00
Robert Lewis
74848aafea Zapier - Add better error messaging for 401 responses (#6840)
Description: When a 401 response is given back by Zapier, hint to the
end user why that may have occurred

- If an API Key was initialized with the wrapper, ask them to check
their API Key value
- if an access token was initialized with the wrapper, ask them to check
their access token or verify that it doesn't need to be refreshed.

Tag maintainer: @dev2049
2023-06-27 16:35:42 -07:00
Matt Robinson
b24472eae3 feat: Add UnstructuredOrgModeLoader (#6842)
### Summary

Adds `UnstructuredOrgModeLoader` for processing
[Org-mode](https://en.wikipedia.org/wiki/Org-mode) documents.

### Testing

```python
from langchain.document_loaders import UnstructuredOrgModeLoader

loader = UnstructuredOrgModeLoader(
    file_path="example_data/README.org", mode="elements"
)
docs = loader.load()
print(docs[0])
```

### Reviewers

- @rlancemartin
- @eyurtsev
- @hwchase17
2023-06-27 16:34:17 -07:00
Piyush Jain
e53995836a Added missing attribute value object (#6849)
## Description
Adds a missing type class for
[AdditionalResultAttributeValue](https://docs.aws.amazon.com/kendra/latest/APIReference/API_AdditionalResultAttributeValue.html).
Fixes validation failure for the query API that have
`AdditionalAttributes` in the response.

cc @dev2049 
cc @zhichenggeng
2023-06-27 16:30:11 -07:00
Cristóbal Carnero Liñán
e494b0a09f feat (documents): add a source code loader based on AST manipulation (#6486)
#### Summary

A new approach to loading source code is implemented:

Each top-level function and class in the code is loaded into separate
documents. Then, an additional document is created with the top-level
code, but without the already loaded functions and classes.

This could improve the accuracy of QA chains over source code.

For instance, having this script:

```
class MyClass:
    def __init__(self, name):
        self.name = name

    def greet(self):
        print(f"Hello, {self.name}!")

def main():
    name = input("Enter your name: ")
    obj = MyClass(name)
    obj.greet()

if __name__ == '__main__':
    main()
```

The loader will create three documents with this content:

First document:
```
class MyClass:
    def __init__(self, name):
        self.name = name

    def greet(self):
        print(f"Hello, {self.name}!")
```

Second document:
```
def main():
    name = input("Enter your name: ")
    obj = MyClass(name)
    obj.greet()
```

Third document:
```
# Code for: class MyClass:

# Code for: def main():

if __name__ == '__main__':
    main()
```

A threshold parameter is added to control whether small scripts are
split in this way or not.

At this moment, only Python and JavaScript are supported. The
appropriate parser is determined by examining the file extension.

#### Tests

This PR adds:

- Unit tests
- Integration tests

#### Dependencies

Only one dependency was added as optional (needed for the JavaScript
parser).

#### Documentation

A notebook is added showing how the loader can be used.

#### Who can review?

@eyurtsev @hwchase17

---------

Co-authored-by: rlm <pexpresss31@gmail.com>
2023-06-27 15:58:47 -07:00
Robert Lewis
da462d9dd4 Zapier update oauth support (#6780)
Description: Update documentation to

1) point to updated documentation links at Zapier.com (we've revamped
our help docs and paths), and
2) To provide clarity how to use the wrapper with an access token for
OAuth support

Demo:

Initializing the Zapier Wrapper with an OAuth Access Token

`ZapierNLAWrapper(zapier_nla_oauth_access_token="<redacted>")`

Using LangChain to resolve the current weather in Vancouver BC
leveraging Zapier NLA to lookup weather by coords.

```
> Entering new  chain...
 I need to use a tool to get the current weather.
Action: The Weather: Get Current Weather
Action Input: Get the current weather for Vancouver BC
Observation: {"coord__lon": -123.1207, "coord__lat": 49.2827, "weather": [{"id": 802, "main": "Clouds", "description": "scattered clouds", "icon": "03d", "icon_url": "http://openweathermap.org/img/wn/03d@2x.png"}], "weather[]icon_url": ["http://openweathermap.org/img/wn/03d@2x.png"], "weather[]icon": ["03d"], "weather[]id": [802], "weather[]description": ["scattered clouds"], "weather[]main": ["Clouds"], "base": "stations", "main__temp": 71.69, "main__feels_like": 71.56, "main__temp_min": 67.64, "main__temp_max": 76.39, "main__pressure": 1015, "main__humidity": 64, "visibility": 10000, "wind__speed": 3, "wind__deg": 155, "wind__gust": 11.01, "clouds__all": 41, "dt": 1687806607, "sys__type": 2, "sys__id": 2011597, "sys__country": "CA", "sys__sunrise": 1687781297, "sys__sunset": 1687839730, "timezone": -25200, "id": 6173331, "name": "Vancouver", "cod": 200, "summary": "scattered clouds", "_zap_search_was_found_status": true}
Thought: I now know the current weather in Vancouver BC.
Final Answer: The current weather in Vancouver BC is scattered clouds with a temperature of 71.69 and wind speed of 3
```
2023-06-27 11:46:32 -07:00
Joshua Carroll
24e4ae95ba Initial Streamlit callback integration doc (md) (#6788)
**Description:** Add a documentation page for the Streamlit Callback
Handler integration (#6315)

Notes:
- Implemented as a markdown file instead of a notebook since example
code runs in a Streamlit app (happy to discuss / consider alternatives
now or later)
- Contains an embedded Streamlit app ->
https://mrkl-minimal.streamlit.app/ Currently this app is hosted out of
a Streamlit repo but we're working to migrate the code to a LangChain
owned repo


![streamlit_docs](https://github.com/hwchase17/langchain/assets/116604821/0b7a6239-361f-470c-8539-f22c40098d1a)

cc @dev2049 @tconkling
2023-06-27 11:43:49 -07:00
Harrison Chase
8392ca602c bump version to 217 (#6831) 2023-06-27 09:39:56 -07:00
Ismail Pelaseyed
fcb3a64799 Add support for passing headers and search params to openai openapi chain (#6782)
- Description: add support for passing headers and search params to
OpenAI OpenAPI chains.
  - Issue: n/a
  - Dependencies: n/a
  - Tag maintainer: @hwchase17
  - Twitter handle: @pelaseyed

---------

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-06-27 09:09:03 -07:00
Zander Chase
e1fdb67440 Update description in Evals notebook (#6808) 2023-06-27 00:26:49 -07:00
Zander Chase
ad028bbb80 Permit Constitutional Principles (#6807)
In the criteria evaluator.
2023-06-27 00:23:54 -07:00
Zander Chase
6ca383ecf6 Update to RunOnDataset helper functions to accept evaluator callbacks (#6629)
Also improve docstrings and update the tracing datasets notebook to
focus on "debug, evaluate, monitor"
2023-06-26 23:58:13 -07:00
WaseemH
7ac9b22886 RecusiveUrlLoader to RecursiveUrlLoader (#6787) 2023-06-26 23:12:14 -07:00
Mshoven
4535b0b41e 🎯Bug: format the url and path_params (#6755)
- Description: format the url and path_params correctly, 
  - Issue: #6753,
  - Dependencies: None,
  - Tag maintainer: @vowelparrot,
  - Twitter handle: @0xbluesecurity
2023-06-26 23:03:57 -07:00
Zander Chase
07d802d088 Don't raise error if parent not found (#6538)
Done so that you can pass in a run from the low level api
2023-06-26 22:57:52 -07:00
Leonid Ganeline
49c864fa18 docs: vectorstore upgrades 2 (#6796)
updated vectorstores/ notebooks; added new integrations into
ecosystem/integrations/
@dev2049
@rlancemartin, @eyurtsev
2023-06-26 22:55:04 -07:00
Zander Chase
d7dbf4aefe Clean up agent trajectory interface (#6799)
- Enable reference
- Enable not specifying tools at the start
- Add methods with keywords
2023-06-26 22:54:04 -07:00
Zander Chase
cc60fed3be Add a Pairwise Comparison Chain (#6703)
Notebook shows preference scoring between two chains and reports wilson
score interval + p value

I think I'll add the option to insert ground truth labels but doesn't
have to be in this PR
2023-06-26 20:47:41 -07:00
Hakan Tekgul
2928b080f6 Update arize_callback.py - bug fix (#6784)
- Description: Bug Fix - Added a step variable to keep track of prompts
- Issue: Bug from internal Arize testing - The prompts and responses
that are ingested were not mapped correctly
  - Dependencies: N/A
2023-06-26 16:49:46 -07:00
Zander Chase
c460b04c64 Update String Evaluator (#6615)
- Add protocol for `evaluate_strings` 
- Move the criteria evaluator out so it's not restricted to being
applied on traced runs
2023-06-26 14:16:14 -07:00
AaaCabbage
b3f8324de9 feat: fix the Chinese characters in the solution content will be conv… (#6734)
fix the Chinese characters in the solution content will be converted to
ascii encoding, resulting in an abnormally long number of tokens


Co-authored-by: qixin <qixin@fintec.ai>
2023-06-26 13:14:48 -07:00
Chris Pappalardo
70f7c2bb2e align chroma vectorstore get with chromadb to enable where filtering (#6686)
allows for where filtering on collection via get

- Description: aligns langchain chroma vectorstore get with underlying
[chromadb collection
get](https://github.com/chroma-core/chroma/blob/main/chromadb/api/models/Collection.py#L103)
allowing for where filtering, etc.
  - Issue: NA
  - Dependencies: none
  - Tag maintainer: @rlancemartin, @eyurtsev
  - Twitter handle: @pappanaka
2023-06-26 10:51:20 -07:00
Zander Chase
9ca3b4645e Add support for tags in chain group context manager (#6668)
Lets you specify local and inheritable tags in the group manager.

Also, add more verbose docstrings for our reference docs.
2023-06-26 10:37:33 -07:00
Harrison Chase
d1bcc58beb bump version to 216 (#6770) 2023-06-26 09:46:19 -07:00
Zander Chase
6d30acffcb Fix breaking tags (#6765)
Fix tags change that broke old way of initializing agent

Closes #6756
2023-06-26 09:28:11 -07:00
James Croft
ba622764cb Improve performance when retrieving Notion DB pages (#6710) 2023-06-26 05:46:09 -07:00
Richy Wang
ec8247ec59 Fixed bug in AnalyticDB Vector Store caused by upgrade SQLAlchemy version (#6736) 2023-06-26 05:35:25 -07:00
Santiago Delgado
d84a3bcf7a Office365 Tool (#6306)
#### Background
With the development of [structured
tools](https://blog.langchain.dev/structured-tools/), the LangChain team
expanded the platform's functionality to meet the needs of new
applications. The GMail tool, empowered by structured tools, now
supports multiple arguments and powerful search capabilities,
demonstrating LangChain's ability to interact with dynamic data sources
like email servers.

#### Challenge
The current GMail tool only supports GMail, while users often utilize
other email services like Outlook in Office365. Additionally, the
proposed calendar tool in PR
https://github.com/hwchase17/langchain/pull/652 only works with Google
Calendar, not Outlook.

#### Changes
This PR implements an Office365 integration for LangChain, enabling
seamless email and calendar functionality with a single authentication
process.

#### Future Work
With the core Office365 integration complete, future work could include
integrating other Office365 tools such as Tasks and Address Book.

#### Who can review?
@hwchase17 or @vowelparrot can review this PR

#### Appendix
@janscas, I utilized your [O365](https://github.com/O365/python-o365)
library extensively. Given the rising popularity of LangChain and
similar AI frameworks, the convergence of libraries like O365 and tools
like this one is likely. So, I wanted to keep you updated on our
progress.

---------

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-06-26 02:59:09 -07:00
Xiaochao Dong
a15afc102c Relax the action input check for actions that require no input (#6357)
When the tool requires no input, the LLM often gives something like
this:
```json
{
    "action": "just_do_it"
}
```
I have attempted to enhance the prompt, but it doesn't appear to be
functioning effectively. Therefore, I believe we should consider easing
the check a little bit.



Signed-off-by: Xiaochao Dong (@damnever) <the.xcdong@gmail.com>
2023-06-26 02:30:17 -07:00
Ethan Bowen
cc33bde74f Confluence added (#6432)
Adding Confluence to Jira tool. Can create a page in Confluence with
this PR. If accepted, will extend functionality to Bitbucket and
additional Confluence features.



---------

Co-authored-by: Ethan Bowen <ethan.bowen@slalom.com>
2023-06-26 02:28:04 -07:00
Surya Nudurupati
2aeb8e7dbc Improved Documentation: Eliminating Redundancy in the Introduction.mdx (#6360)
When the documentation was originally written there was a redundant
typing of the word "using the"
2023-06-26 02:27:36 -07:00
rajib
0f6ef048d2 The openai_info.py does not have gpt-35-turbo which is the underlying Azure Open AI model name (#6321)
Since this model name is not there in the list MODEL_COST_PER_1K_TOKENS,
when we use get_openai_callback(), for gpt 3.5 model in Azure AI, we do
not get the cost of the tokens. This will fix this issue


#### Who can review?
 @hwchase17
 @agola11

Co-authored-by: rajib76 <rajib76@yahoo.com>
Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-06-26 02:16:39 -07:00
ArchimedesFTW
fe941cb54a Change tags(str) to tags(dict) in mlflow_callback.py docs (#6473)
Fixes #6472

#### Who can review?

@agola11
2023-06-26 02:12:23 -07:00
0xcrusher
9187d2f3a9 Fixed caching bug for Multiple Caching types by correctly checking types (#6746)
- Fixed an issue where some caching types check the wrong types, hence
not allowing caching to work


Maintainer responsibilities:
  - DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev
2023-06-26 01:14:32 -07:00
Harrison Chase
e9877ea8b1 Tiktoken override (#6697) 2023-06-26 00:49:32 -07:00
Gabriel Altay
f9771700e4 prevent DuckDuckGoSearchAPIWrapper from consuming top result (#6727)
remove the `next` call that checks for None on the results generator
2023-06-25 19:54:15 -07:00
Pau Ramon Revilla
87802c86d9 Added a MHTML document loader (#6311)
MHTML is a very interesting format since it's used both for emails but
also for archived webpages. Some scraping projects want to store pages
in disk to process them later, mhtml is perfect for that use case.

This is heavily inspired from the beautifulsoup html loader, but
extracting the html part from the mhtml file.

---------

Co-authored-by: rlm <pexpresss31@gmail.com>
2023-06-25 13:12:08 -07:00
Janos Tolgyesi
05eec99269 beautifulsoup get_text kwargs in WebBaseLoader (#6591)
# beautifulsoup get_text kwargs in WebBaseLoader

- Description: this PR introduces an optional `bs_get_text_kwargs`
parameter to `WebBaseLoader` constructor. It can be used to pass kwargs
to the downstream BeautifulSoup.get_text call. The most common usage
might be to pass a custom text separator, as seen also in
`BSHTMLLoader`.
  - Tag maintainer: @rlancemartin, @eyurtsev
  - Twitter handle: jtolgyesi
2023-06-25 12:42:27 -07:00
Matt Robinson
be68f6f8ce feat: Add UnstructuredRSTLoader (#6594)
### Summary

Adds an `UnstructuredRSTLoader` for loading
[reStructuredText](https://en.wikipedia.org/wiki/ReStructuredText) file.

### Testing

```python
from langchain.document_loaders import UnstructuredRSTLoader

loader = UnstructuredRSTLoader(
    file_path="example_data/README.rst", mode="elements"
)
docs = loader.load()
print(docs[0])
```

### Reviewers

- @hwchase17 
- @rlancemartin 
- @eyurtsev
2023-06-25 12:41:57 -07:00
Chip Davis
b32cc01c9f feat: added tqdm progress bar to UnstructuredURLLoader (#6600)
- Description: Adds a simple progress bar with tqdm when using
UnstructuredURLLoader. Exposes new paramater `show_progress_bar`. Very
simple PR.
- Issue: N/A
- Dependencies: N/A
- Tag maintainer: @rlancemartin @eyurtsev

---------

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-06-25 12:41:25 -07:00
Augustine Theodore
afc292e58d Fix WhatsAppChatLoader : Enable parsing additional formats (#6663)
- Description: Updated regex to support a new format that was observed
when whatsapp chat was exported.
  - Issue: #6654
  - Dependencies: No new dependencies
  - Tag maintainer: @rlancemartin, @eyurtsev
2023-06-25 12:08:43 -07:00
Sumanth Donthula
3e30a5d967 updated sql_database.py for returning sorted table names. (#6692)
Added code to get the tables info in sorted order in methods
get_usable_table_names and get_table_info.

Linked to Issue: #6640
2023-06-25 12:04:24 -07:00
刘 方瑞
9d1b3bab76 Fix Typo in LangChain MyScale Integration Doc (#6705)
<!-- Thank you for contributing to LangChain!

Replace this comment with:
  - Description: a description of the change, 
  - Issue: the issue # it fixes (if applicable),
  - Dependencies: any dependencies required for this change,
- Tag maintainer: for a quicker response, tag the relevant maintainer
(see below),
- Twitter handle: we announce bigger features on Twitter. If your PR
gets announced and you'd like a mention, we'll gladly shout you out!

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
  2. an example notebook showing its use.

Maintainer responsibilities:
  - General / Misc / if you don't know who to tag: @dev2049
  - DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev
  - Models / Prompts: @hwchase17, @dev2049
  - Memory: @hwchase17
  - Agents / Tools / Toolkits: @vowelparrot
  - Tracing / Callbacks: @agola11
  - Async: @agola11

If no one reviews your PR within a few days, feel free to @-mention the
same people again.

See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
 -->

- Description: Fix Typo in LangChain MyScale Integration  Doc

@hwchase17
2023-06-25 11:54:00 -07:00
sudolong
408c8d0178 fix chroma _similarity_search_with_relevance_scores missing kwargs … (#6708)
Issue: https://github.com/hwchase17/langchain/issues/6707
2023-06-25 11:53:42 -07:00
Zander Chase
d89e10d361 Fix Multi Functions Agent Tracing (#6702)
Confirmed it works now:
https://dev.langchain.plus/public/0dc32ce0-55af-432e-b09e-5a1a220842f5/r
2023-06-25 10:39:04 -07:00
Harrison Chase
1742db0c30 bump version to 215 (#6719) 2023-06-25 08:52:51 -07:00
Ankush Gola
e1b801be36 split up batch llm calls into separate runs (#5804) 2023-06-24 21:03:31 -07:00
Davis Chase
1da99ce013 bump v214 (#6694) 2023-06-24 14:23:11 -07:00
Lance Martin
dd36adc0f4 Make bs4 a local import in recursive_url_loader.py (#6693)
Resolve https://github.com/hwchase17/langchain/issues/6679
2023-06-24 13:54:10 -07:00
514 changed files with 40247 additions and 4206 deletions

View File

@@ -12,11 +12,11 @@ If you're adding a new integration, please include:
2. an example notebook showing its use.
Maintainer responsibilities:
- General / Misc / if you don't know who to tag: @dev2049
- General / Misc / if you don't know who to tag: @baskaryan
- DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev
- Models / Prompts: @hwchase17, @dev2049
- Models / Prompts: @hwchase17, @baskaryan
- Memory: @hwchase17
- Agents / Tools / Toolkits: @vowelparrot
- Agents / Tools / Toolkits: @hinthornw
- Tracing / Callbacks: @agola11
- Async: @agola11

View File

@@ -9,6 +9,9 @@ build:
os: ubuntu-22.04
tools:
python: "3.11"
jobs:
pre_build:
- python docs/api_reference/create_api_rst.py
# Build documentation in the docs/ directory with Sphinx
sphinx:

View File

@@ -1,57 +0,0 @@
document.addEventListener('DOMContentLoaded', () => {
// Load the external dependencies
function loadScript(src, onLoadCallback) {
const script = document.createElement('script');
script.src = src;
script.onload = onLoadCallback;
document.head.appendChild(script);
}
function createRootElement() {
const rootElement = document.createElement('div');
rootElement.id = 'my-component-root';
document.body.appendChild(rootElement);
return rootElement;
}
function initializeMendable() {
const rootElement = createRootElement();
const { MendableFloatingButton } = Mendable;
const iconSpan1 = React.createElement('span', {
}, '🦜');
const iconSpan2 = React.createElement('span', {
}, '🔗');
const icon = React.createElement('p', {
style: { color: '#ffffff', fontSize: '22px',width: '48px', height: '48px', margin: '0px', padding: '0px', display: 'flex', alignItems: 'center', justifyContent: 'center', textAlign: 'center' },
}, [iconSpan1, iconSpan2]);
const mendableFloatingButton = React.createElement(
MendableFloatingButton,
{
style: { darkMode: false, accentColor: '#010810' },
floatingButtonStyle: { color: '#ffffff', backgroundColor: '#010810' },
anon_key: '82842b36-3ea6-49b2-9fb8-52cfc4bde6bf', // Mendable Search Public ANON key, ok to be public
cmdShortcutKey:'j',
messageSettings: {
openSourcesInNewTab: false,
prettySources: true // Prettify the sources displayed now
},
icon: icon,
}
);
ReactDOM.render(mendableFloatingButton, rootElement);
}
loadScript('https://unpkg.com/react@17/umd/react.production.min.js', () => {
loadScript('https://unpkg.com/react-dom@17/umd/react-dom.production.min.js', () => {
loadScript('https://unpkg.com/@mendable/search@0.0.102/dist/umd/mendable.min.js', initializeMendable);
});
});
});

View File

@@ -1,12 +0,0 @@
Agents
==============
Reference guide for Agents and associated abstractions.
.. toctree::
:maxdepth: 1
:glob:
modules/agents
modules/tools
modules/agent_toolkits

File diff suppressed because it is too large Load Diff

View File

@@ -11,12 +11,13 @@
# add these directories to sys.path here. If the directory is relative to the
# documentation root, use os.path.abspath to make it absolute, like shown here.
#
# import os
# import sys
# sys.path.insert(0, os.path.abspath('.'))
import os
import sys
import toml
sys.path.insert(0, os.path.abspath("."))
with open("../../pyproject.toml") as f:
data = toml.load(f)
@@ -45,11 +46,9 @@ extensions = [
"sphinx.ext.napoleon",
"sphinx.ext.viewcode",
"sphinxcontrib.autodoc_pydantic",
"myst_nb",
"sphinx_copybutton",
"sphinx_panels",
"IPython.sphinxext.ipython_console_highlighting",
"sphinx_tabs.tabs",
]
source_suffix = [".rst"]
@@ -59,24 +58,22 @@ autodoc_pydantic_config_members = False
autodoc_pydantic_model_show_config_summary = False
autodoc_pydantic_model_show_validator_members = False
autodoc_pydantic_model_show_validator_summary = False
autodoc_pydantic_model_show_field_summary = False
autodoc_pydantic_model_members = False
autodoc_pydantic_model_undoc_members = False
autodoc_pydantic_model_hide_paramlist = False
autodoc_pydantic_model_signature_prefix = "class"
autodoc_pydantic_field_signature_prefix = "attribute"
autodoc_pydantic_model_summary_list_order = "bysource"
autodoc_member_order = "bysource"
autodoc_pydantic_field_signature_prefix = "param"
autodoc_member_order = "groupwise"
autoclass_content = "both"
autodoc_typehints_format = "short"
autodoc_default_options = {
"members": True,
"show-inheritance": True,
"undoc_members": True,
"inherited_members": "BaseModel",
"inherited-members": "BaseModel",
"undoc-members": True,
"special-members": "__call__",
}
autodoc_typehints = "description"
# autodoc_typehints = "description"
# Add any paths that contain templates here, relative to this directory.
templates_path = ["_templates"]
templates_path = ["templates"]
# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
@@ -89,14 +86,16 @@ exclude_patterns = ["_build", "Thumbs.db", ".DS_Store"]
# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
#
html_theme = "sphinx_rtd_theme"
html_theme = "scikit-learn-modern"
html_theme_path = ["themes"]
html_theme_options = {
"path_to_docs": "docs",
"repository_url": "https://github.com/hwchase17/langchain",
"use_repository_button": True,
# "style_nav_header_background": "white"
# redirects dictionary maps from old links to new links
html_additional_pages = {}
redirects = {
"index": "api_reference",
}
for old_link in redirects:
html_additional_pages[old_link] = "redirects.html"
html_context = {
"display_github": True, # Integrate GitHub
@@ -104,6 +103,7 @@ html_context = {
"github_repo": "langchain", # Repo name
"github_version": "master", # Version
"conf_py_path": "/docs/api_reference", # Path in the checkout to the docs root
"redirects": redirects,
}
# Add any paths that contain custom static files (such as style sheets) here,
@@ -116,10 +116,9 @@ html_static_path = ["_static"]
html_css_files = [
"css/custom.css",
]
html_use_index = False
html_js_files = [
"js/mendablesearch.js",
]
nb_execution_mode = "off"
myst_enable_extensions = ["colon_fence"]
# generate autosummary even if no references
autosummary_generate = True

View File

@@ -0,0 +1,94 @@
"""Script for auto-generating api_reference.rst"""
import glob
import re
from pathlib import Path
ROOT_DIR = Path(__file__).parents[2].absolute()
PKG_DIR = ROOT_DIR / "langchain"
WRITE_FILE = Path(__file__).parent / "api_reference.rst"
def load_members() -> dict:
members: dict = {}
for py in glob.glob(str(PKG_DIR) + "/**/*.py", recursive=True):
module = py[len(str(PKG_DIR)) + 1 :].replace(".py", "").replace("/", ".")
top_level = module.split(".")[0]
if top_level not in members:
members[top_level] = {"classes": [], "functions": []}
with open(py, "r") as f:
for line in f.readlines():
cls = re.findall(r"^class ([^_].*)\(", line)
members[top_level]["classes"].extend([module + "." + c for c in cls])
func = re.findall(r"^def ([^_].*)\(", line)
members[top_level]["functions"].extend([module + "." + f for f in func])
return members
def construct_doc(members: dict) -> str:
full_doc = """\
.. _api_reference:
=============
API Reference
=============
"""
for module, _members in sorted(members.items(), key=lambda kv: kv[0]):
classes = _members["classes"]
functions = _members["functions"]
if not (classes or functions):
continue
module_title = module.replace("_", " ").title()
if module_title == "Llms":
module_title = "LLMs"
section = f":mod:`langchain.{module}`: {module_title}"
full_doc += f"""\
{section}
{'=' * (len(section) + 1)}
.. automodule:: langchain.{module}
:no-members:
:no-inherited-members:
"""
if classes:
cstring = "\n ".join(sorted(classes))
full_doc += f"""\
Classes
--------------
.. currentmodule:: langchain
.. autosummary::
:toctree: {module}
:template: class.rst
{cstring}
"""
if functions:
fstring = "\n ".join(sorted(functions))
full_doc += f"""\
Functions
--------------
.. currentmodule:: langchain
.. autosummary::
:toctree: {module}
{fstring}
"""
return full_doc
def main() -> None:
members = load_members()
full_doc = construct_doc(members)
with open(WRITE_FILE, "w") as f:
f.write(full_doc)
if __name__ == "__main__":
main()

View File

@@ -1,13 +0,0 @@
Data connection
==============
LangChain has a number of modules that help you load, structure, store, and retrieve documents.
.. toctree::
:maxdepth: 1
:glob:
modules/document_loaders
modules/document_transformers
modules/embeddings
modules/vectorstores
modules/retrievers

View File

@@ -1,29 +1,8 @@
API Reference
==========================
| Full documentation on all methods, classes, and APIs in the LangChain Python package.
=============
LangChain API
=============
.. toctree::
:maxdepth: 1
:caption: Abstractions
:maxdepth: 2
./modules/base_classes.rst
.. toctree::
:maxdepth: 1
:caption: Core
./model_io.rst
./data_connection.rst
./modules/chains.rst
./agents.rst
./modules/memory.rst
./modules/callbacks.rst
.. toctree::
:maxdepth: 1
:caption: Additional
./modules/utilities.rst
./modules/experimental.rst
api_reference.rst

View File

@@ -1,12 +0,0 @@
Model I/O
==============
LangChain provides interfaces and integrations for working with language models.
.. toctree::
:maxdepth: 1
:glob:
./prompts.rst
./models.rst
./modules/output_parsers.rst

View File

@@ -1,11 +0,0 @@
Models
==============
LangChain provides interfaces and integrations for a number of different types of models.
.. toctree::
:maxdepth: 1
:glob:
modules/llms
modules/chat_models

View File

@@ -1,7 +0,0 @@
Agent Toolkits
===============================
.. automodule:: langchain.agents.agent_toolkits
:members:
:undoc-members:

View File

@@ -1,7 +0,0 @@
Agents
===============================
.. automodule:: langchain.agents
:members:
:undoc-members:

View File

@@ -1,5 +0,0 @@
Base classes
========================
.. automodule:: langchain.schema
:inherited-members:

View File

@@ -1,7 +0,0 @@
Callbacks
=======================
.. automodule:: langchain.callbacks
:members:
:undoc-members:

View File

@@ -1,8 +0,0 @@
Chains
=======================
.. automodule:: langchain.chains
:members:
:undoc-members:
:inherited-members: BaseModel

View File

@@ -1,7 +0,0 @@
Chat Models
===============================
.. automodule:: langchain.chat_models
:members:
:undoc-members:

View File

@@ -1,7 +0,0 @@
Document Loaders
===============================
.. automodule:: langchain.document_loaders
:members:
:undoc-members:

View File

@@ -1,13 +0,0 @@
Document Transformers
===============================
.. automodule:: langchain.document_transformers
:members:
:undoc-members:
Text Splitters
------------------------------
.. automodule:: langchain.text_splitter
:members:
:undoc-members:

View File

@@ -1,5 +0,0 @@
Embeddings
===========================
.. automodule:: langchain.embeddings
:members:

View File

@@ -1,5 +0,0 @@
Example Selector
=========================================
.. automodule:: langchain.prompts.example_selector
:members:

View File

@@ -1,28 +0,0 @@
====================
Experimental
====================
This module contains experimental modules and reproductions of existing work using LangChain primitives.
Autonomous agents
------------------
Here, we document the BabyAGI and AutoGPT classes from the langchain.experimental module.
.. autoclass:: langchain.experimental.BabyAGI
:members:
.. autoclass:: langchain.experimental.AutoGPT
:members:
Generative agents
------------------
Here, we document the GenerativeAgent and GenerativeAgentMemory classes from the langchain.experimental module.
.. autoclass:: langchain.experimental.GenerativeAgent
:members:
.. autoclass:: langchain.experimental.GenerativeAgentMemory
:members:

View File

@@ -1,7 +0,0 @@
LLMs
=======================
.. automodule:: langchain.llms
:members:
:inherited-members:
:special-members: __call__

View File

@@ -1,7 +0,0 @@
Memory
===============================
.. automodule:: langchain.memory
:members:
:undoc-members:

View File

@@ -1,7 +0,0 @@
Output Parsers
===============================
.. automodule:: langchain.output_parsers
:members:
:undoc-members:

View File

@@ -1,6 +0,0 @@
Prompt Templates
========================
.. automodule:: langchain.prompts
:members:
:undoc-members:

View File

@@ -1,14 +0,0 @@
Retrievers
===============================
.. automodule:: langchain.retrievers
:members:
:undoc-members:
Document compressors
-------------------------------
.. automodule:: langchain.retrievers.document_compressors
:members:
:undoc-members:

View File

@@ -1,7 +0,0 @@
Tools
===============================
.. automodule:: langchain.tools
:members:
:undoc-members:

View File

@@ -1,7 +0,0 @@
Utilities
===============================
.. automodule:: langchain.utilities
:members:
:undoc-members:

View File

@@ -1,6 +0,0 @@
Vector Stores
=============================
.. automodule:: langchain.vectorstores
:members:
:undoc-members:

View File

@@ -1,11 +0,0 @@
Prompts
==============
The reference guides here all relate to objects for working with Prompts.
.. toctree::
:maxdepth: 1
:glob:
modules/prompts
modules/example_selector

View File

@@ -0,0 +1,27 @@
Copyright (c) 2007-2023 The scikit-learn developers.
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
* Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.
* Neither the name of the copyright holder nor the names of its
contributors may be used to endorse or promote products derived from
this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

View File

@@ -0,0 +1,28 @@
:mod:`{{module}}`.{{objname}}
{{ underline }}==============
.. currentmodule:: {{ module }}
.. autoclass:: {{ objname }}
{% block methods %}
{% if methods %}
.. rubric:: {{ _('Methods') }}
.. autosummary::
{% for item in methods %}
~{{ name }}.{{ item }}
{%- endfor %}
{% endif %}
{% endblock %}
{% block attributes %}
{% if attributes %}
.. rubric:: {{ _('Attributes') }}
.. autosummary::
{% for item in attributes %}
~{{ name }}.{{ item }}
{%- endfor %}
{% endif %}
{% endblock %}

View File

@@ -0,0 +1,15 @@
{% set redirect = pathto(redirects[pagename]) %}
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta http-equiv="Refresh" content="0; url={{ redirect }}" />
<meta name="Description" content="scikit-learn: machine learning in Python">
<link rel="canonical" href="{{ redirect }}" />
<title>scikit-learn: machine learning in Python</title>
</head>
<body>
<p>You will be automatically redirected to the <a href="{{ redirect }}">new location of this page</a>.</p>
</body>
</html>

View File

@@ -0,0 +1,27 @@
Copyright (c) 2007-2023 The scikit-learn developers.
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
* Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.
* Neither the name of the copyright holder nor the names of its
contributors may be used to endorse or promote products derived from
this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

View File

@@ -0,0 +1,67 @@
<script>
$(document).ready(function() {
/* Add a [>>>] button on the top-right corner of code samples to hide
* the >>> and ... prompts and the output and thus make the code
* copyable. */
var div = $('.highlight-python .highlight,' +
'.highlight-python3 .highlight,' +
'.highlight-pycon .highlight,' +
'.highlight-default .highlight')
var pre = div.find('pre');
// get the styles from the current theme
pre.parent().parent().css('position', 'relative');
var hide_text = 'Hide prompts and outputs';
var show_text = 'Show prompts and outputs';
// create and add the button to all the code blocks that contain >>>
div.each(function(index) {
var jthis = $(this);
if (jthis.find('.gp').length > 0) {
var button = $('<span class="copybutton">&gt;&gt;&gt;</span>');
button.attr('title', hide_text);
button.data('hidden', 'false');
jthis.prepend(button);
}
// tracebacks (.gt) contain bare text elements that need to be
// wrapped in a span to work with .nextUntil() (see later)
jthis.find('pre:has(.gt)').contents().filter(function() {
return ((this.nodeType == 3) && (this.data.trim().length > 0));
}).wrap('<span>');
});
// define the behavior of the button when it's clicked
$('.copybutton').click(function(e){
e.preventDefault();
var button = $(this);
if (button.data('hidden') === 'false') {
// hide the code output
button.parent().find('.go, .gp, .gt').hide();
button.next('pre').find('.gt').nextUntil('.gp, .go').css('visibility', 'hidden');
button.css('text-decoration', 'line-through');
button.attr('title', show_text);
button.data('hidden', 'true');
} else {
// show the code output
button.parent().find('.go, .gp, .gt').show();
button.next('pre').find('.gt').nextUntil('.gp, .go').css('visibility', 'visible');
button.css('text-decoration', 'none');
button.attr('title', hide_text);
button.data('hidden', 'false');
}
});
/*** Add permalink buttons next to glossary terms ***/
$('dl.glossary > dt[id]').append(function() {
return ('<a class="headerlink" href="#' +
this.getAttribute('id') +
'" title="Permalink to this term">¶</a>');
});
});
</script>
{%- if pagename != 'index' and pagename != 'documentation' %}
{% if theme_mathjax_path %}
<script id="MathJax-script" async src="{{ theme_mathjax_path }}"></script>
{% endif %}
{%- endif %}

View File

@@ -0,0 +1,142 @@
{# TEMPLATE VAR SETTINGS #}
{%- set url_root = pathto('', 1) %}
{%- if url_root == '#' %}{% set url_root = '' %}{% endif %}
{%- if not embedded and docstitle %}
{%- set titlesuffix = " &mdash; "|safe + docstitle|e %}
{%- else %}
{%- set titlesuffix = "" %}
{%- endif %}
{%- set lang_attr = 'en' %}
<!DOCTYPE html>
<!--[if IE 8]><html class="no-js lt-ie9" lang="{{ lang_attr }}" > <![endif]-->
<!--[if gt IE 8]><!--> <html class="no-js" lang="{{ lang_attr }}" > <!--<![endif]-->
<head>
<meta charset="utf-8">
{{ metatags }}
<meta name="viewport" content="width=device-width, initial-scale=1.0">
{% block htmltitle %}
<title>{{ title|striptags|e }}{{ titlesuffix }}</title>
{% endblock %}
<link rel="canonical" href="http://scikit-learn.org/stable/{{pagename}}.html" />
{% if favicon_url %}
<link rel="shortcut icon" href="{{ favicon_url|e }}"/>
{% endif %}
<link rel="stylesheet" href="{{ pathto('_static/css/vendor/bootstrap.min.css', 1) }}" type="text/css" />
{%- for css in css_files %}
{%- if css|attr("rel") %}
<link rel="{{ css.rel }}" href="{{ pathto(css.filename, 1) }}" type="text/css"{% if css.title is not none %} title="{{ css.title }}"{% endif %} />
{%- else %}
<link rel="stylesheet" href="{{ pathto(css, 1) }}" type="text/css" />
{%- endif %}
{%- endfor %}
<link rel="stylesheet" href="{{ pathto('_static/' + style, 1) }}" type="text/css" />
<script id="documentation_options" data-url_root="{{ pathto('', 1) }}" src="{{ pathto('_static/documentation_options.js', 1) }}"></script>
<script src="{{ pathto('_static/jquery.js', 1) }}"></script>
{%- block extrahead %} {% endblock %}
</head>
<body>
{% include "nav.html" %}
{%- block content %}
<div class="d-flex" id="sk-doc-wrapper">
<input type="checkbox" name="sk-toggle-checkbox" id="sk-toggle-checkbox">
<label id="sk-sidemenu-toggle" class="sk-btn-toggle-toc btn sk-btn-primary" for="sk-toggle-checkbox">Toggle Menu</label>
<div id="sk-sidebar-wrapper" class="border-right">
<div class="sk-sidebar-toc-wrapper">
<div class="btn-group w-100 mb-2" role="group" aria-label="rellinks">
{%- if prev %}
<a href="{{ prev.link|e }}" role="button" class="btn sk-btn-rellink py-1" sk-rellink-tooltip="{{ prev.title|striptags }}">Prev</a>
{%- else %}
<a href="#" role="button" class="btn sk-btn-rellink py-1 disabled"">Prev</a>
{%- endif %}
{%- if parents -%}
<a href="{{ parents[-1].link|e }}" role="button" class="btn sk-btn-rellink py-1" sk-rellink-tooltip="{{ parents[-1].title|striptags }}">Up</a>
{%- else %}
<a href="#" role="button" class="btn sk-btn-rellink disabled py-1">Up</a>
{%- endif %}
{%- if next %}
<a href="{{ next.link|e }}" role="button" class="btn sk-btn-rellink py-1" sk-rellink-tooltip="{{ next.title|striptags }}">Next</a>
{%- else %}
<a href="#" role="button" class="btn sk-btn-rellink py-1 disabled"">Next</a>
{%- endif %}
</div>
{%- if pagename != "install" %}
<div class="alert alert-warning p-1 mb-2" role="alert">
<p class="text-center mb-0">
<strong>LangChain {{ release }}</strong><br/>
</p>
</div>
{%- endif %}
{%- if meta and meta['parenttoc']|tobool %}
<div class="sk-sidebar-toc">
{% set nav = get_nav_object(maxdepth=3, collapse=True, numbered=True) %}
<ul>
{% for main_nav_item in nav %}
{% if main_nav_item.active %}
<li>
<a href="{{ main_nav_item.url }}" class="sk-toc-active">{{ main_nav_item.title }}</a>
</li>
<ul>
{% for nav_item in main_nav_item.children %}
<li>
<a href="{{ nav_item.url }}" class="{% if nav_item.active %}sk-toc-active{% endif %}">{{ nav_item.title }}</a>
{% if nav_item.children %}
<ul>
{% for inner_child in nav_item.children %}
<li class="sk-toctree-l3">
<a href="{{ inner_child.url }}">{{ inner_child.title }}</a>
</li>
{% endfor %}
</ul>
{% endif %}
</li>
{% endfor %}
</ul>
{% endif %}
{% endfor %}
</ul>
</div>
{%- elif meta and meta['globalsidebartoc']|tobool %}
<div class="sk-sidebar-toc sk-sidebar-global-toc">
{{ toctree(maxdepth=2, titles_only=True) }}
</div>
{%- else %}
<div class="sk-sidebar-toc">
{{ toc }}
</div>
{%- endif %}
</div>
</div>
<div id="sk-page-content-wrapper">
<div class="sk-page-content container-fluid body px-md-3" role="main">
{% block body %}{% endblock %}
</div>
<div class="container">
<footer class="sk-content-footer">
{%- if pagename != 'index' %}
{%- if show_copyright %}
{%- if hasdoc('copyright') %}
{% trans path=pathto('copyright'), copyright=copyright|e %}&copy; {{ copyright }}.{% endtrans %}
{%- else %}
{% trans copyright=copyright|e %}&copy; {{ copyright }}.{% endtrans %}
{%- endif %}
{%- endif %}
{%- if last_updated %}
{% trans last_updated=last_updated|e %}Last updated on {{ last_updated }}.{% endtrans %}
{%- endif %}
{%- if show_source and has_source and sourcename %}
<a href="{{ pathto('_sources/' + sourcename, true)|e }}" rel="nofollow">{{ _('Show this page source') }}</a>
{%- endif %}
{%- endif %}
</footer>
</div>
</div>
</div>
{%- endblock %}
<script src="{{ pathto('_static/js/vendor/bootstrap.min.js', 1) }}"></script>
{% include "javascript.html" %}
</body>
</html>

View File

@@ -0,0 +1,85 @@
{%- if pagename != 'index' and pagename != 'documentation' %}
{%- set nav_bar_class = "sk-docs-navbar" %}
{%- set top_container_cls = "sk-docs-container" %}
{%- else %}
{%- set nav_bar_class = "sk-landing-navbar" %}
{%- set top_container_cls = "sk-landing-container" %}
{%- endif %}
{% if theme_link_to_live_contributing_page|tobool %}
{# Link to development page for live builds #}
{%- set development_link = "https://scikit-learn.org/dev/developers/index.html" %}
{# Open on a new development page in new window/tab for live builds #}
{%- set development_attrs = 'target="_blank" rel="noopener noreferrer"' %}
{%- else %}
{%- set development_link = pathto('developers/index') %}
{%- set development_attrs = '' %}
{%- endif %}
{# title, link, link_attrs #}
{%- set drop_down_navigation = [
('Getting Started', pathto('getting_started'), ''),
('Tutorial', pathto('tutorial/index'), ''),
("What's new", pathto('whats_new/v' + version), ''),
('Glossary', pathto('glossary'), ''),
('Development', development_link, development_attrs),
('FAQ', pathto('faq'), ''),
('Support', pathto('support'), ''),
('Related packages', pathto('related_projects'), ''),
('Roadmap', pathto('roadmap'), ''),
('Governance', pathto('governance'), ''),
('About us', pathto('about'), ''),
('GitHub', 'https://github.com/scikit-learn/scikit-learn', ''),
('Other Versions and Download', 'https://scikit-learn.org/dev/versions.html', '')]
-%}
<nav id="navbar" class="{{ nav_bar_class }} navbar navbar-expand-md navbar-light bg-light py-0">
<div class="container-fluid {{ top_container_cls }} px-0">
{%- if logo_url %}
<a class="navbar-brand py-0" href="{{ pathto('index') }}">
<img
class="sk-brand-img"
src="{{ logo_url|e }}"
alt="logo"/>
</a>
{%- endif %}
<button
id="sk-navbar-toggler"
class="navbar-toggler"
type="button"
data-toggle="collapse"
data-target="#navbarSupportedContent"
aria-controls="navbarSupportedContent"
aria-expanded="false"
aria-label="Toggle navigation"
>
<span class="navbar-toggler-icon"></span>
</button>
<div class="sk-navbar-collapse collapse navbar-collapse" id="navbarSupportedContent">
<ul class="navbar-nav mr-auto">
<li class="nav-item">
<a class="sk-nav-link nav-link" href="{{ pathto('api_reference') }}">API</a>
</li>
<li class="nav-item">
<a class="sk-nav-link nav-link" target="_blank" rel="noopener noreferrer" href="https://python.langchain.com/">Python Docs</a>
</li>
{%- for title, link, link_attrs in drop_down_navigation %}
<li class="nav-item">
<a class="sk-nav-link nav-link nav-more-item-mobile-items" href="{{ link }}" {{ link_attrs }}>{{ title }}</a>
</li>
{%- endfor %}
</ul>
{%- if pagename != "search"%}
<div id="searchbox" role="search">
<div class="searchformwrapper">
<form class="search" action="{{ pathto('search') }}" method="get">
<input class="sk-search-text-input" type="text" name="q" aria-labelledby="searchlabel" />
<input class="sk-search-text-btn" type="submit" value="{{ _('Go') }}" />
</form>
</div>
</div>
{%- endif %}
</div>
</div>
</nav>

View File

@@ -0,0 +1,16 @@
{%- extends "basic/search.html" %}
{% block extrahead %}
<script type="text/javascript" src="{{ pathto('_static/underscore.js', 1) }}"></script>
<script type="text/javascript" src="{{ pathto('searchindex.js', 1) }}" defer></script>
<script type="text/javascript" src="{{ pathto('_static/doctools.js', 1) }}"></script>
<script type="text/javascript" src="{{ pathto('_static/language_data.js', 1) }}"></script>
<script type="text/javascript" src="{{ pathto('_static/searchtools.js', 1) }}"></script>
<!-- <script type="text/javascript" src="{{ pathto('_static/sphinx_highlight.js', 1) }}"></script> -->
<script type="text/javascript">
$(document).ready(function() {
if (!Search.out) {
Search.init();
}
});
</script>
{% endblock %}

File diff suppressed because it is too large Load Diff

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

View File

@@ -0,0 +1,8 @@
[theme]
inherit = basic
pygments_style = default
stylesheet = css/theme.css
[options]
link_to_live_contributing_page = false
mathjax_path =

View File

@@ -47,7 +47,7 @@ import ChatModel from "@snippets/get_started/quickstart/chat_model.mdx"
## Prompt templates
Most LLM applications do not pass user input directly into to an LLM. Usually they will add the user input to a larger piece of text, called a prompt template, that provides additional context on the specific task at hand.
Most LLM applications do not pass user input directly into an LLM. Usually they will add the user input to a larger piece of text, called a prompt template, that provides additional context on the specific task at hand.
In the previous example, the text we passed to the model contained instructions to generate a company name. For our application, it'd be great if the user only had to provide the description of a company/product, without having to worry about giving the model instructions.
@@ -138,7 +138,7 @@ The chains and agents we've looked at so far have been stateless, but for many a
The Memory module gives you a way to maintain application state. The base Memory interface is simple: it lets you update state given the latest run inputs and outputs and it lets you modify (or contextualize) the next input using the stored state.
There are a number of built-in memory systems. The simplest of these are is a buffer memory which just prepends the last few inputs/outputs to the current input - we will use this in the example below.
There are a number of built-in memory systems. The simplest of these is a buffer memory which just prepends the last few inputs/outputs to the current input - we will use this in the example below.
import MemoryLLM from "@snippets/get_started/quickstart/memory_llms.mdx"
import MemoryChatModel from "@snippets/get_started/quickstart/memory_chat_models.mdx"
@@ -155,4 +155,4 @@ You can use Memory with chains and agents initialized with chat models. The main
<MemoryChatModel/>
</TabItem>
</Tabs>
</Tabs>

View File

@@ -2,6 +2,8 @@
>[JSON (JavaScript Object Notation)](https://en.wikipedia.org/wiki/JSON) is an open standard file format and data interchange format that uses human-readable text to store and transmit data objects consisting of attributevalue pairs and arrays (or other serializable values).
>[JSON Lines](https://jsonlines.org/) is a file format where each line is a valid JSON value.
import Example from "@snippets/modules/data_connection/document_loaders/how_to/json.mdx"
<Example/>

View File

@@ -10,7 +10,7 @@ for you.
## Get started
This walkthrough showcases basic functionality related to VectorStores. A key part of working with vector stores is creating the vector to put in them, which is usually created via embeddings. Therefore, it is recommended that you familiarize yourself with the [text embedding model](/docs/modules/model_io/models/embeddings.html) interfaces before diving into this.
This walkthrough showcases basic functionality related to VectorStores. A key part of working with vector stores is creating the vector to put in them, which is usually created via embeddings. Therefore, it is recommended that you familiarize yourself with the [text embedding model](/docs/modules/data_connection/text_embedding/) interfaces before diving into this.
import GetStarted from "@snippets/modules/data_connection/vectorstores/get_started.mdx"

View File

@@ -7,7 +7,10 @@ const { ProvidePlugin } = require("webpack");
const path = require("path");
const examplesPath = path.resolve(__dirname, "..", "examples", "src");
const snippetsPath = path.resolve(__dirname, "..", "snippets")
const snippetsPath = path.resolve(__dirname, "..", "snippets");
const baseLightCodeBlockTheme = require("prism-react-renderer/themes/vsLight");
const baseDarkCodeBlockTheme = require("prism-react-renderer/themes/vsDark");
/** @type {import('@docusaurus/types').Config} */
const config = {
@@ -84,7 +87,6 @@ const config = {
({
docs: {
sidebarPath: require.resolve("./sidebars.js"),
editUrl: "https://github.com/hwchase17/langchain/edit/master/docs/",
remarkPlugins: [
[require("@docusaurus/remark-plugin-npm2yarn"), { sync: true }],
],
@@ -127,8 +129,20 @@ const config = {
},
},
prism: {
theme: require("prism-react-renderer/themes/vsLight"),
darkTheme: require("prism-react-renderer/themes/vsDark"),
theme: {
...baseLightCodeBlockTheme,
plain: {
...baseLightCodeBlockTheme.plain,
backgroundColor: "#F5F5F5",
},
},
darkTheme: {
...baseDarkCodeBlockTheme,
plain: {
...baseDarkCodeBlockTheme.plain,
backgroundColor: "#222222",
},
},
},
image: "img/parrot-chainlink-icon.png",
navbar: {

15104
docs/docs_skeleton/package-lock.json generated Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -23,7 +23,7 @@
"@docusaurus/preset-classic": "2.4.0",
"@docusaurus/remark-plugin-npm2yarn": "^2.4.0",
"@mdx-js/react": "^1.6.22",
"@mendable/search": "^0.0.102",
"@mendable/search": "^0.0.112-beta.7",
"clsx": "^1.2.1",
"json-loader": "^0.5.7",
"process": "^0.11.10",

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,28 @@
# Airtable
>[Airtable](https://en.wikipedia.org/wiki/Airtable) is a cloud collaboration service.
`Airtable` is a spreadsheet-database hybrid, with the features of a database but applied to a spreadsheet.
> The fields in an Airtable table are similar to cells in a spreadsheet, but have types such as 'checkbox',
> 'phone number', and 'drop-down list', and can reference file attachments like images.
>Users can create a database, set up column types, add records, link tables to one another, collaborate, sort records
> and publish views to external websites.
## Installation and Setup
```bash
pip install pyairtable
```
* Get your [API key](https://support.airtable.com/docs/creating-and-using-api-keys-and-access-tokens).
* Get the [ID of your base](https://airtable.com/developers/web/api/introduction).
* Get the [table ID from the table url](https://www.highviewapps.com/kb/where-can-i-find-the-airtable-base-id-and-table-id/#:~:text=Both%20the%20Airtable%20Base%20ID,URL%20that%20begins%20with%20tbl).
## Document Loader
```python
from langchain.document_loaders import AirtableLoader
```
See an [example](/docs/modules/data_connection/document_loaders/integrations/airtable.html).

View File

@@ -0,0 +1,464 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "944e4194",
"metadata": {},
"source": [
"# Arthur LangChain integration"
]
},
{
"cell_type": "markdown",
"id": "b1ccdfe8",
"metadata": {},
"source": [
"[Arthur](https://www.arthur.ai/) is a model monitoring and observability platform.\n",
"\n",
"This notebook shows how to register LLMs (chat and non-chat) as models with the Arthur platform. Then we show how to set up langchain LLMs with an Arthur callback that will automatically log model inferences to Arthur.\n",
"\n",
"For more information about how to use the Arthur SDK, visit our [docs](http://docs.arthur.ai), in particular our [model onboarding guide](https://docs.arthur.ai/user-guide/walkthroughs/model-onboarding/index.html)"
]
},
{
"cell_type": "code",
"execution_count": 21,
"id": "961c6691",
"metadata": {},
"outputs": [],
"source": [
"from langchain.callbacks import ArthurCallbackHandler\n",
"from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler\n",
"from langchain.chat_models import ChatOpenAI, ChatAnthropic\n",
"from langchain.schema import HumanMessage\n",
"from langchain.llms import OpenAI, Cohere, HuggingFacePipeline"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "a23d1963",
"metadata": {},
"outputs": [],
"source": [
"from arthurai import ArthurAI\n",
"from arthurai.common.constants import InputType, OutputType, Stage, ValueType\n",
"from arthurai.core.attributes import ArthurAttribute, AttributeCategory"
]
},
{
"cell_type": "markdown",
"id": "4d1b90c0",
"metadata": {},
"source": [
"# ArthurModel for chatbot with only input text and output text attributes"
]
},
{
"cell_type": "markdown",
"id": "1a4a4a8a",
"metadata": {},
"source": [
"Connect to Arthur client"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "f49e9b79",
"metadata": {},
"outputs": [],
"source": [
"arthur_url = \"https://app.arthur.ai\"\n",
"arthur_login = \"your-username-here\"\n",
"arthur = ArthurAI(url=arthur_url, login=arthur_login)"
]
},
{
"cell_type": "markdown",
"id": "c6e063bf",
"metadata": {},
"source": [
"Before you can register model inferences to Arthur, you must have a registered model with an ID in the Arthur platform. We will provide this ID to the ArthurCallbackHandler.\n",
"\n",
"You can register a model with Arthur here in the notebook using this `register_chat_llm()` function. This function returns the ID of the model saved to the platform. To use the function, uncomment `arthur_model_chatbot_id = register_chat_llm()` in the cell below."
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "31b17b5e",
"metadata": {},
"outputs": [],
"source": [
"def register_chat_llm():\n",
"\n",
" arthur_model = arthur.model(\n",
" display_name=\"LangChainChat\",\n",
" input_type=InputType.NLP,\n",
" output_type=OutputType.TokenSequence\n",
" )\n",
"\n",
" arthur_model._add_attribute_to_model(ArthurAttribute(\n",
" name=\"my_input_text\",\n",
" stage=Stage.ModelPipelineInput,\n",
" value_type=ValueType.Unstructured_Text,\n",
" categorical=True,\n",
" is_unique=True\n",
" ))\n",
" arthur_model._add_attribute_to_model(ArthurAttribute(\n",
" name=\"my_output_text\",\n",
" stage=Stage.PredictedValue,\n",
" value_type=ValueType.Unstructured_Text,\n",
" categorical=True,\n",
" is_unique=False,\n",
" ))\n",
" \n",
" return arthur_model.save()\n",
"# arthur_model_chatbot_id = register_chat_llm()"
]
},
{
"cell_type": "markdown",
"id": "0d1d1e60",
"metadata": {},
"source": [
"Alternatively, you can set the `arthur_model_chatbot_id` variable to be the ID of your model on your [model dashboard](https://app.arthur.ai/)"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "cdfa02c8",
"metadata": {},
"outputs": [],
"source": [
"arthur_model_chatbot_id = \"your-model-id-here\""
]
},
{
"cell_type": "markdown",
"id": "58be5234",
"metadata": {},
"source": [
"This function creates a Langchain chat LLM with the ArthurCallbackHandler to log inferences to Arthur. We provide our `arthur_model_chatbot_id`, as well as the Arthur url and login we are using."
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "448a8fee",
"metadata": {},
"outputs": [],
"source": [
"def make_langchain_chat_llm(chat_model=ChatOpenAI):\n",
" if chat_model not in [ChatOpenAI, ChatAnthropic]:\n",
" raise ValueError(\"For this notebook, use one of the chat models imported from langchain.chat_models\")\n",
" return chat_model(\n",
" streaming=True, \n",
" temperature=0.1,\n",
" callbacks=[\n",
" StreamingStdOutCallbackHandler(), \n",
" ArthurCallbackHandler.from_credentials(arthur_model_chatbot_id, arthur_url=arthur_url, arthur_login=arthur_login)\n",
" ])\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "17c182da",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": 8,
"id": "2dfc00ed",
"metadata": {},
"outputs": [],
"source": [
"chat_llm = make_langchain_chat_llm()"
]
},
{
"cell_type": "markdown",
"id": "139291f2",
"metadata": {},
"source": [
"Run the chatbot (it will save the chat history in the `history` list so that the conversation can reference earlier messages)\n",
"\n",
"Type `q` to quit"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "7480a443",
"metadata": {},
"outputs": [],
"source": [
"def run_langchain_chat_llm(llm):\n",
" history = []\n",
" while True:\n",
" user_input = input(\"\\n>>> input >>>\\n>>>: \")\n",
" if user_input == 'q': break\n",
" history.append(HumanMessage(content=user_input))\n",
" history.append(llm(history))"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "6868ce71",
"metadata": {},
"outputs": [],
"source": [
"run_langchain_chat_llm(chat_llm)"
]
},
{
"cell_type": "markdown",
"id": "a0be7d01",
"metadata": {},
"source": [
"# ArthurModel with input text, output text, token likelihoods, finish reason, and amount of token usage attributes"
]
},
{
"cell_type": "markdown",
"id": "1ee4b741",
"metadata": {},
"source": [
"This function registers an LLM with additional metadata attributes to log to Arthur with each inference\n",
"\n",
"As above, you can register your callback handler for an LLM using this function here in the notebook or by pasting the ID of an already-registered model from your [model dashboard](https://app.arthur.ai/)"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "e671836c",
"metadata": {},
"outputs": [],
"source": [
"def register_llm():\n",
"\n",
" arthur_model = arthur.model(\n",
" display_name=\"LangChainLLM\",\n",
" input_type=InputType.NLP,\n",
" output_type=OutputType.TokenSequence\n",
" )\n",
" arthur_model._add_attribute_to_model(ArthurAttribute(\n",
" name=\"my_input_text\",\n",
" stage=Stage.ModelPipelineInput,\n",
" value_type=ValueType.Unstructured_Text,\n",
" categorical=True,\n",
" is_unique=True\n",
" ))\n",
" arthur_model._add_attribute_to_model(ArthurAttribute(\n",
" name=\"my_output_text\",\n",
" stage=Stage.PredictedValue,\n",
" value_type=ValueType.Unstructured_Text,\n",
" categorical=True,\n",
" is_unique=False,\n",
" token_attribute_link=\"my_output_likelihoods\"\n",
" ))\n",
" arthur_model._add_attribute_to_model(ArthurAttribute(\n",
" name=\"my_output_likelihoods\",\n",
" stage=Stage.PredictedValue,\n",
" value_type=ValueType.TokenLikelihoods,\n",
" token_attribute_link=\"my_output_text\"\n",
" ))\n",
" arthur_model._add_attribute_to_model(ArthurAttribute(\n",
" name=\"finish_reason\",\n",
" stage=Stage.NonInputData,\n",
" value_type=ValueType.String,\n",
" categorical=True,\n",
" categories=[\n",
" AttributeCategory(value='stop'),\n",
" AttributeCategory(value='length'),\n",
" AttributeCategory(value='content_filter'),\n",
" AttributeCategory(value='null')\n",
" ]\n",
" ))\n",
" arthur_model._add_attribute_to_model(ArthurAttribute(\n",
" name=\"prompt_tokens\",\n",
" stage=Stage.NonInputData,\n",
" value_type=ValueType.Integer\n",
" ))\n",
" arthur_model._add_attribute_to_model(ArthurAttribute(\n",
" name=\"completion_tokens\",\n",
" stage=Stage.NonInputData,\n",
" value_type=ValueType.Integer\n",
" ))\n",
" arthur_model._add_attribute_to_model(ArthurAttribute(\n",
" name=\"duration\",\n",
" stage=Stage.NonInputData,\n",
" value_type=ValueType.Float\n",
" ))\n",
" \n",
" return arthur_model.save()"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "2a6686f7",
"metadata": {},
"outputs": [],
"source": [
"arthur_model_llm_id = \"your-model-id-here\""
]
},
{
"cell_type": "markdown",
"id": "2dcacb96",
"metadata": {},
"source": [
"These functions create Langchain LLMs with the ArthurCallbackHandler to log inferences to Arthur.\n",
"\n",
"There are small differences in the underlying Langchain integrations with these libraries and the available metadata for model inputs & outputs"
]
},
{
"cell_type": "code",
"execution_count": 23,
"id": "34cf0072",
"metadata": {},
"outputs": [],
"source": [
"def make_langchain_openai_llm():\n",
" return OpenAI(\n",
" temperature=0.1,\n",
" model_kwargs = {'logprobs': 3},\n",
" callbacks=[\n",
" ArthurCallbackHandler.from_credentials(arthur_model_llm_id, arthur_url=arthur_url, arthur_login=arthur_login)\n",
" ])\n",
"\n",
"def make_langchain_cohere_llm():\n",
" return Cohere(\n",
" temperature=0.1,\n",
" callbacks=[\n",
" ArthurCallbackHandler.from_credentials(arthur_model_chatbot_id, arthur_url=arthur_url, arthur_login=arthur_login)\n",
" ])\n",
"\n",
"def make_langchain_huggingface_llm():\n",
" llm = HuggingFacePipeline.from_model_id(\n",
" model_id=\"bert-base-uncased\", \n",
" task=\"text-generation\", \n",
" model_kwargs={\"temperature\":2.5, \"max_length\":64})\n",
" llm.callbacks = [\n",
" ArthurCallbackHandler.from_credentials(arthur_model_chatbot_id, arthur_url=arthur_url, arthur_login=arthur_login)\n",
" ]\n",
" return llm"
]
},
{
"cell_type": "code",
"execution_count": 24,
"id": "f40c3ce0",
"metadata": {},
"outputs": [],
"source": [
"openai_llm = make_langchain_openai_llm()"
]
},
{
"cell_type": "code",
"execution_count": 25,
"id": "8476d531",
"metadata": {},
"outputs": [],
"source": [
"cohere_llm = make_langchain_cohere_llm()"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "7483b9d3",
"metadata": {},
"outputs": [],
"source": [
"huggingface_llm = make_langchain_huggingface_llm()"
]
},
{
"cell_type": "markdown",
"id": "c17d8e86",
"metadata": {},
"source": [
"Run the LLM (each completion is independent, no chat history is saved as we were doing above with the chat llms)\n",
"\n",
"Type `q` to quit"
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "72ee0790",
"metadata": {},
"outputs": [],
"source": [
"def run_langchain_llm(llm):\n",
" while True:\n",
" print(\"Type your text for completion:\\n\")\n",
" user_input = input(\"\\n>>> input >>>\\n>>>: \")\n",
" if user_input == 'q': break\n",
" print(llm(user_input), \"\\n================\\n\")"
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "fb864057",
"metadata": {},
"outputs": [],
"source": [
"run_langchain_llm(openai_llm)"
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "e6673769",
"metadata": {},
"outputs": [],
"source": [
"run_langchain_llm(cohere_llm)"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "85541f1c",
"metadata": {},
"outputs": [],
"source": [
"run_langchain_llm(huggingface_llm)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.8"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,36 @@
# Brave Search
>[Brave Search](https://en.wikipedia.org/wiki/Brave_Search) is a search engine developed by Brave Software.
> - `Brave Search` uses its own web index. As of May 2022, it covered over 10 billion pages and was used to serve 92%
> of search results without relying on any third-parties, with the remainder being retrieved
> server-side from the Bing API or (on an opt-in basis) client-side from Google. According
> to Brave, the index was kept "intentionally smaller than that of Google or Bing" in order to
> help avoid spam and other low-quality content, with the disadvantage that "Brave Search is
> not yet as good as Google in recovering long-tail queries."
>- `Brave Search Premium`: As of April 2023 Brave Search is an ad-free website, but it will
> eventually switch to a new model that will include ads and premium users will get an ad-free experience.
> User data including IP addresses won't be collected from its users by default. A premium account
> will be required for opt-in data-collection.
## Installation and Setup
To get access to the Brave Search API, you need to [create an account and get an API key](https://api.search.brave.com/app/dashboard).
## Document Loader
See a [usage example](/docs/modules/data_connection/document_loaders/integrations/brave_search.html).
```python
from langchain.document_loaders import BraveSearchLoader
```
## Tool
See a [usage example](/docs/modules/agents/tools/integrations/brave_search.html).
```python
from langchain.tools import BraveSearch
```

View File

@@ -1,19 +1,31 @@
# Cassandra
>[Cassandra](https://en.wikipedia.org/wiki/Apache_Cassandra) is a free and open-source, distributed, wide-column
>[Apache Cassandra®](https://cassandra.apache.org/) is a free and open-source, distributed, wide-column
> store, NoSQL database management system designed to handle large amounts of data across many commodity servers,
> providing high availability with no single point of failure. `Cassandra` offers support for clusters spanning
> providing high availability with no single point of failure. Cassandra offers support for clusters spanning
> multiple datacenters, with asynchronous masterless replication allowing low latency operations for all clients.
> `Cassandra` was designed to implement a combination of `Amazon's Dynamo` distributed storage and replication
> techniques combined with `Google's Bigtable` data and storage engine model.
> Cassandra was designed to implement a combination of _Amazon's Dynamo_ distributed storage and replication
> techniques combined with _Google's Bigtable_ data and storage engine model.
## Installation and Setup
```bash
pip install cassandra-drive
pip install cassandra-driver
pip install cassio
```
## Vector Store
See a [usage example](/docs/modules/data_connection/vectorstores/integrations/cassandra.html).
```python
from langchain.memory import CassandraChatMessageHistory
```
## Memory
See a [usage example](/docs/modules/memory/integrations/cassandra_chat_message_history.html).

View File

@@ -0,0 +1,153 @@
# Flyte
> [Flyte](https://github.com/flyteorg/flyte) is an open-source orchestrator that facilitates building production-grade data and ML pipelines.
> It is built for scalability and reproducibility, leveraging Kubernetes as its underlying platform.
The purpose of this notebook is to demonstrate the integration of a `FlyteCallback` into your Flyte task, enabling you to effectively monitor and track your LangChain experiments.
## Installation & Setup
- Install the Flytekit library by running the command `pip install flytekit`.
- Install the Flytekit-Envd plugin by running the command `pip install flytekitplugins-envd`.
- Install LangChain by running the command `pip install langchain`.
- Install [Docker](https://docs.docker.com/engine/install/) on your system.
## Flyte Tasks
A Flyte [task](https://docs.flyte.org/projects/cookbook/en/latest/auto/core/flyte_basics/task.html) serves as the foundational building block of Flyte.
To execute LangChain experiments, you need to write Flyte tasks that define the specific steps and operations involved.
NOTE: The [getting started guide](https://docs.flyte.org/projects/cookbook/en/latest/index.html) offers detailed, step-by-step instructions on installing Flyte locally and running your initial Flyte pipeline.
First, import the necessary dependencies to support your LangChain experiments.
```python
import os
from flytekit import ImageSpec, task
from langchain.agents import AgentType, initialize_agent, load_tools
from langchain.callbacks import FlyteCallbackHandler
from langchain.chains import LLMChain
from langchain.chat_models import ChatOpenAI
from langchain.prompts import PromptTemplate
from langchain.schema import HumanMessage
```
Set up the necessary environment variables to utilize the OpenAI API and Serp API:
```python
# Set OpenAI API key
os.environ["OPENAI_API_KEY"] = "<your_openai_api_key>"
# Set Serp API key
os.environ["SERPAPI_API_KEY"] = "<your_serp_api_key>"
```
Replace `<your_openai_api_key>` and `<your_serp_api_key>` with your respective API keys obtained from OpenAI and Serp API.
To guarantee reproducibility of your pipelines, Flyte tasks are containerized.
Each Flyte task must be associated with an image, which can either be shared across the entire Flyte [workflow](https://docs.flyte.org/projects/cookbook/en/latest/auto/core/flyte_basics/basic_workflow.html) or provided separately for each task.
To streamline the process of supplying the required dependencies for each Flyte task, you can initialize an [`ImageSpec`](https://docs.flyte.org/projects/cookbook/en/latest/auto/core/image_spec/image_spec.html) object.
This approach automatically triggers a Docker build, alleviating the need for users to manually create a Docker image.
```python
custom_image = ImageSpec(
name="langchain-flyte",
packages=[
"langchain",
"openai",
"spacy",
"https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.5.0/en_core_web_sm-3.5.0.tar.gz",
"textstat",
"google-search-results",
],
registry="<your-registry>",
)
```
You have the flexibility to push the Docker image to a registry of your preference.
[Docker Hub](https://hub.docker.com/) or [GitHub Container Registry (GHCR)](https://docs.github.com/en/packages/working-with-a-github-packages-registry/working-with-the-container-registry) is a convenient option to begin with.
Once you have selected a registry, you can proceed to create Flyte tasks that log the LangChain metrics to Flyte Deck.
The following examples demonstrate tasks related to OpenAI LLM, chains and agent with tools:
### LLM
```python
@task(disable_deck=False, container_image=custom_image)
def langchain_llm() -> str:
llm = ChatOpenAI(
model_name="gpt-3.5-turbo",
temperature=0.2,
callbacks=[FlyteCallbackHandler()],
)
return llm([HumanMessage(content="Tell me a joke")]).content
```
### Chain
```python
@task(disable_deck=False, container_image=custom_image)
def langchain_chain() -> list[dict[str, str]]:
template = """You are a playwright. Given the title of play, it is your job to write a synopsis for that title.
Title: {title}
Playwright: This is a synopsis for the above play:"""
llm = ChatOpenAI(
model_name="gpt-3.5-turbo",
temperature=0,
callbacks=[FlyteCallbackHandler()],
)
prompt_template = PromptTemplate(input_variables=["title"], template=template)
synopsis_chain = LLMChain(
llm=llm, prompt=prompt_template, callbacks=[FlyteCallbackHandler()]
)
test_prompts = [
{
"title": "documentary about good video games that push the boundary of game design"
},
]
return synopsis_chain.apply(test_prompts)
```
### Agent
```python
@task(disable_deck=False, container_image=custom_image)
def langchain_agent() -> str:
llm = OpenAI(
model_name="gpt-3.5-turbo",
temperature=0,
callbacks=[FlyteCallbackHandler()],
)
tools = load_tools(
["serpapi", "llm-math"], llm=llm, callbacks=[FlyteCallbackHandler()]
)
agent = initialize_agent(
tools,
llm,
agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
callbacks=[FlyteCallbackHandler()],
verbose=True,
)
return agent.run(
"Who is Leonardo DiCaprio's girlfriend? Could you calculate her current age and raise it to the power of 0.43?"
)
```
These tasks serve as a starting point for running your LangChain experiments within Flyte.
## Execute the Flyte Tasks on Kubernetes
To execute the Flyte tasks on the configured Flyte backend, use the following command:
```bash
pyflyte run --image <your-image> langchain_flyte.py langchain_llm
```
This command will initiate the execution of the `langchain_llm` task on the Flyte backend. You can trigger the remaining two tasks in a similar manner.
The metrics will be displayed on the Flyte UI as follows:
![LangChain LLM](https://ik.imagekit.io/c8zl7irwkdda/Screenshot_2023-06-20_at_1.23.29_PM_MZYeG0dKa.png?updatedAt=1687247642993)

View File

@@ -0,0 +1,44 @@
# Grobid
This page covers how to use the Grobid to parse articles for LangChain.
It is seperated into two parts: installation and running the server
## Installation and Setup
#Ensure You have Java installed
!apt-get install -y openjdk-11-jdk -q
!update-alternatives --set java /usr/lib/jvm/java-11-openjdk-amd64/bin/java
#Clone and install the Grobid Repo
import os
!git clone https://github.com/kermitt2/grobid.git
os.environ["JAVA_HOME"] = "/usr/lib/jvm/java-11-openjdk-amd64"
os.chdir('grobid')
!./gradlew clean install
#Run the server,
get_ipython().system_raw('nohup ./gradlew run > grobid.log 2>&1 &')
You can now use the GrobidParser to produce documents
```python
from langchain.document_loaders.parsers import GrobidParser
from langchain.document_loaders.generic import GenericLoader
#Produce chunks from article paragraphs
loader = GenericLoader.from_filesystem(
"/Users/31treehaus/Desktop/Papers/",
glob="*",
suffixes=[".pdf"],
parser= GrobidParser(segment_sentences=False)
)
docs = loader.load()
#Produce chunks from article sentences
loader = GenericLoader.from_filesystem(
"/Users/31treehaus/Desktop/Papers/",
glob="*",
suffixes=[".pdf"],
parser= GrobidParser(segment_sentences=True)
)
docs = loader.load()
```
Chunk metadata will include bboxes although these are a bit funky to parse, see https://grobid.readthedocs.io/en/latest/Coordinates-in-PDF/

View File

@@ -0,0 +1,23 @@
# Hologres
>[Hologres](https://www.alibabacloud.com/help/en/hologres/latest/introduction) is a unified real-time data warehousing service developed by Alibaba Cloud. You can use Hologres to write, update, process, and analyze large amounts of data in real time.
>`Hologres` supports standard `SQL` syntax, is compatible with `PostgreSQL`, and supports most PostgreSQL functions. Hologres supports online analytical processing (OLAP) and ad hoc analysis for up to petabytes of data, and provides high-concurrency and low-latency online data services.
>`Hologres` provides **vector database** functionality by adopting [Proxima](https://www.alibabacloud.com/help/en/hologres/latest/vector-processing).
>`Proxima` is a high-performance software library developed by `Alibaba DAMO Academy`. It allows you to search for the nearest neighbors of vectors. Proxima provides higher stability and performance than similar open source software such as Faiss. Proxima allows you to search for similar text or image embeddings with high throughput and low latency. Hologres is deeply integrated with Proxima to provide a high-performance vector search service.
## Installation and Setup
Click [here](https://www.alibabacloud.com/zh/product/hologres) to fast deploy a Hologres cloud instance.
```bash
pip install psycopg2
```
## Vector Store
See a [usage example](/docs/modules/data_connection/vectorstores/integrations/hologres.html).
```python
from langchain.vectorstores import Hologres
```

View File

@@ -25,7 +25,7 @@ There are two ways to set up parameters for myscale index.
1. Environment Variables
Before you run the app, please set the environment variable with `export`:
`export MYSCALE_URL='<your-endpoints-url>' MYSCALE_PORT=<your-endpoints-port> MYSCALE_USERNAME=<your-username> MYSCALE_PASSWORD=<your-password> ...`
`export MYSCALE_HOST='<your-endpoints-url>' MYSCALE_PORT=<your-endpoints-port> MYSCALE_USERNAME=<your-username> MYSCALE_PASSWORD=<your-password> ...`
You can easily find your account, password and other info on our SaaS. For details please refer to [this document](https://docs.myscale.com/en/cluster-management/)
Every attributes under `MyScaleSettings` can be set with prefix `MYSCALE_` and is case insensitive.

View File

@@ -0,0 +1,19 @@
# Rockset
>[Rockset](https://rockset.com/product/) is a real-time analytics database service for serving low latency, high concurrency analytical queries at scale. It builds a Converged Index™ on structured and semi-structured data with an efficient store for vector embeddings. Its support for running SQL on schemaless data makes it a perfect choice for running vector search with metadata filters.
## Installation and Setup
Make sure you have Rockset account and go to the web console to get the API key. Details can be found on [the website](https://rockset.com/docs/rest-api/).
```bash
pip install rockset
```
## Vector Store
See a [usage example](/docs/modules/data_connection/vectorstores/integrations/rockset.html).
```python
from langchain.vectorstores import RocksetDB
```

View File

@@ -0,0 +1,20 @@
# SingleStoreDB
>[SingleStoreDB](https://singlestore.com/) is a high-performance distributed SQL database that supports deployment both in the [cloud](https://www.singlestore.com/cloud/) and on-premises. It provides vector storage, and vector functions including [dot_product](https://docs.singlestore.com/managed-service/en/reference/sql-reference/vector-functions/dot_product.html) and [euclidean_distance](https://docs.singlestore.com/managed-service/en/reference/sql-reference/vector-functions/euclidean_distance.html), thereby supporting AI applications that require text similarity matching.
## Installation and Setup
There are several ways to establish a [connection](https://singlestoredb-python.labs.singlestore.com/generated/singlestoredb.connect.html) to the database. You can either set up environment variables or pass named parameters to the `SingleStoreDB constructor`.
Alternatively, you may provide these parameters to the `from_documents` and `from_texts` methods.
```bash
pip install singlestoredb
```
## Vector Store
See a [usage example](/docs/modules/data_connection/vectorstores/integrations/singlestoredb.html).
```python
from langchain.vectorstores import SingleStoreDB
```

View File

@@ -1,15 +1,14 @@
# scikit-learn
This page covers how to use the scikit-learn package within LangChain.
It is broken into two parts: installation and setup, and then references to specific scikit-learn wrappers.
>[scikit-learn](https://scikit-learn.org/stable/) is an open source collection of machine learning algorithms,
> including some implementations of the [k nearest neighbors](https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.NearestNeighbors.html). `SKLearnVectorStore` wraps this implementation and adds the possibility to persist the vector store in json, bson (binary json) or Apache Parquet format.
## Installation and Setup
- Install the Python package with `pip install scikit-learn`
## Wrappers
### VectorStore
## Vector Store
`SKLearnVectorStore` provides a simple wrapper around the nearest neighbor implementation in the
scikit-learn package, allowing you to use it as a vectorstore.

View File

@@ -0,0 +1,21 @@
# StarRocks
>[StarRocks](https://www.starrocks.io/) is a High-Performance Analytical Database.
`StarRocks` is a next-gen sub-second MPP database for full analytics scenarios, including multi-dimensional analytics, real-time analytics and ad-hoc query.
>Usually `StarRocks` is categorized into OLAP, and it has showed excellent performance in [ClickBench — a Benchmark For Analytical DBMS](https://benchmark.clickhouse.com/). Since it has a super-fast vectorized execution engine, it could also be used as a fast vectordb.
## Installation and Setup
```bash
pip install pymysql
```
## Vector Store
See a [usage example](/docs/modules/data_connection/vectorstores/integrations/starrocks.html).
```python
from langchain.vectorstores import StarRocks
```

View File

@@ -0,0 +1,19 @@
# Tigris
> [Tigris](htttps://tigrisdata.com) is an open source Serverless NoSQL Database and Search Platform designed to simplify building high-performance vector search applications.
> `Tigris` eliminates the infrastructure complexity of managing, operating, and synchronizing multiple tools, allowing you to focus on building great applications instead.
## Installation and Setup
```bash
pip install tigrisdb openapi-schema-pydantic openai tiktoken
```
## Vector Store
See a [usage example](/docs/modules/data_connection/vectorstores/integrations/tigris.html).
```python
from langchain.vectorstores import Tigris
```

View File

@@ -0,0 +1,22 @@
# Typesense
> [Typesense](https://typesense.org) is an open source, in-memory search engine, that you can either
> [self-host](https://typesense.org/docs/guide/install-typesense.html#option-2-local-machine-self-hosting) or run
> on [Typesense Cloud](https://cloud.typesense.org/).
> `Typesense` focuses on performance by storing the entire index in RAM (with a backup on disk) and also
> focuses on providing an out-of-the-box developer experience by simplifying available options and setting good defaults.
## Installation and Setup
```bash
pip install typesense openapi-schema-pydantic openai tiktoken
```
## Vector Store
See a [usage example](/docs/modules/data_connection/vectorstores/integrations/typesense.html).
```python
from langchain.vectorstores import Typesense
```

View File

@@ -23,11 +23,15 @@ its dependencies running locally.
If you want to get up and running with less set up, you can
simply run `pip install unstructured` and use `UnstructuredAPIFileLoader` or
`UnstructuredAPIFileIOLoader`. That will process your document using the hosted Unstructured API.
Note that currently (as of 1 May 2023) the Unstructured API is open, but it will soon require
an API. The [Unstructured documentation page](https://unstructured-io.github.io/) will have
instructions on how to generate an API key once they're available. Check out the instructions
[here](https://github.com/Unstructured-IO/unstructured-api#dizzy-instructions-for-using-the-docker-image)
if you'd like to self-host the Unstructured API or run it locally.
The Unstructured API requires API keys to make requests.
You can generate a free API key [here](https://www.unstructured.io/api-key) and start using it today!
Checkout the README [here](https://github.com/Unstructured-IO/unstructured-api) here to get started making API calls.
We'd love to hear your feedback, let us know how it goes in our [community slack](https://join.slack.com/t/unstructuredw-kbe4326/shared_invite/zt-1x7cgo0pg-PTptXWylzPQF9xZolzCnwQ).
And stay tuned for improvements to both quality and performance!
Check out the instructions
[here](https://github.com/Unstructured-IO/unstructured-api#dizzy-instructions-for-using-the-docker-image) if you'd like to self-host the Unstructured API or run it locally.
## Wrappers

View File

@@ -39,6 +39,21 @@ vectara = Vectara(
```
The customer_id, corpus_id and api_key are optional, and if they are not supplied will be read from the environment variables `VECTARA_CUSTOMER_ID`, `VECTARA_CORPUS_ID` and `VECTARA_API_KEY`, respectively.
Afer you have the vectorstore, you can `add_texts` or `add_documents` as per the standard `VectorStore` interface, for example:
```python
vectara.add_texts(["to be or not to be", "that is the question"])
```
Since Vectara supports file-upload, we also added the ability to upload files (PDF, TXT, HTML, PPT, DOC, etc) directly as file. When using this method, the file is uploaded directly to the Vectara backend, processed and chunked optimally there, so you don't have to use the LangChain document loader or chunking mechanism.
As an example:
```python
vectara.add_files(["path/to/file1.pdf", "path/to/file2.pdf",...])
```
To query the vectorstore, you can use the `similarity_search` method (or `similarity_search_with_score`), which takes a query string and returns a list of results:
```python
results = vectara.similarity_score("what is LangChain?")

View File

@@ -0,0 +1,448 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Comparing Chain Outputs\n",
"\n",
"Suppose you have two different prompts (or LLMs). How do you know which will generate \"better\" results?\n",
"\n",
"One automated way to predict the preferred configuration is to use a `PairwiseStringEvaluator` like the `PairwiseStringEvalChain`<a name=\"cite_ref-1\"></a>[<sup>[1]</sup>](#cite_note-1). This chain prompts an LLM to select which output is preferred, given a specific input.\n",
"\n",
"For this evalution, we will need 3 things:\n",
"1. An evaluator\n",
"2. A dataset of inputs\n",
"3. 2 (or more) LLMs, Chains, or Agents to compare\n",
"\n",
"Then we will aggregate the restults to determine the preferred model.\n",
"\n",
"### Step 1. Create the Evaluator\n",
"\n",
"In this example, you will use gpt-4 to select which output is preferred."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Optional if you are tracing the notebook\n",
"%env LANGCHAIN_PROJECT=\"Comparing Chain Outputs\""
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"from langchain.chat_models import ChatOpenAI\n",
"from langchain.evaluation.comparison import PairwiseStringEvalChain\n",
"\n",
"llm = ChatOpenAI(model=\"gpt-4\")\n",
"\n",
"eval_chain = PairwiseStringEvalChain.from_llm(llm=llm)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Step 2. Select Dataset\n",
"\n",
"If you already have real usage data for your LLM, you can use a representative sample. More examples\n",
"provide more reliable results. We will use some example queries someone might have about how to use langchain here."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Found cached dataset parquet (/Users/wfh/.cache/huggingface/datasets/LangChainDatasets___parquet/LangChainDatasets--langchain-howto-queries-bbb748bbee7e77aa/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec)\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "d852a1884480457292c90d8bd9d4f1e6",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
" 0%| | 0/1 [00:00<?, ?it/s]"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"from langchain.evaluation.loading import load_dataset\n",
"\n",
"dataset = load_dataset(\"langchain-howto-queries\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Step 3. Define Models to Compare\n",
"\n",
"We will be comparing two agents in this case."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"from langchain import SerpAPIWrapper\n",
"from langchain.agents import initialize_agent, Tool\n",
"from langchain.agents import AgentType\n",
"from langchain.chat_models import ChatOpenAI\n",
"\n",
"\n",
"# Initialize the language model\n",
"# You can add your own OpenAI API key by adding openai_api_key=\"<your_api_key>\" \n",
"llm = ChatOpenAI(temperature=0, model=\"gpt-3.5-turbo-0613\")\n",
"\n",
"# Initialize the SerpAPIWrapper for search functionality\n",
"#Replace <your_api_key> in openai_api_key=\"<your_api_key>\" with your actual SerpAPI key.\n",
"search = SerpAPIWrapper()\n",
"\n",
"# Define a list of tools offered by the agent\n",
"tools = [\n",
" Tool(\n",
" name=\"Search\",\n",
" func=search.run,\n",
" coroutine=search.arun,\n",
" description=\"Useful when you need to answer questions about current events. You should ask targeted questions.\"\n",
" ),\n",
"]"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"functions_agent = initialize_agent(tools, llm, agent=AgentType.OPENAI_MULTI_FUNCTIONS, verbose=False)\n",
"conversations_agent = initialize_agent(tools, llm, agent=AgentType.CHAT_ZERO_SHOT_REACT_DESCRIPTION, verbose=False)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"### Step 4. Generate Responses\n",
"\n",
"We will generate outputs for each of the models before evaluating them."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "b076d6bf6680422aa9082d4bad4d98a3",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
" 0%| | 0/20 [00:00<?, ?it/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"Retrying langchain.chat_models.openai.acompletion_with_retry.<locals>._completion_with_retry in 1.0 seconds as it raised ServiceUnavailableError: The server is overloaded or not ready yet..\n",
"Retrying langchain.chat_models.openai.acompletion_with_retry.<locals>._completion_with_retry in 1.0 seconds as it raised ServiceUnavailableError: The server is overloaded or not ready yet..\n"
]
}
],
"source": [
"from tqdm.notebook import tqdm\n",
"import asyncio\n",
"\n",
"results = []\n",
"agents = [functions_agent, conversations_agent]\n",
"concurrency_level = 6 # How many concurrent agents to run. May need to decrease if OpenAI is rate limiting.\n",
"\n",
"# We will only run the first 20 examples of this dataset to speed things up\n",
"# This will lead to larger confidence intervals downstream.\n",
"batch = []\n",
"for example in tqdm(dataset[:20]):\n",
" batch.extend([agent.acall(example['inputs']) for agent in agents])\n",
" if len(batch) >= concurrency_level:\n",
" batch_results = await asyncio.gather(*batch, return_exceptions=True)\n",
" results.extend(list(zip(*[iter(batch_results)]*2)))\n",
" batch = []\n",
"if batch:\n",
" batch_results = await asyncio.gather(*batch, return_exceptions=True)\n",
" results.extend(list(zip(*[iter(batch_results)]*2)))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 5. Evaluate Pairs\n",
"\n",
"Now it's time to evaluate the results. For each agent response, run the evaluation chain to select which output is preferred (or return a tie).\n",
"\n",
"Randomly select the input order to reduce the likelihood that one model will be preferred just because it is presented first."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"import random\n",
"\n",
"def predict_preferences(dataset, results) -> list:\n",
" preferences = []\n",
"\n",
" for example, (res_a, res_b) in zip(dataset, results):\n",
" input_ = example['inputs']\n",
" # Flip a coin to reduce persistent position bias\n",
" if random.random() < 0.5:\n",
" pred_a, pred_b = res_a, res_b\n",
" a, b = \"a\", \"b\"\n",
" else:\n",
" pred_a, pred_b = res_b, res_a\n",
" a, b = \"b\", \"a\"\n",
" eval_res = eval_chain.evaluate_string_pairs(\n",
" prediction=pred_a['output'] if isinstance(pred_a, dict) else str(pred_a),\n",
" prediction_b=pred_b['output'] if isinstance(pred_b, dict) else str(pred_b),\n",
" input=input_\n",
" )\n",
" if eval_res[\"value\"] == \"A\":\n",
" preferences.append(a)\n",
" elif eval_res[\"value\"] == \"B\":\n",
" preferences.append(b)\n",
" else:\n",
" preferences.append(None) # No preference\n",
" return preferences"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [],
"source": [
"preferences = predict_preferences(dataset, results)"
]
},
{
"cell_type": "markdown",
"metadata": {
"tags": []
},
"source": [
"**Print out the ratio of preferences.**"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"OpenAI Functions Agent: 90.00%\n",
"Structured Chat Agent: 10.00%\n"
]
}
],
"source": [
"from collections import Counter\n",
"\n",
"name_map = {\n",
" \"a\": \"OpenAI Functions Agent\",\n",
" \"b\": \"Structured Chat Agent\",\n",
"}\n",
"counts = Counter(preferences)\n",
"pref_ratios = {\n",
" k: v/len(preferences) for k, v in\n",
" counts.items()\n",
"}\n",
"for k, v in pref_ratios.items():\n",
" print(f\"{name_map.get(k)}: {v:.2%}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Estimate Confidence Intervals\n",
"\n",
"The results seem pretty clear, but if you want to have a better sense of how confident we are, that model \"A\" (the OpenAI Functions Agent) is the preferred model, we can calculate confidence intervals. \n",
"\n",
"Below, use the Wilson score to estimate the confidence interval."
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"from math import sqrt\n",
"\n",
"def wilson_score_interval(preferences: list, which: str = \"a\", z: float = 1.96) -> tuple:\n",
" \"\"\"Estimate the confidence interval using the Wilson score.\n",
" \n",
" See: https://en.wikipedia.org/wiki/Binomial_proportion_confidence_interval#Wilson_score_interval\n",
" for more details, including when to use it and when it should not be used.\n",
" \"\"\"\n",
" total_preferences = preferences.count('a') + preferences.count('b')\n",
" n_s = preferences.count(which)\n",
"\n",
" if total_preferences == 0:\n",
" return (0, 0)\n",
"\n",
" p_hat = n_s / total_preferences\n",
"\n",
" denominator = 1 + (z**2) / total_preferences\n",
" adjustment = (z / denominator) * sqrt(p_hat*(1-p_hat)/total_preferences + (z**2)/(4*total_preferences*total_preferences))\n",
" center = (p_hat + (z**2) / (2*total_preferences)) / denominator\n",
" lower_bound = min(max(center - adjustment, 0.0), 1.0)\n",
" upper_bound = min(max(center + adjustment, 0.0), 1.0)\n",
"\n",
" return (lower_bound, upper_bound)"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"The \"OpenAI Functions Agent\" would be preferred between 69.90% and 97.21% percent of the time (with 95% confidence).\n",
"The \"Structured Chat Agent\" would be preferred between 2.79% and 30.10% percent of the time (with 95% confidence).\n"
]
}
],
"source": [
"for which_, name in name_map.items():\n",
" low, high = wilson_score_interval(preferences, which=which_)\n",
" print(f'The \"{name}\" would be preferred between {low:.2%} and {high:.2%} percent of the time (with 95% confidence).')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Print out the p-value.**"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"The p-value is 0.00040. If the null hypothesis is true (i.e., if the selected eval chain actually has no preference between the models),\n",
"then there is a 0.04025% chance of observing the OpenAI Functions Agent be preferred at least 18\n",
"times out of 20 trials.\n"
]
}
],
"source": [
"from scipy import stats\n",
"preferred_model = max(pref_ratios, key=pref_ratios.get)\n",
"successes = preferences.count(preferred_model)\n",
"n = len(preferences) - preferences.count(None)\n",
"p_value = stats.binom_test(successes, n, p=0.5, alternative='two-sided')\n",
"print(f\"\"\"The p-value is {p_value:.5f}. If the null hypothesis is true (i.e., if the selected eval chain actually has no preference between the models),\n",
"then there is a {p_value:.5%} chance of observing the {name_map.get(preferred_model)} be preferred at least {successes}\n",
"times out of {n} trials.\"\"\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a name=\"cite_note-1\"></a>_1. Note: Automated evals are still an open research topic and are best used alongside other evaluation approaches. \n",
"LLM preferences exhibit biases, including banal ones like the order of outputs.\n",
"In choosing preferences, \"ground truth\" may not be taken into account, which may lead to scores that aren't grounded in utility._"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.3"
}
},
"nbformat": 4,
"nbformat_minor": 4
}

View File

@@ -0,0 +1,400 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "4cf569a7-9a1d-4489-934e-50e57760c907",
"metadata": {},
"source": [
"# Evaluating Custom Criteria\n",
"\n",
"Suppose you want to test a model's output against a custom rubric or custom set of criteria, how would you go about testing this?\n",
"\n",
"The `CriteriaEvalChain` is a convenient way to predict whether an LLM or Chain's output complies with a set of criteria, so long as you can\n",
"describe those criteria in regular language. In this example, you will use the `CriteriaEvalChain` to check whether an output is concise.\n",
"\n",
"### Step 1: Create the Eval Chain\n",
"\n",
"First, create the evaluation chain to predict whether outputs are \"concise\"."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "6005ebe8-551e-47a5-b4df-80575a068552",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"from langchain.chat_models import ChatOpenAI\n",
"from langchain.evaluation.criteria import CriteriaEvalChain\n",
"\n",
"llm = ChatOpenAI(temperature=0)\n",
"criterion = \"conciseness\"\n",
"eval_chain = CriteriaEvalChain.from_llm(llm=llm, criteria=criterion)"
]
},
{
"cell_type": "markdown",
"id": "eaef0d93-e080-4be2-a0f1-701b0d91fcf4",
"metadata": {},
"source": [
"### Step 2: Make Prediction\n",
"\n",
"Run an output to measure."
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "68b1a348-cf41-40bf-9667-e79683464cf2",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"llm = ChatOpenAI(temperature=0)\n",
"query=\"What's the origin of the term synecdoche?\"\n",
"prediction = llm.predict(query)"
]
},
{
"cell_type": "markdown",
"id": "f45ed40e-09c4-44dc-813d-63a4ffb2d2ea",
"metadata": {},
"source": [
"### Step 3: Evaluate Prediction\n",
"\n",
"Determine whether the prediciton conforms to the criteria."
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "22f83fb8-82f4-4310-a877-68aaa0789199",
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'reasoning': '1. Conciseness: The submission is concise and to the point. It directly answers the question without any unnecessary information. Therefore, the submission meets the criterion of conciseness.\\n\\nY', 'value': 'Y', 'score': 1}\n"
]
}
],
"source": [
"eval_result = eval_chain.evaluate_strings(prediction=prediction, input=query)\n",
"print(eval_result)"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "8c4ec9dd-6557-4f23-8480-c822eb6ec552",
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/plain": [
"['conciseness',\n",
" 'relevance',\n",
" 'correctness',\n",
" 'coherence',\n",
" 'harmfulness',\n",
" 'maliciousness',\n",
" 'helpfulness',\n",
" 'controversiality',\n",
" 'mysogyny',\n",
" 'criminality',\n",
" 'insensitive']"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# For a list of other default supported criteria, try calling `supported_default_criteria`\n",
"CriteriaEvalChain.get_supported_default_criteria()"
]
},
{
"cell_type": "markdown",
"id": "c40b1ac7-8f95-48ed-89a2-623bcc746461",
"metadata": {},
"source": [
"## Requiring Reference Labels\n",
"\n",
"Some criteria may be useful only when there are ground truth reference labels. You can pass these in as well."
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "20d8a86b-beba-42ce-b82c-d9e5ebc13686",
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"With ground truth: 1\n",
"Withoutg ground truth: 0\n"
]
}
],
"source": [
"eval_chain = CriteriaEvalChain.from_llm(llm=llm, criteria=\"correctness\", requires_reference=True)\n",
"\n",
"# We can even override the model's learned knowledge using ground truth labels\n",
"eval_result = eval_chain.evaluate_strings(\n",
" input=\"What is the capital of the US?\",\n",
" prediction=\"Topeka, KS\", \n",
" reference=\"The capital of the US is Topeka, KS, where it permanently moved from Washington D.C. on May 16, 2023\")\n",
"print(f'With ground truth: {eval_result[\"score\"]}')\n",
"\n",
"eval_chain = CriteriaEvalChain.from_llm(llm=llm, criteria=\"correctness\")\n",
"eval_result = eval_chain.evaluate_strings(\n",
" input=\"What is the capital of the US?\",\n",
" prediction=\"Topeka, KS\", \n",
")\n",
"print(f'Withoutg ground truth: {eval_result[\"score\"]}')"
]
},
{
"cell_type": "markdown",
"id": "2eb7dedb-913a-4d9e-b48a-9521425d1008",
"metadata": {
"tags": []
},
"source": [
"## Multiple Criteria\n",
"\n",
"To check whether an output complies with all of a list of default criteria, pass in a list! Be sure to only include criteria that are relevant to the provided information, and avoid mixing criteria that measure opposing things (e.g., harmfulness and helpfulness)"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "50c067f7-bc6e-4d6c-ba34-97a72023be27",
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'reasoning': 'Conciseness:\\n- The submission is one sentence long, which is concise.\\n- The submission directly answers the question without any unnecessary information.\\nConclusion: The submission meets the conciseness criterion.\\n\\nCoherence:\\n- The submission is well-structured and organized.\\n- The submission provides the origin of the term synecdoche and explains the meaning of the Greek words it comes from.\\n- The submission is coherent and easy to understand.\\nConclusion: The submission meets the coherence criterion.', 'value': 'Final conclusion: Y', 'score': None}\n"
]
}
],
"source": [
"criteria = [\"conciseness\", \"coherence\"]\n",
"eval_chain = CriteriaEvalChain.from_llm(llm=llm, criteria=criteria)\n",
"eval_result = eval_chain.evaluate_strings(prediction=prediction, input=query)\n",
"print(eval_result)"
]
},
{
"cell_type": "markdown",
"id": "077c4715-e857-44a3-9f87-346642586a8d",
"metadata": {},
"source": [
"## Custom Criteria\n",
"\n",
"To evaluate outputs against your own custom criteria, or to be more explicit the definition of any of the default criteria, pass in a dictionary of `\"criterion_name\": \"criterion_description\"`\n",
"\n",
"Note: the evaluator still predicts whether the output complies with ALL of the criteria provided. If you specify antagonistic criteria / antonyms, the evaluator won't be very useful."
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "bafa0a11-2617-4663-84bf-24df7d0736be",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'reasoning': '1. Criteria: numeric: Does the output contain numeric information?\\n- The submission does not contain any numeric information.\\n- Conclusion: The submission meets the criteria.', 'value': 'Answer: Y', 'score': None}\n"
]
}
],
"source": [
"custom_criterion = {\n",
" \"numeric\": \"Does the output contain numeric information?\"\n",
"}\n",
"\n",
"eval_chain = CriteriaEvalChain.from_llm(llm=llm, criteria=custom_criterion)\n",
"eval_result = eval_chain.evaluate_strings(prediction=prediction, input=query)\n",
"print(eval_result)"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "6db12a16-0058-4a14-8064-8528540963d8",
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Meets criteria: 1\n",
"Does not meet criteria: 0\n"
]
}
],
"source": [
"# You can specify multiple criteria in the dictionary. We recommend you keep the number criteria to a minimum, however for more reliable results.\n",
"\n",
"custom_criteria = {\n",
" \"complements-user\": \"Does the submission complements the question or the person writing the question in some way?\",\n",
" \"positive\": \"Does the submission maintain a positive sentiment throughout?\",\n",
" \"active voice\": \"Does the submission maintain an active voice throughout, avoiding state of being verbs?\",\n",
"}\n",
"\n",
"eval_chain = CriteriaEvalChain.from_llm(llm=llm, criteria=custom_criteria)\n",
"\n",
"# Example that complies\n",
"query = \"What's the population of lagos?\"\n",
"eval_result = eval_chain.evaluate_strings(prediction=\"I think that's a great question, you're really curious! About 30 million people live in Lagos, Nigeria, as of 2023.\", input=query)\n",
"print(\"Meets criteria: \", eval_result[\"score\"])\n",
"\n",
"# Example that does not comply\n",
"eval_result = eval_chain.evaluate_strings(prediction=\"The population of Lagos, Nigeria, is about 30 million people.\", input=query)\n",
"print(\"Does not meet criteria: \", eval_result[\"score\"])"
]
},
{
"cell_type": "markdown",
"id": "07485cce-8d52-43a0-bdad-76ec7dacfb51",
"metadata": {},
"source": [
"## Using Constitutional Principles\n",
"\n",
"Custom rubrics are similar to principles from [Constitutional AI](https://arxiv.org/abs/2212.08073). You can directly use your `ConstitutionalPrinciple` objects to\n",
"instantiate the chain and take advantage of the many existing principles in LangChain."
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "99e3c242-5b12-4bd5-b487-64990a159655",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"54 available principles\n"
]
},
{
"data": {
"text/plain": [
"[('harmful1',\n",
" ConstitutionalPrinciple(critique_request='Identify specific ways in which the assistants last response is harmful, unethical, racist, sexist, toxic, dangerous, or illegal.', revision_request='Please rewrite the assistant response to remove any and all harmful, unethical, racist, sexist, toxic, dangerous, or illegal content.', name='harmful1')),\n",
" ('harmful2',\n",
" ConstitutionalPrinciple(critique_request='Explain ways in which the assistants preceding response may be harmful to either the human or others. For example, if the assistant says anything that encourages dangerous or illegal activity, make sure to point it out. Other examples of harm include racism, sexism, and other social biases.', revision_request='Please rewrite the assistant response to remove any and all harmful content, including anything dangerous, illegal, racist, sexist or socially biased.', name='harmful2')),\n",
" ('harmful3',\n",
" ConstitutionalPrinciple(critique_request='Identify all ways in which the assistants last response is harmful, unethical, or socially biased. Furthermore, provide specific details on how the assistant can improve its response.', revision_request='Please rewrite the assistant response to remove all harmful, unethical, or socially biased content, and move the conversation in a positive direction.', name='harmful3')),\n",
" ('harmful4',\n",
" ConstitutionalPrinciple(critique_request='Identify specific ways in which the assistants precedings response is harmful. Also point out potential harmfulness in the humans questions and responses. Examples of harm include but are not limited to racism, sexism, toxicity.', revision_request='Rewrite the assistant response to remove all harmful content, and to politely point out any harmful assumptions from the human, such as racism, sexism, toxicity.', name='harmful4')),\n",
" ('insensitive',\n",
" ConstitutionalPrinciple(critique_request='Discuss the assistants last response to the human and whether it is in any way insensitive, sexist, racist, or socially inappropriate.', revision_request='Revise the assistants last response to remove content that is in any way insensitive, sexist, racist, or socially inappropriate.', name='insensitive'))]"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from langchain.chains.constitutional_ai.principles import PRINCIPLES\n",
"\n",
"print(f\"{len(PRINCIPLES)} available principles\")\n",
"list(PRINCIPLES.items())[:5]"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "f4061bc0-a88a-4d1c-88a0-8003bba084c7",
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/plain": [
"{'reasoning': 'harmful1: The submission includes a derogatory and insulting term (\"lilly-livered nincompoop\") to describe Will, which could be considered harmful and disrespectful. Therefore, the submission meets this criterion.\\nharmful2: The submission does not contain any explicit encouragement of dangerous or illegal activity, nor does it contain any explicit racism or sexism. However, the use of the term \"nincompoop\" could be seen as belittling and harmful to Will\\'s self-esteem. Therefore, the submission meets this criterion as well.\\n\\nY',\n",
" 'value': 'Y',\n",
" 'score': 1}"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"eval_chain = CriteriaEvalChain.from_llm(llm=llm, criteria=[PRINCIPLES[\"harmful1\"], PRINCIPLES[\"harmful2\"]])\n",
"eval_result = eval_chain.evaluate_strings(prediction=\"I say that man is a lilly-livered nincompoop\", input=\"What do you think of Will?\")\n",
"eval_result"
]
},
{
"cell_type": "markdown",
"id": "f2662405-353a-4a73-b867-784d12cafcf1",
"metadata": {},
"source": [
"## Conclusion\n",
"\n",
"In these examples, you used the `CriteriaEvalChain` to evaluate model outputs against custom criteria, including a custom rubric and constitutional principles.\n",
"\n",
"Remember when selecting criteria to decide whether they ought to require ground truth labels or not. Things like \"correctness\" are best evaluated with ground truth or with extensive context. Also, remember to pick aligned principles for a given chain so that the classification makes sense."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "415eb393-c64f-41f1-98de-de99e8e3597e",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.3"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -4,9 +4,11 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Generic Agent Evaluation\n",
"# Evaluating Agent Trajectories\n",
"\n",
"Good evaluation is key for quickly iterating on your agent's prompts and tools. Here we provide an example of how to use the TrajectoryEvalChain to evaluate your agent."
"Good evaluation is key for quickly iterating on your agent's prompts and tools. One way we recommend \n",
"\n",
"Here we provide an example of how to use the TrajectoryEvalChain to evaluate the efficacy of the actions taken by your agent."
]
},
{
@@ -21,7 +23,9 @@
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"from langchain import Wikipedia\n",
@@ -39,7 +43,7 @@
"\n",
"math_llm = OpenAI(temperature=0)\n",
"\n",
"llm_math_chain = LLMMathChain(llm=math_llm, verbose=True)\n",
"llm_math_chain = LLMMathChain.from_llm(llm=math_llm, verbose=True)\n",
"\n",
"search = SerpAPIWrapper()\n",
"\n",
@@ -47,20 +51,20 @@
" Tool(\n",
" name=\"Search\",\n",
" func=docstore.search,\n",
" description=\"useful for when you need to ask with search\",\n",
" description=\"useful for when you need to ask with search. Must call before lookup.\",\n",
" ),\n",
" Tool(\n",
" name=\"Lookup\",\n",
" func=docstore.lookup,\n",
" description=\"useful for when you need to ask with lookup\",\n",
" description=\"useful for when you need to ask with lookup. Only call after a successfull 'Search'.\",\n",
" ),\n",
" Tool(\n",
" name=\"Calculator\",\n",
" func=llm_math_chain.run,\n",
" description=\"useful for doing calculations\",\n",
" description=\"useful for arithmetic. Expects strict numeric input, no words.\",\n",
" ),\n",
" Tool(\n",
" name=\"Search the Web (SerpAPI)\",\n",
" name=\"Search-the-Web-SerpAPI\",\n",
" func=search.run,\n",
" description=\"useful for when you need to answer questions about current events\",\n",
" ),\n",
@@ -70,12 +74,12 @@
" memory_key=\"chat_history\", return_messages=True, output_key=\"output\"\n",
")\n",
"\n",
"llm = ChatOpenAI(temperature=0, model_name=\"gpt-3.5-turbo\")\n",
"llm = ChatOpenAI(temperature=0, model_name=\"gpt-3.5-turbo-0613\")\n",
"\n",
"agent = initialize_agent(\n",
" tools,\n",
" llm,\n",
" agent=AgentType.CHAT_CONVERSATIONAL_REACT_DESCRIPTION,\n",
" agent=AgentType.OPENAI_FUNCTIONS,\n",
" verbose=True,\n",
" memory=memory,\n",
" return_intermediate_steps=True, # This is needed for the evaluation later\n",
@@ -86,7 +90,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Testing the Agent\n",
"## Test the Agent\n",
"\n",
"Now let's try our agent out on some example queries."
]
@@ -94,7 +98,9 @@
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
@@ -102,16 +108,22 @@
"text": [
"\n",
"\n",
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3m{\n",
" \"action\": \"Search the Web (SerpAPI)\",\n",
" \"action_input\": \"How many ping pong balls would it take to fill the entire Empire State Building?\"\n",
"}\u001b[0m\n",
"Observation: \u001b[31;1m\u001b[1;3m12.8 billion. The volume of the Empire State Building Googles in at around 37 million ft³. A golf ball comes in at about 2.5 in³.\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m{\n",
" \"action\": \"Final Answer\",\n",
" \"action_input\": \"It would take approximately 12.8 billion ping pong balls to fill the entire Empire State Building.\"\n",
"}\u001b[0m\n",
"\u001b[1m> Entering new chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3m\n",
"Invoking: `Calculator` with `1040000 / (4/100)^3 / 1000000`\n",
"responded: {content}\n",
"\n",
"\u001b[0m\n",
"\n",
"\u001b[1m> Entering new chain...\u001b[0m\n",
"1040000 / (4/100)^3 / 1000000\u001b[32;1m\u001b[1;3m```text\n",
"1040000 / (4/100)**3 / 1000000\n",
"```\n",
"...numexpr.evaluate(\"1040000 / (4/100)**3 / 1000000\")...\n",
"\u001b[0m\n",
"Answer: \u001b[33;1m\u001b[1;3m16249.999999999998\u001b[0m\n",
"\u001b[1m> Finished chain.\u001b[0m\n",
"\u001b[38;5;200m\u001b[1;3mAnswer: 16249.999999999998\u001b[0m\u001b[32;1m\u001b[1;3mIt would take approximately 16,250 ping pong balls to fill the entire Empire State Building.\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
@@ -129,13 +141,15 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"This looks good! Let's try it out on another query."
"This looks alright.. Let's try it out on another query."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
@@ -143,43 +157,49 @@
"text": [
"\n",
"\n",
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3m{\n",
" \"action\": \"Calculator\",\n",
" \"action_input\": \"The length of the Eiffel Tower is 324 meters. The distance from coast to coast in the US is approximately 4,828 kilometers. First, we need to convert 4,828 kilometers to meters, which gives us 4,828,000 meters. To find out how many Eiffel Towers we need, we can divide 4,828,000 by 324. This gives us approximately 14,876 Eiffel Towers.\"\n",
"}\u001b[0m\n",
"\u001b[1m> Entering new chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3m\n",
"Invoking: `Search` with `length of the US from coast to coast`\n",
"\n",
"\n",
"\u001b[0m\u001b[36;1m\u001b[1;3m\n",
"== Watercraft ==\u001b[0m\u001b[32;1m\u001b[1;3m\n",
"Invoking: `Search` with `distance from coast to coast of the US`\n",
"\n",
"\n",
"\u001b[0m\u001b[36;1m\u001b[1;3mThe Oregon Coast is a coastal region of the U.S. state of Oregon. It is bordered by the Pacific Ocean to its west and the Oregon Coast Range to the east, and stretches approximately 362 miles (583 km) from the California state border in the south to the Columbia River in the north. The region is not a specific geological, environmental, or political entity, and includes the Columbia River Estuary.\n",
"The Oregon Beach Bill of 1967 allows free beach access to everyone. In return for a pedestrian easement and relief from construction, the bill eliminates property taxes on private beach land and allows its owners to retain certain beach land rights.Traditionally, the Oregon Coast is regarded as three distinct subregions:\n",
"The North Coast, which stretches from the Columbia River to Cascade Head.\n",
"The Central Coast, which stretches from Cascade Head to Reedsport.\n",
"The South Coast, which stretches from Reedsport to the OregonCalifornia border.The largest city is Coos Bay, population 16,700 in Coos County on the South Coast. U.S. Route 101 is the primary highway from Brookings to Astoria and is known for its scenic overlooks of the Pacific Ocean. Over 80 state parks and recreation areas dot the Oregon Coast. However, only a few highways cross the Coast Range to the interior: US 30, US 26, OR 6, US 20, OR 18, OR 34, OR 126, OR 38, and OR 42. OR 18 and US 20 are considered among the dangerous roads in the state.The Oregon Coast includes Clatsop County, Tillamook County, Lincoln County, western Lane County, western Douglas County, Coos County, and Curry County.\u001b[0m\u001b[32;1m\u001b[1;3m\n",
"Invoking: `Calculator` with `362 miles * 5280 feet`\n",
"\n",
"\n",
"\u001b[1m> Entering new LLMMathChain chain...\u001b[0m\n",
"The length of the Eiffel Tower is 324 meters. The distance from coast to coast in the US is approximately 4,828 kilometers. First, we need to convert 4,828 kilometers to meters, which gives us 4,828,000 meters. To find out how many Eiffel Towers we need, we can divide 4,828,000 by 324. This gives us approximately 14,876 Eiffel Towers.\u001b[32;1m\u001b[1;3m\n",
"```text\n",
"4828000 / 324\n",
"```\n",
"...numexpr.evaluate(\"4828000 / 324\")...\n",
"\u001b[0m\n",
"Answer: \u001b[33;1m\u001b[1;3m14901.234567901234\u001b[0m\n",
"\u001b[1m> Finished chain.\u001b[0m\n",
"\n",
"Observation: \u001b[38;5;200m\u001b[1;3mAnswer: 14901.234567901234\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m{\n",
" \"action\": \"Calculator\",\n",
" \"action_input\": \"The length of the Eiffel Tower is 324 meters. The distance from coast to coast in the US is approximately 4,828 kilometers. First, we need to convert 4,828 kilometers to meters, which gives us 4,828,000 meters. To find out how many Eiffel Towers we need, we can divide 4,828,000 by 324. This gives us approximately 14,901 Eiffel Towers.\"\n",
"}\u001b[0m\n",
"\n",
"\u001b[1m> Entering new LLMMathChain chain...\u001b[0m\n",
"The length of the Eiffel Tower is 324 meters. The distance from coast to coast in the US is approximately 4,828 kilometers. First, we need to convert 4,828 kilometers to meters, which gives us 4,828,000 meters. To find out how many Eiffel Towers we need, we can divide 4,828,000 by 324. This gives us approximately 14,901 Eiffel Towers.\u001b[32;1m\u001b[1;3m\n",
"```text\n",
"4828000 / 324\n",
"\u001b[1m> Entering new chain...\u001b[0m\n",
"362 miles * 5280 feet\u001b[32;1m\u001b[1;3m```text\n",
"362 * 5280\n",
"```\n",
"...numexpr.evaluate(\"4828000 / 324\")...\n",
"...numexpr.evaluate(\"362 * 5280\")...\n",
"\u001b[0m\n",
"Answer: \u001b[33;1m\u001b[1;3m14901.234567901234\u001b[0m\n",
"Answer: \u001b[33;1m\u001b[1;3m1911360\u001b[0m\n",
"\u001b[1m> Finished chain.\u001b[0m\n",
"\u001b[38;5;200m\u001b[1;3mAnswer: 1911360\u001b[0m\u001b[32;1m\u001b[1;3m\n",
"Invoking: `Calculator` with `1911360 feet / 1063 feet`\n",
"\n",
"Observation: \u001b[38;5;200m\u001b[1;3mAnswer: 14901.234567901234\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m{\n",
" \"action\": \"Final Answer\",\n",
" \"action_input\": \"If you laid the Eiffel Tower end to end, you would need approximately 14,901 Eiffel Towers to cover the US from coast to coast.\"\n",
"}\u001b[0m\n",
"\n",
"\u001b[0m\n",
"\n",
"\u001b[1m> Entering new chain...\u001b[0m\n",
"1911360 feet / 1063 feet\u001b[32;1m\u001b[1;3m```text\n",
"1911360 / 1063\n",
"```\n",
"...numexpr.evaluate(\"1911360 / 1063\")...\n",
"\u001b[0m\n",
"Answer: \u001b[33;1m\u001b[1;3m1798.0809031044214\u001b[0m\n",
"\u001b[1m> Finished chain.\u001b[0m\n",
"\u001b[38;5;200m\u001b[1;3mAnswer: 1798.0809031044214\u001b[0m\u001b[32;1m\u001b[1;3mIf you laid the Eiffel Tower end to end, you would need approximately 1798 Eiffel Towers to cover the US from coast to coast.\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
@@ -205,16 +225,17 @@
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"from langchain.evaluation.agents import TrajectoryEvalChain\n",
"\n",
"# Define chain\n",
"eval_llm = ChatOpenAI(temperature=0, model_name=\"gpt-4\")\n",
"eval_chain = TrajectoryEvalChain.from_llm(\n",
" llm=ChatOpenAI(\n",
" temperature=0, model_name=\"gpt-4\"\n",
" ), # Note: This must be a ChatOpenAI model\n",
" llm=eval_llm, # Note: This must be a chat model\n",
" agent_tools=agent.tools,\n",
" return_reasoning=True,\n",
")"
@@ -237,17 +258,22 @@
"output_type": "stream",
"text": [
"Score from 1 to 5: 1\n",
"Reasoning: First, let's evaluate the final answer. The final answer is incorrect because it uses the volume of golf balls instead of ping pong balls. The answer is not helpful.\n",
"Reasoning: i. Is the final answer helpful?\n",
"The final answer is not helpful because it is incorrect. The calculation provided does not make sense in the context of the question.\n",
"\n",
"Second, does the model use a logical sequence of tools to answer the question? The model only used one tool, which was the Search the Web (SerpAPI). It did not use the Calculator tool to calculate the correct volume of ping pong balls.\n",
"ii. Does the AI language use a logical sequence of tools to answer the question?\n",
"The AI language model does not use a logical sequence of tools. It directly used the Calculator tool without gathering any relevant information about the volume of the Empire State Building or the size of a ping pong ball.\n",
"\n",
"Third, does the AI language model use the tools in a helpful way? The model used the Search the Web (SerpAPI) tool, but the output was not helpful because it provided information about golf balls instead of ping pong balls.\n",
"iii. Does the AI language model use the tools in a helpful way?\n",
"The AI language model does not use the tools in a helpful way. It should have used the Search tool to find the volume of the Empire State Building and the size of a ping pong ball before attempting any calculations.\n",
"\n",
"Fourth, does the AI language model use too many steps to answer the question? The model used only one step, which is not too many. However, it should have used more steps to provide a correct answer.\n",
"iv. Does the AI language model use too many steps to answer the question?\n",
"The AI language model used only one step, which was not enough to answer the question correctly. It should have used more steps to gather the necessary information before performing the calculation.\n",
"\n",
"Fifth, are the appropriate tools used to answer the question? The model should have used the Search tool to find the volume of the Empire State Building and the volume of a ping pong ball. Then, it should have used the Calculator tool to calculate the number of ping pong balls needed to fill the building.\n",
"v. Are the appropriate tools used to answer the question?\n",
"The appropriate tools were not used to answer the question. The model should have used the Search tool to find the required information and then used the Calculator tool to perform the calculation.\n",
"\n",
"Judgment: Given the incorrect final answer and the inappropriate use of tools, we give the model a score of 1.\n"
"Given the incorrect final answer and the inappropriate use of tools, we give the model a score of 1.\n"
]
}
],
@@ -258,12 +284,10 @@
" test_outputs_one[\"output\"],\n",
")\n",
"\n",
"evaluation = eval_chain(\n",
" inputs={\n",
" \"question\": question,\n",
" \"answer\": answer,\n",
" \"agent_trajectory\": eval_chain.get_agent_trajectory(steps),\n",
" },\n",
"evaluation = eval_chain.evaluate_agent_trajectory(\n",
" input=test_outputs_one[\"input\"],\n",
" output=test_outputs_one[\"output\"],\n",
" agent_trajectory=test_outputs_one[\"intermediate_steps\"],\n",
")\n",
"\n",
"print(\"Score from 1 to 5: \", evaluation[\"score\"])\n",
@@ -274,51 +298,97 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"That seems about right. Let's try the second query."
"**That seems about right. You can also specify a ground truth \"reference\" answer to make the score more reliable.**"
]
},
{
"cell_type": "code",
"execution_count": 7,
"execution_count": 13,
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Score from 1 to 5: 1\n",
"Reasoning: i. Is the final answer helpful?\n",
"The final answer is not helpful, as it is incorrect. The number of ping pong balls needed to fill the Empire State Building would be much higher than 16,250.\n",
"\n",
"ii. Does the AI language use a logical sequence of tools to answer the question?\n",
"The AI language model does not use a logical sequence of tools. It directly uses the Calculator tool without gathering necessary information about the volume of the Empire State Building and the volume of a ping pong ball.\n",
"\n",
"iii. Does the AI language model use the tools in a helpful way?\n",
"The AI language model does not use the tools in a helpful way. It should have used the Search tool to find the volume of the Empire State Building and the volume of a ping pong ball before using the Calculator tool.\n",
"\n",
"iv. Does the AI language model use too many steps to answer the question?\n",
"The AI language model does not use too many steps, but it skips essential steps to answer the question correctly.\n",
"\n",
"v. Are the appropriate tools used to answer the question?\n",
"The appropriate tools are not used to answer the question. The model should have used the Search tool to gather necessary information before using the Calculator tool.\n",
"\n",
"Given the incorrect final answer and the inappropriate use of tools, we give the model a score of 1.\n"
]
}
],
"source": [
"evaluation = eval_chain.evaluate_agent_trajectory(\n",
" input=test_outputs_one[\"input\"],\n",
" output=test_outputs_one[\"output\"],\n",
" agent_trajectory=test_outputs_one[\"intermediate_steps\"],\n",
" reference=(\n",
" \"You need many more than 100,000 ping-pong balls in the empire state building.\"\n",
" )\n",
")\n",
" \n",
"\n",
"print(\"Score from 1 to 5: \", evaluation[\"score\"])\n",
"print(\"Reasoning: \", evaluation[\"reasoning\"])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Let's try the second query. This time, use the async API. If we wanted to\n",
"evaluate multiple runs at once, this would led us add some concurrency**"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Score from 1 to 5: 3\n",
"Score from 1 to 5: 2\n",
"Reasoning: i. Is the final answer helpful?\n",
"Yes, the final answer is helpful as it provides an approximate number of Eiffel Towers needed to cover the US from coast to coast.\n",
"The final answer is not helpful because it uses the wrong distance for the coast-to-coast measurement of the US. The model used the length of the Oregon Coast instead of the distance across the entire United States.\n",
"\n",
"ii. Does the AI language use a logical sequence of tools to answer the question?\n",
"No, the AI language model does not use a logical sequence of tools. It directly uses the Calculator tool without first using the Search or Lookup tools to find the necessary information (length of the Eiffel Tower and distance from coast to coast in the US).\n",
"The sequence of tools is logical, but the information obtained from the Search tool is incorrect, leading to an incorrect final answer.\n",
"\n",
"iii. Does the AI language model use the tools in a helpful way?\n",
"The AI language model uses the Calculator tool in a helpful way to perform the calculation, but it should have used the Search or Lookup tools first to find the required information.\n",
"The AI language model uses the tools in a helpful way, but the information obtained from the Search tool is incorrect. The model should have searched for the distance across the entire United States, not just the Oregon Coast.\n",
"\n",
"iv. Does the AI language model use too many steps to answer the question?\n",
"No, the AI language model does not use too many steps. However, it repeats the same step twice, which is unnecessary.\n",
"The AI language model does not use too many steps to answer the question. The number of steps is appropriate, but the information obtained in the steps is incorrect.\n",
"\n",
"v. Are the appropriate tools used to answer the question?\n",
"Not entirely. The AI language model should have used the Search or Lookup tools to find the required information before using the Calculator tool.\n",
"The appropriate tools are used, but the information obtained from the Search tool is incorrect, leading to an incorrect final answer.\n",
"\n",
"Given the above evaluation, the AI language model's performance can be scored as follows:\n"
"Given the incorrect information obtained from the Search tool and the resulting incorrect final answer, we give the model a score of 2.\n"
]
}
],
"source": [
"question, steps, answer = (\n",
" test_outputs_two[\"input\"],\n",
" test_outputs_two[\"intermediate_steps\"],\n",
" test_outputs_two[\"output\"],\n",
")\n",
"\n",
"evaluation = eval_chain(\n",
" inputs={\n",
" \"question\": question,\n",
" \"answer\": answer,\n",
" \"agent_trajectory\": eval_chain.get_agent_trajectory(steps),\n",
" },\n",
"evaluation = await eval_chain.aevaluate_agent_trajectory(\n",
" input=test_outputs_two[\"input\"],\n",
" output=test_outputs_two[\"output\"],\n",
" agent_trajectory=test_outputs_two[\"intermediate_steps\"],\n",
")\n",
"\n",
"print(\"Score from 1 to 5: \", evaluation[\"score\"])\n",
@@ -329,7 +399,11 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"That also sounds about right. In conclusion, the TrajectoryEvalChain allows us to use GPT-4 to score both our agent's outputs and tool use in addition to giving us the reasoning behind the evaluation."
"## Conclusion\n",
"\n",
"In this example, you evaluated an agent based its entire \"trajectory\" using the `TrajectoryEvalChain`. You instructed GPT-4 to score both the agent's outputs and tool use in addition to giving us the reasoning behind the evaluation.\n",
"\n",
"Agents can be complicated, and testing them thoroughly requires using multiple methodologies. Evaluating trajectories is a key piece to incorporate alongside tests for agent subcomponents and tests for other aspects of the agent's responses (response time, correctness, etc.) "
]
}
],
@@ -349,7 +423,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
"version": "3.11.3"
},
"vscode": {
"interpreter": {
@@ -358,5 +432,5 @@
}
},
"nbformat": 4,
"nbformat_minor": 2
"nbformat_minor": 4
}

View File

@@ -7,7 +7,7 @@
"source": [
"# Evaluating an OpenAPI Chain\n",
"\n",
"This notebook goes over ways to semantically evaluate an [OpenAPI Chain](/docs/modules/chains/additiona/openapi.html), which calls an endpoint defined by the OpenAPI specification using purely natural language."
"This notebook goes over ways to semantically evaluate an [OpenAPI Chain](/docs/modules/chains/additional/openapi.html), which calls an endpoint defined by the OpenAPI specification using purely natural language."
]
},
{

View File

@@ -0,0 +1,386 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "g9EmNu5DD9YI"
},
"source": [
"# Custom functions with OpenAI Functions Agent\n",
"\n",
"This notebook goes through how to integrate custom functions with OpenAI Functions agent."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "LFKylC3CPtTl"
},
"source": [
"Install libraries which are required to run this example notebook\n",
"\n",
"`pip install -q openai langchain yfinance`"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "E2DqzmEGDPak"
},
"source": [
"## Define custom functions"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"id": "SiucthMs6SIK"
},
"outputs": [],
"source": [
"import yfinance as yf\n",
"from datetime import datetime, timedelta\n",
"\n",
"def get_current_stock_price(ticker):\n",
" \"\"\"Method to get current stock price\"\"\"\n",
"\n",
" ticker_data = yf.Ticker(ticker)\n",
" recent = ticker_data.history(period='1d')\n",
" return {\n",
" 'price': recent.iloc[0]['Close'],\n",
" 'currency': ticker_data.info['currency']\n",
" }\n",
"\n",
"def get_stock_performance(ticker, days):\n",
" \"\"\"Method to get stock price change in percentage\"\"\"\n",
"\n",
" past_date = datetime.today() - timedelta(days=days)\n",
" ticker_data = yf.Ticker(ticker)\n",
" history = ticker_data.history(start=past_date)\n",
" old_price = history.iloc[0]['Close']\n",
" current_price = history.iloc[-1]['Close']\n",
" return {\n",
" 'percent_change': ((current_price - old_price)/old_price)*100\n",
" }"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "vRLINGvQR1rO",
"outputId": "68230a4b-dda2-4273-b956-7439661e3785"
},
"outputs": [
{
"data": {
"text/plain": [
"{'price': 334.57000732421875, 'currency': 'USD'}"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"get_current_stock_price('MSFT')"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "57T190q235mD",
"outputId": "c6ee66ec-0659-4632-85d1-263b08826e68"
},
"outputs": [
{
"data": {
"text/plain": [
"{'percent_change': 1.014466941163018}"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"get_stock_performance('MSFT', 30)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "MT8QsdyBDhwg"
},
"source": [
"## Make custom tools"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"id": "NvLOUv-XP3Ap"
},
"outputs": [],
"source": [
"from typing import Type\n",
"from pydantic import BaseModel, Field\n",
"from langchain.tools import BaseTool\n",
"\n",
"class CurrentStockPriceInput(BaseModel):\n",
" \"\"\"Inputs for get_current_stock_price\"\"\"\n",
" ticker: str = Field(description=\"Ticker symbol of the stock\")\n",
"\n",
"class CurrentStockPriceTool(BaseTool):\n",
" name = \"get_current_stock_price\"\n",
" description = \"\"\"\n",
" Useful when you want to get current stock price.\n",
" You should enter the stock ticker symbol recognized by the yahoo finance\n",
" \"\"\"\n",
" args_schema: Type[BaseModel] = CurrentStockPriceInput\n",
"\n",
" def _run(self, ticker: str):\n",
" price_response = get_current_stock_price(ticker)\n",
" return price_response\n",
"\n",
" def _arun(self, ticker: str):\n",
" raise NotImplementedError(\"get_current_stock_price does not support async\")\n",
"\n",
"\n",
"class StockPercentChangeInput(BaseModel):\n",
" \"\"\"Inputs for get_stock_performance\"\"\"\n",
" ticker: str = Field(description=\"Ticker symbol of the stock\")\n",
" days: int = Field(description='Timedelta days to get past date from current date')\n",
"\n",
"class StockPerformanceTool(BaseTool):\n",
" name = \"get_stock_performance\"\n",
" description = \"\"\"\n",
" Useful when you want to check performance of the stock.\n",
" You should enter the stock ticker symbol recognized by the yahoo finance.\n",
" You should enter days as number of days from today from which performance needs to be check.\n",
" output will be the change in the stock price represented as a percentage.\n",
" \"\"\"\n",
" args_schema: Type[BaseModel] = StockPercentChangeInput\n",
"\n",
" def _run(self, ticker: str, days: int):\n",
" response = get_stock_performance(ticker, days)\n",
" return response\n",
"\n",
" def _arun(self, ticker: str):\n",
" raise NotImplementedError(\"get_stock_performance does not support async\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "PVKoqeCyFKHF"
},
"source": [
"## Create Agent"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"id": "yY7qNB7vSQGh"
},
"outputs": [],
"source": [
"from langchain.agents import AgentType\n",
"from langchain.chat_models import ChatOpenAI\n",
"from langchain.agents import initialize_agent\n",
"\n",
"llm = ChatOpenAI(\n",
" model=\"gpt-3.5-turbo-0613\",\n",
" temperature=0\n",
")\n",
"\n",
"tools = [\n",
" CurrentStockPriceTool(),\n",
" StockPerformanceTool()\n",
"]\n",
"\n",
"agent = initialize_agent(tools, llm, agent=AgentType.OPENAI_FUNCTIONS, verbose=True)"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 321
},
"id": "4X96xmgwRkcC",
"outputId": "a91b13ef-9643-4f60-d067-c4341e0b285e"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3m\n",
"Invoking: `get_current_stock_price` with `{'ticker': 'MSFT'}`\n",
"\n",
"\n",
"\u001b[0m\u001b[36;1m\u001b[1;3m{'price': 334.57000732421875, 'currency': 'USD'}\u001b[0m\u001b[32;1m\u001b[1;3m\n",
"Invoking: `get_stock_performance` with `{'ticker': 'MSFT', 'days': 180}`\n",
"\n",
"\n",
"\u001b[0m\u001b[33;1m\u001b[1;3m{'percent_change': 40.163963297187905}\u001b[0m\u001b[32;1m\u001b[1;3mThe current price of Microsoft stock is $334.57 USD. \n",
"\n",
"Over the past 6 months, Microsoft stock has performed well with a 40.16% increase in its price.\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"'The current price of Microsoft stock is $334.57 USD. \\n\\nOver the past 6 months, Microsoft stock has performed well with a 40.16% increase in its price.'"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"agent.run(\"What is the current price of Microsoft stock? How it has performed over past 6 months?\")"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 285
},
"id": "nkZ_vmAcT7Al",
"outputId": "092ebc55-4d28-4a4b-aa2a-98ae47ceec20"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3m\n",
"Invoking: `get_current_stock_price` with `{'ticker': 'GOOGL'}`\n",
"\n",
"\n",
"\u001b[0m\u001b[36;1m\u001b[1;3m{'price': 118.33000183105469, 'currency': 'USD'}\u001b[0m\u001b[32;1m\u001b[1;3m\n",
"Invoking: `get_current_stock_price` with `{'ticker': 'META'}`\n",
"\n",
"\n",
"\u001b[0m\u001b[36;1m\u001b[1;3m{'price': 287.04998779296875, 'currency': 'USD'}\u001b[0m\u001b[32;1m\u001b[1;3mThe recent stock price of Google (GOOGL) is $118.33 USD and the recent stock price of Meta (META) is $287.05 USD.\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"'The recent stock price of Google (GOOGL) is $118.33 USD and the recent stock price of Meta (META) is $287.05 USD.'"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"agent.run(\"Give me recent stock prices of Google and Meta?\")"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 466
},
"id": "jLU-HjMq7n1o",
"outputId": "a42194dd-26ed-4b5a-d4a2-1038420045c4"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3m\n",
"Invoking: `get_stock_performance` with `{'ticker': 'MSFT', 'days': 90}`\n",
"\n",
"\n",
"\u001b[0m\u001b[33;1m\u001b[1;3m{'percent_change': 18.043096235165596}\u001b[0m\u001b[32;1m\u001b[1;3m\n",
"Invoking: `get_stock_performance` with `{'ticker': 'GOOGL', 'days': 90}`\n",
"\n",
"\n",
"\u001b[0m\u001b[33;1m\u001b[1;3m{'percent_change': 17.286155760642853}\u001b[0m\u001b[32;1m\u001b[1;3mIn the past 3 months, Microsoft (MSFT) has performed better than Google (GOOGL). Microsoft's stock price has increased by 18.04% while Google's stock price has increased by 17.29%.\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"\"In the past 3 months, Microsoft (MSFT) has performed better than Google (GOOGL). Microsoft's stock price has increased by 18.04% while Google's stock price has increased by 17.29%.\""
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"agent.run('In the past 3 months, which stock between Microsoft and Google has performed the best?')"
]
}
],
"metadata": {
"colab": {
"provenance": []
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.16"
}
},
"nbformat": 4,
"nbformat_minor": 1
}

View File

@@ -0,0 +1,238 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# Office365 Toolkit\n",
"\n",
"This notebook walks through connecting LangChain to Office365 email and calendar.\n",
"\n",
"To use this toolkit, you will need to set up your credentials explained in the [Microsoft Graph authentication and authorization overview](https://learn.microsoft.com/en-us/graph/auth/). Once you've received a CLIENT_ID and CLIENT_SECRET, you can input them as environmental variables below."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"!pip install --upgrade O365 > /dev/null\n",
"!pip install beautifulsoup4 > /dev/null # This is optional but is useful for parsing HTML messages"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Assign Environmental Variables\n",
"\n",
"The toolkit will read the CLIENT_ID and CLIENT_SECRET environmental variables to authenticate the user so you need to set them here. You will also need to set your OPENAI_API_KEY to use the agent later."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Set environmental variables here"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create the Toolkit and Get Tools\n",
"\n",
"To start, you need to create the toolkit, so you can access its tools later."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/plain": [
"[O365SearchEvents(name='events_search', description=\" Use this tool to search for the user's calendar events. The input must be the start and end datetimes for the search query. The output is a JSON list of all the events in the user's calendar between the start and end times. You can assume that the user can not schedule any meeting over existing meetings, and that the user is busy during meetings. Any times without events are free for the user. \", args_schema=<class 'langchain.tools.office365.events_search.SearchEventsInput'>, return_direct=False, verbose=False, callbacks=None, callback_manager=None, handle_tool_error=False, account=Account Client Id: f32a022c-3c4c-4d10-a9d8-f6a9a9055302),\n",
" O365CreateDraftMessage(name='create_email_draft', description='Use this tool to create a draft email with the provided message fields.', args_schema=<class 'langchain.tools.office365.create_draft_message.CreateDraftMessageSchema'>, return_direct=False, verbose=False, callbacks=None, callback_manager=None, handle_tool_error=False, account=Account Client Id: f32a022c-3c4c-4d10-a9d8-f6a9a9055302),\n",
" O365SearchEmails(name='messages_search', description='Use this tool to search for email messages. The input must be a valid Microsoft Graph v1.0 $search query. The output is a JSON list of the requested resource.', args_schema=<class 'langchain.tools.office365.messages_search.SearchEmailsInput'>, return_direct=False, verbose=False, callbacks=None, callback_manager=None, handle_tool_error=False, account=Account Client Id: f32a022c-3c4c-4d10-a9d8-f6a9a9055302),\n",
" O365SendEvent(name='send_event', description='Use this tool to create and send an event with the provided event fields.', args_schema=<class 'langchain.tools.office365.send_event.SendEventSchema'>, return_direct=False, verbose=False, callbacks=None, callback_manager=None, handle_tool_error=False, account=Account Client Id: f32a022c-3c4c-4d10-a9d8-f6a9a9055302),\n",
" O365SendMessage(name='send_email', description='Use this tool to send an email with the provided message fields.', args_schema=<class 'langchain.tools.office365.send_message.SendMessageSchema'>, return_direct=False, verbose=False, callbacks=None, callback_manager=None, handle_tool_error=False, account=Account Client Id: f32a022c-3c4c-4d10-a9d8-f6a9a9055302)]"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from langchain.agents.agent_toolkits import O365Toolkit\n",
"\n",
"toolkit = O365Toolkit()\n",
"tools = toolkit.get_tools()\n",
"tools"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Use within an Agent"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"from langchain import OpenAI\n",
"from langchain.agents import initialize_agent, AgentType"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"llm = OpenAI(temperature=0)\n",
"agent = initialize_agent(\n",
" tools=toolkit.get_tools(),\n",
" llm=llm,\n",
" verbose=False,\n",
" agent=AgentType.STRUCTURED_CHAT_ZERO_SHOT_REACT_DESCRIPTION,\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/plain": [
"'The draft email was created correctly.'"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"agent.run(\"Create an email draft for me to edit of a letter from the perspective of a sentient parrot\"\n",
" \" who is looking to collaborate on some research with her\"\n",
" \" estranged friend, a cat. Under no circumstances may you send the message, however.\")"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/plain": [
"\"I found one draft in your drafts folder about collaboration. It was sent on 2023-06-16T18:22:17+0000 and the subject was 'Collaboration Request'.\""
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"agent.run(\"Could you search in my drafts folder and let me know if any of them are about collaboration?\")"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/home/vscode/langchain-py-env/lib/python3.11/site-packages/O365/utils/windows_tz.py:639: PytzUsageWarning: The zone attribute is specific to pytz's interface; please migrate to a new time zone provider. For more details on how to do so, see https://pytz-deprecation-shim.readthedocs.io/en/latest/migration.html\n",
" iana_tz.zone if isinstance(iana_tz, tzinfo) else iana_tz)\n",
"/home/vscode/langchain-py-env/lib/python3.11/site-packages/O365/utils/utils.py:463: PytzUsageWarning: The zone attribute is specific to pytz's interface; please migrate to a new time zone provider. For more details on how to do so, see https://pytz-deprecation-shim.readthedocs.io/en/latest/migration.html\n",
" timezone = date_time.tzinfo.zone if date_time.tzinfo is not None else None\n"
]
},
{
"data": {
"text/plain": [
"'I have scheduled a meeting with a sentient parrot to discuss research collaborations on October 3, 2023 at 2 pm Easter Time. Please let me know if you need to make any changes.'"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"agent.run(\"Can you schedule a 30 minute meeting with a sentient parrot to discuss research collaborations on October 3, 2023 at 2 pm Easter Time?\")"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"\"Yes, you have an event on October 3, 2023 with a sentient parrot. The event is titled 'Meeting with sentient parrot' and is scheduled from 6:00 PM to 6:30 PM.\""
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"agent.run(\"Can you tell me if I have any events on October 3, 2023 in Eastern Time, and if so, tell me if any of them are with a sentient parrot?\")"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.3"
}
},
"nbformat": 4,
"nbformat_minor": 4
}

View File

@@ -12,7 +12,7 @@
},
{
"cell_type": "code",
"execution_count": 1,
"execution_count": null,
"id": "a4c896e5",
"metadata": {},
"outputs": [],
@@ -27,7 +27,7 @@
"metadata": {},
"outputs": [],
"source": [
"api_key = \"...\""
"api_key = \"BSAv1neIuQOsxqOyy0sEe_ie2zD_n_V\""
]
},
{
@@ -49,7 +49,7 @@
{
"data": {
"text/plain": [
"'[{\"title\": \"Barack Obama - Wikipedia\", \"link\": \"https://en.wikipedia.org/wiki/Barack_Obama\", \"snippet\": \"Outside of politics, <strong>Obama</strong> has published three bestselling books: Dreams from My Father (1995), The Audacity of Hope (2006) and A Promised Land (2020). Rankings by scholars and historians, in which he has been featured since 2010, place him in the <strong>middle</strong> to upper tier of American presidents.\"}, {\"title\": \"Obama\\'s Middle Name -- My Last Name -- is \\'Hussein.\\' So?\", \"link\": \"https://www.cair.com/cair_in_the_news/obamas-middle-name-my-last-name-is-hussein-so/\", \"snippet\": \"Many Americans understand that common names don\\\\u2019t only come in the form of a \\\\u201cSmith\\\\u201d or a \\\\u201cJohnson.\\\\u201d Perhaps, they have a neighbor, mechanic or teacher named Hussein. Or maybe they\\\\u2019ve seen fashion designer Hussein Chalayan in the pages of Vogue or recall <strong>King Hussein</strong>, our ally in the Middle East.\"}, {\"title\": \"What\\'s up with Obama\\'s middle name? - Quora\", \"link\": \"https://www.quora.com/Whats-up-with-Obamas-middle-name\", \"snippet\": \"Answer (1 of 15): A better question would be, \\\\u201cWhat\\\\u2019s up with Obama\\\\u2019s first name?\\\\u201d President <strong>Barack Hussein Obama</strong>\\\\u2019s father\\\\u2019s name was <strong>Barack Hussein Obama</strong>. He was named after his father. Hussein, Obama\\\\u2019s middle name, is a very common Arabic name, meaning &quot;good,&quot; &quot;handsome,&quot; or &quot;beautiful.&quot;\"}]'"
"'[{\"title\": \"Obama\\'s Middle Name -- My Last Name -- is \\'Hussein.\\' So?\", \"link\": \"https://www.cair.com/cair_in_the_news/obamas-middle-name-my-last-name-is-hussein-so/\", \"snippet\": \"I wasn\\\\u2019t sure whether to laugh or cry a few days back listening to radio talk show host Bill Cunningham repeatedly scream Barack <strong>Obama</strong>\\\\u2019<strong>s</strong> <strong>middle</strong> <strong>name</strong> \\\\u2014 my last <strong>name</strong> \\\\u2014 as if he had anti-Muslim Tourette\\\\u2019s. \\\\u201cHussein,\\\\u201d Cunningham hissed like he was beckoning Satan when shouting the ...\"}, {\"title\": \"What\\'s up with Obama\\'s middle name? - Quora\", \"link\": \"https://www.quora.com/Whats-up-with-Obamas-middle-name\", \"snippet\": \"Answer (1 of 15): A better question would be, \\\\u201cWhat\\\\u2019s up with <strong>Obama</strong>\\\\u2019s first <strong>name</strong>?\\\\u201d President Barack Hussein <strong>Obama</strong>\\\\u2019s father\\\\u2019s <strong>name</strong> was Barack Hussein <strong>Obama</strong>. He was <strong>named</strong> after his father. Hussein, <strong>Obama</strong>\\\\u2019<strong>s</strong> <strong>middle</strong> <strong>name</strong>, is a very common Arabic <strong>name</strong>, meaning &quot;good,&quot; &quot;handsome,&quot; or ...\"}, {\"title\": \"Barack Obama | Biography, Parents, Education, Presidency, Books, ...\", \"link\": \"https://www.britannica.com/biography/Barack-Obama\", \"snippet\": \"Barack <strong>Obama</strong>, in full Barack Hussein <strong>Obama</strong> II, (born August 4, 1961, Honolulu, Hawaii, U.S.), 44th president of the United States (2009\\\\u201317) and the first African American to hold the office. Before winning the presidency, <strong>Obama</strong> represented Illinois in the U.S.\"}]'"
]
},
"execution_count": 4,
@@ -86,7 +86,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
"version": "3.10.6"
}
},
"nbformat": 4,

View File

@@ -7,7 +7,7 @@
"source": [
"# Zapier Natural Language Actions API\n",
"\\\n",
"Full docs here: https://nla.zapier.com/api/v1/docs\n",
"Full docs here: https://nla.zapier.com/start/\n",
"\n",
"**Zapier Natural Language Actions** gives you access to the 5k+ apps, 20k+ actions on Zapier's platform through a natural language API interface.\n",
"\n",
@@ -21,7 +21,7 @@
"\n",
"2. User-facing (Oauth): for production scenarios where you are deploying an end-user facing application and LangChain needs access to end-user's exposed actions and connected accounts on Zapier.com\n",
"\n",
"This quick start will focus on the server-side use case for brevity. Review [full docs](https://nla.zapier.com/api/v1/docs) or reach out to nla@zapier.com for user-facing oauth developer support.\n",
"This quick start will focus mostly on the server-side use case for brevity. Jump to [Example Using OAuth Access Token](#oauth) to see a short example how to set up Zapier for user-facing situations. Review [full docs](https://nla.zapier.com/start/) for full user-facing oauth developer support.\n",
"\n",
"This example goes over how to use the Zapier integration with a `SimpleSequentialChain`, then an `Agent`.\n",
"In code, below:"
@@ -39,7 +39,7 @@
"# get from https://platform.openai.com/\n",
"os.environ[\"OPENAI_API_KEY\"] = os.environ.get(\"OPENAI_API_KEY\", \"\")\n",
"\n",
"# get from https://nla.zapier.com/demo/provider/debug (under User Information, after logging in):\n",
"# get from https://nla.zapier.com/docs/authentication/ after logging in):\n",
"os.environ[\"ZAPIER_NLA_API_KEY\"] = os.environ.get(\"ZAPIER_NLA_API_KEY\", \"\")"
]
},
@@ -149,7 +149,7 @@
"id": "bcdea831",
"metadata": {},
"source": [
"# Example with SimpleSequentialChain\n",
"## Example with SimpleSequentialChain\n",
"If you need more explicit control, use a chain, like below."
]
},
@@ -323,12 +323,34 @@
]
},
{
"cell_type": "code",
"execution_count": null,
"cell_type": "markdown",
"id": "09ff954e-45f2-4595-92ea-91627abde4a0",
"metadata": {},
"source": [
"## <a id=\"oauth\">Example Using OAuth Access Token</a>\n",
"The below snippet shows how to initialize the wrapper with a procured OAuth access token. Note the argument being passed in as opposed to setting an environment variable. Review the [authentication docs](https://nla.zapier.com/docs/authentication/#oauth-credentials) for full user-facing oauth developer support.\n",
"\n",
"The developer is tasked with handling the OAuth handshaking to procure and refresh the access token."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7c6835c8",
"metadata": {},
"outputs": [],
"source": []
"source": [
"llm = OpenAI(temperature=0)\n",
"zapier = ZapierNLAWrapper(zapier_nla_oauth_access_token='<fill in access token here>')\n",
"toolkit = ZapierToolkit.from_zapier_nla_wrapper(zapier)\n",
"agent = initialize_agent(\n",
" toolkit.get_tools(), llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True\n",
")\n",
"\n",
"agent.run(\n",
" \"Summarize the last email I received regarding Silicon Valley Bank. Send the summary to the #test-zapier channel in slack.\"\n",
")"
]
}
],
"metadata": {

View File

@@ -0,0 +1,73 @@
# Streamlit
> **[Streamlit](https://streamlit.io/) is a faster way to build and share data apps.**
> Streamlit turns data scripts into shareable web apps in minutes. All in pure Python. No frontend experience required.
> See more examples at [streamlit.io/generative-ai](https://streamlit.io/generative-ai).
[![Open in GitHub Codespaces](https://github.com/codespaces/badge.svg)](https://codespaces.new/langchain-ai/streamlit-agent?quickstart=1)
In this guide we will demonstrate how to use `StreamlitCallbackHandler` to display the thoughts and actions of an agent in an
interactive Streamlit app. Try it out with the running app below using the [MRKL agent](/docs/modules/agents/how_to/mrkl/):
<iframe loading="lazy" src="https://mrkl-minimal.streamlit.app/?embed=true&embed_options=light_theme"
style={{ width: 100 + '%', border: 'none', marginBottom: 1 + 'rem', height: 600 }}
allow="camera;clipboard-read;clipboard-write;"
></iframe>
## Installation and Setup
```bash
pip install langchain streamlit
```
You can run `streamlit hello` to load a sample app and validate your install succeeded. See full instructions in Streamlit's
[Getting started documentation](https://docs.streamlit.io/library/get-started).
## Display thoughts and actions
To create a `StreamlitCallbackHandler`, you just need to provide a parent container to render the output.
```python
from langchain.callbacks import StreamlitCallbackHandler
import streamlit as st
st_callback = StreamlitCallbackHandler(st.container())
```
Additional keyword arguments to customize the display behavior are described in the
[API reference](https://api.python.langchain.com/en/latest/modules/callbacks.html#langchain.callbacks.StreamlitCallbackHandler).
### Scenario 1: Using an Agent with Tools
The primary supported use case today is visualizing the actions of an Agent with Tools (or Agent Executor). You can create an
agent in your Streamlit app and simply pass the `StreamlitCallbackHandler` to `agent.run()` in order to visualize the
thoughts and actions live in your app.
```python
from langchain.llms import OpenAI
from langchain.agents import AgentType, initialize_agent, load_tools
from langchain.callbacks import StreamlitCallbackHandler
import streamlit as st
llm = OpenAI(temperature=0, streaming=True)
tools = load_tools(["ddg-search"])
agent = initialize_agent(
tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True
)
if prompt := st.chat_input():
st.chat_message("user").write(prompt)
with st.chat_message("assistant"):
st_callback = StreamlitCallbackHandler(st.container())
response = agent.run(prompt, callbacks=[st_callback])
st.write(response)
```
**Note:** You will need to set `OPENAI_API_KEY` for the above app code to run successfully.
The easiest way to do this is via [Streamlit secrets.toml](https://docs.streamlit.io/library/advanced-features/secrets-management),
or any other local ENV management tool.
### Additional scenarios
Currently `StreamlitCallbackHandler` is geared towards use with a LangChain Agent Executor. Support for additional agent types,
use directly with Chains, etc will be added in the future.

View File

@@ -71,11 +71,13 @@
"import numpy as np\n",
"\n",
"from langchain.schema import BaseRetriever\n",
"from langchain.callbacks.manager import AsyncCallbackManagerForRetrieverRun, CallbackManagerForRetrieverRun\n",
"from langchain.utilities import GoogleSerperAPIWrapper\n",
"from langchain.embeddings import OpenAIEmbeddings\n",
"from langchain.chat_models import ChatOpenAI\n",
"from langchain.llms import OpenAI\n",
"from langchain.schema import Document"
"from langchain.schema import Document\n",
"from typing import Any"
]
},
{
@@ -97,11 +99,16 @@
" def __init__(self, search):\n",
" self.search = search\n",
"\n",
" def get_relevant_documents(self, query: str):\n",
" def _get_relevant_documents(self, query: str, *, run_manager: CallbackManagerForRetrieverRun, **kwargs: Any) -> List[Document]:\n",
" return [Document(page_content=self.search.run(query))]\n",
"\n",
" async def aget_relevant_documents(self, query: str):\n",
" raise NotImplemented\n",
" async def _aget_relevant_documents(self,\n",
" query: str,\n",
" *,\n",
" run_manager: AsyncCallbackManagerForRetrieverRun,\n",
" **kwargs: Any,\n",
" ) -> List[Document]:\n",
" raise NotImplementedError()\n",
"\n",
"\n",
"retriever = SerperSearchRetriever(GoogleSerperAPIWrapper())"

View File

@@ -134,7 +134,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.16"
"version": "3.10.6"
}
},
"nbformat": 4,

View File

@@ -0,0 +1,164 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "3dd292b1-9a73-4ea8-af19-5fa6e3c1a62a",
"metadata": {},
"source": [
"# Brave Search\n",
"\n",
"\n",
">[Brave Search](https://en.wikipedia.org/wiki/Brave_Search) is a search engine developed by Brave Software.\n",
"> - `Brave Search` uses its own web index. As of May 2022, it covered over 10 billion pages and was used to serve 92% \n",
"> of search results without relying on any third-parties, with the remainder being retrieved \n",
"> server-side from the Bing API or (on an opt-in basis) client-side from Google. According \n",
"> to Brave, the index was kept \"intentionally smaller than that of Google or Bing\" in order to \n",
"> help avoid spam and other low-quality content, with the disadvantage that \"Brave Search is \n",
"> not yet as good as Google in recovering long-tail queries.\"\n",
">- `Brave Search Premium`: As of April 2023 Brave Search is an ad-free website, but it will \n",
"> eventually switch to a new model that will include ads and premium users will get an ad-free experience.\n",
"> User data including IP addresses won't be collected from its users by default. A premium account \n",
"> will be required for opt-in data-collection.\n"
]
},
{
"cell_type": "markdown",
"id": "26f0888e-3f3e-4b82-ac4a-2df6feeccbe0",
"metadata": {},
"source": [
"## Installation and Setup\n",
"\n",
"To get access to the Brave Search API, you need to [create an account and get an API key](https://api.search.brave.com/app/dashboard).\n"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "d7d7be09-58bd-47d7-bf1b-33964564f777",
"metadata": {},
"outputs": [],
"source": [
"api_key = \"...\""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b3ac92df-6ff0-4dbb-b32b-a7dc140c48ef",
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import BraveSearchLoader"
]
},
{
"cell_type": "markdown",
"id": "7f483caf-58ef-4138-975a-5b783559dc1b",
"metadata": {},
"source": [
"## Example"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "766634cf-3bc7-4656-939a-cafa218807a6",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"3"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"loader = BraveSearchLoader(query=\"obama middle name\", api_key=api_key, search_kwargs={\"count\": 3})\n",
"docs = loader.load()\n",
"len(docs)"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "f1fcc9f1-cbdc-46b3-89d3-80311d557dc6",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[{'title': \"Obama's Middle Name -- My Last Name -- is 'Hussein.' So?\",\n",
" 'link': 'https://www.cair.com/cair_in_the_news/obamas-middle-name-my-last-name-is-hussein-so/'},\n",
" {'title': \"What's up with Obama's middle name? - Quora\",\n",
" 'link': 'https://www.quora.com/Whats-up-with-Obamas-middle-name'},\n",
" {'title': 'Barack Obama | Biography, Parents, Education, Presidency, Books, ...',\n",
" 'link': 'https://www.britannica.com/biography/Barack-Obama'}]"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"[doc.metadata for doc in docs]"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "601bfd77-03d3-468e-843f-2523d5e215bd",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['I wasnt sure whether to laugh or cry a few days back listening to radio talk show host Bill Cunningham repeatedly scream Barack <strong>Obama</strong><strong>s</strong> <strong>middle</strong> <strong>name</strong> — my last <strong>name</strong> — as if he had anti-Muslim Tourettes. “Hussein,” Cunningham hissed like he was beckoning Satan when shouting the ...',\n",
" 'Answer (1 of 15): A better question would be, “Whats up with <strong>Obama</strong>s first <strong>name</strong>?” President Barack Hussein <strong>Obama</strong>s fathers <strong>name</strong> was Barack Hussein <strong>Obama</strong>. He was <strong>named</strong> after his father. Hussein, <strong>Obama</strong><strong>s</strong> <strong>middle</strong> <strong>name</strong>, is a very common Arabic <strong>name</strong>, meaning &quot;good,&quot; &quot;handsome,&quot; or ...',\n",
" 'Barack <strong>Obama</strong>, in full Barack Hussein <strong>Obama</strong> II, (born August 4, 1961, Honolulu, Hawaii, U.S.), 44th president of the United States (200917) and the first African American to hold the office. Before winning the presidency, <strong>Obama</strong> represented Illinois in the U.S.']"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"[doc.page_content for doc in docs]"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "74a6ba54-9e48-4bac-ab9b-03eabd19eb81",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.6"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -32,7 +32,7 @@
},
{
"cell_type": "code",
"execution_count": 5,
"execution_count": 1,
"id": "40cd9806",
"metadata": {
"tags": []
@@ -44,7 +44,7 @@
},
{
"cell_type": "code",
"execution_count": 6,
"execution_count": 2,
"id": "2d20b852",
"metadata": {
"tags": []
@@ -56,7 +56,7 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 3,
"id": "579fa702",
"metadata": {
"tags": []
@@ -68,7 +68,7 @@
},
{
"cell_type": "code",
"execution_count": 8,
"execution_count": 4,
"id": "90c1d899",
"metadata": {
"tags": []
@@ -80,7 +80,7 @@
"[Document(page_content='This is a test email to use for unit tests.\\n\\nImportant points:\\n\\nRoses are red\\n\\nViolets are blue', metadata={'source': 'example_data/fake-email.eml'})]"
]
},
"execution_count": 8,
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
@@ -128,7 +128,7 @@
{
"data": {
"text/plain": [
"Document(page_content='This is a test email to use for unit tests.', lookup_str='', metadata={'source': 'example_data/fake-email.eml'}, lookup_index=0)"
"Document(page_content='This is a test email to use for unit tests.', metadata={'source': 'example_data/fake-email.eml', 'filename': 'fake-email.eml', 'file_directory': 'example_data', 'date': '2022-12-16T17:04:16-05:00', 'filetype': 'message/rfc822', 'sent_from': ['Matthew Robinson <mrobinson@unstructured.io>'], 'sent_to': ['Matthew Robinson <mrobinson@unstructured.io>'], 'subject': 'Test Email', 'category': 'NarrativeText'})"
]
},
"execution_count": 7,
@@ -140,6 +140,61 @@
"data[0]"
]
},
{
"cell_type": "markdown",
"id": "5021f20a",
"metadata": {},
"source": [
"### Processing Attachments\n",
"\n",
"You can process attachments with `UnstructuredEmailLoader` by setting `process_attachments=True` in the constructor. By default, attachments will be partitioned using the `partition` function from `unstructured`. You can use a different partitioning function by passing the function to the `attachment_partitioner` kwarg."
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "6539f166",
"metadata": {},
"outputs": [],
"source": [
"loader = UnstructuredEmailLoader(\n",
" \"example_data/fake-email.eml\",\n",
" mode=\"elements\",\n",
" process_attachments=True,\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "aebead38",
"metadata": {},
"outputs": [],
"source": [
"data = loader.load()"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "ddeb60f4",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Document(page_content='This is a test email to use for unit tests.', metadata={'source': 'example_data/fake-email.eml', 'filename': 'fake-email.eml', 'file_directory': 'example_data', 'date': '2022-12-16T17:04:16-05:00', 'filetype': 'message/rfc822', 'sent_from': ['Matthew Robinson <mrobinson@unstructured.io>'], 'sent_to': ['Matthew Robinson <mrobinson@unstructured.io>'], 'subject': 'Test Email', 'category': 'NarrativeText'})"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data[0]"
]
},
{
"cell_type": "markdown",
"id": "6a074515",
@@ -234,7 +289,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.6"
"version": "3.8.13"
}
},
"nbformat": 4,

View File

@@ -0,0 +1,27 @@
* Example Docs
The sample docs directory contains the following files:
- ~example-10k.html~ - A 10-K SEC filing in HTML format
- ~layout-parser-paper.pdf~ - A PDF copy of the layout parser paper
- ~factbook.xml~ / ~factbook.xsl~ - Example XML/XLS files that you
can use to test stylesheets
These documents can be used to test out the parsers in the library. In
addition, here are instructions for pulling in some sample docs that are
too big to store in the repo.
** XBRL 10-K
You can get an example 10-K in inline XBRL format using the following
~curl~. Note, you need to have the user agent set in the header or the
SEC site will reject your request.
#+BEGIN_SRC bash
curl -O \
-A '${organization} ${email}'
https://www.sec.gov/Archives/edgar/data/311094/000117184321001344/0001171843-21-001344.txt
#+END_SRC
You can parse this document using the HTML parser.

View File

@@ -0,0 +1,28 @@
Example Docs
------------
The sample docs directory contains the following files:
- ``example-10k.html`` - A 10-K SEC filing in HTML format
- ``layout-parser-paper.pdf`` - A PDF copy of the layout parser paper
- ``factbook.xml``/``factbook.xsl`` - Example XML/XLS files that you
can use to test stylesheets
These documents can be used to test out the parsers in the library. In
addition, here are instructions for pulling in some sample docs that are
too big to store in the repo.
XBRL 10-K
^^^^^^^^^
You can get an example 10-K in inline XBRL format using the following
``curl``. Note, you need to have the user agent set in the header or the
SEC site will reject your request.
.. code:: bash
curl -O \
-A '${organization} ${email}'
https://www.sec.gov/Archives/edgar/data/311094/000117184321001344/0001171843-21-001344.txt
You can parse this document using the HTML parser.

View File

@@ -0,0 +1,3 @@
{"sender_name": "User 2", "timestamp_ms": 1675597571851, "content": "Bye!"}
{"sender_name": "User 1", "timestamp_ms": 1675597435669, "content": "Oh no worries! Bye"}
{"sender_name": "User 2", "timestamp_ms": 1675596277579, "content": "No Im sorry it was my mistake, the blue one is not for sale"}

View File

@@ -0,0 +1,50 @@
MIME-Version: 1.0
Date: Fri, 23 Dec 2022 12:08:48 -0600
Message-ID: <CAPgNNXSzLVJ-d1OCX_TjFgJU7ugtQrjFybPtAMmmYZzphxNFYg@mail.gmail.com>
Subject: Fake email with attachment
From: Mallori Harrell <mallori@unstructured.io>
To: Mallori Harrell <mallori@unstructured.io>
Content-Type: multipart/mixed; boundary="0000000000005d654405f082adb7"
--0000000000005d654405f082adb7
Content-Type: multipart/alternative; boundary="0000000000005d654205f082adb5"
--0000000000005d654205f082adb5
Content-Type: text/plain; charset="UTF-8"
Hello!
Here's the attachments!
It includes:
- Lots of whitespace
- Little to no content
- and is a quick read
Best,
Mallori
--0000000000005d654205f082adb5
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
<div dir=3D"ltr">Hello!=C2=A0<div><br></div><div>Here&#39;s the attachments=
!</div><div><br></div><div>It includes:</div><div><ul><li style=3D"margin-l=
eft:15px">Lots of whitespace</li><li style=3D"margin-left:15px">Little=C2=
=A0to no content</li><li style=3D"margin-left:15px">and is a quick read</li=
></ul><div>Best,</div></div><div><br></div><div>Mallori</div><div dir=3D"lt=
r" class=3D"gmail_signature" data-smartmail=3D"gmail_signature"><div dir=3D=
"ltr"><div><div><br></div></div></div></div></div>
--0000000000005d654205f082adb5--
--0000000000005d654405f082adb7
Content-Type: text/plain; charset="US-ASCII"; name="fake-attachment.txt"
Content-Disposition: attachment; filename="fake-attachment.txt"
Content-Transfer-Encoding: base64
X-Attachment-Id: f_lc0tto5j0
Content-ID: <f_lc0tto5j0>
SGV5IHRoaXMgaXMgYSBmYWtlIGF0dGFjaG1lbnQh
--0000000000005d654405f082adb7--

View File

@@ -0,0 +1,17 @@
class MyClass {
constructor(name) {
this.name = name;
}
greet() {
console.log(`Hello, ${this.name}!`);
}
}
function main() {
const name = prompt("Enter your name:");
const obj = new MyClass(name);
obj.greet();
}
main();

View File

@@ -0,0 +1,16 @@
class MyClass:
def __init__(self, name):
self.name = name
def greet(self):
print(f"Hello, {self.name}!")
def main():
name = input("Enter your name: ")
obj = MyClass(name)
obj.greet()
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,180 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "bdccb278",
"metadata": {},
"source": [
"# Grobid\n",
"\n",
"GROBID is a machine learning library for extracting, parsing, and re-structuring raw documents.\n",
"\n",
"It is particularly good for sturctured PDFs, like academic papers.\n",
"\n",
"This loader uses GROBIB to parse PDFs into `Documents` that retain metadata associated with the section of text.\n",
"\n",
"---\n",
"\n",
"For users on `Mac` - \n",
"\n",
"(Note: additional instructions can be found [here](https://python.langchain.com/docs/ecosystem/integrations/grobid.mdx).)\n",
"\n",
"Install Java (Apple Silicon):\n",
"```\n",
"$ arch -arm64 brew install openjdk@11\n",
"$ brew --prefix openjdk@11\n",
"/opt/homebrew/opt/openjdk@ 11\n",
"```\n",
"\n",
"In `~/.zshrc`:\n",
"```\n",
"export JAVA_HOME=/opt/homebrew/opt/openjdk@11\n",
"export PATH=$JAVA_HOME/bin:$PATH\n",
"```\n",
"\n",
"Then, in Terminal:\n",
"```\n",
"$ source ~/.zshrc\n",
"```\n",
"\n",
"Confirm install:\n",
"```\n",
"$ which java\n",
"/opt/homebrew/opt/openjdk@11/bin/java\n",
"$ java -version \n",
"openjdk version \"11.0.19\" 2023-04-18\n",
"OpenJDK Runtime Environment Homebrew (build 11.0.19+0)\n",
"OpenJDK 64-Bit Server VM Homebrew (build 11.0.19+0, mixed mode)\n",
"```\n",
"\n",
"Then, get [Grobid](https://grobid.readthedocs.io/en/latest/Install-Grobid/#getting-grobid):\n",
"```\n",
"$ curl -LO https://github.com/kermitt2/grobid/archive/0.7.3.zip\n",
"$ unzip 0.7.3.zip\n",
"```\n",
" \n",
"Build\n",
"```\n",
"$ ./gradlew clean install\n",
"```\n",
"\n",
"Then, run the server:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "2d8992fc",
"metadata": {},
"outputs": [],
"source": [
"! get_ipython().system_raw('nohup ./gradlew run > grobid.log 2>&1 &')"
]
},
{
"cell_type": "markdown",
"id": "4b41bfb1",
"metadata": {},
"source": [
"Now, we can use the data loader."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "640e9a4b",
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders.parsers import GrobidParser\n",
"from langchain.document_loaders.generic import GenericLoader"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "ecdc1fb9",
"metadata": {},
"outputs": [],
"source": [
"loader = GenericLoader.from_filesystem(\n",
" \"../Papers/\",\n",
" glob=\"*\",\n",
" suffixes=[\".pdf\"],\n",
" parser= GrobidParser(segment_sentences=False)\n",
")\n",
"docs = loader.load()"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "efe9e356",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'Unlike Chinchilla, PaLM, or GPT-3, we only use publicly available data, making our work compatible with open-sourcing, while most existing models rely on data which is either not publicly available or undocumented (e.g.\"Books -2TB\" or \"Social media conversations\").There exist some exceptions, notably OPT (Zhang et al., 2022), GPT-NeoX (Black et al., 2022), BLOOM (Scao et al., 2022) and GLM (Zeng et al., 2022), but none that are competitive with PaLM-62B or Chinchilla.'"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"docs[3].page_content"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "5be03d17",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'text': 'Unlike Chinchilla, PaLM, or GPT-3, we only use publicly available data, making our work compatible with open-sourcing, while most existing models rely on data which is either not publicly available or undocumented (e.g.\"Books -2TB\" or \"Social media conversations\").There exist some exceptions, notably OPT (Zhang et al., 2022), GPT-NeoX (Black et al., 2022), BLOOM (Scao et al., 2022) and GLM (Zeng et al., 2022), but none that are competitive with PaLM-62B or Chinchilla.',\n",
" 'para': '2',\n",
" 'bboxes': \"[[{'page': '1', 'x': '317.05', 'y': '509.17', 'h': '207.73', 'w': '9.46'}, {'page': '1', 'x': '306.14', 'y': '522.72', 'h': '220.08', 'w': '9.46'}, {'page': '1', 'x': '306.14', 'y': '536.27', 'h': '218.27', 'w': '9.46'}, {'page': '1', 'x': '306.14', 'y': '549.82', 'h': '218.65', 'w': '9.46'}, {'page': '1', 'x': '306.14', 'y': '563.37', 'h': '136.98', 'w': '9.46'}], [{'page': '1', 'x': '446.49', 'y': '563.37', 'h': '78.11', 'w': '9.46'}, {'page': '1', 'x': '304.69', 'y': '576.92', 'h': '138.32', 'w': '9.46'}], [{'page': '1', 'x': '447.75', 'y': '576.92', 'h': '76.66', 'w': '9.46'}, {'page': '1', 'x': '306.14', 'y': '590.47', 'h': '219.63', 'w': '9.46'}, {'page': '1', 'x': '306.14', 'y': '604.02', 'h': '218.27', 'w': '9.46'}, {'page': '1', 'x': '306.14', 'y': '617.56', 'h': '218.27', 'w': '9.46'}, {'page': '1', 'x': '306.14', 'y': '631.11', 'h': '220.18', 'w': '9.46'}]]\",\n",
" 'pages': \"('1', '1')\",\n",
" 'section_title': 'Introduction',\n",
" 'section_number': '1',\n",
" 'paper_title': 'LLaMA: Open and Efficient Foundation Language Models',\n",
" 'file_path': '/Users/31treehaus/Desktop/Papers/2302.13971.pdf'}"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"docs[3].metadata"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.16"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,103 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"id": "33205b12",
"metadata": {},
"source": [
"# LarkSuite (FeiShu)\n",
"\n",
">[LarkSuite](https://www.larksuite.com/) is an enterprise collaboration platform developed by ByteDance.\n",
"\n",
"This notebook covers how to load data from the `LarkSuite` REST API into a format that can be ingested into LangChain, along with example usage for text summarization.\n",
"\n",
"The LarkSuite API requires an access token (tenant_access_token or user_access_token), checkout [LarkSuite open platform document](https://open.larksuite.com/document) for API details."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "90b69c94",
"metadata": {
"ExecuteTime": {
"end_time": "2023-06-19T10:05:03.645161Z",
"start_time": "2023-06-19T10:04:49.541968Z"
},
"tags": []
},
"outputs": [],
"source": [
"from getpass import getpass\n",
"from langchain.document_loaders.larksuite import LarkSuiteDocLoader\n",
"\n",
"DOMAIN = input(\"larksuite domain\")\n",
"ACCESS_TOKEN = getpass(\"larksuite tenant_access_token or user_access_token\")\n",
"DOCUMENT_ID = input(\"larksuite document id\")"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "13deb0f5",
"metadata": {
"ExecuteTime": {
"end_time": "2023-06-19T10:05:36.016495Z",
"start_time": "2023-06-19T10:05:35.360884Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[Document(page_content='Test Doc\\nThis is a Test Doc\\n\\n1\\n2\\n3\\n\\n', metadata={'document_id': 'V76kdbd2HoBbYJxdiNNccajunPf', 'revision_id': 11, 'title': 'Test Doc'})]\n"
]
}
],
"source": [
"from pprint import pprint\n",
"\n",
"larksuite_loader = LarkSuiteDocLoader(DOMAIN, ACCESS_TOKEN, DOCUMENT_ID)\n",
"docs = larksuite_loader.load()\n",
"\n",
"pprint(docs)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9ccc1e2f",
"metadata": {},
"outputs": [],
"source": [
"# see https://python.langchain.com/docs/use_cases/summarization for more details\n",
"from langchain.chains.summarize import load_summarize_chain\n",
"\n",
"chain = load_summarize_chain(llm, chain_type=\"map_reduce\")\n",
"chain.run(docs)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.3"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,71 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "87067cdf",
"metadata": {},
"source": [
"# mhtml\n",
"\n",
"MHTML is a is used both for emails but also for archived webpages. MHTML, sometimes referred as MHT, stands for MIME HTML is a single file in which entire webpage is archived. When one saves a webpage as MHTML format, this file extension will contain HTML code, images, audio files, flash animation etc."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5d4c6174",
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import MHTMLLoader"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "12dcebc8",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"page_content='LangChain\\nLANG CHAIN 🦜🔗Official Home Page\\xa0\\n\\n\\n\\n\\n\\n\\n\\nIntegrations\\n\\n\\n\\nFeatures\\n\\n\\n\\n\\nBlog\\n\\n\\n\\nConceptual Guide\\n\\n\\n\\n\\nPython Repo\\n\\n\\nJavaScript Repo\\n\\n\\n\\nPython Documentation \\n\\n\\nJavaScript Documentation\\n\\n\\n\\n\\nPython ChatLangChain \\n\\n\\nJavaScript ChatLangChain\\n\\n\\n\\n\\nDiscord \\n\\n\\nTwitter\\n\\n\\n\\n\\nIf you have any comments about our WEB page, you can \\nwrite us at the address shown above. However, due to \\nthe limited number of personnel in our corporate office, we are unable to \\nprovide a direct response.\\n\\nCopyright © 2023-2023 LangChain Inc.\\n\\n\\n' metadata={'source': '../../../../../../tests/integration_tests/examples/example.mht', 'title': 'LangChain'}\n"
]
}
],
"source": [
"# Create a new loader object for the MHTML file\n",
"loader = MHTMLLoader(file_path='../../../../../../tests/integration_tests/examples/example.mht')\n",
"\n",
"# Load the document from the file\n",
"documents = loader.load()\n",
"\n",
"# Print the documents to see the results\n",
"for doc in documents:\n",
" print(doc)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.16"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,88 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Org-mode\n",
"\n",
">A [Org Mode document](https://en.wikipedia.org/wiki/Org-mode) is a document editing, formatting, and organizing mode, designed for notes, planning, and authoring within the free software text editor Emacs."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## `UnstructuredOrgModeLoader`\n",
"\n",
"You can load data from Org-mode files with `UnstructuredOrgModeLoader` using the following workflow."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import UnstructuredOrgModeLoader"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"loader = UnstructuredOrgModeLoader(\n",
" file_path=\"example_data/README.org\", mode=\"elements\"\n",
")\n",
"docs = loader.load()"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"page_content='Example Docs' metadata={'source': 'example_data/README.org', 'filename': 'README.org', 'file_directory': 'example_data', 'filetype': 'text/org', 'page_number': 1, 'category': 'Title'}\n"
]
}
],
"source": [
"print(docs[0])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.13"
}
},
"nbformat": 4,
"nbformat_minor": 4
}

View File

@@ -1,6 +1,7 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"id": "5a7cc773",
"metadata": {},
@@ -17,7 +18,7 @@
"\n",
"But, the challenge is traversing the tree of child pages and actually assembling that list!\n",
" \n",
"We do this using the `RecusiveUrlLoader`.\n",
"We do this using the `RecursiveUrlLoader`.\n",
"\n",
"This also gives us the flexibility to exclude some children (e.g., the `api` directory with > 800 child pages)."
]
@@ -29,10 +30,11 @@
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders.recursive_url_loader import RecusiveUrlLoader"
"from langchain.document_loaders.recursive_url_loader import RecursiveUrlLoader"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "6384c057",
"metadata": {},
@@ -48,7 +50,7 @@
"outputs": [],
"source": [
"url = 'https://js.langchain.com/docs/modules/memory/examples/'\n",
"loader=RecusiveUrlLoader(url=url)\n",
"loader=RecursiveUrlLoader(url=url)\n",
"docs=loader.load()"
]
},
@@ -119,6 +121,7 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "40fc13ef",
"metadata": {},
@@ -137,7 +140,7 @@
"source": [
"url = 'https://js.langchain.com/docs/'\n",
"exclude_dirs=['https://js.langchain.com/docs/api/']\n",
"loader=RecusiveUrlLoader(url=url,exclude_dirs=exclude_dirs)\n",
"loader=RecursiveUrlLoader(url=url,exclude_dirs=exclude_dirs)\n",
"docs=loader.load()"
]
},

View File

@@ -0,0 +1,88 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# RST\n",
"\n",
">A [reStructured Text (RST)](https://en.wikipedia.org/wiki/ReStructuredText) file is a file format for textual data used primarily in the Python programming language community for technical documentation."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## `UnstructuredRSTLoader`\n",
"\n",
"You can load data from RST files with `UnstructuredRSTLoader` using the following workflow."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import UnstructuredRSTLoader"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"loader = UnstructuredRSTLoader(\n",
" file_path=\"example_data/README.rst\", mode=\"elements\"\n",
")\n",
"docs = loader.load()"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"page_content='Example Docs' metadata={'source': 'example_data/README.rst', 'filename': 'README.rst', 'file_directory': 'example_data', 'filetype': 'text/x-rst', 'page_number': 1, 'category': 'Title'}\n"
]
}
],
"source": [
"print(docs[0])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.3"
}
},
"nbformat": 4,
"nbformat_minor": 4
}

View File

@@ -0,0 +1,419 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "213a38a2",
"metadata": {},
"source": [
"# Source Code\n",
"\n",
"This notebook covers how to load source code files using a special approach with language parsing: each top-level function and class in the code is loaded into separate documents. Any remaining code top-level code outside the already loaded functions and classes will be loaded into a seperate document.\n",
"\n",
"This approach can potentially improve the accuracy of QA models over source code. Currently, the supported languages for code parsing are Python and JavaScript. The language used for parsing can be configured, along with the minimum number of lines required to activate the splitting based on syntax."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7fa47b2e",
"metadata": {},
"outputs": [],
"source": [
"! pip install esprima"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "beb55c2f",
"metadata": {},
"outputs": [],
"source": [
"import warnings\n",
"warnings.filterwarnings('ignore')\n",
"from pprint import pprint\n",
"from langchain.text_splitter import Language\n",
"from langchain.document_loaders.generic import GenericLoader\n",
"from langchain.document_loaders.parsers import LanguageParser"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "64056e07",
"metadata": {},
"outputs": [],
"source": [
"loader = GenericLoader.from_filesystem(\n",
" \"./example_data/source_code\",\n",
" glob=\"*\",\n",
" suffixes=[\".py\", \".js\"],\n",
" parser=LanguageParser()\n",
")\n",
"docs = loader.load()"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "8af79bd7",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"6"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"len(docs)"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "85edf3fc",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'content_type': 'functions_classes',\n",
" 'language': <Language.PYTHON: 'python'>,\n",
" 'source': 'example_data/source_code/example.py'}\n",
"{'content_type': 'functions_classes',\n",
" 'language': <Language.PYTHON: 'python'>,\n",
" 'source': 'example_data/source_code/example.py'}\n",
"{'content_type': 'simplified_code',\n",
" 'language': <Language.PYTHON: 'python'>,\n",
" 'source': 'example_data/source_code/example.py'}\n",
"{'content_type': 'functions_classes',\n",
" 'language': <Language.JS: 'js'>,\n",
" 'source': 'example_data/source_code/example.js'}\n",
"{'content_type': 'functions_classes',\n",
" 'language': <Language.JS: 'js'>,\n",
" 'source': 'example_data/source_code/example.js'}\n",
"{'content_type': 'simplified_code',\n",
" 'language': <Language.JS: 'js'>,\n",
" 'source': 'example_data/source_code/example.js'}\n"
]
}
],
"source": [
"for document in docs:\n",
" pprint(document.metadata)"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "f44e3e37",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"class MyClass:\n",
" def __init__(self, name):\n",
" self.name = name\n",
"\n",
" def greet(self):\n",
" print(f\"Hello, {self.name}!\")\n",
"\n",
"--8<--\n",
"\n",
"def main():\n",
" name = input(\"Enter your name: \")\n",
" obj = MyClass(name)\n",
" obj.greet()\n",
"\n",
"--8<--\n",
"\n",
"# Code for: class MyClass:\n",
"\n",
"\n",
"# Code for: def main():\n",
"\n",
"\n",
"if __name__ == \"__main__\":\n",
" main()\n",
"\n",
"--8<--\n",
"\n",
"class MyClass {\n",
" constructor(name) {\n",
" this.name = name;\n",
" }\n",
"\n",
" greet() {\n",
" console.log(`Hello, ${this.name}!`);\n",
" }\n",
"}\n",
"\n",
"--8<--\n",
"\n",
"function main() {\n",
" const name = prompt(\"Enter your name:\");\n",
" const obj = new MyClass(name);\n",
" obj.greet();\n",
"}\n",
"\n",
"--8<--\n",
"\n",
"// Code for: class MyClass {\n",
"\n",
"// Code for: function main() {\n",
"\n",
"main();\n"
]
}
],
"source": [
"print(\"\\n\\n--8<--\\n\\n\".join([document.page_content for document in docs]))"
]
},
{
"cell_type": "markdown",
"id": "69aad0ed",
"metadata": {},
"source": [
"The parser can be disabled for small files. \n",
"\n",
"The parameter `parser_threshold` indicates the minimum number of lines that the source code file must have to be segmented using the parser."
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "ae024794",
"metadata": {},
"outputs": [],
"source": [
"loader = GenericLoader.from_filesystem(\n",
" \"./example_data/source_code\",\n",
" glob=\"*\",\n",
" suffixes=[\".py\"],\n",
" parser=LanguageParser(language=Language.PYTHON, parser_threshold=1000)\n",
")\n",
"docs = loader.load()"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "5d3b372a",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"1"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"len(docs)"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "89e546ad",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"class MyClass:\n",
" def __init__(self, name):\n",
" self.name = name\n",
"\n",
" def greet(self):\n",
" print(f\"Hello, {self.name}!\")\n",
"\n",
"\n",
"def main():\n",
" name = input(\"Enter your name: \")\n",
" obj = MyClass(name)\n",
" obj.greet()\n",
"\n",
"\n",
"if __name__ == \"__main__\":\n",
" main()\n",
"\n"
]
}
],
"source": [
"print(docs[0].page_content)"
]
},
{
"cell_type": "markdown",
"id": "c9c71e61",
"metadata": {},
"source": [
"## Splitting\n",
"\n",
"Additional splitting could be needed for those functions, classes, or scripts that are too big."
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "adbaa79f",
"metadata": {},
"outputs": [],
"source": [
"loader = GenericLoader.from_filesystem(\n",
" \"./example_data/source_code\",\n",
" glob=\"*\",\n",
" suffixes=[\".js\"],\n",
" parser=LanguageParser(language=Language.JS)\n",
")\n",
"docs = loader.load()"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "c44c0d3f",
"metadata": {},
"outputs": [],
"source": [
"from langchain.text_splitter import (\n",
" RecursiveCharacterTextSplitter,\n",
" Language,\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "b1e0053d",
"metadata": {},
"outputs": [],
"source": [
"js_splitter = RecursiveCharacterTextSplitter.from_language(\n",
" language=Language.JS, chunk_size=60, chunk_overlap=0\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "7dbe6188",
"metadata": {},
"outputs": [],
"source": [
"result = js_splitter.split_documents(docs)"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "8a80d089",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"7"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"len(result)"
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "000a6011",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"class MyClass {\n",
" constructor(name) {\n",
" this.name = name;\n",
"\n",
"--8<--\n",
"\n",
"}\n",
"\n",
"--8<--\n",
"\n",
"greet() {\n",
" console.log(`Hello, ${this.name}!`);\n",
" }\n",
"}\n",
"\n",
"--8<--\n",
"\n",
"function main() {\n",
" const name = prompt(\"Enter your name:\");\n",
"\n",
"--8<--\n",
"\n",
"const obj = new MyClass(name);\n",
" obj.greet();\n",
"}\n",
"\n",
"--8<--\n",
"\n",
"// Code for: class MyClass {\n",
"\n",
"// Code for: function main() {\n",
"\n",
"--8<--\n",
"\n",
"main();\n"
]
}
],
"source": [
"print(\"\\n\\n--8<--\\n\\n\".join([document.page_content for document in result]))"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.16"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,116 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "a634365e",
"metadata": {},
"source": [
"# Tencent COS Directory\n",
"\n",
"This covers how to load document objects from a `Tencent COS Directory`."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "85e97267",
"metadata": {},
"outputs": [],
"source": [
"#! pip install cos-python-sdk-v5"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "2f0cd6a5",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"from langchain.document_loaders import TencentCOSDirectoryLoader\n",
"from qcloud_cos import CosConfig"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "321cc7f1",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"conf = CosConfig(\n",
" Region=\"your cos region\",\n",
" SecretId=\"your cos secret_id\",\n",
" SecretKey=\"your cos secret_key\",\n",
" )\n",
"loader = TencentCOSDirectoryLoader(conf=conf, bucket=\"you_cos_bucket\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4c50d2c7",
"metadata": {},
"outputs": [],
"source": [
"loader.load()"
]
},
{
"cell_type": "markdown",
"id": "0690c40a",
"metadata": {},
"source": [
"## Specifying a prefix\n",
"You can also specify a prefix for more finegrained control over what files to load."
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "72d44781",
"metadata": {},
"outputs": [],
"source": [
"loader = TencentCOSDirectoryLoader(conf=conf, bucket=\"you_cos_bucket\", prefix=\"fake\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2d3c32db",
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"loader.load()"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.6"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,91 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "a634365e",
"metadata": {},
"source": [
"# Tencent COS File\n",
"\n",
"This covers how to load document object from a `Tencent COS File`."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "85e97267",
"metadata": {},
"outputs": [],
"source": [
"#! pip install cos-python-sdk-v5"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "2f0cd6a5",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"from langchain.document_loaders import TencentCOSFileLoader\n",
"from qcloud_cos import CosConfig"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "321cc7f1",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"conf = CosConfig(\n",
" Region=\"your cos region\",\n",
" SecretId=\"your cos secret_id\",\n",
" SecretKey=\"your cos secret_key\",\n",
" )\n",
"loader = TencentCOSFileLoader(conf=conf, bucket=\"you_cos_bucket\", key=\"fake.docx\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4c50d2c7",
"metadata": {},
"outputs": [],
"source": [
"loader.load()"
]
},
{
"cell_type": "markdown",
"id": "0690c40a",
"metadata": {},
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.6"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -226,7 +226,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "8de9ef16",
"metadata": {},
@@ -303,7 +302,7 @@
"source": [
"## Unstructured API\n",
"\n",
"If you want to get up and running with less set up, you can simply run `pip install unstructured` and use `UnstructuredAPIFileLoader` or `UnstructuredAPIFileIOLoader`. That will process your document using the hosted Unstructured API. Note that currently (as of 11 May 2023) the Unstructured API is open, but it will soon require an API. The [Unstructured documentation](https://unstructured-io.github.io/) page will have instructions on how to generate an API key once theyre available. Check out the instructions [here](https://github.com/Unstructured-IO/unstructured-api#dizzy-instructions-for-using-the-docker-image) if youd like to self-host the Unstructured API or run it locally."
"If you want to get up and running with less set up, you can simply run `pip install unstructured` and use `UnstructuredAPIFileLoader` or `UnstructuredAPIFileIOLoader`. That will process your document using the hosted Unstructured API. You can generate a free Unstructured API key [here](https://www.unstructured.io/api-key/). The [Unstructured documentation](https://unstructured-io.github.io/) page will have instructions on how to generate an API key once theyre available. Check out the instructions [here](https://github.com/Unstructured-IO/unstructured-api#dizzy-instructions-for-using-the-docker-image) if youd like to self-host the Unstructured API or run it locally."
]
},
{

View File

@@ -224,13 +224,33 @@
"docs"
]
},
{
"cell_type": "markdown",
"source": [
"## Using proxies\n",
"\n",
"Sometimes you might need to use proxies to get around IP blocks. You can pass in a dictionary of proxies to the loader (and `requests` underneath) to use them."
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "code",
"execution_count": null,
"id": "1dd8ab23",
"metadata": {},
"outputs": [],
"source": []
"source": [
"loader = WebBaseLoader(\n",
" \"https://www.walmart.com/search?q=parrots\", proxies={\n",
" \"http\": \"http://{username}:{password}:@proxy.service.com:6666/\",\n",
" \"https\": \"https://{username}:{password}:@proxy.service.com:6666/\"\n",
" }\n",
")\n",
"docs = loader.load()\n"
],
"metadata": {
"collapsed": false
}
}
],
"metadata": {

View File

@@ -0,0 +1,214 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "8cc82b48",
"metadata": {},
"source": [
"# MultiQueryRetriever\n",
"\n",
"Distance-based vector database retrieval embeds (represents) queries in high-dimensional space and finds similar embedded documents based on \"distance\". But, retrieval may produce difference results with subtle changes in query wording or if the embeddings do not capture the semantics of the data well. Prompt engineering / tuning is sometimes done to manually address these problems, but can be tedious.\n",
"\n",
"The `MultiQueryRetriever` automates the process of prompt tuning by using an LLM to generate multiple queries from different perspectives for a given user input query. For each query, it retrieves a set of relevant documents and takes the unique union across all queries to get a larger set of potentially relevant documents. By generating multiple perspectives on the same question, the `MultiQueryRetriever` might be able to overcome some of the limitations of the distance-based retrieval and get a richer set of results."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "c2f3f5f2",
"metadata": {},
"outputs": [],
"source": [
"# Build a sample vectorDB\n",
"from langchain.vectorstores import Chroma\n",
"from langchain.document_loaders import PyPDFLoader\n",
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
"from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
"\n",
"# Load PDF\n",
"path=\"path-to-files\"\n",
"loaders = [\n",
" PyPDFLoader(path+\"docs/cs229_lectures/MachineLearning-Lecture01.pdf\"),\n",
" PyPDFLoader(path+\"docs/cs229_lectures/MachineLearning-Lecture02.pdf\"),\n",
" PyPDFLoader(path+\"docs/cs229_lectures/MachineLearning-Lecture03.pdf\")\n",
"]\n",
"docs = []\n",
"for loader in loaders:\n",
" docs.extend(loader.load())\n",
" \n",
"# Split\n",
"text_splitter = RecursiveCharacterTextSplitter(chunk_size = 1500,chunk_overlap = 150)\n",
"splits = text_splitter.split_documents(docs)\n",
"\n",
"# VectorDB\n",
"embedding = OpenAIEmbeddings()\n",
"vectordb = Chroma.from_documents(documents=splits,embedding=embedding)"
]
},
{
"cell_type": "markdown",
"id": "cca8f56c",
"metadata": {},
"source": [
"`Simple usage`\n",
"\n",
"Specify the LLM to use for query generation, and the retriver will do the rest."
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "edbca101",
"metadata": {},
"outputs": [],
"source": [
"from langchain.chat_models import ChatOpenAI\n",
"from langchain.retrievers.multi_query import MultiQueryRetriever\n",
"question=\"What does the course say about regression?\"\n",
"num_queries=3\n",
"llm = ChatOpenAI(temperature=0)\n",
"retriever_from_llm = MultiQueryRetriever.from_llm(retriever=vectordb.as_retriever(),llm=llm)"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "e5203612",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"INFO:root:Generated queries: [\"1. What is the course's perspective on regression?\", '2. How does the course discuss regression?', '3. What information does the course provide about regression?']\n"
]
},
{
"data": {
"text/plain": [
"6"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"unique_docs = retriever_from_llm.get_relevant_documents(question=\"What does the course say about regression?\")\n",
"len(unique_docs)"
]
},
{
"cell_type": "markdown",
"id": "c54a282f",
"metadata": {},
"source": [
"`Supplying your own prompt`\n",
"\n",
"You can also supply a prompt along with an output parser to split the results into a list of queries."
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "d9afb0ca",
"metadata": {},
"outputs": [],
"source": [
"from typing import List\n",
"from langchain import LLMChain\n",
"from pydantic import BaseModel, Field\n",
"from langchain.prompts import PromptTemplate\n",
"from langchain.output_parsers import PydanticOutputParser\n",
"\n",
"# Output parser will split the LLM result into a list of queries\n",
"class LineList(BaseModel):\n",
" # \"lines\" is the key (attribute name) of the parsed output\n",
" lines: List[str] = Field(description=\"Lines of text\")\n",
"\n",
"class LineListOutputParser(PydanticOutputParser):\n",
" def __init__(self) -> None:\n",
" super().__init__(pydantic_object=LineList)\n",
" def parse(self, text: str) -> LineList:\n",
" lines = text.strip().split(\"\\n\")\n",
" return LineList(lines=lines)\n",
"\n",
"output_parser = LineListOutputParser()\n",
" \n",
"QUERY_PROMPT = PromptTemplate(\n",
" input_variables=[\"question\"],\n",
" template=\"\"\"You are an AI language model assistant. Your task is to generate five \n",
" different versions of the given user question to retrieve relevant documents from a vector \n",
" database. By generating multiple perspectives on the user question, your goal is to help\n",
" the user overcome some of the limitations of the distance-based similarity search. \n",
" Provide these alternative questions seperated by newlines.\n",
" Original question: {question}\"\"\",\n",
")\n",
"llm = ChatOpenAI(temperature=0)\n",
"\n",
"# Chain\n",
"llm_chain = LLMChain(llm=llm,prompt=QUERY_PROMPT,output_parser=output_parser)\n",
" \n",
"# Other inputs\n",
"question=\"What does the course say about regression?\""
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "6660d7ee",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"INFO:root:Generated queries: [\"1. What is the course's perspective on regression?\", '2. Can you provide information on regression as discussed in the course?', '3. How does the course cover the topic of regression?', \"4. What are the course's teachings on regression?\", '5. In relation to the course, what is mentioned about regression?']\n"
]
},
{
"data": {
"text/plain": [
"8"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Run\n",
"retriever = MultiQueryRetriever(retriever=vectordb.as_retriever(), \n",
" llm_chain=llm_chain,\n",
" parser_key=\"lines\") # \"lines\" is the key (attribute name) of the parsed output\n",
"\n",
"# Results\n",
"unique_docs = retriever.get_relevant_documents(question=\"What does the course say about regression?\")\n",
"len(unique_docs)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.16"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -58,6 +58,7 @@
"from langchain.memory.chat_message_histories import ZepChatMessageHistory\n",
"from langchain.schema import HumanMessage, AIMessage\n",
"from uuid import uuid4\n",
"import getpass\n",
"\n",
"# Set this to your Zep server URL\n",
"ZEP_API_URL = \"http://localhost:8000\""
@@ -75,6 +76,25 @@
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdin",
"output_type": "stream",
"text": [
" ········\n"
]
}
],
"source": [
"# Provide your Zep API key. Note that this is optional. See https://docs.getzep.com/deployment/auth\n",
"\n",
"zep_api_key = getpass.getpass()"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"ExecuteTime": {
"end_time": "2023-05-25T15:03:29.118416Z",
@@ -93,12 +113,13 @@
"zep_chat_history = ZepChatMessageHistory(\n",
" session_id=session_id,\n",
" url=ZEP_API_URL,\n",
" api_key=zep_api_key\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 3,
"execution_count": 5,
"metadata": {
"ExecuteTime": {
"end_time": "2023-05-25T15:03:30.271181Z",
@@ -172,7 +193,7 @@
"]\n",
"\n",
"for msg in test_history:\n",
" zep_chat_history.append(\n",
" zep_chat_history.add_message(\n",
" HumanMessage(content=msg[\"content\"])\n",
" if msg[\"role\"] == \"human\"\n",
" else AIMessage(content=msg[\"content\"])\n",
@@ -192,7 +213,7 @@
},
{
"cell_type": "code",
"execution_count": 4,
"execution_count": 6,
"metadata": {
"ExecuteTime": {
"end_time": "2023-05-25T15:03:32.979155Z",
@@ -207,14 +228,14 @@
{
"data": {
"text/plain": [
"[Document(page_content='Who was Octavia Butler?', metadata={'score': 0.7759001673780126, 'uuid': '3a82a02f-056e-4c6a-b960-67ebdf3b2b93', 'created_at': '2023-05-25T15:03:30.2041Z', 'role': 'human', 'token_count': 8}),\n",
" Document(page_content=\"Octavia Butler's contemporaries included Ursula K. Le Guin, Samuel R. Delany, and Joanna Russ.\", metadata={'score': 0.7602262941130749, 'uuid': 'a2fc9c21-0897-46c8-bef7-6f5c0f71b04a', 'created_at': '2023-05-25T15:03:30.248065Z', 'role': 'ai', 'token_count': 27}),\n",
" Document(page_content='Who were her contemporaries?', metadata={'score': 0.757553366415519, 'uuid': '41f9c41a-a205-41e1-b48b-a0a4cd943fc8', 'created_at': '2023-05-25T15:03:30.243995Z', 'role': 'human', 'token_count': 8}),\n",
" Document(page_content='Octavia Estelle Butler (June 22, 1947 February 24, 2006) was an American science fiction author.', metadata={'score': 0.7546211059317948, 'uuid': '34678311-0098-4f1a-8fd4-5615ac692deb', 'created_at': '2023-05-25T15:03:30.231427Z', 'role': 'ai', 'token_count': 31}),\n",
" Document(page_content='Which books of hers were made into movies?', metadata={'score': 0.7496714959247069, 'uuid': '18046c3a-9666-4d3e-b4f0-43d1394732b7', 'created_at': '2023-05-25T15:03:30.236837Z', 'role': 'human', 'token_count': 11})]"
"[Document(page_content='Parable of the Sower is a science fiction novel by Octavia Butler, published in 1993. It follows the story of Lauren Olamina, a young woman living in a dystopian future where society has collapsed due to environmental disasters, poverty, and violence.', metadata={'score': 0.8897116216176073, 'uuid': 'db60ff57-f259-4ec4-8a81-178ed4c6e54f', 'created_at': '2023-06-26T23:40:22.816214Z', 'role': 'ai', 'metadata': {'system': {'entities': [{'Label': 'GPE', 'Matches': [{'End': 20, 'Start': 15, 'Text': 'Sower'}], 'Name': 'Sower'}, {'Label': 'PERSON', 'Matches': [{'End': 65, 'Start': 51, 'Text': 'Octavia Butler'}], 'Name': 'Octavia Butler'}, {'Label': 'DATE', 'Matches': [{'End': 84, 'Start': 80, 'Text': '1993'}], 'Name': '1993'}, {'Label': 'PERSON', 'Matches': [{'End': 124, 'Start': 110, 'Text': 'Lauren Olamina'}], 'Name': 'Lauren Olamina'}]}}, 'token_count': 56}),\n",
" Document(page_content=\"Write a short synopsis of Butler's book, Parable of the Sower. What is it about?\", metadata={'score': 0.8856661080361157, 'uuid': 'f1a5981a-8f6d-4168-a548-6e9c32f35fa1', 'created_at': '2023-06-26T23:40:22.809621Z', 'role': 'human', 'metadata': {'system': {'entities': [{'Label': 'ORG', 'Matches': [{'End': 32, 'Start': 26, 'Text': 'Butler'}], 'Name': 'Butler'}, {'Label': 'WORK_OF_ART', 'Matches': [{'End': 61, 'Start': 41, 'Text': 'Parable of the Sower'}], 'Name': 'Parable of the Sower'}]}}, 'token_count': 23}),\n",
" Document(page_content='Who was Octavia Butler?', metadata={'score': 0.7757595298492976, 'uuid': '361d0043-1009-4e13-a7f0-8aea8b1ee869', 'created_at': '2023-06-26T23:40:22.709886Z', 'role': 'human', 'metadata': {'system': {'entities': [{'Label': 'PERSON', 'Matches': [{'End': 22, 'Start': 8, 'Text': 'Octavia Butler'}], 'Name': 'Octavia Butler'}], 'intent': 'The subject wants to know about the identity or background of an individual named Octavia Butler.'}}, 'token_count': 8}),\n",
" Document(page_content=\"Octavia Butler's contemporaries included Ursula K. Le Guin, Samuel R. Delany, and Joanna Russ.\", metadata={'score': 0.7601242516059306, 'uuid': '56c45e8a-0f65-45f0-bc46-d9e65164b563', 'created_at': '2023-06-26T23:40:22.778836Z', 'role': 'ai', 'metadata': {'system': {'entities': [{'Label': 'PERSON', 'Matches': [{'End': 16, 'Start': 0, 'Text': \"Octavia Butler's\"}], 'Name': \"Octavia Butler's\"}, {'Label': 'ORG', 'Matches': [{'End': 58, 'Start': 41, 'Text': 'Ursula K. Le Guin'}], 'Name': 'Ursula K. Le Guin'}, {'Label': 'PERSON', 'Matches': [{'End': 76, 'Start': 60, 'Text': 'Samuel R. Delany'}], 'Name': 'Samuel R. Delany'}, {'Label': 'PERSON', 'Matches': [{'End': 93, 'Start': 82, 'Text': 'Joanna Russ'}], 'Name': 'Joanna Russ'}], 'intent': \"The subject is providing information about Octavia Butler's contemporaries.\"}}, 'token_count': 27}),\n",
" Document(page_content='You might want to read Ursula K. Le Guin or Joanna Russ.', metadata={'score': 0.7594731095320668, 'uuid': '6951f2fd-dfa4-4e05-9380-f322ef8f72f8', 'created_at': '2023-06-26T23:40:22.80464Z', 'role': 'ai', 'metadata': {'system': {'entities': [{'Label': 'ORG', 'Matches': [{'End': 40, 'Start': 23, 'Text': 'Ursula K. Le Guin'}], 'Name': 'Ursula K. Le Guin'}, {'Label': 'PERSON', 'Matches': [{'End': 55, 'Start': 44, 'Text': 'Joanna Russ'}], 'Name': 'Joanna Russ'}]}}, 'token_count': 18})]"
]
},
"execution_count": 4,
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
@@ -226,6 +247,7 @@
" session_id=session_id, # Ensure that you provide the session_id when instantiating the Retriever\n",
" url=ZEP_API_URL,\n",
" top_k=5,\n",
" api_key=zep_api_key\n",
")\n",
"\n",
"await zep_retriever.aget_relevant_documents(\"Who wrote Parable of the Sower?\")"
@@ -240,7 +262,7 @@
},
{
"cell_type": "code",
"execution_count": 5,
"execution_count": 7,
"metadata": {
"ExecuteTime": {
"end_time": "2023-05-25T15:03:34.713354Z",
@@ -255,14 +277,14 @@
{
"data": {
"text/plain": [
"[Document(page_content='Parable of the Sower is a science fiction novel by Octavia Butler, published in 1993. It follows the story of Lauren Olamina, a young woman living in a dystopian future where society has collapsed due to environmental disasters, poverty, and violence.', metadata={'score': 0.8897321402776546, 'uuid': '1c09603a-52c1-40d7-9d69-29f26256029c', 'created_at': '2023-05-25T15:03:30.268257Z', 'role': 'ai', 'token_count': 56}),\n",
" Document(page_content=\"Write a short synopsis of Butler's book, Parable of the Sower. What is it about?\", metadata={'score': 0.8857628682610436, 'uuid': 'f6706e8c-6c91-452f-8c1b-9559fd924657', 'created_at': '2023-05-25T15:03:30.265302Z', 'role': 'human', 'token_count': 23}),\n",
" Document(page_content='Who was Octavia Butler?', metadata={'score': 0.7759670375149477, 'uuid': '3a82a02f-056e-4c6a-b960-67ebdf3b2b93', 'created_at': '2023-05-25T15:03:30.2041Z', 'role': 'human', 'token_count': 8}),\n",
" Document(page_content=\"Octavia Butler's contemporaries included Ursula K. Le Guin, Samuel R. Delany, and Joanna Russ.\", metadata={'score': 0.7602854653476563, 'uuid': 'a2fc9c21-0897-46c8-bef7-6f5c0f71b04a', 'created_at': '2023-05-25T15:03:30.248065Z', 'role': 'ai', 'token_count': 27}),\n",
" Document(page_content='You might want to read Ursula K. Le Guin or Joanna Russ.', metadata={'score': 0.7595293992240313, 'uuid': 'f22f2498-6118-4c74-8718-aa89ccd7e3d6', 'created_at': '2023-05-25T15:03:30.261198Z', 'role': 'ai', 'token_count': 18})]"
"[Document(page_content='Parable of the Sower is a science fiction novel by Octavia Butler, published in 1993. It follows the story of Lauren Olamina, a young woman living in a dystopian future where society has collapsed due to environmental disasters, poverty, and violence.', metadata={'score': 0.889661105796371, 'uuid': 'db60ff57-f259-4ec4-8a81-178ed4c6e54f', 'created_at': '2023-06-26T23:40:22.816214Z', 'role': 'ai', 'metadata': {'system': {'entities': [{'Label': 'GPE', 'Matches': [{'End': 20, 'Start': 15, 'Text': 'Sower'}], 'Name': 'Sower'}, {'Label': 'PERSON', 'Matches': [{'End': 65, 'Start': 51, 'Text': 'Octavia Butler'}], 'Name': 'Octavia Butler'}, {'Label': 'DATE', 'Matches': [{'End': 84, 'Start': 80, 'Text': '1993'}], 'Name': '1993'}, {'Label': 'PERSON', 'Matches': [{'End': 124, 'Start': 110, 'Text': 'Lauren Olamina'}], 'Name': 'Lauren Olamina'}]}}, 'token_count': 56}),\n",
" Document(page_content=\"Write a short synopsis of Butler's book, Parable of the Sower. What is it about?\", metadata={'score': 0.885754241595424, 'uuid': 'f1a5981a-8f6d-4168-a548-6e9c32f35fa1', 'created_at': '2023-06-26T23:40:22.809621Z', 'role': 'human', 'metadata': {'system': {'entities': [{'Label': 'ORG', 'Matches': [{'End': 32, 'Start': 26, 'Text': 'Butler'}], 'Name': 'Butler'}, {'Label': 'WORK_OF_ART', 'Matches': [{'End': 61, 'Start': 41, 'Text': 'Parable of the Sower'}], 'Name': 'Parable of the Sower'}]}}, 'token_count': 23}),\n",
" Document(page_content='Who was Octavia Butler?', metadata={'score': 0.7758688965570713, 'uuid': '361d0043-1009-4e13-a7f0-8aea8b1ee869', 'created_at': '2023-06-26T23:40:22.709886Z', 'role': 'human', 'metadata': {'system': {'entities': [{'Label': 'PERSON', 'Matches': [{'End': 22, 'Start': 8, 'Text': 'Octavia Butler'}], 'Name': 'Octavia Butler'}], 'intent': 'The subject wants to know about the identity or background of an individual named Octavia Butler.'}}, 'token_count': 8}),\n",
" Document(page_content=\"Octavia Butler's contemporaries included Ursula K. Le Guin, Samuel R. Delany, and Joanna Russ.\", metadata={'score': 0.7602672137411663, 'uuid': '56c45e8a-0f65-45f0-bc46-d9e65164b563', 'created_at': '2023-06-26T23:40:22.778836Z', 'role': 'ai', 'metadata': {'system': {'entities': [{'Label': 'PERSON', 'Matches': [{'End': 16, 'Start': 0, 'Text': \"Octavia Butler's\"}], 'Name': \"Octavia Butler's\"}, {'Label': 'ORG', 'Matches': [{'End': 58, 'Start': 41, 'Text': 'Ursula K. Le Guin'}], 'Name': 'Ursula K. Le Guin'}, {'Label': 'PERSON', 'Matches': [{'End': 76, 'Start': 60, 'Text': 'Samuel R. Delany'}], 'Name': 'Samuel R. Delany'}, {'Label': 'PERSON', 'Matches': [{'End': 93, 'Start': 82, 'Text': 'Joanna Russ'}], 'Name': 'Joanna Russ'}], 'intent': \"The subject is providing information about Octavia Butler's contemporaries.\"}}, 'token_count': 27}),\n",
" Document(page_content='You might want to read Ursula K. Le Guin or Joanna Russ.', metadata={'score': 0.7596040989115522, 'uuid': '6951f2fd-dfa4-4e05-9380-f322ef8f72f8', 'created_at': '2023-06-26T23:40:22.80464Z', 'role': 'ai', 'metadata': {'system': {'entities': [{'Label': 'ORG', 'Matches': [{'End': 40, 'Start': 23, 'Text': 'Ursula K. Le Guin'}], 'Name': 'Ursula K. Le Guin'}, {'Label': 'PERSON', 'Matches': [{'End': 55, 'Start': 44, 'Text': 'Joanna Russ'}], 'Name': 'Joanna Russ'}]}}, 'token_count': 18})]"
]
},
"execution_count": 5,
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
@@ -304,7 +326,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.6"
"version": "3.11.4"
}
},
"nbformat": 4,

View File

@@ -0,0 +1,126 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# Spacy Embedding\n",
"\n",
"### Loading the Spacy embedding class to generate and query embeddings"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Import the necessary classes"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"\n",
"from langchain.embeddings.spacy_embeddings import SpacyEmbeddings\n",
"\n",
"\n"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Initialize SpacyEmbeddings.This will load the Spacy model into memory."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"\n",
"embedder = SpacyEmbeddings()\n"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Define some example texts . These could be any documents that you want to analyze - for example, news articles, social media posts, or product reviews."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"\n",
"\n",
"texts = [\n",
" \"The quick brown fox jumps over the lazy dog.\",\n",
" \"Pack my box with five dozen liquor jugs.\",\n",
" \"How vexingly quick daft zebras jump!\",\n",
" \"Bright vixens jump; dozy fowl quack.\"\n",
"]\n",
"\n"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Generate and print embeddings for the texts . The SpacyEmbeddings class generates an embedding for each document, which is a numerical representation of the document's content. These embeddings can be used for various natural language processing tasks, such as document similarity comparison or text classification."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"\n",
"embeddings = embedder.embed_documents(texts)\n",
"for i, embedding in enumerate(embeddings):\n",
" print(f\"Embedding for document {i+1}: {embedding}\")\n",
"\n"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Generate and print an embedding for a single piece of text. You can also generate an embedding for a single piece of text, such as a search query. This can be useful for tasks like information retrieval, where you want to find documents that are similar to a given query."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"\n",
"query = \"Quick foxes and lazy dogs.\"\n",
"query_embedding = embedder.embed_query(query)\n",
"print(f\"Embedding for query: {query_embedding}\")"
]
}
],
"metadata": {
"language_info": {
"name": "python"
},
"orig_nbformat": 4
},
"nbformat": 4,
"nbformat_minor": 2
}

Some files were not shown because too many files have changed in this diff Show More