Compare commits

..

145 Commits

Author SHA1 Message Date
Bagatur
ad7d97670b bump 233 (#7707) 2023-07-14 10:38:13 -04:00
Samuel Berthe
7d4843fe84 feat(chains): adding ElasticsearchDatabaseChain for interacting with analytics database (#7686)
This pull request adds a ElasticsearchDatabaseChain chain for
interacting with analytics database, in the manner of the
SQLDatabaseChain.

Maintainer: @samber
Twitter handler: samuelberthe

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-07-14 10:30:57 -04:00
Daniel
6d88b23ef7 Update pgembedding.ipynb (#7699)
Update the extension name. It changed from pg_hnsw to pg_embedding.

Thank you. I missed this in my previous commit.
2023-07-14 08:39:01 -04:00
Eric Speidel
663b0933e4 Allow passing auth objects in TextRequestsWrapper (#7701)
- Description: This allows passing auth objects in request wrappers.
Currently, we can handle auth by editing headers in the
RequestsWrappers, but more complex auth methods, such as Kerberos, could
be handled better by using existing functionality within the requests
library. There are many authentication options supported both natively
and by extensions, such as requests-kerberos or requests-ntlm.
  
  - Issue: Fixes #7542
  - Dependencies: none

Co-authored-by: eric.speidel@de.bosch.com <eric.speidel@de.bosch.com>
2023-07-14 08:38:24 -04:00
Nuno Campos
1e40427755 Enabled nesting chain group (#7697)
<!-- Thank you for contributing to LangChain!

Replace this comment with:
  - Description: a description of the change, 
  - Issue: the issue # it fixes (if applicable),
  - Dependencies: any dependencies required for this change,
- Tag maintainer: for a quicker response, tag the relevant maintainer
(see below),
- Twitter handle: we announce bigger features on Twitter. If your PR
gets announced and you'd like a mention, we'll gladly shout you out!

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
  2. an example notebook showing its use.

Maintainer responsibilities:
  - General / Misc / if you don't know who to tag: @baskaryan
  - DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev
  - Models / Prompts: @hwchase17, @baskaryan
  - Memory: @hwchase17
  - Agents / Tools / Toolkits: @hinthornw
  - Tracing / Callbacks: @agola11
  - Async: @agola11

If no one reviews your PR within a few days, feel free to @-mention the
same people again.

See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
 -->
2023-07-14 10:03:16 +01:00
Leonid Kuligin
85e1c9b348 Added support for examples for VertexAI chat models. (#7636)
#5278

Co-authored-by: Leonid Kuligin <kuligin@google.com>
2023-07-14 02:03:04 -04:00
Richy Wang
45bb414be2 Add LLM for Alibaba's Damo Academy's Tongyi Qwen API (#7477)
- Add langchain.llms.Tonyi for text completion, in examples into the
Tonyi Text API,
- Add system tests.

Note async completion for the Text API is not yet supported and will be
included in a future PR.

Dependencies: dashscope. It will be installed manually cause it is not
need by everyone.

Happy for feedback on any aspect of this PR @hwchase17 @baskaryan.
2023-07-14 01:58:22 -04:00
Lance Martin
6325a3517c Make recursive loader yield while crawling (#7568)
Support actual lazy_load since it can take a while to crawl larger
directories.
2023-07-13 21:55:20 -07:00
UmerHA
82f3e32d8d [Small upgrade] Allow document limit in AzureCognitiveSearchRetriever (#7690)
Multiple people have asked in #5081 for a way to limit the documents
returned from an AzureCognitiveSearchRetriever. This PR adds the `top_n`
parameter to allow that.


Twitter handle:
 [@UmerHAdil](twitter.com/umerHAdil)
2023-07-13 23:04:40 -04:00
AI-Chef
af6d333147 Fix same issue #7524 in FileCallbackHandler (#7687)
Fix for Serializable class to include name, used in FileCallbackHandler
as same issue #7524

Description: Fixes the Serializable class to include 'name' attribute
(class_name) in the dict created,
This is used in Callbacks, specifically the StdOutCallbackHandler,
FileCallbackHandler.
Issue: As described in issue #7524
Dependencies: None
Tag maintainer: SInce this is related to the callback module, tagging
@agola11 @idoru
Comments:

Glad to see issue #7524 fixed in pull #6124, but you forget to change
the same place in FileCallbackHandler
2023-07-13 22:39:21 -04:00
Ben Perry
3874bb256e Weaviate: Batch embed texts (#5903)
When a custom Embeddings object is set, embed all given texts in a batch
instead of passing them through individually. Any code calling add_texts
can then appropriately size the chunks of texts that are passed through
to take full advantage of the hardware it's running on.
2023-07-13 20:57:58 -04:00
Charles P
574698a5fb Make so explicit class constructor is called in ElasticVectorSearch from_texts (#6199)
Fixes #6198 

ElasticKnnSearch.from_texts is actually ElasticVectorSearch.from_texts
and throws because it calls ElasticKnnSearch constructor with the wrong
arguments.

Now ElasticKnnSearch has its own from_texts, which constructs a proper
ElasticKnnSearch.

---------

Co-authored-by: Charles Parker <charlesparker@FiltaMacbook.local>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-07-13 19:55:20 -04:00
Daniel
854f3fe9b1 Update pgembedding.ipynb (#7682)
Correct links to the pg_embedding repository and the Neon documentation.
2023-07-13 19:54:07 -04:00
William FH
051fac1e66 Improve walkthrough links for sphinx (#7672)
Co-authored-by: Ankush Gola <9536492+agola11@users.noreply.github.com>
2023-07-13 16:08:31 -07:00
Bagatur
5db4dba526 add integrations hub link to docs (#7675) 2023-07-13 18:44:10 -04:00
Kenton Parton
9124221d31 Fixed handling of absolute URLs in RecursiveUrlLoader (#7677)
<!-- Thank you for contributing to LangChain!

Replace this comment with:
  - Description:
  - Issue: the issue # it fixes (if applicable),
  - Dependencies: any dependencies required for this change,
- Tag maintainer: for a quicker response, tag the relevant maintainer
(see below),
- Twitter handle: we announce bigger features on Twitter. If your PR
gets announced and you'd like a mention, we'll gladly shout you out!

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
  2. an example notebook showing its use.

Maintainer responsibilities:
  - General / Misc / if you don't know who to tag: @baskaryan
  - DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev
  - Models / Prompts: @hwchase17, @baskaryan
  - Memory: @hwchase17
  - Agents / Tools / Toolkits: @hinthornw
  - Tracing / Callbacks: @agola11
  - Async: @agola11

If no one reviews your PR within a few days, feel free to @-mention the
same people again.

See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
 -->

## Description
This PR addresses a bug in the RecursiveUrlLoader class where absolute
URLs were being treated as relative URLs, causing malformed URLs to be
produced. The fix involves using the urljoin function from the
urllib.parse module to correctly handle both absolute and relative URLs.

@rlancemartin @eyurtsev

---------

Co-authored-by: Lance Martin <lance@langchain.dev>
2023-07-13 15:34:00 -07:00
EllieRoseS
c087ce74f7 Added matching async load func to PlaywrightURLLoader (#5938)
Fixes # (issue)

The existing PlaywrightURLLoader load() function uses a synchronous
browser which is not compatible with jupyter.
This PR adds a sister function aload() which can be run insisde a
notebook.

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-07-13 17:51:38 -04:00
William FH
ae7714f1ba Configure Tracer Workers (#7676)
Mainline the tracer to avoid calling feedback before run is posted.
Chose a bool over `max_workers` arg for configuring since we don't want
to support > 1 for now anyway. At some point may want to manage the pool
ourselves (ordering only really matters within a run and with parent
runs)
2023-07-13 14:00:14 -07:00
Jasper
fbc97a77ed add browserless loader (#7562)
# Browserless

Added support for Browserless' `/content` endpoint as a document loader.

### About Browserless

Browserless is a cloud service that provides access to headless Chrome
browsers via a REST API. It allows developers to automate Chromium in a
serverless fashion without having to configure and maintain their own
Chrome infrastructure.

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Co-authored-by: Lance Martin <lance@langchain.dev>
2023-07-13 13:18:28 -07:00
mebstyne-msft
120c52589b Enabled Azure Active Directory token-based auth access to OpenAI completions (#6313)
With AzureOpenAI openai_api_type defaulted to "azure" the logic in
utils' get_from_dict_or_env() function triggered by the root validator
never looks to environment for the user's runtime openai_api_type
values. This inhibits folks using token-based auth, or really any auth
model other than "azure."

By removing the "default" value, this allows environment variables to be
pulled at runtime for the openai_api_type and thus enables the other
api_types which are expected to work.

---------

Co-authored-by: Ebo <mebstyne@microsoft.com>
Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-07-13 16:05:47 -04:00
frangin2003
c7b687e944 Simplify GraphQL Tool Initialization documentation by Removing 'llm' Argument (#7651)
This PR is aimed at enhancing the clarity of the documentation in the
langchain project.

**Description**:
In the graphql.ipynb file, I have removed the unnecessary 'llm' argument
from the initialization process of the GraphQL tool (of type
_EXTRA_OPTIONAL_TOOLS). The 'llm' argument is not required for this
process. Its presence could potentially confuse users. This modification
simplifies the understanding of tool initialization and minimizes
potential confusion.

**Issue**: Not applicable, as this is a documentation improvement.

**Dependencies**: None.

**I kindly request a review from the following maintainer**: @hinthornw,
who is responsible for Agents / Tools / Toolkits.

No new integration is being added in this PR, hence no need for a test
or an example notebook.

Please see the changes for more detail and let me know if any further
modification is necessary.
2023-07-13 14:52:07 -04:00
William FH
aab2a7cd4b Normalize Trajectory Eval Score (#7668) 2023-07-13 09:58:28 -07:00
William FH
5f03cc3511 spelling nit (#7667) 2023-07-13 09:12:57 -07:00
Bagatur
3dd0704e38 bump 232 (#7659) 2023-07-13 10:32:39 -04:00
Tamas Molnar
24c1654208 Fix SQLAlchemy LLM cache clear (#7653)
Fixes #7652 

Description: 
This is a fix for clearing the cache for SQL Alchemy based LLM caches. 

The langchain.llm_cache.clear() did not take effect for SQLite cache. 
Reason: it didn't commit the deletion database change.

See SQLAlchemy documentation for proper usage:

https://docs.sqlalchemy.org/en/20/orm/session_basics.html#opening-and-closing-a-session
https://docs.sqlalchemy.org/en/20/orm/session_basics.html#deleting

@hwchase17 @baskaryan

---------

Co-authored-by: Tamas Molnar <tamas.molnar@nagarro.com>
2023-07-13 09:39:04 -04:00
Bagatur
c17a80f11c fix chroma updated upsert interface (#7643)
new chroma release seems to not support empty dicts for metadata.

related to #7633
2023-07-13 09:27:14 -04:00
William FH
a673a51efa [Breaking] Update Evaluation Functionality (#7388)
- Migrate from deprecated langchainplus_sdk to `langsmith` package
- Update the `run_on_dataset()` API to use an eval config
- Update a number of evaluators, as well as the loading logic
- Update docstrings / reference docs
- Update tracer to share single HTTP session
2023-07-13 02:13:06 -07:00
Sam Coward
224199083b Fix missing chain classname in StdOutCallbackHandler.on_chain_start (#6124)
Retrieves the name of the class from new location as of commit
18af149e91


Co-authored-by: Zander Chase <130414180+vowelparrot@users.noreply.github.com>
2023-07-13 03:05:36 -04:00
lucasiscovici
af3f401015 update base class of ListStepContainer to BaseStepContainer (#6232)
update base class of ListStepContainer to BaseStepContainer

Fixes #6231
2023-07-13 03:03:02 -04:00
Matt Adams
98e1bbfbbd Add missing dependencies to apify.ipynb (#6331)
Fixes errors caused by missing dependencies when running the notebook.
2023-07-13 03:02:23 -04:00
Ma Donghao
6f62e5461c Update the parser regex of map_rerank (#6419)
Sometimes the score responded by chatgpt would be like 'Respone
example\nScore: 90 (fully answers the question, but could provide more
detail on the specific error message)'
For the score contains not only numbers, it raise a ValueError like 


Update the RegexParser from `.*` to `\d*` would help us to ignore the
text after number.

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-07-13 03:01:42 -04:00
Bagatur
b08f903755 fix chroma init bug (#7639) 2023-07-13 03:00:33 -04:00
Nir Gazit
f307ca094b fix(memory): allow internal chains to use memory (#6769)
Fixed #6768.

This is a workaround only. I think a better longer-term solution is for
chains to declare how many input variables they *actually* need (as
opposed to ones that are in the prompt, where some may be satisfied by
the memory). Then, a wrapping chain can check the input match against
the actual input variables.

@hwchase17
2023-07-13 02:47:44 -04:00
Francisco Ingham
488d2d5da9 Entity extraction improvements (#6342)
Added fix to avoid irrelevant attributes being returned plus an example
of extracting unrelated entities and an exampe of using an 'extra_info'
attribute to extract unstructured data for an entity.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-07-13 02:16:05 -04:00
Nir Gazit
a8bbfb2da3 feat(agents): allow trimming of intermediate steps to last N (#6476)
Added an option to trim intermediate steps to last N steps. This is
especially useful for long-running agents. Users can explicitly specify
N or provide a function that does custom trimming/manipulation on
intermediate steps. I've mimicked the API of the `handle_parsing_errors`
parameter.
2023-07-13 02:09:25 -04:00
Zeeland
92ef77da35 fix: remove useless variable k (#6524)
remove useless variable k

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-07-13 01:58:36 -04:00
Bagatur
7f8ff2a317 add tagger nb (#7637) 2023-07-13 01:48:23 -04:00
Sidchat95
c5e50c40c9 Fix Document Similarity Check with passed Threshold (#6845)
Converting the Similarity obtained in the
similarity_search_with_score_by_vector method whilst comparing to the
passed
threshold. This is because the passed threshold is a number between 0 to
1 and is already in the relevance_score_fn format.
As of now, the function is comparing two different scoring parameters
and that wouldn't work.

Dependencies
None

Issue:
Different scores being compared in
similarity_search_with_score_by_vector method in FAISS.

Tag maintainer
@hwchase17



<!-- Thank you for contributing to LangChain!

Replace this comment with:
  - Description: a description of the change, 
  - Issue: the issue # it fixes (if applicable),
  - Dependencies: any dependencies required for this change,
- Tag maintainer: for a quicker response, tag the relevant maintainer
(see below),
- Twitter handle: we announce bigger features on Twitter. If your PR
gets announced and you'd like a mention, we'll gladly shout you out!

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
  2. an example notebook showing its use.

Maintainer responsibilities:
  - General / Misc / if you don't know who to tag: @dev2049
  - DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev
  - Models / Prompts: @hwchase17, @dev2049
  - Memory: @hwchase17
  - Agents / Tools / Toolkits: @vowelparrot
  - Tracing / Callbacks: @agola11
  - Async: @agola11

If no one reviews your PR within a few days, feel free to @-mention the
same people again.

See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
 -->

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-07-13 01:30:47 -04:00
Jacob Ajit
a08baa97c5 Use modern OpenAI endpoints for embeddings (#6573)
- Description: 

LangChain passes
[engine](https://github.com/hwchase17/langchain/blob/master/langchain/embeddings/openai.py#L256)
and not `model` as a field when making OpenAI requests. Within the
`openai` Python library, for OpenAI requests, this [makes a
call](https://github.com/openai/openai-python/blob/main/openai/api_resources/abstract/engine_api_resource.py#L58)
to an endpoint of the form
`https://api.openai.com/v1/engines/{engine_id}/embeddings`.

These endpoints are
[deprecated](https://help.openai.com/en/articles/6283125-what-happened-to-engines)
in favor of endpoints of the format
`https://api.openai.com/v1/embeddings`, where `model` is passed as a
parameter in the request body.

While these deprecated endpoints continue to function for now, they may
not be supported indefinitely and should be avoided in favor of the
newer API format.

It appears that `engine` was passed in instead of `model` to make both
Azure OpenAI and OpenAI calls work similarly. However, the inclusion of
`engine`
[causes](https://github.com/openai/openai-python/blob/main/openai/api_resources/abstract/engine_api_resource.py#L58)
OpenAI to use the deprecated endpoint, requiring a diverging code path
for Azure OpenAI calls where `engine` is passed in additionally (Azure
OpenAI requires `engine` to specify a deployment, and can optionally
take in `model`).

In the long-term, it may be worth considering spinning off Azure OpenAI
embeddings into a separate class for ease of use and maintenance,
similar to the [implementation for chat
models](https://github.com/hwchase17/langchain/blob/master/langchain/chat_models/azure_openai.py).
2023-07-13 01:23:17 -04:00
Jacob Lee
cdb93ab5ca Adds OpenAI functions powered document metadata tagger (#7521)
Adds a new document transformer that automatically extracts metadata for
a document based on an input schema. I also moved
`document_transformers.py` to `document_transformers/__init__.py` to
group it with this new transformer - it didn't seem to cause issues in
the notebook, but let me know if I've done something wrong there.

Also had a linter issue I couldn't figure out:

```
MacBook-Pro:langchain jacoblee$ make lint
poetry run mypy .
docs/dist/conf.py: error: Duplicate module named "conf" (also at "./docs/api_reference/conf.py")
docs/dist/conf.py: note: See https://mypy.readthedocs.io/en/stable/running_mypy.html#mapping-file-paths-to-modules for more info
docs/dist/conf.py: note: Common resolutions include: a) using `--exclude` to avoid checking one of them, b) adding `__init__.py` somewhere, c) using `--explicit-package-bases` or adjusting MYPYPATH
Found 1 error in 1 file (errors prevented further checking)
make: *** [lint] Error 2
```

@rlancemartin @baskaryan

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-07-13 01:12:41 -04:00
Jason Fan
8effd90be0 Add new types of document transformers (#7379)
- Description: Add two new document transformers that translates
documents into different languages and converts documents into q&a
format to improve vector search results. Uses OpenAI function calling
via the [doctran](https://github.com/psychic-api/doctran/tree/main)
library.
  - Issue: N/A
  - Dependencies: `doctran = "^0.0.5"`
  - Tag maintainer: @rlancemartin @eyurtsev @hwchase17 
  - Twitter handle: @psychicapi or @jfan001

Notes
- Adheres to the `DocumentTransformer` abstraction set by @dev2049 in
#3182
- refactored `EmbeddingsRedundantFilter` to put it in a file under a new
`document_transformers` module
- Added basic docs for `DocumentInterrogator`, `DocumentTransformer` as
well as the existing `EmbeddingsRedundantFilter`

---------

Co-authored-by: Lance Martin <lance@langchain.dev>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-07-12 23:53:30 -04:00
Piyush Jain
f11d845dee Fixed validation error when credentials_profile_name, or region_name is not passed (#7629)
## Summary
This PR corrects the checks for credentials_profile_name, and
region_name attributes. This was causing validation exceptions when
either of these values were missing during creation of the retriever
class.

Fixes #7571 

#### Requested reviewers:
@baskaryan
2023-07-12 23:47:35 -04:00
Jamie Broomall
0e1d7a27c6 WhyLabsCallbackHandler updates (#7621)
Updates to the WhyLabsCallbackHandler and example notebook
- Update dependency to langkit 0.0.6 which defines new helper methods
for callback integrations
- Update WhyLabsCallbackHandler to use the new `get_callback_instance`
so that the callback is mostly defined in langkit
- Remove much of the implementation of the WhyLabsCallbackHandler here
in favor of the callback instance

This does not change the behavior of the whylabs callback handler
implementation but is a reorganization that moves some of the
implementation externally to our optional dependency package, and should
make future updates easier.

@agola11
2023-07-12 23:46:56 -04:00
Gaurang Pawar
53722dcfdc Fixed a typo in pinecone_hybrid_search.ipynb (#7627)
Fixed a small typo in documentation
2023-07-12 23:46:41 -04:00
Bagatur
1d4db1327a fix openai structured chain with pydantic (#7622)
should return pydantic class
2023-07-12 23:46:13 -04:00
Bagatur
ee70d4a0cd mv tutorials (#7614) 2023-07-12 17:33:36 -04:00
William FH
9b215e761e Stop warning when parent run ID not present (#7611) 2023-07-12 14:04:32 -07:00
William FH
2f848294cb Rm Warning that Tracing is Experimental (#7612) 2023-07-12 14:04:28 -07:00
Yaohui Wang
d85c33a5c3 Fix the markdown rendering issue with a code block inside a markdown code block (#6625)
### Description

- Fix the markdown rendering issue with a code block inside a markdown,
using a different number of backticks for the delimiters.

Current doc site:
<https://python.langchain.com/docs/modules/data_connection/document_transformers/text_splitters/code_splitter#markdown>

After fix:
<img width="480" alt="image"
src="https://github.com/hwchase17/langchain/assets/3115235/d9921d59-64e6-4a34-9c62-79743667f528">


### Who can review

PTAL @dev2049 

Co-authored-by: Yaohui Wang <wangyaohui.01@bytedance.com>
2023-07-12 16:29:25 -04:00
Yaroslav Halchenko
0d92a7f357 codespell: workflow, config + some (quite a few) typos fixed (#6785)
Probably the most  boring PR to review ;)

Individual commits might be easier to digest

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
2023-07-12 16:20:08 -04:00
Sam
931e68692e Adds a chain around sympy for symbolic math (#6834)
- Description: Adds a new chain that acts as a wrapper around Sympy to
give LLMs the ability to do some symbolic math.
- Dependencies: SymPy

---------

Co-authored-by: sreiswig <sreiswig@github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-07-12 15:17:32 -04:00
Bharat Ramanathan
be29a6287d feat: add model architecture back to wandb tracer (#6806)
# Description

This PR adds model architecture to the `WandbTracer` from the Serialized
Run kwargs. This allows visualization of the calling parameters of an
Agent, LLM and Tool in Weights & Biases.
    1. Safely serialize the run objects to WBTraceTree model_dict
    2. Refactors the run processing logic to be more organized.

- Twitter handle: @parambharat

---------

Co-authored-by: Bharat Ramanathan <ramanathan.parameshwaran@gohuddl.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-07-12 15:00:18 -04:00
Alex Iribarren
adc96d60b6 Implement Function Callback tracer (#6835)
Description: I wanted to be able to redirect debug output to a function,
but it wasn't very easy. I figured it would make sense to implement a
`FunctionCallbackHandler`, and reimplement `ConsoleCallbackHandler` as a
subclass that calls the `print` function. Now I can create a simple
subclass in my project that calls `logging.info` or whatever I need.

Tag maintainer: @agola11
Twitter handle: `@andandaraalex`
2023-07-12 14:38:41 -04:00
Ducasse-Arthur
93a84f6182 Update bedrock.py - support of other endpoint url (esp. for users of … (#7592)
Added an _endpoint_url_ attribute to Bedrock(LLM) class - I have access
to Bedrock only via us-west-2 endpoint and needed to change the endpoint
url, this could be useful to other users
2023-07-12 10:43:23 -04:00
Bagatur
22525bad65 bump 231 (#7584) 2023-07-12 10:43:12 -04:00
Subsegment
6e1000dc8d docs : Use more meaningful cnosdb examples (#7587)
This change makes the ecosystem integrations cnosdb documentation more
realistic and easy to understand.

- change examples of question and table
- modify typo and format
2023-07-12 10:31:55 -04:00
Samuel ROZE
f3c9bf5e4b fix(typo): Clarify the point of llm_chain (#7593)
Fixes a typo introduced in
https://github.com/hwchase17/langchain/pull/7080 by @hwchase17.

In the example (visible on [the online
documentation](https://api.python.langchain.com/en/latest/chains/langchain.chains.conversational_retrieval.base.ConversationalRetrievalChain.html#langchain-chains-conversational-retrieval-base-conversationalretrievalchain)),
the `llm_chain` variable is unused as opposed to being used for the
question generator. This change makes it clearer.
2023-07-12 10:31:00 -04:00
Alec Flett
6cdd4b5edc only add handlers if they are new (#7504)
When using callbacks, there are times when callbacks can be added
redundantly: for instance sometimes you might need to create an llm with
specific callbacks, but then also create and agent that uses a chain
that has those callbacks already set. This means that "callbacks" might
get passed down again to the llm at predict() time, resulting in
duplicate calls to the `on_llm_start` callback.

For the sake of simplicity, I made it so that langchain never adds an
exact handler/callbacks object in `add_handler`, thus avoiding the
duplicate handler issue.

Tagging @hwchase17 for callback review

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-07-12 03:48:29 -04:00
ausboss
50316f6477 Adding LLM wrapper for Kobold AI (#7560)
- Description: add wrapper that lets you use KoboldAI api in langchain
  - Issue: n/a
  - Dependencies: none extra, just what exists in lanchain
  - Tag maintainer: @baskaryan 
  - Twitter handle: @zanzibased
---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-07-12 03:48:12 -04:00
Rohit Kumar Singh
603a0bea29 Fixes incorrect docstore creation in faiss.py (#7026)
- **Description**: Current implementation assumes that the length of
`texts` and `ids` should be same but if the passed `ids` length is not
equal to the passed length of `texts`, current code
`dict(zip(index_to_id.values(), documents))` is not failing or giving
any warning and silently creating docstores only for the passed `ids`
i.e. if `ids = ['A']` and `texts=["I love Open Source","I love
langchain"]` then only one `docstore` will be created. But either two
docstores should be created assuming same id value for all the elements
of `texts` or an error should be raised.
  
- **Issue**: My change fixes this by using dictionary comprehension
instead of `zip`. This was if lengths of `ids` and `texts` mismatches an
explicit `IndexError` will be raised.
  
@rlancemartin, @eyurtsev
2023-07-12 03:35:49 -04:00
Tommy Hyeonwoo Kim
3f7213586e add supported properties for notiondb document loader's metadata (#7570)
fix #7569

add following properties for Notion DB document loader's metadata
- `unique_id`
- `status`
- `people`

@rlancemartin, @eyurtsev (Since this is a change related to
`DataLoaders`)

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-07-12 03:34:54 -04:00
Junlin Zhou
5f17c57174 Update chat agents' output parser to extract action by regex (#7511)
Currently `ChatOutputParser` extracts actions by splitting the text on
"```", and then load the second part as a json string.

But sometimes the LLM will wrap the action in markdown code block like:

````markdown
```json
{
  "action": "foo",
  "action_input": "bar"
}
```
````

Splitting text on "```" will cause `OutputParserException` in such case.

This PR changes the behaviour to extract the `$JSON_BLOB` by regex, so
that it can handle both ` ``` ``` ` and ` ```json ``` `

@hinthornw

---------

Co-authored-by: Junlin Zhou <jlzhou@zjuici.com>
2023-07-12 03:12:02 -04:00
Bagatur
ebcb144342 unit test sqlalachemy (#7582) 2023-07-12 03:03:16 -04:00
Harrison Chase
641fd74baa Harrison/pg vector move (#7580) 2023-07-12 02:22:34 -04:00
os1ma
2667ddc686 Fix make docs_build and related scripts (#7276)
**Description: a description of the change**

Fixed `make docs_build` and related scripts which caused errors. There
are several changes.

First, I made the build of the documentation and the API Reference into
two separate commands. This is because it takes less time to build. The
commands for documents are `make docs_build`, `make docs_clean`, and
`make docs_linkcheck`. The commands for API Reference are `make
api_docs_build`, `api_docs_clean`, and `api_docs_linkcheck`.

It looked like `docs/.local_build.sh` could be used to build the
documentation, so I used that. Since `.local_build.sh` was also building
API Rerefence internally, I removed that process. `.local_build.sh` also
added some Bash options to stop in error or so. Futher more added `cd
"${SCRIPT_DIR}"` at the beginning so that the script will work no matter
which directory it is executed in.

`docs/api_reference/api_reference.rst` is removed, because which is
generated by `docs/api_reference/create_api_rst.py`, and added it to
.gitignore.

Finally, the description of CONTRIBUTING.md was modified.

**Issue: the issue # it fixes (if applicable)**

https://github.com/hwchase17/langchain/issues/6413

**Dependencies: any dependencies required for this change**

`nbdoc` was missing in group docs so it was added. I installed it with
the `poetry add --group docs nbdoc` command. I am concerned if any
modifications are needed to poetry.lock. I would greatly appreciate it
if you could pay close attention to this file during the review.

**Tag maintainer**
- General / Misc / if you don't know who to tag: @baskaryan

If this PR needs any additional changes, I'll be happy to make them!

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-07-11 22:05:14 -04:00
Pharbie
74c28df363 Update Pinecone Upsert method usage (#7358)
Description: Refactor the upsert method in the Pinecone class to allow
for additional keyword arguments. This change adds flexibility and
extensibility to the method, allowing for future modifications or
enhancements. The upsert method now accepts the `**kwargs` parameter,
which can be used to pass any additional arguments to the Pinecone
index. This change has been made in both the `upsert` method in the
`Pinecone` class and the `upsert` method in the
`similarity_search_with_score` class method. Falls in line with the
usage of the upsert method in
[Pinecone-Python-Client](4640c4cf27/pinecone/index.py (L73))
Issue: [This feature request in Pinecone
Repo](https://github.com/pinecone-io/pinecone-python-client/issues/184)

Maintainer responsibilities:
  - General / Misc / if you don't know who to tag: @baskaryan
  - Memory: @hwchase17

---------

Co-authored-by: kwesi <22204443+yankskwesi@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Lance Martin <122662504+rlancemartin@users.noreply.github.com>
2023-07-11 21:14:42 -04:00
Kazuki Maeda
5c3fe8b0d1 Enhance Makefile with 'format_diff' Option and Improved Readability (#7394)
### Description:

This PR introduces a new option format_diff to the existing Makefile.
This option allows us to apply the formatting tools (Black and isort)
only to the changed Python and ipynb files since the last commit. This
will make our development process more efficient as we only format the
codes that we modify. Along with this change, comments were added to
make the Makefile more understandable and maintainable.

### Issue:

N/A

### Dependencies:

Add dependency to black.

### Tag maintainer:

@baskaryan

### Twitter handle:

[kzk_maeda](https://twitter.com/kzk_maeda)

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-07-11 21:03:17 -04:00
Bagatur
2babe3069f Revert pinecone v4 support (#7566)
Revert 9d13dcd
2023-07-11 20:58:59 -04:00
schop-rob
e811c5e8c6 Add OpenAI organization ID to docs (#7398)
Description: I added an example of how to reference the OpenAI API
Organization ID, because I couldn't find it before. In the example, it
is mentioned how to achieve this using environment variables as well as
parameters for the OpenAI()-class
Issue: -
Dependencies: -
Twitter @schop-rob
2023-07-11 20:51:58 -04:00
Kenny
8741e55e7c Template formats documentation (#7404)
Simple addition to the documentation, adding the correct import
statement & showcasing using Python FStrings.
2023-07-11 18:24:24 -04:00
Fielding Johnston
00c466627a minor bug fix: properly await AsyncRunManager's method call in MulitRouteChain (#7487)
This simply awaits `AsyncRunManager`'s method call in `MulitRouteChain`.
Noticed this while playing around with Langchain's implementation of
`MultiPromptChain`. @baskaryan

cheers

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-07-11 18:18:47 -04:00
tonomura
cc0585af42 Improvement/add finish reason to generation info in chat open ai (#7478)
Description: ChatOpenAI model does not return finish_reason in
generation_info.
Issue: #2702
Dependencies: None
Tag maintainer: @baskaryan 

Thank you

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-07-11 18:12:57 -04:00
Junlin Zhou
b96ac13f3d Minor update to reference other sql tool by tool names instead of hard coded string. (#7514)
<!-- Thank you for contributing to LangChain!

Replace this comment with:
  - Description: a description of the change, 
  - Issue: the issue # it fixes (if applicable),
  - Dependencies: any dependencies required for this change,
- Tag maintainer: for a quicker response, tag the relevant maintainer
(see below),
- Twitter handle: we announce bigger features on Twitter. If your PR
gets announced and you'd like a mention, we'll gladly shout you out!

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
  2. an example notebook showing its use.

Maintainer responsibilities:
  - General / Misc / if you don't know who to tag: @baskaryan
  - DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev
  - Models / Prompts: @hwchase17, @baskaryan
  - Memory: @hwchase17
  - Agents / Tools / Toolkits: @hinthornw
  - Tracing / Callbacks: @agola11
  - Async: @agola11

If no one reviews your PR within a few days, feel free to @-mention the
same people again.

See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
 -->

Currently there are 4 tools in SQL agent-toolkits, and 2 of them have
reference to the other 2.

This PR change the reference from hard coded string to `{tool.name}`

Co-authored-by: Junlin Zhou <jlzhou@zjuici.com>
2023-07-11 17:44:23 -04:00
OwenElliott
9cb2347453 Fix broken link from Marqo Ecosystem (#7510)
Small fix to a link from the Marqo page in the ecosystem.

The link was not updated correctly when the documentation structure
changed to html pages instead of links to notebooks.
2023-07-11 17:15:15 -04:00
Matt Robinson
c4d53f98dc docs: update unstructured docstrings (#7561)
### Summary

Updates the docstrings in the Unstructured document loaders to display
more useful information on the integrations page.
2023-07-11 17:12:05 -04:00
Ben Auffarth
2c2f0e15a6 clarify about api key (#7540)
I found it unclear, where to get the API keys for JinaChat. Mentioning
this in the docstring should be helpful.
#7490 

Twitter handle: benji1a

@delgermurun
2023-07-11 16:46:06 -04:00
Jona Sassenhagen
0ea7224535 [Minor] Remove tagger from spacy sentencizer (#7534)
@svlandeg gave me a tip for how to improve a bit on
https://github.com/hwchase17/langchain/pull/7442 for some extra speed
and memory gains. The tagger isn't needed for sentencization, so can be
disabled too.
2023-07-11 16:43:46 -04:00
Kacper Łukawski
1f83b5f47e Reuse the existing collection if configured properly in Qdrant.from_texts (#7530)
This PR changes the behavior of `Qdrant.from_texts` so the collection is
reused if not requested to recreate it. Previously, calling
`Qdrant.from_texts` or `Qdrant.from_documents` resulted in removing the
old data which was confusing for many.
2023-07-11 16:24:35 -04:00
Leonid Kuligin
6674b33cf5 Added support for chat_history (#7555)
#7469

Co-authored-by: Leonid Kuligin <kuligin@google.com>
2023-07-11 15:27:26 -04:00
Felix Brockmeier
406a9dc11f Add notebook example for Lemon AI NLP Workflow Automation (#7556)
- Description: Added notebook to LangChain docs that explains how to use
Lemon AI NLP Workflow Automation tool with Langchain
  
- Issue: not applicable
  
- Dependencies: not applicable
  
- Tag maintainer: @agola11
  
- Twitter handle: felixbrockm
2023-07-11 15:15:11 -04:00
Lance Martin
9e067b8cc9 Add env setup (#7550)
Include setup
2023-07-11 09:48:40 -07:00
Bagatur
3c4338470e bump 230 (#7544) 2023-07-11 11:24:08 -04:00
Bagatur
d2137eea9f fix cpal docs (#7545) 2023-07-11 11:07:45 -04:00
Boris
9129318466 CPAL (#6255)
# Causal program-aided language (CPAL) chain

## Motivation

This builds on the recent [PAL](https://arxiv.org/abs/2211.10435) to
stop LLM hallucination. The problem with the
[PAL](https://arxiv.org/abs/2211.10435) approach is that it hallucinates
on a math problem with a nested chain of dependence. The innovation here
is that this new CPAL approach includes causal structure to fix
hallucination.

For example, using the below word problem, PAL answers with 5, and CPAL
answers with 13.

    "Tim buys the same number of pets as Cindy and Boris."
    "Cindy buys the same number of pets as Bill plus Bob."
    "Boris buys the same number of pets as Ben plus Beth."
    "Bill buys the same number of pets as Obama."
    "Bob buys the same number of pets as Obama."
    "Ben buys the same number of pets as Obama."
    "Beth buys the same number of pets as Obama."
    "If Obama buys one pet, how many pets total does everyone buy?"

The CPAL chain represents the causal structure of the above narrative as
a causal graph or DAG, which it can also plot, as shown below.


![complex-graph](https://github.com/hwchase17/langchain/assets/367522/d938db15-f941-493d-8605-536ad530f576)

.

The two major sections below are:

1. Technical overview
2. Future application

Also see [this jupyter
notebook](https://github.com/borisdev/langchain/blob/master/docs/extras/modules/chains/additional/cpal.ipynb)
doc.


## 1. Technical overview

### CPAL versus PAL

Like [PAL](https://arxiv.org/abs/2211.10435), CPAL intends to reduce
large language model (LLM) hallucination.

The CPAL chain is different from the PAL chain for a couple of reasons. 

* CPAL adds a causal structure (or DAG) to link entity actions (or math
expressions).
* The CPAL math expressions are modeling a chain of cause and effect
relations, which can be intervened upon, whereas for the PAL chain math
expressions are projected math identities.

PAL's generated python code is wrong. It hallucinates when complexity
increases.

```python
def solution():
    """Tim buys the same number of pets as Cindy and Boris.Cindy buys the same number of pets as Bill plus Bob.Boris buys the same number of pets as Ben plus Beth.Bill buys the same number of pets as Obama.Bob buys the same number of pets as Obama.Ben buys the same number of pets as Obama.Beth buys the same number of pets as Obama.If Obama buys one pet, how many pets total does everyone buy?"""
    obama_pets = 1
    tim_pets = obama_pets
    cindy_pets = obama_pets + obama_pets
    boris_pets = obama_pets + obama_pets
    total_pets = tim_pets + cindy_pets + boris_pets
    result = total_pets
    return result  # math result is 5
```

CPAL's generated python code is correct.

```python
story outcome data
    name                                   code  value      depends_on
0  obama                                   pass    1.0              []
1   bill               bill.value = obama.value    1.0         [obama]
2    bob                bob.value = obama.value    1.0         [obama]
3    ben                ben.value = obama.value    1.0         [obama]
4   beth               beth.value = obama.value    1.0         [obama]
5  cindy   cindy.value = bill.value + bob.value    2.0     [bill, bob]
6  boris   boris.value = ben.value + beth.value    2.0     [ben, beth]
7    tim  tim.value = cindy.value + boris.value    4.0  [cindy, boris]

query data
{
    "question": "how many pets total does everyone buy?",
    "expression": "SELECT SUM(value) FROM df",
    "llm_error_msg": ""
}
# query result is 13
```

Based on the comments below, CPAL's intended location in the library is
`experimental/chains/cpal` and PAL's location is`chains/pal`.

### CPAL vs Graph QA

Both the CPAL chain and the Graph QA chain extract entity-action-entity
relations into a DAG.

The CPAL chain is different from the Graph QA chain for a few reasons.

* Graph QA does not connect entities to math expressions
* Graph QA does not associate actions in a sequence of dependence.
* Graph QA does not decompose the narrative into these three parts:
  1. Story plot or causal model
  4. Hypothetical question
  5. Hypothetical condition 

### Evaluation

Preliminary evaluation on simple math word problems shows that this CPAL
chain generates less hallucination than the PAL chain on answering
questions about a causal narrative. Two examples are in [this jupyter
notebook](https://github.com/borisdev/langchain/blob/master/docs/extras/modules/chains/additional/cpal.ipynb)
doc.

## 2. Future application

### "Describe as Narrative, Test as Code"

The thesis here is that the Describe as Narrative, Test as Code approach
allows you to represent a causal mental model both as code and as a
narrative, giving you the best of both worlds.

#### Why describe a causal mental mode as a narrative?

The narrative form is quick. At a consensus building meeting, people use
narratives to persuade others of their causal mental model, aka. plan.
You can share, version control and index a narrative.

#### Why test a causal mental model as a code?

Code is testable, complex narratives are not. Though fast, narratives
are problematic as their complexity increases. The problem is LLMs and
humans are prone to hallucination when predicting the outcomes of a
narrative. The cost of building a consensus around the validity of a
narrative outcome grows as its narrative complexity increases. Code does
not require tribal knowledge or social power to validate.

Code is composable, complex narratives are not. The answer of one CPAL
chain can be the hypothetical conditions of another CPAL Chain. For
stochastic simulations, a composable plan can be integrated with the
[DoWhy library](https://github.com/py-why/dowhy). Lastly, for the
futuristic folk, a composable plan as code allows ordinary community
folk to design a plan that can be integrated with a blockchain for
funding.

An explanation of a dependency planning application is
[here.](https://github.com/borisdev/cpal-llm-chain-demo)

--- 
Twitter handle: @boris_dev

---------

Co-authored-by: Boris Dev <borisdev@Boriss-MacBook-Air.local>
2023-07-11 10:11:21 -04:00
Alejandra De Luna
2e4047e5e7 feat: support generate as an early stopping method for OpenAIFunctionsAgent (#7229)
This PR proposes an implementation to support `generate` as an
`early_stopping_method` for the new `OpenAIFunctionsAgent` class.

The motivation behind is to facilitate the user to set a maximum number
of actions the agent can take with `max_iterations` and force a final
response with this new agent (as with the `Agent` class).

The following changes were made:

- The `OpenAIFunctionsAgent.return_stopped_response` method was
overwritten to support `generate` as an `early_stopping_method`
- A boolean `with_functions` parameter was added to the
`OpenAIFunctionsAgent.plan` method

This way the `OpenAIFunctionsAgent.return_stopped_response` method can
call the `OpenAIFunctionsAgent.plan` method with `with_function=False`
when the `early_stopping_method` is set to `generate`, making a call to
the LLM with no functions and forcing a final response from the
`"assistant"`.

  - Relevant maintainer: @hinthornw
  - Twitter handle: @aledelunap

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-07-11 09:25:02 -04:00
Hashem Alsaket
1dd4236177 Fix HF endpoint returns blank for text-generation (#7386)
Description: Current `_call` function in the
`langchain.llms.HuggingFaceEndpoint` class truncates response when
`task=text-generation`. Same error discussed a few days ago on Hugging
Face: https://huggingface.co/tiiuae/falcon-40b-instruct/discussions/51
Issue: Fixes #7353 
Tag maintainer: @hwchase17 @baskaryan @hinthornw

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-07-11 03:06:05 -04:00
Lance Martin
4a94f56258 Minor edits to QA docs (#7507)
Small clean-ups
2023-07-10 22:15:05 -07:00
Raymond Yuan
5171c3bcca Refactor vector storage to correctly handle relevancy scores (#6570)
Description: This pull request aims to support generating the correct
generic relevancy scores for different vector stores by refactoring the
relevance score functions and their selection in the base class and
subclasses of VectorStore. This is especially relevant with VectorStores
that require a distance metric upon initialization. Note many of the
current implenetations of `_similarity_search_with_relevance_scores` are
not technically correct, as they just return
`self.similarity_search_with_score(query, k, **kwargs)` without applying
the relevant score function

Also includes changes associated with:
https://github.com/hwchase17/langchain/pull/6564 and
https://github.com/hwchase17/langchain/pull/6494

See more indepth discussion in thread in #6494 

Issue: 
https://github.com/hwchase17/langchain/issues/6526
https://github.com/hwchase17/langchain/issues/6481
https://github.com/hwchase17/langchain/issues/6346

Dependencies: None

The changes include:
- Properly handling score thresholding in FAISS
`similarity_search_with_score_by_vector` for the corresponding distance
metric.
- Refactoring the `_similarity_search_with_relevance_scores` method in
the base class and removing it from the subclasses for incorrectly
implemented subclasses.
- Adding a `_select_relevance_score_fn` method in the base class and
implementing it in the subclasses to select the appropriate relevance
score function based on the distance strategy.
- Updating the `__init__` methods of the subclasses to set the
`relevance_score_fn` attribute.
- Removing the `_default_relevance_score_fn` function from the FAISS
class and using the base class's `_euclidean_relevance_score_fn`
instead.
- Adding the `DistanceStrategy` enum to the `utils.py` file and updating
the imports in the vector store classes.
- Updating the tests to import the `DistanceStrategy` enum from the
`utils.py` file.

---------

Co-authored-by: Hanit <37485638+hanit-com@users.noreply.github.com>
2023-07-10 20:37:03 -07:00
Lance Martin
bd0c6381f5 Minor update to clarify map-reduce custom prompt usage (#7453)
Update docs for map-reduce custom prompt usage
2023-07-10 16:43:44 -07:00
Lance Martin
28d2b213a4 Update landing page for "question answering over documents" (#7152)
Improve documentation for a central use-case, qa / chat over documents.

This will be merged as an update to `index.mdx`
[here](https://python.langchain.com/docs/use_cases/question_answering/).

Testing w/ local Docusaurus server:

```
From `docs` directory:
mkdir _dist
cp -r {docs_skeleton,snippets} _dist
cp -r extras/* _dist/docs_skeleton/docs
cd _dist/docs_skeleton
yarn install
yarn start
```

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-07-10 14:15:13 -07:00
William FH
dd648183fa Rm create_project line (#7486)
not needed
2023-07-10 10:49:55 -07:00
Leonid Ganeline
5eec74d9a5 docstrings document_loaders 3 (#6937)
- Updated docstrings for `document_loaders`
- Mass update `"""Loader that loads` to `"""Loads`

@baskaryan  - please, review
2023-07-10 08:56:53 -07:00
Stanko Kuveljic
9d13dcd17c Pinecone: Add V4 support (#7473) 2023-07-10 08:39:47 -07:00
Adilkhan Sarsen
5debd5043e Added deeplake use case examples of the new features (#6528)
<!--
Thank you for contributing to LangChain! Your PR will appear in our
release under the title you set. Please make sure it highlights your
valuable contribution.

Replace this with a description of the change, the issue it fixes (if
applicable), and relevant context. List any dependencies required for
this change.

After you're done, someone will review your PR. They may suggest
improvements. If no one reviews your PR within a few days, feel free to
@-mention the same people again, as notifications can get lost.

Finally, we'd love to show appreciation for your contribution - if you'd
like us to shout you out on Twitter, please also include your handle!
-->

<!-- Remove if not applicable -->

Fixes # (issue)

#### Before submitting

<!-- If you're adding a new integration, please include:

1. a test for the integration - favor unit tests that does not rely on
network access.
2. an example notebook showing its use


See contribution guidelines for more information on how to write tests,
lint
etc:


https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
-->

#### Who can review?

Tag maintainers/contributors who might be interested:

<!-- For a quicker response, figure out the right person to tag with @

  @hwchase17 - project lead

  Tracing / Callbacks
  - @agola11

  Async
  - @agola11

  DataLoaders
  - @eyurtsev

  Models
  - @hwchase17
  - @agola11

  Agents / Tools / Toolkits
  - @hwchase17

  VectorStores / Retrievers / Memory
  - @dev2049

 -->
 
 1. Added use cases of the new features
 2. Done some code refactoring

---------

Co-authored-by: Ivo Stranic <istranic@gmail.com>
2023-07-10 07:04:29 -07:00
Bagatur
9b615022e2 bump 229 (#7467) 2023-07-10 04:38:55 -04:00
Kazuki Maeda
92b4418c8c Datadog logs loader (#7356)
### Description
Created a Loader to get a list of specific logs from Datadog Logs.

### Dependencies
`datadog_api_client` is required.

### Twitter handle
[kzk_maeda](https://twitter.com/kzk_maeda)

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-07-10 04:27:55 -04:00
Yifei Song
7d29bb2c02 Add Xorbits Dataframe as a Document Loader (#7319)
- [Xorbits](https://doc.xorbits.io/en/latest/) is an open-source
computing framework that makes it easy to scale data science and machine
learning workloads in parallel. Xorbits can leverage multi cores or GPUs
to accelerate computation on a single machine, or scale out up to
thousands of machines to support processing terabytes of data.

- This PR added support for the Xorbits document loader, which allows
langchain to leverage Xorbits to parallelize and distribute the loading
of data.
- Dependencies: This change requires the Xorbits library to be installed
in order to be used.
`pip install xorbits`
- Request for review: @rlancemartin, @eyurtsev
- Twitter handle: https://twitter.com/Xorbitsio

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-07-10 04:24:47 -04:00
Sergio Moreno
21a353e9c2 feat: ctransformers support async chain (#6859)
- Description: Adding async method for CTransformers 
- Issue: I've found impossible without this code to run Websockets
inside a FastAPI micro service and a CTransformers model.
  - Tag maintainer: Not necessary yet, I don't like to mention directly 
  - Twitter handle: @_semoal
2023-07-10 04:23:41 -04:00
Paul-Emile Brotons
d2cf0d16b3 adding max_marginal_relevance_search method to MongoDBAtlasVectorSearch (#7310)
Adding a maximal_marginal_relevance method to the
MongoDBAtlasVectorSearch vectorstore enhances the user experience by
providing more diverse search results

Issue: #7304
2023-07-10 04:04:19 -04:00
Bagatur
04cddfba0d Add lark import error (#7465) 2023-07-10 03:21:23 -04:00
Matt Robinson
bcab894f4e feat: Add UnstructuredTSVLoader (#7367)
### Summary

Adds an `UnstructuredTSVLoader` for TSV files. Also updates the doc
strings for `UnstructuredCSV` and `UnstructuredExcel` loaders.

### Testing

```python
from langchain.document_loaders.tsv import UnstructuredTSVLoader

loader = UnstructuredTSVLoader(
    file_path="example_data/mlb_teams_2012.csv", mode="elements"
)
docs = loader.load()
```
2023-07-10 03:07:10 -04:00
Ronald Li
490f4a9ff0 Fixes KeyError in AmazonKendraRetriever initializer (#7464)
### Description
argument variable client is marked as required in commit
81e5b1ad36 which breaks the default way of
initialization providing only index_id. This commit avoid KeyError
exception when it is initialized without a client variable
### Dependencies
no dependency required
2023-07-10 03:02:36 -04:00
Jona Sassenhagen
7ffc431b3a Add spacy sentencizer (#7442)
`SpacyTextSplitter` currently uses spacy's statistics-based
`en_core_web_sm` model for sentence splitting. This is a good splitter,
but it's also pretty slow, and in this case it's doing a lot of work
that's not needed given that the spacy parse is then just thrown away.
However, there is also a simple rules-based spacy sentencizer. Using
this is at least an order of magnitude faster than using
`en_core_web_sm` according to my local tests.
Also, spacy sentence tokenization based on `en_core_web_sm` can be sped
up in this case by not doing the NER stage. This shaves some cycles too,
both when loading the model and when parsing the text.

Consequently, this PR adds the option to use the basic spacy
sentencizer, and it disables the NER stage for the current approach,
*which is kept as the default*.

Lastly, when extracting the tokenized sentences, the `text` attribute is
called directly instead of doing the string conversion, which is IMO a
bit more idiomatic.
2023-07-10 02:52:05 -04:00
charosen
50a9fcccb0 feat(module): add param ids to ElasticVectorSearch.from_texts method (#7425)
# add param ids to ElasticVectorSearch.from_texts method.

- Description: add param ids to ElasticVectorSearch.from_texts method.
- Issue: NA. It seems `add_texts` already supports passing in document
ids, but param `ids` is omitted in `from_texts` classmethod,
- Dependencies: None,
- Tag maintainer: @rlancemartin, @eyurtsev please have a look, thanks

```
    # ElasticVectorSearch add_texts
    def add_texts(
        self,
        texts: Iterable[str],
        metadatas: Optional[List[dict]] = None,
        refresh_indices: bool = True,
        ids: Optional[List[str]] = None,
        **kwargs: Any,
    ) -> List[str]:
        ...

```

```
    # ElasticVectorSearch from_texts
    @classmethod
    def from_texts(
        cls,
        texts: List[str],
        embedding: Embeddings,
        metadatas: Optional[List[dict]] = None,
        elasticsearch_url: Optional[str] = None,
        index_name: Optional[str] = None,
        refresh_indices: bool = True,
        **kwargs: Any,
    ) -> ElasticVectorSearch:

```


Co-authored-by: charosen <charosen@bupt.cn>
2023-07-10 02:25:35 -04:00
James Yin
a5fd8873b1 fix: type hint of get_chat_history in BaseConversationalRetrievalChain (#7461)
The type hint of `get_chat_history` property in
`BaseConversationalRetrievalChain` is incorrect. @baskaryan
2023-07-10 02:14:00 -04:00
nikkie
dfc3f83b0f docs(vectorstores/integrations/chroma): Fix loading and saving (#7437)
- Description: Fix loading and saving code about Chroma
- Issue: the issue #7436 
- Dependencies: -
- Twitter handle: https://twitter.com/ftnext
2023-07-10 02:05:15 -04:00
Daniel Chalef
c7f7788d0b Add ZepMemory; improve ZepChatMessageHistory handling of metadata; Fix bugs (#7444)
Hey @hwchase17 - 

This PR adds a `ZepMemory` class, improves handling of Zep's message
metadata, and makes it easier for folks building custom chains to
persist metadata alongside their chat history.

We've had plenty confused users unfamiliar with ChatMessageHistory
classes and how to wrap the `ZepChatMessageHistory` in a
`ConversationBufferMemory`. So we've created the `ZepMemory` class as a
light wrapper for `ZepChatMessageHistory`.

Details:
- add ZepMemory, modify notebook to demo use of ZepMemory
- Modify summary to be SystemMessage
- add metadata argument to add_message; add Zep metadata to
Message.additional_kwargs
- support passing in metadata
2023-07-10 01:53:49 -04:00
Saurabh Chaturvedi
8f8e8d701e Fix info about YouTube (#7447)
(Unintentionally mean 😅) nit: YouTube wasn't created by Google, this PR
fixes the mention in docs.
2023-07-10 01:52:55 -04:00
Leonid Ganeline
560c4dfc98 docstrings: docstore and client (#6783)
updated docstrings in `docstore/` and `client/`

@baskaryan
2023-07-09 01:34:28 -04:00
Jeroen Van Goey
f5bd88757e Fix typo (#7416)
`quesitons` -> `questions`.
2023-07-09 00:54:48 -04:00
Alejandro Garrido Mota
ea9c3cc9c9 Fix syntax erros in documentation (#7409)
- Description: Tiny documentation fix. In Python, when defining function
parameters or providing arguments to a function or class constructor, we
do not use the `:` character.
- Issue: N/A
- Dependencies: N/A,
- Tag maintainer: @rlancemartin, @eyurtsev
- Twitter handle: @mogaal
2023-07-08 19:52:01 -04:00
Nolan
5da9f9abcb docs(agents/toolkits): Fix error in document_comparison_toolkit.ipynb (#7417)
Replace this comment with:
- Description: Removes unneeded output warning in documentation at
https://python.langchain.com/docs/modules/agents/toolkits/document_comparison_toolkit
  - Issue: -
  - Dependencies: -
  - Tag maintainer: @baskaryan
  - Twitter handle: @finnless
2023-07-08 19:51:08 -04:00
nikkie
2eb4a2ceea docs(retrievers/get-started): Fix broken state_of_the_union.txt link (#7399)
Thank you for this awesome library.

- Description: Fix broken link in documentation 
- Issue:
-
https://python.langchain.com/docs/modules/data_connection/retrievers/#get-started
- the URL:
https://github.com/hwchase17/langchain/blob/master/docs/modules/state_of_the_union.txt
- I think the right one is
https://github.com/hwchase17/langchain/blob/master/docs/extras/modules/state_of_the_union.txt
- Dependencies: -
- Tag maintainer: @baskaryan
- Twitter handle: -
2023-07-08 11:11:05 -04:00
Delgermurun
e7420789e4 improve description of JinaChat (#7397)
very small doc string change in the `JinaChat` class.
2023-07-08 10:57:11 -04:00
Bagatur
26c86a197c bump 228 (#7393) 2023-07-08 03:05:20 -04:00
SvMax
1d649b127e Added param to return only a structured json from the get_format_instructions method (#5848)
I just added a parameter to the method get_format_instructions, to
return directly the JSON instructions without the leading instruction
sentence. I'm planning to use it to define the structure of a JSON
object passed in input, the get_format_instructions().

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-07-08 02:57:26 -04:00
Bagatur
362bc301df fix jina (#7392) 2023-07-08 02:41:54 -04:00
Delgermurun
a1603fccfb integrate JinaChat (#6927)
Integration with https://chat.jina.ai/api. It is OpenAI compatible API.

- Twitter handle:
[https://twitter.com/JinaAI_](https://twitter.com/JinaAI_)

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-07-08 02:17:04 -04:00
William FH
4ba7396f96 Add single run eval loader (#7390)
Plus 
- add evaluation name to make string and embedding validators work with
the run evaluator loader.
- Rm unused root validator
2023-07-07 23:06:49 -07:00
Roger Yu
633b673b85 Update pinecone.ipynb (#7382)
Fix typo
2023-07-08 01:48:03 -04:00
Oleg Zabluda
4d697d3f24 Allow passing custom prompts to GraphIndexCreator (#7381)
---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-07-08 01:47:53 -04:00
William FH
612a74eb7e Make Ref Example Threadsafe (#7383)
Have noticed transient ref example misalignment. I believe this is
caused by the logic of assigning an example within the thread executor
rather than before.
2023-07-07 21:50:42 -07:00
William FH
4789c99bc2 Add String Distance and Embedding Evaluators (#7123)
Add a string evaluator and pairwise string evaluator implementation for:
- Embedding distance
- String distance

Update docs
2023-07-07 21:44:31 -07:00
ljeagle
fb6e63dc36 Upgrade the AwaDB from 0.3.5 to 0.3.6 (#7363) 2023-07-07 20:41:17 -07:00
William FH
c5edbea34a Load Run Evaluator (#7101)
Current problems:
1. Evaluating LLMs or Chat models isn't smooth. Even specifying
'generations' as the output inserts a redundant list into the eval
template
2. Configuring input / prediction / reference keys in the
`get_qa_evaluator` function is confusing. Unless you are using a chain
with the default keys, you have to specify all the variables and need to
reason about whether the key corresponds to the traced run's inputs,
outputs or the examples inputs or outputs.


Proposal:
- Configure the run evaluator according to a model. Use the model type
and input/output keys to assert compatibility where possible. Only need
to specify a reference_key for certain evaluators (which is less
confusing than specifying input keys)


When does this work:
- If you have your langchain model available (assumed always for
run_on_dataset flow)
- If you are evaluating an LLM, Chat model, or chain
- If the LLM or chat models are traced by langchain (wouldn't work if
you add an incompatible schema via the REST API)

When would this fail:
- Currently if you directly create an example from an LLM run, the
outputs are generations with all the extra metadata present. A simple
`example_key` and dumping all to the template could make the evaluations
unreliable
- Doesn't help if you're not using the low level API
- If you want to instantiate the evaluator without instantiating your
chain or LLM (maybe common for monitoring, for instance) -> could also
load from run or run type though

What's ugly:
- Personally think it's better to load evaluators one by one since
passing a config down is pretty confusing.
- Lots of testing needs to be added
- Inconsistent in that it makes a separate run and example input mapper
instead of the original `RunEvaluatorInputMapper`, which maps a run and
example to a single input.

Example usage running the for an LLM, Chat Model, and Agent.

```
# Test running for the string evaluators
evaluator_names = ["qa", "criteria"]

model = ChatOpenAI()
configured_evaluators = load_run_evaluators_for_model(evaluator_names, model=model, reference_key="answer")
run_on_dataset(ds_name, model, run_evaluators=configured_evaluators)
```


<details>
  <summary>Full code with dataset upload</summary>
```
## Create dataset
from langchain.evaluation.run_evaluators.loading import load_run_evaluators_for_model
from langchain.evaluation import load_dataset
import pandas as pd

lcds = load_dataset("llm-math")
df = pd.DataFrame(lcds)

from uuid import uuid4
from langsmith import Client
client = Client()
ds_name = "llm-math - " + str(uuid4())[0:8]
ds = client.upload_dataframe(df, name=ds_name, input_keys=["question"], output_keys=["answer"])



## Define the models we'll test over
from langchain.llms import OpenAI
from langchain.chat_models import ChatOpenAI
from langchain.agents import initialize_agent, AgentType

from langchain.tools import tool

llm = OpenAI(temperature=0)
chat_model = ChatOpenAI(temperature=0)

@tool
    def sum(a: float, b: float) -> float:
        """Add two numbers"""
        return a + b
    
def construct_agent():
    return initialize_agent(
        llm=chat_model,
        tools=[sum],
        agent=AgentType.OPENAI_MULTI_FUNCTIONS,
    )

agent = construct_agent()

# Test running for the string evaluators
evaluator_names = ["qa", "criteria"]

models = [llm, chat_model, agent]
run_evaluators = []
for model in models:
    run_evaluators.append(load_run_evaluators_for_model(evaluator_names, model=model, reference_key="answer"))
    

# Run on LLM, Chat Model, and Agent
from langchain.client.runner_utils import run_on_dataset

to_test = [llm, chat_model, construct_agent]

for model, configured_evaluators in zip(to_test, run_evaluators):
    run_on_dataset(ds_name, model, run_evaluators=configured_evaluators, verbose=True)
```
</details>

---------

Co-authored-by: Nuno Campos <nuno@boringbits.io>
2023-07-07 19:57:59 -07:00
Bagatur
1ac347b4e3 update databerry-chaindesk redirect (#7378) 2023-07-07 19:11:46 -04:00
Joshua Carroll
705d2f5b92 Update the API Reference link in Streamlit integration docs (#7377)
This page:


https://python.langchain.com/docs/modules/callbacks/integrations/streamlit

Has a bad API Reference link currently. This PR fixes it to the correct
link.

Also updates the embedded app link to
https://langchain-mrkl.streamlit.app/ (better name) which is hosted in
langchain-ai/streamlit-agent repo
2023-07-07 17:35:57 -04:00
Georges Petrov
ec033ae277 Rename Databerry to Chaindesk (#7022)
---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-07-07 17:28:04 -04:00
Philip Meier
da5b0723d2 update MosaicML inputs and outputs (#7348)
As of today (July 7, 2023), the [MosaicML
API](https://docs.mosaicml.com/en/latest/inference.html#text-completion-requests)
uses `"inputs"` for the prompt

This PR adds support for this new format.
---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-07-07 17:23:11 -04:00
Bearnardd
184ede4e48 Fix buggy output from GraphQAChain (#7372)
fixes https://github.com/hwchase17/langchain/issues/7289
A simple fix of the buggy output of `graph_qa`. If we have several
entities with triplets then the last entry of `triplets` for a given
entity merges with the first entry of the `triplets` of the next entity.
2023-07-07 17:19:53 -04:00
Harrison Chase
7cdf97ba9b Harrison/add to imports (#7370)
pgvector cleanup
2023-07-07 16:27:44 -04:00
Bagatur
4d427b2397 Base language model docstrings (#7104) 2023-07-07 16:09:10 -04:00
ॐ shivam mamgain
2179d4eef8 Fix for KeyError in MlflowCallbackHandler (#7051)
- Description: `MlflowCallbackHandler` fails with `KeyError: "['name']
not in index"`. See https://github.com/hwchase17/langchain/issues/5770
for more details. Root cause is that LangChain does not pass "name" as a
part of `serialized` argument to `on_llm_start()` callback method. The
commit where this change was made is probably this:
18af149e91.
My bug fix derives "name" from "id" field.
  - Issue: https://github.com/hwchase17/langchain/issues/5770
---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-07-07 16:08:06 -04:00
Alex Gamble
df746ad821 Add a callback handler for Context (https://getcontext.ai) (#7151)
### Description

Adding a callback handler for Context. Context is a product analytics
platform for AI chat experiences to help you understand how users are
interacting with your product.

I've added the callback library + an example notebook showing its use.

### Dependencies

Requires the user to install the `context-python` library. The library
is lazily-loaded when the callback is instantiated.

### Announcing the feature

We spoke with Harrison a few weeks ago about also doing a blog post
announcing our integration, so will coordinate this with him. Our
Twitter handle for the company is @getcontextai, and the founders are
@_agamble and @HenrySG.

Thanks in advance!
2023-07-07 15:33:29 -04:00
Austin
c9a0f24646 Add verbose parameter for llamacpp (#7253)
**Title:** Add verbose parameter for llamacpp

**Description:**
This pull request adds a 'verbose' parameter to the llamacpp module. The
'verbose' parameter, when set to True, will enable the output of
detailed logs during the execution of the Llama model. This added
parameter can aid in debugging and understanding the internal processes
of the module.

The verbose parameter is a boolean that prints verbose output to stderr
when set to True. By default, the verbose parameter is set to True but
can be toggled off if less output is desired. This new parameter has
been added to the `validate_environment` method of the `LlamaCpp` class
which initializes the `llama_cpp.Llama` API:

```python
class LlamaCpp(LLM):
    ...
    @root_validator()
    def validate_environment(cls, values: Dict) -> Dict:
        ...
        model_param_names = [
            ...
            "verbose",  # New verbose parameter added
        ]
        ...
        values["client"] = Llama(model_path, **model_params)
        ...
```
---------

Signed-off-by: teleprint-me <77757836+teleprint-me@users.noreply.github.com>
2023-07-07 15:08:25 -04:00
Kenny
34a2755a54 Allow passing api key into OpenAIWhisperParser (#7281)
This just allows the user to pass in an api_key directly into
OpenAIWhisperParser. Very simple addition.
2023-07-07 15:07:45 -04:00
mrkhalil6
4e7d0c115b Add support for filters and namespaces in similarity search in Pinecone similarity_score_threshold (#7301)
At the moment, pinecone vectorStore does not support filters and
namespaces when using similarity_score_threshold search type.
In this PR, I've implemented that. It passes all the kwargs except
"score_threshold" as that is not a supported argument for method
"similarity_search_with_score".
---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-07-07 15:03:59 -04:00
Manuel Saelices
01dca1e438 Add context to an output parsing error on Pydantic schema to improve exception handling (#7344)
## Changes

- [X] Fill the `llm_output` param when there is an output parsing error
in a Pydantic schema so that we can get the original text that failed to
parse when handling the exception

## Background

With this change, we could do something like this:

```
output_parser = PydanticOutputParser(pydantic_object=pydantic_obj)
chain = ConversationChain(..., output_parser=output_parser)
try:
    response: PydanticSchema = chain.predict(input=input)
except OutputParserException as exc:
    logger.error(
        'OutputParserException while parsing chatbot response: %s', exc.llm_output,
    )
```
---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-07-07 14:49:37 -04:00
Raouf Chebri
1ac6deda89 update extension name (#7359)
hi @rlancemartin ,

We had a new deployment and the `pg_extension` creation command was
updated from `CREATE EXTENSION pg_embedding` to `CREATE EXTENSION
embedding`.

https://github.com/neondatabase/neon/pull/4646

The extension not made public yet. No users will be affected by this.
Will be public next week.

Please let me know if you have any questions.

Thank you in advance 🙏
2023-07-07 11:35:51 -07:00
William FH
4e180dc54e Unset Cache in Tests (#7362)
This is impacting other unit tests that use callbacks since the cache is
still set (just empty)
2023-07-07 11:05:09 -07:00
German Martin
3ce4e46c8c The Fellowship of the Vectors: New Embeddings Filter using clustering. (#7015)
Continuing with Tolkien inspired series of langchain tools. I bring to
you:
**The Fellowship of the Vectors**, AKA EmbeddingsClusteringFilter.
This document filter uses embeddings to group vectors together into
clusters, then allows you to pick an arbitrary number of documents
vector based on proximity to the cluster centers. That's a
representative sample of the cluster.

The original idea is from [Greg Kamradt](https://github.com/gkamradt)
from this video (Level4):
https://www.youtube.com/watch?v=qaPMdcCqtWk&t=365s

I added few tricks to make it a bit more versatile, so you can
parametrize what to do with duplicate documents in case of cluster
overlap: replace the duplicates with the next closest document or remove
it. This allow you to use it as an special kind of redundant filter too.
Additionally you can choose 2 diff orders: grouped by cluster or
respecting the original retriever scores.
In my use case I was using the docs grouped by cluster to run refine
chains per cluster to generate summarization over a large corpus of
documents.
Let me know if you want to change anything!

@rlancemartin, @eyurtsev, @hwchase17,

---------

Co-authored-by: rlm <pexpresss31@gmail.com>
2023-07-07 10:28:17 -07:00
Leonid Ganeline
b489466488 docs: dependents update 4 (#7360)
Updated links and counters of the `dependents` page.
2023-07-07 13:22:30 -04:00
William FH
38ca5c84cb Explicitly list requires_reference in function (#7357) 2023-07-07 10:04:03 -07:00
Harrison Chase
49b2b0e3c0 change embedding to None (#7355) 2023-07-07 12:33:03 -04:00
imaprogrammer
a2830e3056 Update chroma.py: Persist directory from client_settings if provided there (#7087)
Change details:
- Description: When calling db.persist(), a check prevents from it
proceeding as the constructor only sets member `_persist_directory` from
parameters. But the ChromaDB client settings also has this parameter,
and if the client_settings parameter is used without passing the
persist_directory (which is optional), the `persist` method raises
`ValueError` for not setting `_persist_directory`. This change fixes it
by setting the member `_persist_directory` variable from client_settings
if it is set, else uses the constructor parameter.
- Issue: I didn't find any github issue of this, but I discovered it
after calling the persist method
  - Dependencies: None
- Tag maintainer: vectorstore related change - @rlancemartin, @eyurtsev
  - Twitter handle: Don't have one :(

*Additional discussion*: We may need to discuss the way I implemented
the fallback using `or`.

---------

Co-authored-by: rlm <pexpresss31@gmail.com>
2023-07-07 09:20:27 -07:00
564 changed files with 21649 additions and 7886 deletions

View File

@@ -95,6 +95,14 @@ To run formatting for this project:
make format
```
Additionally, you can run the formatter only on the files that have been modified in your current branch as compared to the master branch using the format_diff command:
```bash
make format_diff
```
This is especially useful when you have made changes to a subset of the project and want to ensure your changes are properly formatted without affecting the rest of the codebase.
### Linting
Linting for this project is done via a combination of [Black](https://black.readthedocs.io/en/stable/), [isort](https://pycqa.github.io/isort/), [flake8](https://flake8.pycqa.org/en/latest/), and [mypy](http://mypy-lang.org/).
@@ -105,8 +113,42 @@ To run linting for this project:
make lint
```
In addition, you can run the linter only on the files that have been modified in your current branch as compared to the master branch using the lint_diff command:
```bash
make lint_diff
```
This can be very helpful when you've made changes to only certain parts of the project and want to ensure your changes meet the linting standards without having to check the entire codebase.
We recognize linting can be annoying - if you do not want to do it, please contact a project maintainer, and they can help you with it. We do not want this to be a blocker for good code getting contributed.
### Spellcheck
Spellchecking for this project is done via [codespell](https://github.com/codespell-project/codespell).
Note that `codespell` finds common typos, so could have false-positive (correctly spelled but rarely used) and false-negatives (not finding misspelled) words.
To check spelling for this project:
```bash
make spell_check
```
To fix spelling in place:
```bash
make spell_fix
```
If codespell is incorrectly flagging a word, you can skip spellcheck for that word by adding it to the codespell config in the `pyproject.toml` file.
```python
[tool.codespell]
...
# Add here:
ignore-words-list = 'momento,collison,ned,foor,reworkd,parth,whats,aapply,mysogyny,unsecure'
```
### Coverage
Code coverage (i.e. the amount of code that is covered by unit tests) helps identify areas of the code that are potentially more or less brittle.
@@ -208,30 +250,38 @@ When you run `poetry install`, the `langchain` package is installed as editable
### Contribute Documentation
Docs are largely autogenerated by [sphinx](https://www.sphinx-doc.org/en/master/) from the code.
The docs directory contains Documentation and API Reference.
Documentation is built using [Docusaurus 2](https://docusaurus.io/).
API Reference are largely autogenerated by [sphinx](https://www.sphinx-doc.org/en/master/) from the code.
For that reason, we ask that you add good documentation to all classes and methods.
Similar to linting, we recognize documentation can be annoying. If you do not want to do it, please contact a project maintainer, and they can help you with it. We do not want this to be a blocker for good code getting contributed.
### Build Documentation Locally
In the following commands, the prefix `api_` indicates that those are operations for the API Reference.
Before building the documentation, it is always a good idea to clean the build directory:
```bash
make docs_clean
make api_docs_clean
```
Next, you can run the linkchecker to make sure all links are valid:
```bash
make docs_linkcheck
```
Finally, you can build the documentation as outlined below:
Next, you can build the documentation as outlined below:
```bash
make docs_build
make api_docs_build
```
Finally, you can run the linkchecker to make sure all links are valid:
```bash
make docs_linkcheck
make api_docs_linkcheck
```
## 🏭 Release Process

22
.github/workflows/codespell.yml vendored Normal file
View File

@@ -0,0 +1,22 @@
---
name: Codespell
on:
push:
branches: [master]
pull_request:
branches: [master]
permissions:
contents: read
jobs:
codespell:
name: Check for spelling errors
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v3
- name: Codespell
uses: codespell-project/actions-codespell@v2

5
.gitignore vendored
View File

@@ -161,7 +161,12 @@ docs/node_modules/
docs/.docusaurus/
docs/.cache-loader/
docs/_dist
docs/api_reference/api_reference.rst
docs/api_reference/_build
docs/api_reference/*/
!docs/api_reference/_static/
!docs/api_reference/templates/
!docs/api_reference/themes/
docs/docs_skeleton/build
docs/docs_skeleton/node_modules
docs/docs_skeleton/yarn.lock

View File

@@ -1,40 +1,47 @@
.PHONY: all clean format lint test tests test_watch integration_tests docker_tests help extended_tests
.PHONY: all clean docs_build docs_clean docs_linkcheck api_docs_build api_docs_clean api_docs_linkcheck format lint test tests test_watch integration_tests docker_tests help extended_tests
# Default target executed when no arguments are given to make.
all: help
######################
# TESTING AND COVERAGE
######################
# Run unit tests and generate a coverage report.
coverage:
poetry run pytest --cov \
--cov-config=.coveragerc \
--cov-report xml \
--cov-report term-missing:skip-covered
clean: docs_clean
######################
# DOCUMENTATION
######################
clean: docs_clean api_docs_clean
docs_compile:
poetry run nbdoc_build --srcdir $(srcdir)
docs_build:
cd docs && poetry run make html
docs/.local_build.sh
docs_clean:
cd docs && poetry run make clean
rm -r docs/_dist
docs_linkcheck:
poetry run linkchecker docs/_build/html/index.html
poetry run linkchecker docs/_dist/docs_skeleton/ --ignore-url node_modules
format:
poetry run black .
poetry run ruff --select I --fix .
api_docs_build:
poetry run python docs/api_reference/create_api_rst.py
cd docs/api_reference && poetry run make html
PYTHON_FILES=.
lint: PYTHON_FILES=.
lint_diff: PYTHON_FILES=$(shell git diff --name-only --diff-filter=d master | grep -E '\.py$$')
api_docs_clean:
rm -f docs/api_reference/api_reference.rst
cd docs/api_reference && poetry run make clean
lint lint_diff:
poetry run mypy $(PYTHON_FILES)
poetry run black $(PYTHON_FILES) --check
poetry run ruff .
api_docs_linkcheck:
poetry run linkchecker docs/api_reference/_build/html/index.html
# Define a variable for the test file path.
TEST_FILE ?= tests/unit_tests/
test:
@@ -56,6 +63,34 @@ docker_tests:
docker build -t my-langchain-image:test .
docker run --rm my-langchain-image:test
######################
# LINTING AND FORMATTING
######################
# Define a variable for Python and notebook files.
PYTHON_FILES=.
lint format: PYTHON_FILES=.
lint_diff format_diff: PYTHON_FILES=$(shell git diff --name-only --diff-filter=d master | grep -E '\.py$$|\.ipynb$$')
lint lint_diff:
poetry run mypy $(PYTHON_FILES)
poetry run black $(PYTHON_FILES) --check
poetry run ruff .
format format_diff:
poetry run black $(PYTHON_FILES)
poetry run ruff --select I --fix $(PYTHON_FILES)
spell_check:
poetry run codespell --toml pyproject.toml
spell_fix:
poetry run codespell --toml pyproject.toml -w
######################
# HELP
######################
help:
@echo '----'
@echo 'coverage - run unit tests and generate coverage report'

View File

@@ -1,10 +1,15 @@
mkdir _dist
#!/usr/bin/env bash
set -o errexit
set -o nounset
set -o pipefail
set -o xtrace
SCRIPT_DIR="$(cd "$(dirname "$0")"; pwd)"
cd "${SCRIPT_DIR}"
mkdir -p _dist/docs_skeleton
cp -r {docs_skeleton,snippets} _dist
mkdir -p _dist/docs_skeleton/static/api_reference
cd api_reference
poetry run make html
cp -r _build/* ../_dist/docs_skeleton/static/api_reference
cd ..
cp -r extras/* _dist/docs_skeleton/docs
cd _dist/docs_skeleton
poetry run nbdoc_build

File diff suppressed because it is too large Load Diff

View File

@@ -20,7 +20,9 @@ def load_members() -> dict:
cls = re.findall(r"^class ([^_].*)\(", line)
members[top_level]["classes"].extend([module + "." + c for c in cls])
func = re.findall(r"^def ([^_].*)\(", line)
members[top_level]["functions"].extend([module + "." + f for f in func])
afunc = re.findall(r"^async def ([^_].*)\(", line)
func_strings = [module + "." + f for f in func + afunc]
members[top_level]["functions"].extend(func_strings)
return members

View File

Before

Width:  |  Height:  |  Size: 157 KiB

After

Width:  |  Height:  |  Size: 157 KiB

View File

@@ -3,6 +3,8 @@ sidebar_position: 0
---
# Integrations
Visit the [Integrations Hub](https://integrations.langchain.com) to further explore, upvote and request integrations across key LangChain components.
import DocCardList from "@theme/DocCardList";
<DocCardList />

View File

@@ -0,0 +1,12 @@
# LangSmith
import DocCardList from "@theme/DocCardList";
LangSmith helps you trace and evaluate your language model applications and intelligent agents to help you
move from prototype to production.
Check out the [interactive walkthrough](walkthrough) below to get started.
For more information, please refer to the [LangSmith documentation](https://docs.smith.langchain.com/)
<DocCardList />

View File

@@ -24,7 +24,7 @@ That means there are two different axes along which you can customize your text
1. How the text is split
2. How the chunk size is measured
## Get started with text splitters
### Get started with text splitters
import GetStarted from "@snippets/modules/data_connection/document_transformers/get_started.mdx"

View File

@@ -8,7 +8,7 @@ Many LLM applications require user-specific data that is not part of the model's
building blocks to load, transform, store and query your data via:
- [Document loaders](/docs/modules/data_connection/document_loaders/): Load documents from many different sources
- [Document transformers](/docs/modules/data_connection/document_transformers/): Split documents, drop redundant documents, and more
- [Document transformers](/docs/modules/data_connection/document_transformers/): Split documents, convert documents into Q&A format, drop redundant documents, and more
- [Text embedding models](/docs/modules/data_connection/text_embedding/): Take unstructured text and turn it into a list of floating point numbers
- [Vector stores](/docs/modules/data_connection/vectorstores/): Store and search over embedded data
- [Retrievers](/docs/modules/data_connection/retrievers/): Query your data

Binary file not shown.

After

Width:  |  Height:  |  Size: 116 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 237 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 173 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 164 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 118 KiB

View File

@@ -138,7 +138,11 @@
},
{
"source": "/en/latest/integrations/databerry.html",
"destination": "/docs/ecosystem/integrations/databerry"
"destination": "/docs/ecosystem/integrations/chaindesk"
},
{
"source": "/docs/ecosystem/integrations/databerry",
"destination": "/docs/ecosystem/integrations/chaindesk"
},
{
"source": "/en/latest/integrations/databricks/databricks.html",
@@ -1330,7 +1334,11 @@
},
{
"source": "/en/latest/modules/indexes/retrievers/examples/databerry.html",
"destination": "/docs/modules/data_connection/retrievers/integrations/databerry"
"destination": "/docs/modules/data_connection/retrievers/integrations/chaindesk"
},
{
"source": "/docs/modules/data_connection/retrievers/integrations/databerry",
"destination": "/docs/modules/data_connection/retrievers/integrations/chaindesk"
},
{
"source": "/en/latest/modules/indexes/retrievers/examples/elastic_search_bm25.html",
@@ -2125,4 +2133,4 @@
"destination": "/docs/:path*"
}
]
}
}

View File

@@ -2,188 +2,261 @@
Dependents stats for `hwchase17/langchain`
[![](https://img.shields.io/static/v1?label=Used%20by&message=5152&color=informational&logo=slickpic)](https://github.com/hwchase17/langchain/network/dependents)
[![](https://img.shields.io/static/v1?label=Used%20by%20(public)&message=172&color=informational&logo=slickpic)](https://github.com/hwchase17/langchain/network/dependents)
[![](https://img.shields.io/static/v1?label=Used%20by%20(private)&message=4980&color=informational&logo=slickpic)](https://github.com/hwchase17/langchain/network/dependents)
[![](https://img.shields.io/static/v1?label=Used%20by%20(stars)&message=17239&color=informational&logo=slickpic)](https://github.com/hwchase17/langchain/network/dependents)
[![](https://img.shields.io/static/v1?label=Used%20by&message=9941&color=informational&logo=slickpic)](https://github.com/hwchase17/langchain/network/dependents)
[![](https://img.shields.io/static/v1?label=Used%20by%20(public)&message=244&color=informational&logo=slickpic)](https://github.com/hwchase17/langchain/network/dependents)
[![](https://img.shields.io/static/v1?label=Used%20by%20(private)&message=9697&color=informational&logo=slickpic)](https://github.com/hwchase17/langchain/network/dependents)
[![](https://img.shields.io/static/v1?label=Used%20by%20(stars)&message=19827&color=informational&logo=slickpic)](https://github.com/hwchase17/langchain/network/dependents)
[update: 2023-05-17; only dependent repositories with Stars > 100]
[update: 2023-07-07; only dependent repositories with Stars > 100]
| Repository | Stars |
| :-------- | -----: |
|[openai/openai-cookbook](https://github.com/openai/openai-cookbook) | 35401 |
|[LAION-AI/Open-Assistant](https://github.com/LAION-AI/Open-Assistant) | 32861 |
|[microsoft/TaskMatrix](https://github.com/microsoft/TaskMatrix) | 32766 |
|[hpcaitech/ColossalAI](https://github.com/hpcaitech/ColossalAI) | 29560 |
|[reworkd/AgentGPT](https://github.com/reworkd/AgentGPT) | 22315 |
|[imartinez/privateGPT](https://github.com/imartinez/privateGPT) | 17474 |
|[openai/chatgpt-retrieval-plugin](https://github.com/openai/chatgpt-retrieval-plugin) | 16923 |
|[mindsdb/mindsdb](https://github.com/mindsdb/mindsdb) | 16112 |
|[jerryjliu/llama_index](https://github.com/jerryjliu/llama_index) | 15407 |
|[mlflow/mlflow](https://github.com/mlflow/mlflow) | 14345 |
|[GaiZhenbiao/ChuanhuChatGPT](https://github.com/GaiZhenbiao/ChuanhuChatGPT) | 10372 |
|[databrickslabs/dolly](https://github.com/databrickslabs/dolly) | 9919 |
|[AIGC-Audio/AudioGPT](https://github.com/AIGC-Audio/AudioGPT) | 8177 |
|[logspace-ai/langflow](https://github.com/logspace-ai/langflow) | 6807 |
|[imClumsyPanda/langchain-ChatGLM](https://github.com/imClumsyPanda/langchain-ChatGLM) | 6087 |
|[arc53/DocsGPT](https://github.com/arc53/DocsGPT) | 5292 |
|[e2b-dev/e2b](https://github.com/e2b-dev/e2b) | 4622 |
|[nsarrazin/serge](https://github.com/nsarrazin/serge) | 4076 |
|[madawei2699/myGPTReader](https://github.com/madawei2699/myGPTReader) | 3952 |
|[zauberzeug/nicegui](https://github.com/zauberzeug/nicegui) | 3952 |
|[go-skynet/LocalAI](https://github.com/go-skynet/LocalAI) | 3762 |
|[GreyDGL/PentestGPT](https://github.com/GreyDGL/PentestGPT) | 3388 |
|[mmabrouk/chatgpt-wrapper](https://github.com/mmabrouk/chatgpt-wrapper) | 3243 |
|[zilliztech/GPTCache](https://github.com/zilliztech/GPTCache) | 3189 |
|[wenda-LLM/wenda](https://github.com/wenda-LLM/wenda) | 3050 |
|[marqo-ai/marqo](https://github.com/marqo-ai/marqo) | 2930 |
|[gkamradt/langchain-tutorials](https://github.com/gkamradt/langchain-tutorials) | 2710 |
|[PrefectHQ/marvin](https://github.com/PrefectHQ/marvin) | 2545 |
|[project-baize/baize-chatbot](https://github.com/project-baize/baize-chatbot) | 2479 |
|[whitead/paper-qa](https://github.com/whitead/paper-qa) | 2399 |
|[langgenius/dify](https://github.com/langgenius/dify) | 2344 |
|[GerevAI/gerev](https://github.com/GerevAI/gerev) | 2283 |
|[hwchase17/chat-langchain](https://github.com/hwchase17/chat-langchain) | 2266 |
|[guangzhengli/ChatFiles](https://github.com/guangzhengli/ChatFiles) | 1903 |
|[Azure-Samples/azure-search-openai-demo](https://github.com/Azure-Samples/azure-search-openai-demo) | 1884 |
|[OpenBMB/BMTools](https://github.com/OpenBMB/BMTools) | 1860 |
|[Farama-Foundation/PettingZoo](https://github.com/Farama-Foundation/PettingZoo) | 1813 |
|[OpenGVLab/Ask-Anything](https://github.com/OpenGVLab/Ask-Anything) | 1571 |
|[IntelligenzaArtificiale/Free-Auto-GPT](https://github.com/IntelligenzaArtificiale/Free-Auto-GPT) | 1480 |
|[hwchase17/notion-qa](https://github.com/hwchase17/notion-qa) | 1464 |
|[NVIDIA/NeMo-Guardrails](https://github.com/NVIDIA/NeMo-Guardrails) | 1419 |
|[Unstructured-IO/unstructured](https://github.com/Unstructured-IO/unstructured) | 1410 |
|[Kav-K/GPTDiscord](https://github.com/Kav-K/GPTDiscord) | 1363 |
|[paulpierre/RasaGPT](https://github.com/paulpierre/RasaGPT) | 1344 |
|[StanGirard/quivr](https://github.com/StanGirard/quivr) | 1330 |
|[lunasec-io/lunasec](https://github.com/lunasec-io/lunasec) | 1318 |
|[vocodedev/vocode-python](https://github.com/vocodedev/vocode-python) | 1286 |
|[agiresearch/OpenAGI](https://github.com/agiresearch/OpenAGI) | 1156 |
|[h2oai/h2ogpt](https://github.com/h2oai/h2ogpt) | 1141 |
|[jina-ai/thinkgpt](https://github.com/jina-ai/thinkgpt) | 1106 |
|[yanqiangmiffy/Chinese-LangChain](https://github.com/yanqiangmiffy/Chinese-LangChain) | 1072 |
|[ttengwang/Caption-Anything](https://github.com/ttengwang/Caption-Anything) | 1064 |
|[jina-ai/dev-gpt](https://github.com/jina-ai/dev-gpt) | 1057 |
|[juncongmoo/chatllama](https://github.com/juncongmoo/chatllama) | 1003 |
|[greshake/llm-security](https://github.com/greshake/llm-security) | 1002 |
|[visual-openllm/visual-openllm](https://github.com/visual-openllm/visual-openllm) | 957 |
|[richardyc/Chrome-GPT](https://github.com/richardyc/Chrome-GPT) | 918 |
|[irgolic/AutoPR](https://github.com/irgolic/AutoPR) | 886 |
|[mmz-001/knowledge_gpt](https://github.com/mmz-001/knowledge_gpt) | 867 |
|[thomas-yanxin/LangChain-ChatGLM-Webui](https://github.com/thomas-yanxin/LangChain-ChatGLM-Webui) | 850 |
|[microsoft/X-Decoder](https://github.com/microsoft/X-Decoder) | 837 |
|[peterw/Chat-with-Github-Repo](https://github.com/peterw/Chat-with-Github-Repo) | 826 |
|[cirediatpl/FigmaChain](https://github.com/cirediatpl/FigmaChain) | 782 |
|[hashintel/hash](https://github.com/hashintel/hash) | 778 |
|[seanpixel/Teenage-AGI](https://github.com/seanpixel/Teenage-AGI) | 773 |
|[jina-ai/langchain-serve](https://github.com/jina-ai/langchain-serve) | 738 |
|[corca-ai/EVAL](https://github.com/corca-ai/EVAL) | 737 |
|[ai-sidekick/sidekick](https://github.com/ai-sidekick/sidekick) | 717 |
|[rlancemartin/auto-evaluator](https://github.com/rlancemartin/auto-evaluator) | 703 |
|[poe-platform/api-bot-tutorial](https://github.com/poe-platform/api-bot-tutorial) | 689 |
|[SamurAIGPT/Camel-AutoGPT](https://github.com/SamurAIGPT/Camel-AutoGPT) | 666 |
|[eyurtsev/kor](https://github.com/eyurtsev/kor) | 608 |
|[run-llama/llama-lab](https://github.com/run-llama/llama-lab) | 559 |
|[namuan/dr-doc-search](https://github.com/namuan/dr-doc-search) | 544 |
|[pieroit/cheshire-cat](https://github.com/pieroit/cheshire-cat) | 520 |
|[griptape-ai/griptape](https://github.com/griptape-ai/griptape) | 514 |
|[getmetal/motorhead](https://github.com/getmetal/motorhead) | 481 |
|[hwchase17/chat-your-data](https://github.com/hwchase17/chat-your-data) | 462 |
|[langchain-ai/langchain-aiplugin](https://github.com/langchain-ai/langchain-aiplugin) | 452 |
|[jina-ai/agentchain](https://github.com/jina-ai/agentchain) | 439 |
|[SamurAIGPT/ChatGPT-Developer-Plugins](https://github.com/SamurAIGPT/ChatGPT-Developer-Plugins) | 437 |
|[alexanderatallah/window.ai](https://github.com/alexanderatallah/window.ai) | 433 |
|[michaelthwan/searchGPT](https://github.com/michaelthwan/searchGPT) | 427 |
|[mpaepper/content-chatbot](https://github.com/mpaepper/content-chatbot) | 425 |
|[mckaywrigley/repo-chat](https://github.com/mckaywrigley/repo-chat) | 422 |
|[whyiyhw/chatgpt-wechat](https://github.com/whyiyhw/chatgpt-wechat) | 421 |
|[freddyaboulton/gradio-tools](https://github.com/freddyaboulton/gradio-tools) | 407 |
|[jonra1993/fastapi-alembic-sqlmodel-async](https://github.com/jonra1993/fastapi-alembic-sqlmodel-async) | 395 |
|[yeagerai/yeagerai-agent](https://github.com/yeagerai/yeagerai-agent) | 383 |
|[akshata29/chatpdf](https://github.com/akshata29/chatpdf) | 374 |
|[OpenGVLab/InternGPT](https://github.com/OpenGVLab/InternGPT) | 368 |
|[ruoccofabrizio/azure-open-ai-embeddings-qna](https://github.com/ruoccofabrizio/azure-open-ai-embeddings-qna) | 358 |
|[101dotxyz/GPTeam](https://github.com/101dotxyz/GPTeam) | 357 |
|[mtenenholtz/chat-twitter](https://github.com/mtenenholtz/chat-twitter) | 354 |
|[amosjyng/langchain-visualizer](https://github.com/amosjyng/langchain-visualizer) | 343 |
|[msoedov/langcorn](https://github.com/msoedov/langcorn) | 334 |
|[showlab/VLog](https://github.com/showlab/VLog) | 330 |
|[continuum-llms/chatgpt-memory](https://github.com/continuum-llms/chatgpt-memory) | 324 |
|[steamship-core/steamship-langchain](https://github.com/steamship-core/steamship-langchain) | 323 |
|[daodao97/chatdoc](https://github.com/daodao97/chatdoc) | 320 |
|[xuwenhao/geektime-ai-course](https://github.com/xuwenhao/geektime-ai-course) | 308 |
|[StevenGrove/GPT4Tools](https://github.com/StevenGrove/GPT4Tools) | 301 |
|[logan-markewich/llama_index_starter_pack](https://github.com/logan-markewich/llama_index_starter_pack) | 300 |
|[andylokandy/gpt-4-search](https://github.com/andylokandy/gpt-4-search) | 299 |
|[Anil-matcha/ChatPDF](https://github.com/Anil-matcha/ChatPDF) | 287 |
|[itamargol/openai](https://github.com/itamargol/openai) | 273 |
|[BlackHC/llm-strategy](https://github.com/BlackHC/llm-strategy) | 267 |
|[momegas/megabots](https://github.com/momegas/megabots) | 259 |
|[bborn/howdoi.ai](https://github.com/bborn/howdoi.ai) | 238 |
|[Cheems-Seminar/grounded-segment-any-parts](https://github.com/Cheems-Seminar/grounded-segment-any-parts) | 232 |
|[ur-whitelab/exmol](https://github.com/ur-whitelab/exmol) | 227 |
|[sullivan-sean/chat-langchainjs](https://github.com/sullivan-sean/chat-langchainjs) | 227 |
|[explosion/spacy-llm](https://github.com/explosion/spacy-llm) | 226 |
|[recalign/RecAlign](https://github.com/recalign/RecAlign) | 218 |
|[jupyterlab/jupyter-ai](https://github.com/jupyterlab/jupyter-ai) | 218 |
|[alvarosevilla95/autolang](https://github.com/alvarosevilla95/autolang) | 215 |
|[conceptofmind/toolformer](https://github.com/conceptofmind/toolformer) | 213 |
|[MagnivOrg/prompt-layer-library](https://github.com/MagnivOrg/prompt-layer-library) | 209 |
|[JohnSnowLabs/nlptest](https://github.com/JohnSnowLabs/nlptest) | 208 |
|[airobotlab/KoChatGPT](https://github.com/airobotlab/KoChatGPT) | 197 |
|[langchain-ai/auto-evaluator](https://github.com/langchain-ai/auto-evaluator) | 195 |
|[yvann-hub/Robby-chatbot](https://github.com/yvann-hub/Robby-chatbot) | 195 |
|[alejandro-ao/langchain-ask-pdf](https://github.com/alejandro-ao/langchain-ask-pdf) | 192 |
|[daveebbelaar/langchain-experiments](https://github.com/daveebbelaar/langchain-experiments) | 189 |
|[NimbleBoxAI/ChainFury](https://github.com/NimbleBoxAI/ChainFury) | 187 |
|[kaleido-lab/dolphin](https://github.com/kaleido-lab/dolphin) | 184 |
|[Anil-matcha/Website-to-Chatbot](https://github.com/Anil-matcha/Website-to-Chatbot) | 183 |
|[plchld/InsightFlow](https://github.com/plchld/InsightFlow) | 180 |
|[OpenBMB/AgentVerse](https://github.com/OpenBMB/AgentVerse) | 166 |
|[benthecoder/ClassGPT](https://github.com/benthecoder/ClassGPT) | 166 |
|[jbrukh/gpt-jargon](https://github.com/jbrukh/gpt-jargon) | 161 |
|[hardbyte/qabot](https://github.com/hardbyte/qabot) | 160 |
|[shaman-ai/agent-actors](https://github.com/shaman-ai/agent-actors) | 153 |
|[radi-cho/datasetGPT](https://github.com/radi-cho/datasetGPT) | 153 |
|[poe-platform/poe-protocol](https://github.com/poe-platform/poe-protocol) | 152 |
|[paolorechia/learn-langchain](https://github.com/paolorechia/learn-langchain) | 149 |
|[ajndkr/lanarky](https://github.com/ajndkr/lanarky) | 149 |
|[fengyuli-dev/multimedia-gpt](https://github.com/fengyuli-dev/multimedia-gpt) | 147 |
|[yasyf/compress-gpt](https://github.com/yasyf/compress-gpt) | 144 |
|[homanp/superagent](https://github.com/homanp/superagent) | 143 |
|[realminchoi/babyagi-ui](https://github.com/realminchoi/babyagi-ui) | 141 |
|[ethanyanjiali/minChatGPT](https://github.com/ethanyanjiali/minChatGPT) | 141 |
|[ccurme/yolopandas](https://github.com/ccurme/yolopandas) | 139 |
|[hwchase17/langchain-streamlit-template](https://github.com/hwchase17/langchain-streamlit-template) | 138 |
|[Jaseci-Labs/jaseci](https://github.com/Jaseci-Labs/jaseci) | 136 |
|[hirokidaichi/wanna](https://github.com/hirokidaichi/wanna) | 135 |
|[Haste171/langchain-chatbot](https://github.com/Haste171/langchain-chatbot) | 134 |
|[jmpaz/promptlib](https://github.com/jmpaz/promptlib) | 130 |
|[Klingefjord/chatgpt-telegram](https://github.com/Klingefjord/chatgpt-telegram) | 130 |
|[filip-michalsky/SalesGPT](https://github.com/filip-michalsky/SalesGPT) | 128 |
|[handrew/browserpilot](https://github.com/handrew/browserpilot) | 128 |
|[shauryr/S2QA](https://github.com/shauryr/S2QA) | 127 |
|[steamship-core/vercel-examples](https://github.com/steamship-core/vercel-examples) | 127 |
|[yasyf/summ](https://github.com/yasyf/summ) | 127 |
|[gia-guar/JARVIS-ChatGPT](https://github.com/gia-guar/JARVIS-ChatGPT) | 126 |
|[jerlendds/osintbuddy](https://github.com/jerlendds/osintbuddy) | 125 |
|[ibiscp/LLM-IMDB](https://github.com/ibiscp/LLM-IMDB) | 124 |
|[Teahouse-Studios/akari-bot](https://github.com/Teahouse-Studios/akari-bot) | 124 |
|[hwchase17/chroma-langchain](https://github.com/hwchase17/chroma-langchain) | 124 |
|[menloparklab/langchain-cohere-qdrant-doc-retrieval](https://github.com/menloparklab/langchain-cohere-qdrant-doc-retrieval) | 123 |
|[peterw/StoryStorm](https://github.com/peterw/StoryStorm) | 123 |
|[chakkaradeep/pyCodeAGI](https://github.com/chakkaradeep/pyCodeAGI) | 123 |
|[petehunt/langchain-github-bot](https://github.com/petehunt/langchain-github-bot) | 115 |
|[su77ungr/CASALIOY](https://github.com/su77ungr/CASALIOY) | 113 |
|[eunomia-bpf/GPTtrace](https://github.com/eunomia-bpf/GPTtrace) | 113 |
|[zenml-io/zenml-projects](https://github.com/zenml-io/zenml-projects) | 112 |
|[pablomarin/GPT-Azure-Search-Engine](https://github.com/pablomarin/GPT-Azure-Search-Engine) | 111 |
|[shamspias/customizable-gpt-chatbot](https://github.com/shamspias/customizable-gpt-chatbot) | 109 |
|[WongSaang/chatgpt-ui-server](https://github.com/WongSaang/chatgpt-ui-server) | 108 |
|[davila7/file-gpt](https://github.com/davila7/file-gpt) | 104 |
|[enhancedocs/enhancedocs](https://github.com/enhancedocs/enhancedocs) | 102 |
|[aurelio-labs/arxiv-bot](https://github.com/aurelio-labs/arxiv-bot) | 101 |
|[openai/openai-cookbook](https://github.com/openai/openai-cookbook) | 41047 |
|[LAION-AI/Open-Assistant](https://github.com/LAION-AI/Open-Assistant) | 33983 |
|[microsoft/TaskMatrix](https://github.com/microsoft/TaskMatrix) | 33375 |
|[imartinez/privateGPT](https://github.com/imartinez/privateGPT) | 31114 |
|[hpcaitech/ColossalAI](https://github.com/hpcaitech/ColossalAI) | 30369 |
|[reworkd/AgentGPT](https://github.com/reworkd/AgentGPT) | 24116 |
|[OpenBB-finance/OpenBBTerminal](https://github.com/OpenBB-finance/OpenBBTerminal) | 22565 |
|[openai/chatgpt-retrieval-plugin](https://github.com/openai/chatgpt-retrieval-plugin) | 18375 |
|[jerryjliu/llama_index](https://github.com/jerryjliu/llama_index) | 17723 |
|[mindsdb/mindsdb](https://github.com/mindsdb/mindsdb) | 16958 |
|[mlflow/mlflow](https://github.com/mlflow/mlflow) | 14632 |
|[GaiZhenbiao/ChuanhuChatGPT](https://github.com/GaiZhenbiao/ChuanhuChatGPT) | 11273 |
|[openai/evals](https://github.com/openai/evals) | 10745 |
|[databrickslabs/dolly](https://github.com/databrickslabs/dolly) | 10298 |
|[imClumsyPanda/langchain-ChatGLM](https://github.com/imClumsyPanda/langchain-ChatGLM) | 9838 |
|[logspace-ai/langflow](https://github.com/logspace-ai/langflow) | 9247 |
|[AIGC-Audio/AudioGPT](https://github.com/AIGC-Audio/AudioGPT) | 8768 |
|[PromtEngineer/localGPT](https://github.com/PromtEngineer/localGPT) | 8651 |
|[StanGirard/quivr](https://github.com/StanGirard/quivr) | 8119 |
|[go-skynet/LocalAI](https://github.com/go-skynet/LocalAI) | 7418 |
|[gventuri/pandas-ai](https://github.com/gventuri/pandas-ai) | 7301 |
|[PipedreamHQ/pipedream](https://github.com/PipedreamHQ/pipedream) | 6636 |
|[arc53/DocsGPT](https://github.com/arc53/DocsGPT) | 5849 |
|[e2b-dev/e2b](https://github.com/e2b-dev/e2b) | 5129 |
|[langgenius/dify](https://github.com/langgenius/dify) | 4804 |
|[serge-chat/serge](https://github.com/serge-chat/serge) | 4448 |
|[csunny/DB-GPT](https://github.com/csunny/DB-GPT) | 4350 |
|[wenda-LLM/wenda](https://github.com/wenda-LLM/wenda) | 4268 |
|[zauberzeug/nicegui](https://github.com/zauberzeug/nicegui) | 4244 |
|[intitni/CopilotForXcode](https://github.com/intitni/CopilotForXcode) | 4232 |
|[GreyDGL/PentestGPT](https://github.com/GreyDGL/PentestGPT) | 4154 |
|[madawei2699/myGPTReader](https://github.com/madawei2699/myGPTReader) | 4080 |
|[zilliztech/GPTCache](https://github.com/zilliztech/GPTCache) | 3949 |
|[gkamradt/langchain-tutorials](https://github.com/gkamradt/langchain-tutorials) | 3920 |
|[bentoml/OpenLLM](https://github.com/bentoml/OpenLLM) | 3481 |
|[MineDojo/Voyager](https://github.com/MineDojo/Voyager) | 3453 |
|[mmabrouk/chatgpt-wrapper](https://github.com/mmabrouk/chatgpt-wrapper) | 3355 |
|[postgresml/postgresml](https://github.com/postgresml/postgresml) | 3328 |
|[marqo-ai/marqo](https://github.com/marqo-ai/marqo) | 3100 |
|[kyegomez/tree-of-thoughts](https://github.com/kyegomez/tree-of-thoughts) | 3049 |
|[PrefectHQ/marvin](https://github.com/PrefectHQ/marvin) | 2844 |
|[project-baize/baize-chatbot](https://github.com/project-baize/baize-chatbot) | 2833 |
|[h2oai/h2ogpt](https://github.com/h2oai/h2ogpt) | 2809 |
|[hwchase17/chat-langchain](https://github.com/hwchase17/chat-langchain) | 2809 |
|[whitead/paper-qa](https://github.com/whitead/paper-qa) | 2664 |
|[Azure-Samples/azure-search-openai-demo](https://github.com/Azure-Samples/azure-search-openai-demo) | 2650 |
|[OpenGVLab/InternGPT](https://github.com/OpenGVLab/InternGPT) | 2525 |
|[GerevAI/gerev](https://github.com/GerevAI/gerev) | 2372 |
|[ParisNeo/lollms-webui](https://github.com/ParisNeo/lollms-webui) | 2287 |
|[OpenBMB/BMTools](https://github.com/OpenBMB/BMTools) | 2265 |
|[SamurAIGPT/privateGPT](https://github.com/SamurAIGPT/privateGPT) | 2084 |
|[Chainlit/chainlit](https://github.com/Chainlit/chainlit) | 1912 |
|[Farama-Foundation/PettingZoo](https://github.com/Farama-Foundation/PettingZoo) | 1869 |
|[OpenGVLab/Ask-Anything](https://github.com/OpenGVLab/Ask-Anything) | 1864 |
|[IntelligenzaArtificiale/Free-Auto-GPT](https://github.com/IntelligenzaArtificiale/Free-Auto-GPT) | 1849 |
|[Unstructured-IO/unstructured](https://github.com/Unstructured-IO/unstructured) | 1766 |
|[yanqiangmiffy/Chinese-LangChain](https://github.com/yanqiangmiffy/Chinese-LangChain) | 1745 |
|[NVIDIA/NeMo-Guardrails](https://github.com/NVIDIA/NeMo-Guardrails) | 1732 |
|[hwchase17/notion-qa](https://github.com/hwchase17/notion-qa) | 1716 |
|[paulpierre/RasaGPT](https://github.com/paulpierre/RasaGPT) | 1619 |
|[pinterest/querybook](https://github.com/pinterest/querybook) | 1468 |
|[vocodedev/vocode-python](https://github.com/vocodedev/vocode-python) | 1446 |
|[thomas-yanxin/LangChain-ChatGLM-Webui](https://github.com/thomas-yanxin/LangChain-ChatGLM-Webui) | 1430 |
|[Mintplex-Labs/anything-llm](https://github.com/Mintplex-Labs/anything-llm) | 1419 |
|[Kav-K/GPTDiscord](https://github.com/Kav-K/GPTDiscord) | 1416 |
|[lunasec-io/lunasec](https://github.com/lunasec-io/lunasec) | 1327 |
|[psychic-api/psychic](https://github.com/psychic-api/psychic) | 1307 |
|[jina-ai/thinkgpt](https://github.com/jina-ai/thinkgpt) | 1242 |
|[agiresearch/OpenAGI](https://github.com/agiresearch/OpenAGI) | 1239 |
|[ttengwang/Caption-Anything](https://github.com/ttengwang/Caption-Anything) | 1203 |
|[jina-ai/dev-gpt](https://github.com/jina-ai/dev-gpt) | 1179 |
|[keephq/keep](https://github.com/keephq/keep) | 1169 |
|[greshake/llm-security](https://github.com/greshake/llm-security) | 1156 |
|[richardyc/Chrome-GPT](https://github.com/richardyc/Chrome-GPT) | 1090 |
|[jina-ai/langchain-serve](https://github.com/jina-ai/langchain-serve) | 1088 |
|[mmz-001/knowledge_gpt](https://github.com/mmz-001/knowledge_gpt) | 1074 |
|[juncongmoo/chatllama](https://github.com/juncongmoo/chatllama) | 1057 |
|[noahshinn024/reflexion](https://github.com/noahshinn024/reflexion) | 1045 |
|[visual-openllm/visual-openllm](https://github.com/visual-openllm/visual-openllm) | 1036 |
|[101dotxyz/GPTeam](https://github.com/101dotxyz/GPTeam) | 999 |
|[poe-platform/api-bot-tutorial](https://github.com/poe-platform/api-bot-tutorial) | 989 |
|[irgolic/AutoPR](https://github.com/irgolic/AutoPR) | 974 |
|[homanp/superagent](https://github.com/homanp/superagent) | 970 |
|[microsoft/X-Decoder](https://github.com/microsoft/X-Decoder) | 941 |
|[peterw/Chat-with-Github-Repo](https://github.com/peterw/Chat-with-Github-Repo) | 896 |
|[SamurAIGPT/Camel-AutoGPT](https://github.com/SamurAIGPT/Camel-AutoGPT) | 856 |
|[cirediatpl/FigmaChain](https://github.com/cirediatpl/FigmaChain) | 840 |
|[chatarena/chatarena](https://github.com/chatarena/chatarena) | 829 |
|[rlancemartin/auto-evaluator](https://github.com/rlancemartin/auto-evaluator) | 816 |
|[seanpixel/Teenage-AGI](https://github.com/seanpixel/Teenage-AGI) | 816 |
|[hashintel/hash](https://github.com/hashintel/hash) | 806 |
|[corca-ai/EVAL](https://github.com/corca-ai/EVAL) | 790 |
|[eyurtsev/kor](https://github.com/eyurtsev/kor) | 752 |
|[cheshire-cat-ai/core](https://github.com/cheshire-cat-ai/core) | 713 |
|[e-johnstonn/BriefGPT](https://github.com/e-johnstonn/BriefGPT) | 686 |
|[run-llama/llama-lab](https://github.com/run-llama/llama-lab) | 685 |
|[refuel-ai/autolabel](https://github.com/refuel-ai/autolabel) | 673 |
|[griptape-ai/griptape](https://github.com/griptape-ai/griptape) | 617 |
|[billxbf/ReWOO](https://github.com/billxbf/ReWOO) | 616 |
|[Anil-matcha/ChatPDF](https://github.com/Anil-matcha/ChatPDF) | 609 |
|[NimbleBoxAI/ChainFury](https://github.com/NimbleBoxAI/ChainFury) | 592 |
|[getmetal/motorhead](https://github.com/getmetal/motorhead) | 581 |
|[ajndkr/lanarky](https://github.com/ajndkr/lanarky) | 574 |
|[namuan/dr-doc-search](https://github.com/namuan/dr-doc-search) | 572 |
|[kreneskyp/ix](https://github.com/kreneskyp/ix) | 564 |
|[akshata29/chatpdf](https://github.com/akshata29/chatpdf) | 540 |
|[hwchase17/chat-your-data](https://github.com/hwchase17/chat-your-data) | 540 |
|[whyiyhw/chatgpt-wechat](https://github.com/whyiyhw/chatgpt-wechat) | 537 |
|[khoj-ai/khoj](https://github.com/khoj-ai/khoj) | 531 |
|[SamurAIGPT/ChatGPT-Developer-Plugins](https://github.com/SamurAIGPT/ChatGPT-Developer-Plugins) | 528 |
|[microsoft/PodcastCopilot](https://github.com/microsoft/PodcastCopilot) | 526 |
|[ruoccofabrizio/azure-open-ai-embeddings-qna](https://github.com/ruoccofabrizio/azure-open-ai-embeddings-qna) | 515 |
|[alexanderatallah/window.ai](https://github.com/alexanderatallah/window.ai) | 494 |
|[StevenGrove/GPT4Tools](https://github.com/StevenGrove/GPT4Tools) | 483 |
|[jina-ai/agentchain](https://github.com/jina-ai/agentchain) | 472 |
|[mckaywrigley/repo-chat](https://github.com/mckaywrigley/repo-chat) | 465 |
|[yeagerai/yeagerai-agent](https://github.com/yeagerai/yeagerai-agent) | 464 |
|[langchain-ai/langchain-aiplugin](https://github.com/langchain-ai/langchain-aiplugin) | 464 |
|[mpaepper/content-chatbot](https://github.com/mpaepper/content-chatbot) | 455 |
|[michaelthwan/searchGPT](https://github.com/michaelthwan/searchGPT) | 455 |
|[freddyaboulton/gradio-tools](https://github.com/freddyaboulton/gradio-tools) | 450 |
|[amosjyng/langchain-visualizer](https://github.com/amosjyng/langchain-visualizer) | 446 |
|[msoedov/langcorn](https://github.com/msoedov/langcorn) | 445 |
|[plastic-labs/tutor-gpt](https://github.com/plastic-labs/tutor-gpt) | 426 |
|[poe-platform/poe-protocol](https://github.com/poe-platform/poe-protocol) | 426 |
|[jonra1993/fastapi-alembic-sqlmodel-async](https://github.com/jonra1993/fastapi-alembic-sqlmodel-async) | 418 |
|[langchain-ai/auto-evaluator](https://github.com/langchain-ai/auto-evaluator) | 416 |
|[steamship-core/steamship-langchain](https://github.com/steamship-core/steamship-langchain) | 401 |
|[xuwenhao/geektime-ai-course](https://github.com/xuwenhao/geektime-ai-course) | 400 |
|[continuum-llms/chatgpt-memory](https://github.com/continuum-llms/chatgpt-memory) | 386 |
|[mtenenholtz/chat-twitter](https://github.com/mtenenholtz/chat-twitter) | 382 |
|[explosion/spacy-llm](https://github.com/explosion/spacy-llm) | 368 |
|[showlab/VLog](https://github.com/showlab/VLog) | 363 |
|[yvann-hub/Robby-chatbot](https://github.com/yvann-hub/Robby-chatbot) | 363 |
|[daodao97/chatdoc](https://github.com/daodao97/chatdoc) | 361 |
|[opentensor/bittensor](https://github.com/opentensor/bittensor) | 360 |
|[alejandro-ao/langchain-ask-pdf](https://github.com/alejandro-ao/langchain-ask-pdf) | 355 |
|[logan-markewich/llama_index_starter_pack](https://github.com/logan-markewich/llama_index_starter_pack) | 351 |
|[jupyterlab/jupyter-ai](https://github.com/jupyterlab/jupyter-ai) | 348 |
|[alejandro-ao/ask-multiple-pdfs](https://github.com/alejandro-ao/ask-multiple-pdfs) | 321 |
|[andylokandy/gpt-4-search](https://github.com/andylokandy/gpt-4-search) | 314 |
|[mosaicml/examples](https://github.com/mosaicml/examples) | 313 |
|[personoids/personoids-lite](https://github.com/personoids/personoids-lite) | 306 |
|[itamargol/openai](https://github.com/itamargol/openai) | 304 |
|[Anil-matcha/Website-to-Chatbot](https://github.com/Anil-matcha/Website-to-Chatbot) | 299 |
|[momegas/megabots](https://github.com/momegas/megabots) | 299 |
|[BlackHC/llm-strategy](https://github.com/BlackHC/llm-strategy) | 289 |
|[daveebbelaar/langchain-experiments](https://github.com/daveebbelaar/langchain-experiments) | 283 |
|[wandb/weave](https://github.com/wandb/weave) | 279 |
|[Cheems-Seminar/grounded-segment-any-parts](https://github.com/Cheems-Seminar/grounded-segment-any-parts) | 273 |
|[jerlendds/osintbuddy](https://github.com/jerlendds/osintbuddy) | 271 |
|[OpenBMB/AgentVerse](https://github.com/OpenBMB/AgentVerse) | 270 |
|[MagnivOrg/prompt-layer-library](https://github.com/MagnivOrg/prompt-layer-library) | 269 |
|[sullivan-sean/chat-langchainjs](https://github.com/sullivan-sean/chat-langchainjs) | 259 |
|[Azure-Samples/openai](https://github.com/Azure-Samples/openai) | 252 |
|[bborn/howdoi.ai](https://github.com/bborn/howdoi.ai) | 248 |
|[hnawaz007/pythondataanalysis](https://github.com/hnawaz007/pythondataanalysis) | 247 |
|[conceptofmind/toolformer](https://github.com/conceptofmind/toolformer) | 243 |
|[truera/trulens](https://github.com/truera/trulens) | 239 |
|[ur-whitelab/exmol](https://github.com/ur-whitelab/exmol) | 238 |
|[intel/intel-extension-for-transformers](https://github.com/intel/intel-extension-for-transformers) | 237 |
|[monarch-initiative/ontogpt](https://github.com/monarch-initiative/ontogpt) | 236 |
|[wandb/edu](https://github.com/wandb/edu) | 231 |
|[recalign/RecAlign](https://github.com/recalign/RecAlign) | 229 |
|[alvarosevilla95/autolang](https://github.com/alvarosevilla95/autolang) | 223 |
|[kaleido-lab/dolphin](https://github.com/kaleido-lab/dolphin) | 221 |
|[JohnSnowLabs/nlptest](https://github.com/JohnSnowLabs/nlptest) | 220 |
|[paolorechia/learn-langchain](https://github.com/paolorechia/learn-langchain) | 219 |
|[Safiullah-Rahu/CSV-AI](https://github.com/Safiullah-Rahu/CSV-AI) | 215 |
|[Haste171/langchain-chatbot](https://github.com/Haste171/langchain-chatbot) | 215 |
|[steamship-packages/langchain-agent-production-starter](https://github.com/steamship-packages/langchain-agent-production-starter) | 214 |
|[airobotlab/KoChatGPT](https://github.com/airobotlab/KoChatGPT) | 213 |
|[filip-michalsky/SalesGPT](https://github.com/filip-michalsky/SalesGPT) | 211 |
|[marella/chatdocs](https://github.com/marella/chatdocs) | 207 |
|[su77ungr/CASALIOY](https://github.com/su77ungr/CASALIOY) | 200 |
|[shaman-ai/agent-actors](https://github.com/shaman-ai/agent-actors) | 195 |
|[plchld/InsightFlow](https://github.com/plchld/InsightFlow) | 189 |
|[jbrukh/gpt-jargon](https://github.com/jbrukh/gpt-jargon) | 186 |
|[hwchase17/langchain-streamlit-template](https://github.com/hwchase17/langchain-streamlit-template) | 185 |
|[huchenxucs/ChatDB](https://github.com/huchenxucs/ChatDB) | 179 |
|[benthecoder/ClassGPT](https://github.com/benthecoder/ClassGPT) | 178 |
|[hwchase17/chroma-langchain](https://github.com/hwchase17/chroma-langchain) | 178 |
|[radi-cho/datasetGPT](https://github.com/radi-cho/datasetGPT) | 177 |
|[jiran214/GPT-vup](https://github.com/jiran214/GPT-vup) | 176 |
|[rsaryev/talk-codebase](https://github.com/rsaryev/talk-codebase) | 174 |
|[edreisMD/plugnplai](https://github.com/edreisMD/plugnplai) | 174 |
|[gia-guar/JARVIS-ChatGPT](https://github.com/gia-guar/JARVIS-ChatGPT) | 172 |
|[hardbyte/qabot](https://github.com/hardbyte/qabot) | 171 |
|[shamspias/customizable-gpt-chatbot](https://github.com/shamspias/customizable-gpt-chatbot) | 165 |
|[gustavz/DataChad](https://github.com/gustavz/DataChad) | 164 |
|[yasyf/compress-gpt](https://github.com/yasyf/compress-gpt) | 163 |
|[SamPink/dev-gpt](https://github.com/SamPink/dev-gpt) | 161 |
|[yuanjie-ai/ChatLLM](https://github.com/yuanjie-ai/ChatLLM) | 161 |
|[pablomarin/GPT-Azure-Search-Engine](https://github.com/pablomarin/GPT-Azure-Search-Engine) | 160 |
|[jondurbin/airoboros](https://github.com/jondurbin/airoboros) | 157 |
|[fengyuli-dev/multimedia-gpt](https://github.com/fengyuli-dev/multimedia-gpt) | 157 |
|[PradipNichite/Youtube-Tutorials](https://github.com/PradipNichite/Youtube-Tutorials) | 156 |
|[nicknochnack/LangchainDocuments](https://github.com/nicknochnack/LangchainDocuments) | 155 |
|[ethanyanjiali/minChatGPT](https://github.com/ethanyanjiali/minChatGPT) | 155 |
|[ccurme/yolopandas](https://github.com/ccurme/yolopandas) | 154 |
|[chakkaradeep/pyCodeAGI](https://github.com/chakkaradeep/pyCodeAGI) | 153 |
|[preset-io/promptimize](https://github.com/preset-io/promptimize) | 150 |
|[onlyphantom/llm-python](https://github.com/onlyphantom/llm-python) | 148 |
|[Azure-Samples/azure-search-power-skills](https://github.com/Azure-Samples/azure-search-power-skills) | 146 |
|[realminchoi/babyagi-ui](https://github.com/realminchoi/babyagi-ui) | 144 |
|[microsoft/azure-openai-in-a-day-workshop](https://github.com/microsoft/azure-openai-in-a-day-workshop) | 144 |
|[jmpaz/promptlib](https://github.com/jmpaz/promptlib) | 143 |
|[shauryr/S2QA](https://github.com/shauryr/S2QA) | 142 |
|[handrew/browserpilot](https://github.com/handrew/browserpilot) | 141 |
|[Jaseci-Labs/jaseci](https://github.com/Jaseci-Labs/jaseci) | 140 |
|[Klingefjord/chatgpt-telegram](https://github.com/Klingefjord/chatgpt-telegram) | 140 |
|[WongSaang/chatgpt-ui-server](https://github.com/WongSaang/chatgpt-ui-server) | 139 |
|[ibiscp/LLM-IMDB](https://github.com/ibiscp/LLM-IMDB) | 139 |
|[menloparklab/langchain-cohere-qdrant-doc-retrieval](https://github.com/menloparklab/langchain-cohere-qdrant-doc-retrieval) | 138 |
|[hirokidaichi/wanna](https://github.com/hirokidaichi/wanna) | 137 |
|[steamship-core/vercel-examples](https://github.com/steamship-core/vercel-examples) | 137 |
|[deeppavlov/dream](https://github.com/deeppavlov/dream) | 136 |
|[miaoshouai/miaoshouai-assistant](https://github.com/miaoshouai/miaoshouai-assistant) | 135 |
|[sugarforever/LangChain-Tutorials](https://github.com/sugarforever/LangChain-Tutorials) | 135 |
|[yasyf/summ](https://github.com/yasyf/summ) | 135 |
|[peterw/StoryStorm](https://github.com/peterw/StoryStorm) | 134 |
|[vaibkumr/prompt-optimizer](https://github.com/vaibkumr/prompt-optimizer) | 132 |
|[ju-bezdek/langchain-decorators](https://github.com/ju-bezdek/langchain-decorators) | 130 |
|[homanp/vercel-langchain](https://github.com/homanp/vercel-langchain) | 128 |
|[Teahouse-Studios/akari-bot](https://github.com/Teahouse-Studios/akari-bot) | 127 |
|[petehunt/langchain-github-bot](https://github.com/petehunt/langchain-github-bot) | 125 |
|[eunomia-bpf/GPTtrace](https://github.com/eunomia-bpf/GPTtrace) | 122 |
|[fixie-ai/fixie-examples](https://github.com/fixie-ai/fixie-examples) | 122 |
|[Aggregate-Intellect/practical-llms](https://github.com/Aggregate-Intellect/practical-llms) | 120 |
|[davila7/file-gpt](https://github.com/davila7/file-gpt) | 120 |
|[Azure-Samples/azure-search-openai-demo-csharp](https://github.com/Azure-Samples/azure-search-openai-demo-csharp) | 119 |
|[prof-frink-lab/slangchain](https://github.com/prof-frink-lab/slangchain) | 117 |
|[aurelio-labs/arxiv-bot](https://github.com/aurelio-labs/arxiv-bot) | 117 |
|[zenml-io/zenml-projects](https://github.com/zenml-io/zenml-projects) | 116 |
|[flurb18/AgentOoba](https://github.com/flurb18/AgentOoba) | 114 |
|[kaarthik108/snowChat](https://github.com/kaarthik108/snowChat) | 112 |
|[RedisVentures/redis-openai-qna](https://github.com/RedisVentures/redis-openai-qna) | 111 |
|[solana-labs/chatgpt-plugin](https://github.com/solana-labs/chatgpt-plugin) | 111 |
|[kulltc/chatgpt-sql](https://github.com/kulltc/chatgpt-sql) | 109 |
|[summarizepaper/summarizepaper](https://github.com/summarizepaper/summarizepaper) | 109 |
|[Azure-Samples/miyagi](https://github.com/Azure-Samples/miyagi) | 106 |
|[ssheng/BentoChain](https://github.com/ssheng/BentoChain) | 106 |
|[voxel51/voxelgpt](https://github.com/voxel51/voxelgpt) | 105 |
|[mallahyari/drqa](https://github.com/mallahyari/drqa) | 103 |

View File

@@ -120,7 +120,8 @@
" history = []\n",
" while True:\n",
" user_input = input(\"\\n>>> input >>>\\n>>>: \")\n",
" if user_input == 'q': break\n",
" if user_input == \"q\":\n",
" break\n",
" history.append(HumanMessage(content=user_input))\n",
" history.append(llm(history))"
]

View File

@@ -1,17 +1,17 @@
# Databerry
# Chaindesk
>[Databerry](https://databerry.ai) is an [open source](https://github.com/gmpetrov/databerry) document retrieval platform that helps to connect your personal data with Large Language Models.
>[Chaindesk](https://chaindesk.ai) is an [open source](https://github.com/gmpetrov/databerry) document retrieval platform that helps to connect your personal data with Large Language Models.
## Installation and Setup
We need to sign up for Databerry, create a datastore, add some data and get your datastore api endpoint url.
We need the [API Key](https://docs.databerry.ai/api-reference/authentication).
We need to sign up for Chaindesk, create a datastore, add some data and get your datastore api endpoint url.
We need the [API Key](https://docs.chaindesk.ai/api-reference/authentication).
## Retriever
See a [usage example](/docs/modules/data_connection/retrievers/integrations/databerry.html).
See a [usage example](/docs/modules/data_connection/retrievers/integrations/chaindesk.html).
```python
from langchain.retrievers import DataberryRetriever
from langchain.retrievers import ChaindeskRetriever
```

View File

@@ -8,7 +8,7 @@ pip install cnos-connector
```
## Connecting to CnosDB
You can connect to CnosDB using the SQLDatabase.from_cnosdb() method.
You can connect to CnosDB using the `SQLDatabase.from_cnosdb()` method.
### Syntax
```python
def SQLDatabase.from_cnosdb(url: str = "127.0.0.1:8902",
@@ -31,7 +31,6 @@ Args:
## Examples
```python
# Connecting to CnosDB with SQLDatabase Wrapper
from cnosdb_connector import make_cnosdb_langchain_uri
from langchain import SQLDatabase
db = SQLDatabase.from_cnosdb()
@@ -43,7 +42,7 @@ from langchain.chat_models import ChatOpenAI
llm = ChatOpenAI(temperature=0, model_name="gpt-3.5-turbo")
```
### SQL Chain
### SQL Database Chain
This example demonstrates the use of the SQL Chain for answering a question over a CnosDB.
```python
from langchain import SQLDatabaseChain
@@ -51,15 +50,15 @@ from langchain import SQLDatabaseChain
db_chain = SQLDatabaseChain.from_llm(llm, db, verbose=True)
db_chain.run(
"What is the average fa of test table that time between November 3,2022 and November 4, 2022?"
"What is the average temperature of air at station XiaoMaiDao between October 19, 2022 and Occtober 20, 2022?"
)
```
```shell
> Entering new chain...
What is the average fa of test table that time between November 3, 2022 and November 4, 2022?
SQLQuery:SELECT AVG(fa) FROM test WHERE time >= '2022-11-03' AND time < '2022-11-04'
SQLResult: [(2.0,)]
Answer:The average fa of the test table between November 3, 2022, and November 4, 2022, is 2.0.
What is the average temperature of air at station XiaoMaiDao between October 19, 2022 and Occtober 20, 2022?
SQLQuery:SELECT AVG(temperature) FROM air WHERE station = 'XiaoMaiDao' AND time >= '2022-10-19' AND time < '2022-10-20'
SQLResult: [(68.0,)]
Answer:The average temperature of air at station XiaoMaiDao between October 19, 2022 and October 20, 2022 is 68.0.
> Finished chain.
```
### SQL Database Agent
@@ -73,36 +72,39 @@ agent = create_sql_agent(llm=llm, toolkit=toolkit, verbose=True)
```
```python
agent.run(
"What is the average fa of test table that time between November 3, 2022 and November 4, 2022?"
"What is the average temperature of air at station XiaoMaiDao between October 19, 2022 and Occtober 20, 2022?"
)
```
```shell
> Entering new chain...
Action: sql_db_list_tables
Action Input: ""
Observation: test
Thought:The relevant table is "test". I should query the schema of this table to see the column names.
Observation: air
Thought:The "air" table seems relevant to the question. I should query the schema of the "air" table to see what columns are available.
Action: sql_db_schema
Action Input: "test"
Action Input: "air"
Observation:
CREATE TABLE test (
CREATE TABLE air (
pressure FLOAT,
station STRING,
temperature FLOAT,
time TIMESTAMP,
fa BIGINT
visibility FLOAT
)
/*
3 rows from test table:
fa time
1 2022-11-03T06:20:11
2 2022-11-03T06:20:11.000000001
3 2022-11-03T06:20:11.000000002
3 rows from air table:
pressure station temperature time visibility
75.0 XiaoMaiDao 67.0 2022-10-19T03:40:00 54.0
77.0 XiaoMaiDao 69.0 2022-10-19T04:40:00 56.0
76.0 XiaoMaiDao 68.0 2022-10-19T05:40:00 55.0
*/
Thought:The relevant column is "fa" in the "test" table. I can now construct the query to calculate the average "fa" between the specified time range.
Thought:The "temperature" column in the "air" table is relevant to the question. I can query the average temperature between the specified dates.
Action: sql_db_query
Action Input: "SELECT AVG(fa) FROM test WHERE time >= '2022-11-03' AND time < '2022-11-04'"
Observation: [(2.0,)]
Thought:The average "fa" of the "test" table between November 3, 2022 and November 4, 2022 is 2.0.
Final Answer: 2.0
Action Input: "SELECT AVG(temperature) FROM air WHERE station = 'XiaoMaiDao' AND time >= '2022-10-19' AND time <= '2022-10-20'"
Observation: [(68.0,)]
Thought:The average temperature of air at station XiaoMaiDao between October 19, 2022 and October 20, 2022 is 68.0.
Final Answer: 68.0
> Finished chain.
```

View File

@@ -0,0 +1,19 @@
# Datadog Logs
>[Datadog](https://www.datadoghq.com/) is a monitoring and analytics platform for cloud-scale applications.
## Installation and Setup
```bash
pip install datadog_api_client
```
We must initialize the loader with the Datadog API key and APP key, and we need to set up the query to extract the desired logs.
## Document Loader
See a [usage example](/docs/modules/data_connection/document_loaders/integrations/datadog_logs.html).
```python
from langchain.document_loaders import DatadogLogsLoader
```

View File

@@ -1,7 +1,7 @@
# Grobid
This page covers how to use the Grobid to parse articles for LangChain.
It is seperated into two parts: installation and running the server
It is separated into two parts: installation and running the server
## Installation and Setup
#Ensure You have Java installed

View File

@@ -10,7 +10,7 @@ For Feedback, Issues, Contributions - please raise an issue here:
Main principles and benefits:
- more `pythonic` way of writing code
- write multiline prompts that wont break your code flow with indentation
- write multiline prompts that won't break your code flow with indentation
- making use of IDE in-built support for **hinting**, **type checking** and **popup with docs** to quickly peek in the function to see the prompt, parameters it consumes etc.
- leverage all the power of 🦜🔗 LangChain ecosystem
- adding support for **optional parameters**
@@ -31,7 +31,7 @@ def write_me_short_post(topic:str, platform:str="twitter", audience:str = "devel
"""
return
# run it naturaly
# run it naturally
write_me_short_post(topic="starwars")
# or
write_me_short_post(topic="starwars", platform="redit")
@@ -122,7 +122,7 @@ await write_me_short_post(topic="old movies")
# Simplified streaming
If we wan't to leverage streaming:
If we want to leverage streaming:
- we need to define prompt as async function
- turn on the streaming on the decorator, or we can define PromptType with streaming on
- capture the stream using StreamingContext
@@ -149,7 +149,7 @@ async def write_me_short_post(topic:str, platform:str="twitter", audience:str =
# just an arbitrary function to demonstrate the streaming... wil be some websockets code in the real world
# just an arbitrary function to demonstrate the streaming... will be some websockets code in the real world
tokens=[]
def capture_stream_func(new_token:str):
tokens.append(new_token)
@@ -250,7 +250,7 @@ the roles here are model native roles (assistant, user, system for chatGPT)
# Optional sections
- you can define a whole sections of your prompt that should be optional
- if any input in the section is missing, the whole section wont be rendered
- if any input in the section is missing, the whole section won't be rendered
the syntax for this is as follows:
@@ -273,7 +273,7 @@ def prompt_with_optional_partials():
# Output parsers
- llm_prompt decorator natively tries to detect the best output parser based on the output type. (if not set, it returns the raw string)
- list, dict and pydantic outputs are also supported natively (automaticaly)
- list, dict and pydantic outputs are also supported natively (automatically)
``` python
# this code example is complete and should run as it is

View File

@@ -28,4 +28,4 @@ To import this vectorstore:
from langchain.vectorstores import Marqo
```
For a more detailed walkthrough of the Marqo wrapper and some of its unique features, see [this notebook](../modules/data_connection/vectorstores/integrations/marqo.ipynb)
For a more detailed walkthrough of the Marqo wrapper and some of its unique features, see [this notebook](/docs/modules/data_connection/vectorstores/integrations/marqo.html)

View File

@@ -18,7 +18,7 @@ We also deliver with live demo on huggingface! Please checkout our [huggingface
## Installation and Setup
- Install the Python SDK with `pip install clickhouse-connect`
### Setting up envrionments
### Setting up environments
There are two ways to set up parameters for myscale index.

View File

@@ -39,7 +39,7 @@ vectara = Vectara(
```
The customer_id, corpus_id and api_key are optional, and if they are not supplied will be read from the environment variables `VECTARA_CUSTOMER_ID`, `VECTARA_CORPUS_ID` and `VECTARA_API_KEY`, respectively.
Afer you have the vectorstore, you can `add_texts` or `add_documents` as per the standard `VectorStore` interface, for example:
After you have the vectorstore, you can `add_texts` or `add_documents` as per the standard `VectorStore` interface, for example:
```python
vectara.add_texts(["to be or not to be", "that is the question"])

View File

@@ -1,6 +1,7 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
@@ -16,6 +17,7 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
@@ -28,10 +30,11 @@
"metadata": {},
"outputs": [],
"source": [
"!pip install langkit -q"
"%pip install langkit openai langchain"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
@@ -54,6 +57,7 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"tags": []
@@ -63,6 +67,7 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
@@ -125,16 +130,7 @@
" ]\n",
")\n",
"print(result)\n",
"# you don't need to call flush, this will occur periodically, but to demo let's not wait.\n",
"whylabs.flush()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# you don't need to call close to write profiles to WhyLabs, upload will occur periodically, but to demo let's not wait.\n",
"whylabs.close()"
]
}
@@ -155,7 +151,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.6"
"version": "3.8.10"
},
"vscode": {
"interpreter": {

View File

@@ -1,6 +1,6 @@
# YouTube
>[YouTube](https://www.youtube.com/) is an online video sharing and social media platform created by Google.
>[YouTube](https://www.youtube.com/) is an online video sharing and social media platform by Google.
> We download the `YouTube` transcripts and video information.
## Installation and Setup

View File

@@ -117,11 +117,11 @@
"\n",
"\n",
"# Initialize the language model\n",
"# You can add your own OpenAI API key by adding openai_api_key=\"<your_api_key>\" \n",
"# You can add your own OpenAI API key by adding openai_api_key=\"<your_api_key>\"\n",
"llm = ChatOpenAI(temperature=0, model=\"gpt-3.5-turbo-0613\")\n",
"\n",
"# Initialize the SerpAPIWrapper for search functionality\n",
"#Replace <your_api_key> in openai_api_key=\"<your_api_key>\" with your actual SerpAPI key.\n",
"# Replace <your_api_key> in openai_api_key=\"<your_api_key>\" with your actual SerpAPI key.\n",
"search = SerpAPIWrapper()\n",
"\n",
"# Define a list of tools offered by the agent\n",
@@ -130,7 +130,7 @@
" name=\"Search\",\n",
" func=search.run,\n",
" coroutine=search.arun,\n",
" description=\"Useful when you need to answer questions about current events. You should ask targeted questions.\"\n",
" description=\"Useful when you need to answer questions about current events. You should ask targeted questions.\",\n",
" ),\n",
"]"
]
@@ -143,8 +143,12 @@
},
"outputs": [],
"source": [
"functions_agent = initialize_agent(tools, llm, agent=AgentType.OPENAI_MULTI_FUNCTIONS, verbose=False)\n",
"conversations_agent = initialize_agent(tools, llm, agent=AgentType.CHAT_ZERO_SHOT_REACT_DESCRIPTION, verbose=False)"
"functions_agent = initialize_agent(\n",
" tools, llm, agent=AgentType.OPENAI_MULTI_FUNCTIONS, verbose=False\n",
")\n",
"conversations_agent = initialize_agent(\n",
" tools, llm, agent=AgentType.CHAT_ZERO_SHOT_REACT_DESCRIPTION, verbose=False\n",
")"
]
},
{
@@ -193,20 +197,20 @@
"\n",
"results = []\n",
"agents = [functions_agent, conversations_agent]\n",
"concurrency_level = 6 # How many concurrent agents to run. May need to decrease if OpenAI is rate limiting.\n",
"concurrency_level = 6 # How many concurrent agents to run. May need to decrease if OpenAI is rate limiting.\n",
"\n",
"# We will only run the first 20 examples of this dataset to speed things up\n",
"# This will lead to larger confidence intervals downstream.\n",
"batch = []\n",
"for example in tqdm(dataset[:20]):\n",
" batch.extend([agent.acall(example['inputs']) for agent in agents])\n",
" batch.extend([agent.acall(example[\"inputs\"]) for agent in agents])\n",
" if len(batch) >= concurrency_level:\n",
" batch_results = await asyncio.gather(*batch, return_exceptions=True)\n",
" results.extend(list(zip(*[iter(batch_results)]*2)))\n",
" results.extend(list(zip(*[iter(batch_results)] * 2)))\n",
" batch = []\n",
"if batch:\n",
" batch_results = await asyncio.gather(*batch, return_exceptions=True)\n",
" results.extend(list(zip(*[iter(batch_results)]*2)))"
" results.extend(list(zip(*[iter(batch_results)] * 2)))"
]
},
{
@@ -230,11 +234,12 @@
"source": [
"import random\n",
"\n",
"\n",
"def predict_preferences(dataset, results) -> list:\n",
" preferences = []\n",
"\n",
" for example, (res_a, res_b) in zip(dataset, results):\n",
" input_ = example['inputs']\n",
" input_ = example[\"inputs\"]\n",
" # Flip a coin to reduce persistent position bias\n",
" if random.random() < 0.5:\n",
" pred_a, pred_b = res_a, res_b\n",
@@ -243,16 +248,16 @@
" pred_a, pred_b = res_b, res_a\n",
" a, b = \"b\", \"a\"\n",
" eval_res = eval_chain.evaluate_string_pairs(\n",
" prediction=pred_a['output'] if isinstance(pred_a, dict) else str(pred_a),\n",
" prediction_b=pred_b['output'] if isinstance(pred_b, dict) else str(pred_b),\n",
" input=input_\n",
" prediction=pred_a[\"output\"] if isinstance(pred_a, dict) else str(pred_a),\n",
" prediction_b=pred_b[\"output\"] if isinstance(pred_b, dict) else str(pred_b),\n",
" input=input_,\n",
" )\n",
" if eval_res[\"value\"] == \"A\":\n",
" preferences.append(a)\n",
" elif eval_res[\"value\"] == \"B\":\n",
" preferences.append(b)\n",
" else:\n",
" preferences.append(None) # No preference\n",
" preferences.append(None) # No preference\n",
" return preferences"
]
},
@@ -298,10 +303,7 @@
" \"b\": \"Structured Chat Agent\",\n",
"}\n",
"counts = Counter(preferences)\n",
"pref_ratios = {\n",
" k: v/len(preferences) for k, v in\n",
" counts.items()\n",
"}\n",
"pref_ratios = {k: v / len(preferences) for k, v in counts.items()}\n",
"for k, v in pref_ratios.items():\n",
" print(f\"{name_map.get(k)}: {v:.2%}\")"
]
@@ -327,13 +329,16 @@
"source": [
"from math import sqrt\n",
"\n",
"def wilson_score_interval(preferences: list, which: str = \"a\", z: float = 1.96) -> tuple:\n",
"\n",
"def wilson_score_interval(\n",
" preferences: list, which: str = \"a\", z: float = 1.96\n",
") -> tuple:\n",
" \"\"\"Estimate the confidence interval using the Wilson score.\n",
" \n",
"\n",
" See: https://en.wikipedia.org/wiki/Binomial_proportion_confidence_interval#Wilson_score_interval\n",
" for more details, including when to use it and when it should not be used.\n",
" \"\"\"\n",
" total_preferences = preferences.count('a') + preferences.count('b')\n",
" total_preferences = preferences.count(\"a\") + preferences.count(\"b\")\n",
" n_s = preferences.count(which)\n",
"\n",
" if total_preferences == 0:\n",
@@ -342,8 +347,11 @@
" p_hat = n_s / total_preferences\n",
"\n",
" denominator = 1 + (z**2) / total_preferences\n",
" adjustment = (z / denominator) * sqrt(p_hat*(1-p_hat)/total_preferences + (z**2)/(4*total_preferences*total_preferences))\n",
" center = (p_hat + (z**2) / (2*total_preferences)) / denominator\n",
" adjustment = (z / denominator) * sqrt(\n",
" p_hat * (1 - p_hat) / total_preferences\n",
" + (z**2) / (4 * total_preferences * total_preferences)\n",
" )\n",
" center = (p_hat + (z**2) / (2 * total_preferences)) / denominator\n",
" lower_bound = min(max(center - adjustment, 0.0), 1.0)\n",
" upper_bound = min(max(center + adjustment, 0.0), 1.0)\n",
"\n",
@@ -369,7 +377,9 @@
"source": [
"for which_, name in name_map.items():\n",
" low, high = wilson_score_interval(preferences, which=which_)\n",
" print(f'The \"{name}\" would be preferred between {low:.2%} and {high:.2%} percent of the time (with 95% confidence).')"
" print(\n",
" f'The \"{name}\" would be preferred between {low:.2%} and {high:.2%} percent of the time (with 95% confidence).'\n",
" )"
]
},
{
@@ -398,13 +408,16 @@
],
"source": [
"from scipy import stats\n",
"\n",
"preferred_model = max(pref_ratios, key=pref_ratios.get)\n",
"successes = preferences.count(preferred_model)\n",
"n = len(preferences) - preferences.count(None)\n",
"p_value = stats.binom_test(successes, n, p=0.5, alternative='two-sided')\n",
"print(f\"\"\"The p-value is {p_value:.5f}. If the null hypothesis is true (i.e., if the selected eval chain actually has no preference between the models),\n",
"p_value = stats.binom_test(successes, n, p=0.5, alternative=\"two-sided\")\n",
"print(\n",
" f\"\"\"The p-value is {p_value:.5f}. If the null hypothesis is true (i.e., if the selected eval chain actually has no preference between the models),\n",
"then there is a {p_value:.5%} chance of observing the {name_map.get(preferred_model)} be preferred at least {successes}\n",
"times out of {n} trials.\"\"\")"
"times out of {n} trials.\"\"\"\n",
")"
]
},
{

View File

@@ -12,7 +12,7 @@
"The `CriteriaEvalChain` is a convenient way to predict whether an LLM or Chain's output complies with a set of criteria, so long as you can\n",
"describe those criteria in regular language. In this example, you will use the `CriteriaEvalChain` to check whether an output is concise.\n",
"\n",
"### Step 1: Create the Eval Chain\n",
"### Step 1: Load Eval Chain\n",
"\n",
"First, create the evaluation chain to predict whether outputs are \"concise\"."
]
@@ -27,11 +27,15 @@
"outputs": [],
"source": [
"from langchain.chat_models import ChatOpenAI\n",
"from langchain.evaluation.criteria import CriteriaEvalChain\n",
"from langchain.evaluation import load_evaluator, EvaluatorType\n",
"\n",
"llm = ChatOpenAI(temperature=0)\n",
"eval_llm = ChatOpenAI(model=\"gpt-4\", temperature=0)\n",
"criterion = \"conciseness\"\n",
"eval_chain = CriteriaEvalChain.from_llm(llm=llm, criteria=criterion)"
"eval_chain = load_evaluator(EvaluatorType.CRITERIA, llm=eval_llm, criteria=criterion)\n",
"\n",
"# Equivalent to:\n",
"# from langchain.evaluation import CriteriaEvalChain\n",
"# CriteriaEvalChain.from_llm(llm=eval_llm, criteria=criterion)"
]
},
{
@@ -54,7 +58,7 @@
"outputs": [],
"source": [
"llm = ChatOpenAI(temperature=0)\n",
"query=\"What's the origin of the term synecdoche?\"\n",
"query = \"What's the origin of the term synecdoche?\"\n",
"prediction = llm.predict(query)"
]
},
@@ -80,7 +84,7 @@
"name": "stdout",
"output_type": "stream",
"text": [
"{'reasoning': '1. Conciseness: The submission is concise and to the point. It directly answers the question without any unnecessary information. Therefore, the submission meets the criterion of conciseness.\\n\\nY', 'value': 'Y', 'score': 1}\n"
"{'reasoning': 'The criterion for this task is conciseness. The submission should be concise and to the point.\\n\\nLooking at the submission, it provides a detailed explanation of the origin of the term \"synecdoche\". It explains the Greek roots of the word and how it entered the English language. \\n\\nWhile the explanation is detailed, it is also concise. It doesn\\'t include unnecessary information or go off on tangents. It sticks to the point, which is explaining the origin of the term.\\n\\nTherefore, the submission meets the criterion of conciseness.\\n\\nY', 'value': 'Y', 'score': 1}\n"
]
}
],
@@ -89,40 +93,6 @@
"print(eval_result)"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "8c4ec9dd-6557-4f23-8480-c822eb6ec552",
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/plain": [
"['conciseness',\n",
" 'relevance',\n",
" 'correctness',\n",
" 'coherence',\n",
" 'harmfulness',\n",
" 'maliciousness',\n",
" 'helpfulness',\n",
" 'controversiality',\n",
" 'mysogyny',\n",
" 'criminality',\n",
" 'insensitive']"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# For a list of other default supported criteria, try calling `supported_default_criteria`\n",
"CriteriaEvalChain.get_supported_default_criteria()"
]
},
{
"cell_type": "markdown",
"id": "c40b1ac7-8f95-48ed-89a2-623bcc746461",
@@ -133,6 +103,24 @@
"Some criteria may be useful only when there are ground truth reference labels. You can pass these in as well."
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "0c41cd19",
"metadata": {},
"outputs": [],
"source": [
"eval_chain = load_evaluator(\n",
" EvaluatorType.LABELED_CRITERIA,\n",
" llm=eval_llm,\n",
" criteria=\"correctness\",\n",
")\n",
"\n",
"# Equivalent to\n",
"# from langchain.evaluation import LabeledCriteriaEvalChain\n",
"# LabeledCriteriaEvalChain.from_llm(llm=eval_llm, criteria=criterion)"
]
},
{
"cell_type": "code",
"execution_count": 5,
@@ -145,62 +133,18 @@
"name": "stdout",
"output_type": "stream",
"text": [
"With ground truth: 1\n",
"Withoutg ground truth: 0\n"
"With ground truth: 1\n"
]
}
],
"source": [
"eval_chain = CriteriaEvalChain.from_llm(llm=llm, criteria=\"correctness\", requires_reference=True)\n",
"\n",
"# We can even override the model's learned knowledge using ground truth labels\n",
"eval_result = eval_chain.evaluate_strings(\n",
" input=\"What is the capital of the US?\",\n",
" prediction=\"Topeka, KS\", \n",
" reference=\"The capital of the US is Topeka, KS, where it permanently moved from Washington D.C. on May 16, 2023\")\n",
"print(f'With ground truth: {eval_result[\"score\"]}')\n",
"\n",
"eval_chain = CriteriaEvalChain.from_llm(llm=llm, criteria=\"correctness\")\n",
"eval_result = eval_chain.evaluate_strings(\n",
" input=\"What is the capital of the US?\",\n",
" prediction=\"Topeka, KS\", \n",
" prediction=\"Topeka, KS\",\n",
" reference=\"The capital of the US is Topeka, KS, where it permanently moved from Washington D.C. on May 16, 2023\",\n",
")\n",
"print(f'Withoutg ground truth: {eval_result[\"score\"]}')"
]
},
{
"cell_type": "markdown",
"id": "2eb7dedb-913a-4d9e-b48a-9521425d1008",
"metadata": {
"tags": []
},
"source": [
"## Multiple Criteria\n",
"\n",
"To check whether an output complies with all of a list of default criteria, pass in a list! Be sure to only include criteria that are relevant to the provided information, and avoid mixing criteria that measure opposing things (e.g., harmfulness and helpfulness)"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "50c067f7-bc6e-4d6c-ba34-97a72023be27",
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'reasoning': 'Conciseness:\\n- The submission is one sentence long, which is concise.\\n- The submission directly answers the question without any unnecessary information.\\nConclusion: The submission meets the conciseness criterion.\\n\\nCoherence:\\n- The submission is well-structured and organized.\\n- The submission provides the origin of the term synecdoche and explains the meaning of the Greek words it comes from.\\n- The submission is coherent and easy to understand.\\nConclusion: The submission meets the coherence criterion.', 'value': 'Final conclusion: Y', 'score': None}\n"
]
}
],
"source": [
"criteria = [\"conciseness\", \"coherence\"]\n",
"eval_chain = CriteriaEvalChain.from_llm(llm=llm, criteria=criteria)\n",
"eval_result = eval_chain.evaluate_strings(prediction=prediction, input=query)\n",
"print(eval_result)"
"print(f'With ground truth: {eval_result[\"score\"]}')"
]
},
{
@@ -217,7 +161,7 @@
},
{
"cell_type": "code",
"execution_count": 7,
"execution_count": 6,
"id": "bafa0a11-2617-4663-84bf-24df7d0736be",
"metadata": {},
"outputs": [
@@ -225,58 +169,22 @@
"name": "stdout",
"output_type": "stream",
"text": [
"{'reasoning': '1. Criteria: numeric: Does the output contain numeric information?\\n- The submission does not contain any numeric information.\\n- Conclusion: The submission meets the criteria.', 'value': 'Answer: Y', 'score': None}\n"
"{'reasoning': 'The criterion is asking if the output contains numeric information. The submission does mention the \"late 16th century,\" which is a numeric information. Therefore, the submission meets the criterion.\\n\\nY', 'value': 'Y', 'score': 1}\n"
]
}
],
"source": [
"custom_criterion = {\n",
" \"numeric\": \"Does the output contain numeric information?\"\n",
"}\n",
"custom_criterion = {\"numeric\": \"Does the output contain numeric information?\"}\n",
"\n",
"eval_chain = CriteriaEvalChain.from_llm(llm=llm, criteria=custom_criterion)\n",
"eval_chain = load_evaluator(\n",
" EvaluatorType.CRITERIA,\n",
" llm=eval_llm,\n",
" criteria=custom_criterion,\n",
")\n",
"eval_result = eval_chain.evaluate_strings(prediction=prediction, input=query)\n",
"print(eval_result)"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "6db12a16-0058-4a14-8064-8528540963d8",
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Meets criteria: 1\n",
"Does not meet criteria: 0\n"
]
}
],
"source": [
"# You can specify multiple criteria in the dictionary. We recommend you keep the number criteria to a minimum, however for more reliable results.\n",
"\n",
"custom_criteria = {\n",
" \"complements-user\": \"Does the submission complements the question or the person writing the question in some way?\",\n",
" \"positive\": \"Does the submission maintain a positive sentiment throughout?\",\n",
" \"active voice\": \"Does the submission maintain an active voice throughout, avoiding state of being verbs?\",\n",
"}\n",
"\n",
"eval_chain = CriteriaEvalChain.from_llm(llm=llm, criteria=custom_criteria)\n",
"\n",
"# Example that complies\n",
"query = \"What's the population of lagos?\"\n",
"eval_result = eval_chain.evaluate_strings(prediction=\"I think that's a great question, you're really curious! About 30 million people live in Lagos, Nigeria, as of 2023.\", input=query)\n",
"print(\"Meets criteria: \", eval_result[\"score\"])\n",
"\n",
"# Example that does not comply\n",
"eval_result = eval_chain.evaluate_strings(prediction=\"The population of Lagos, Nigeria, is about 30 million people.\", input=query)\n",
"print(\"Does not meet criteria: \", eval_result[\"score\"])"
]
},
{
"cell_type": "markdown",
"id": "07485cce-8d52-43a0-bdad-76ec7dacfb51",
@@ -290,7 +198,7 @@
},
{
"cell_type": "code",
"execution_count": 9,
"execution_count": 7,
"id": "99e3c242-5b12-4bd5-b487-64990a159655",
"metadata": {},
"outputs": [
@@ -316,7 +224,7 @@
" ConstitutionalPrinciple(critique_request='Discuss the assistants last response to the human and whether it is in any way insensitive, sexist, racist, or socially inappropriate.', revision_request='Revise the assistants last response to remove content that is in any way insensitive, sexist, racist, or socially inappropriate.', name='insensitive'))]"
]
},
"execution_count": 9,
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
@@ -330,7 +238,7 @@
},
{
"cell_type": "code",
"execution_count": 10,
"execution_count": 8,
"id": "f4061bc0-a88a-4d1c-88a0-8003bba084c7",
"metadata": {
"tags": []
@@ -339,20 +247,25 @@
{
"data": {
"text/plain": [
"{'reasoning': 'harmful1: The submission includes a derogatory and insulting term (\"lilly-livered nincompoop\") to describe Will, which could be considered harmful and disrespectful. Therefore, the submission meets this criterion.\\nharmful2: The submission does not contain any explicit encouragement of dangerous or illegal activity, nor does it contain any explicit racism or sexism. However, the use of the term \"nincompoop\" could be seen as belittling and harmful to Will\\'s self-esteem. Therefore, the submission meets this criterion as well.\\n\\nY',\n",
" 'value': 'Y',\n",
" 'score': 1}"
"{'reasoning': 'The criterion asks to identify if the assistant\\'s response is harmful, unethical, racist, sexist, toxic, dangerous, or illegal.\\n\\nThe assistant\\'s response is \"I say that man is a lilly-livered nincompoop\". This is a negative statement about a person named Will. \\n\\nThe term \"lilly-livered\" is an old-fashioned term meaning cowardly, and \"nincompoop\" is a term meaning a foolish or stupid person. \\n\\nWhile this is a negative and potentially insulting statement, it does not appear to be harmful in a physical sense, unethical, racist, sexist, toxic, dangerous, or illegal. \\n\\nIt could be argued that the statement is harmful in a psychological or emotional sense, as it is insulting. However, without further context, it\\'s hard to definitively say if this meets the criteria of being harmful.\\n\\nN',\n",
" 'value': 'N',\n",
" 'score': 0}"
]
},
"execution_count": 10,
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"eval_chain = CriteriaEvalChain.from_llm(llm=llm, criteria=[PRINCIPLES[\"harmful1\"], PRINCIPLES[\"harmful2\"]])\n",
"eval_result = eval_chain.evaluate_strings(prediction=\"I say that man is a lilly-livered nincompoop\", input=\"What do you think of Will?\")\n",
"eval_result"
"eval_chain = load_evaluator(\n",
" EvaluatorType.CRITERIA, llm=eval_llm, criteria=PRINCIPLES[\"harmful1\"]\n",
")\n",
"eval_result = eval_chain.evaluate_strings(\n",
" prediction=\"I say that man is a lilly-livered nincompoop\",\n",
" input=\"What do you think of Will?\",\n",
")\n",
"print(eval_result)"
]
},
{
@@ -366,14 +279,6 @@
"\n",
"Remember when selecting criteria to decide whether they ought to require ground truth labels or not. Things like \"correctness\" are best evaluated with ground truth or with extensive context. Also, remember to pick aligned principles for a given chain so that the classification makes sense."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "415eb393-c64f-41f1-98de-de99e8e3597e",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
@@ -392,7 +297,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.3"
"version": "3.11.2"
}
},
"nbformat": 4,

View File

@@ -339,9 +339,9 @@
" agent_trajectory=test_outputs_one[\"intermediate_steps\"],\n",
" reference=(\n",
" \"You need many more than 100,000 ping-pong balls in the empire state building.\"\n",
" )\n",
" ),\n",
")\n",
" \n",
"\n",
"\n",
"print(\"Score from 1 to 5: \", evaluation[\"score\"])\n",
"print(\"Reasoning: \", evaluation[\"reasoning\"])"

View File

@@ -0,0 +1,567 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "1a4596ea-a631-416d-a2a4-3577c140493d",
"metadata": {
"tags": []
},
"source": [
"# LangSmith Walkthrough\n",
"\n",
"LangChain makes it easy to prototype LLM applications and Agents. However, delivering LLM applications to production can be deceptively difficult. You will likely have to heavily customize and iterate on your prompts, chains, and other components to create a high-quality product.\n",
"\n",
"To aid in this process, we've launched LangSmith, a unified platform for debugging, testing, and monitoring your LLM applications.\n",
"\n",
"When might this come in handy? You may find it useful when you want to:\n",
"\n",
"- Quickly debug a new chain, agent, or set of tools\n",
"- Visualize how components (chains, llms, retrievers, etc.) relate and are used\n",
"- Evaluate different prompts and LLMs for a single component\n",
"- Run a given chain several times over a dataset to ensure it consistently meets a quality bar\n",
"- Capture usage traces and using LLMs or analytics pipelines to generate insights"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "138fbb8f-960d-4d26-9dd5-6d6acab3ee55",
"metadata": {},
"source": [
"## Prerequisites\n",
"\n",
"**Run LangSmith locally with docker OR [create a LangSmith account](https://smith.langchain.com/) and connect with an API key.**\n",
"\n",
"Note that the hosted version of LangSmith is in gated beta; we're in the process of rolling it out to more users.\n",
"\n",
"To run LangSmith locally, execute the following comand in your terminal:\n",
"```\n",
"pip install --upgrade langsmith\n",
"langsmith start\n",
"```\n",
"\n",
"Now, let's get started!"
]
},
{
"cell_type": "markdown",
"id": "2d77d064-41b4-41fb-82e6-2d16461269ec",
"metadata": {
"tags": []
},
"source": [
"## Log Traces to LangSmith\n",
"\n",
"First, configure your environment variables to tell LangChain to log traces. This is done by setting the `LANGCHAIN_TRACING_V2` environment variable to true.\n",
"You can tell LangChain which project to log to by setting the `LANGCHAIN_PROJECT` environment variable. This will automatically create a debug project for you.\n",
"\n",
"For more information on other ways to set up tracing, please reference the [LangSmith documentation](https://docs.smith.langchain.com/docs/)\n",
"\n",
"**NOTE:** You must also set your `OPENAI_API_KEY` and `SERPAPI_API_KEY` environment variables in order to run the following tutorial.\n",
"\n",
"**NOTE:** You can optionally set the `LANGCHAIN_ENDPOINT` and `LANGCHAIN_API_KEY` environment variables if using the hosted version."
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "904db9a5-f387-4a57-914c-c8af8d39e249",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"import os\n",
"from uuid import uuid4\n",
"\n",
"unique_id = uuid4().hex[0:8]\n",
"os.environ[\"LANGCHAIN_TRACING_V2\"] = \"true\"\n",
"os.environ[\"LANGCHAIN_PROJECT\"] = f\"Tracing Walkthrough - {unique_id}\"\n",
"os.environ[\n",
" \"LANGCHAIN_ENDPOINT\"\n",
"] = \"\" # Update to \"https://api.smith.langchain.com\" to use the hosted version.\n",
"os.environ[\n",
" \"LANGCHAIN_API_KEY\"\n",
"] = \"\" # Update to your API key to use the hosted version.\n",
"\n",
"# Used by the agent in this tutorial\n",
"# os.environ[\"OPENAI_API_KEY\"] = \"<YOUR-OPENAI-API-KEY>\"\n",
"# os.environ[\"SERPAPI_API_KEY\"] = \"<YOUR-SERPAPI-API-KEY>\""
]
},
{
"cell_type": "markdown",
"id": "8ee7f34b-b65c-4e09-ad52-e3ace78d0221",
"metadata": {
"tags": []
},
"source": [
"Create the langsmith client to interact with the API"
]
},
{
"cell_type": "code",
"execution_count": 20,
"id": "510b5ca0",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"from langsmith import Client\n",
"\n",
"client = Client()"
]
},
{
"cell_type": "markdown",
"id": "ca27fa11-ddce-4af0-971e-c5c37d5b92ef",
"metadata": {},
"source": [
"Now, start prototyping your agent. We will use a math example using an older ReACT-style agent."
]
},
{
"cell_type": "code",
"execution_count": 21,
"id": "7c801853-8e96-404d-984c-51ace59cbbef",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"from langchain.chat_models import ChatOpenAI\n",
"from langchain.agents import AgentType, initialize_agent, load_tools\n",
"\n",
"llm = ChatOpenAI(temperature=0)\n",
"tools = load_tools([\"serpapi\", \"llm-math\"], llm=llm)\n",
"agent = initialize_agent(\n",
" tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=False\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 23,
"id": "19537902-b95c-4390-80a4-f6c9a937081e",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"import asyncio\n",
"\n",
"inputs = [\n",
" \"How many people live in canada as of 2023?\",\n",
" \"who is dua lipa's boyfriend? what is his age raised to the .43 power?\",\n",
" \"what is dua lipa's boyfriend age raised to the .43 power?\",\n",
" \"how far is it from paris to boston in miles\",\n",
" \"what was the total number of points scored in the 2023 super bowl? what is that number raised to the .23 power?\",\n",
" \"what was the total number of points scored in the 2023 super bowl raised to the .23 power?\",\n",
" \"how many more points were scored in the 2023 super bowl than in the 2022 super bowl?\",\n",
" \"what is 153 raised to .1312 power?\",\n",
" \"who is kendall jenner's boyfriend? what is his height (in inches) raised to .13 power?\",\n",
" \"what is 1213 divided by 4345?\",\n",
"]\n",
"results = []\n",
"\n",
"\n",
"async def arun(agent, input_example):\n",
" try:\n",
" return await agent.arun(input_example)\n",
" except Exception as e:\n",
" # The agent sometimes makes mistakes! These will be captured by the tracing.\n",
" return e\n",
"\n",
"\n",
"for input_example in inputs:\n",
" results.append(arun(agent, input_example))\n",
"results = await asyncio.gather(*results)"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "0405ff30-21fe-413d-85cf-9fa3c649efec",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"from langchain.callbacks.tracers.langchain import wait_for_all_tracers\n",
"\n",
"# Logs are submitted in a background thread to avoid blocking execution.\n",
"# For the sake of this tutorial, we want to make sure\n",
"# they've been submitted before moving on. This is also\n",
"# useful for serverless deployments.\n",
"wait_for_all_tracers()"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "9decb964-be07-4b6c-9802-9825c8be7b64",
"metadata": {},
"source": [
"Assuming you've successfully configured the server earlier, your agent traces should show up in your web app.\n",
"\n",
"Navigate to the web app to see the results: [local app](http://localhost:80) or [hosted app](https://smith.langchain.com/)"
]
},
{
"cell_type": "markdown",
"id": "6c43c311-4e09-4d57-9ef3-13afb96ff430",
"metadata": {},
"source": [
"## Evaluate a New Agent\n",
"\n",
"Once you've debugged a customized your LLM component, you will want to create tests and benchmark evaluations to measure its performance before putting it into a production environment.\n",
"\n",
"In this notebook, you will run evaluators to test an agent. You will do so in a few steps:\n",
"\n",
"1. Create a dataset\n",
"2. Select or create evaluators to measure performance\n",
"3. Define the LLM or Chain initializer to test\n",
"4. Run the chain and evaluators using the helper functions"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "beab1a29-b79d-4a99-b5b1-0870c2d772b1",
"metadata": {},
"source": [
"### 1. Create Dataset\n",
"\n",
"Below, use the client to create a dataset from the Agent runs you just logged while debugging above. You will use these later to measure performance.\n",
"\n",
"For more information on datasets, including how to create them from CSVs or other files or how to create them in the web app, please refer to the [LangSmith documentation](https://docs.smith.langchain.com/)."
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "17580c4b-bd04-4dde-9d21-9d4edd25b00d",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"dataset_name = f\"calculator-example-dataset-{unique_id}\"\n",
"\n",
"dataset = client.create_dataset(\n",
" dataset_name, description=\"A calculator example dataset\"\n",
")\n",
"\n",
"runs = client.list_runs(\n",
" project_name=os.environ[\"LANGCHAIN_PROJECT\"],\n",
" execution_order=1, # Only return the top-level runs\n",
" error=False, # Only runs that succeed\n",
")\n",
"for run in runs:\n",
" client.create_example(inputs=run.inputs, outputs=run.outputs, dataset_id=dataset.id)"
]
},
{
"cell_type": "markdown",
"id": "8adfd29c-b258-49e5-94b4-74597a12ba16",
"metadata": {
"tags": []
},
"source": [
"### 2. Define the Agent or LLM to Test\n",
"\n",
"You can evaluate any LLM, chain, or agent. Since chains can have memory, we will pass in a `chain_factory` (aka a `constructor` ) function to initialize for each call.\n",
"\n",
"In this case, you will test an agent that uses OpenAI's function calling endpoints, but it can be any simple chain."
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "f42d8ecc-d46a-448b-a89c-04b0f6907f75",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"from langchain.chat_models import ChatOpenAI\n",
"from langchain.agents import AgentType, initialize_agent, load_tools\n",
"\n",
"llm = ChatOpenAI(model=\"gpt-3.5-turbo-0613\", temperature=0)\n",
"tools = load_tools([\"serpapi\", \"llm-math\"], llm=llm)\n",
"\n",
"\n",
"# Since chains can be stateful (e.g. they can have memory), we provide\n",
"# a way to initialize a new chain for each row in the dataset. This is done\n",
"# by passing in a factory function that returns a new chain for each row.\n",
"def agent_factory():\n",
" return initialize_agent(tools, llm, agent=AgentType.OPENAI_FUNCTIONS, verbose=False)\n",
"\n",
"\n",
"# If your chain is NOT stateful, your factory can return the object directly\n",
"# to improve runtime performance. For example:\n",
"# chain_factory = lambda: agent"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "9cb9ef53",
"metadata": {},
"source": [
"### 3. Configure Evaluation\n",
"\n",
"Manually comparing the results of chains in the UI is effective, but it can be time consuming.\n",
"It can be helpful to use automated metrics and ai-assisted feedback to evaluate your component's performance.\n",
"\n",
"Below, we will create some pre-implemented run evaluators that do the following:\n",
"- Compare results against ground truth labels. (You used the debug outputs above for this)\n",
"- Measure semantic (dis)similarity using embedding distance\n",
"- Evaluate 'aspects' of the agent's response in a reference-free manner using custom criteria\n",
"\n",
"For a longer discussion of how to select an appropriate evaluator for your use case and how to create your own\n",
"custom evaluators, please refer to the [LangSmith documentation](https://docs.smith.langchain.com/).\n"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "a25dc281",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"from langchain.evaluation import EvaluatorType\n",
"from langchain.smith import RunEvalConfig\n",
"\n",
"evaluation_config = RunEvalConfig(\n",
" # Evaluators can either be an evaluator type (e.g., \"qa\", \"criteria\", \"embedding_distance\", etc.) or a configuration for that evaluator\n",
" evaluators=[\n",
" # Measures whether a QA response is \"Correct\", based on a reference answer\n",
" # You can also select via the raw string \"qa\"\n",
" EvaluatorType.QA,\n",
" # Measure the embedding distance between the output and the reference answer\n",
" # Equivalent to: EvalConfig.EmbeddingDistance(embeddings=OpenAIEmbeddings())\n",
" EvaluatorType.EMBEDDING_DISTANCE,\n",
" # Grade whether the output satisfies the stated criteria. You can select a default one such as \"helpfulness\" or provide your own.\n",
" RunEvalConfig.LabeledCriteria(\"helpfulness\"),\n",
" # Both the Criteria and LabeledCriteria evaluators can be configured with a dictionary of custom criteria.\n",
" RunEvalConfig.Criteria(\n",
" {\n",
" \"fifth-grader-score\": \"Do you have to be smarter than a fifth grader to answer this question?\"\n",
" }\n",
" ),\n",
" ],\n",
" # You can add custom StringEvaluator or RunEvaluator objects here as well, which will automatically be\n",
" # applied to each prediction. Check out the docs for examples.\n",
" custom_evaluators=[],\n",
")"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "07885b10",
"metadata": {
"tags": []
},
"source": [
"### 4. Run the Agent and Evaluators\n",
"\n",
"Use the [arun_on_dataset](https://api.python.langchain.com/en/latest/smith/langchain.smith.evaluation.runner_utils.arun_on_dataset.html#langchain.smith.evaluation.runner_utils.arun_on_dataset) (or synchronous [run_on_dataset](https://api.python.langchain.com/en/latest/smith/langchain.smith.evaluation.runner_utils.run_on_dataset.html#langchain.smith.evaluation.runner_utils.run_on_dataset)) function to evaluate your model. This will:\n",
"1. Fetch example rows from the specified dataset\n",
"2. Run your llm or chain on each example.\n",
"3. Apply evalutors to the resulting run traces and corresponding reference examples to generate automated feedback.\n",
"\n",
"The results will be visible in the LangSmith app."
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "3733269b-8085-4644-9d5d-baedcff13a2f",
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Processed examples: 1\r"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"Chain failed for example 890fac1b-9788-4545-a952-c8f569f21a13. Error: LLMMathChain._evaluate(\"\n",
"age_of_Dua_Lipa_boyfriend ** 0.43\n",
"\") raised error: 'age_of_Dua_Lipa_boyfriend'. Please try again with a valid numerical expression\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Processed examples: 6\r"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"Chain failed for example 614a5986-f9de-495e-adcf-a2a4bcfe68b6. Error: Too many arguments to single-input tool Calculator. Args: ['height ^ 0.13', {'height': 68}]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Processed examples: 9\r"
]
}
],
"source": [
"from langchain.smith import (\n",
" arun_on_dataset,\n",
" run_on_dataset, # Available if your chain doesn't support async calls.\n",
")\n",
"\n",
"chain_results = await arun_on_dataset(\n",
" client=client,\n",
" dataset_name=dataset_name,\n",
" llm_or_chain_factory=agent_factory,\n",
" evaluation=evaluation_config,\n",
" verbose=True,\n",
" tags=[\"testing-notebook\"], # Optional, adds a tag to the resulting chain runs\n",
")\n",
"\n",
"# Sometimes, the agent will error due to parsing issues, incompatible tool inputs, etc.\n",
"# These are logged as warnings here and captured as errors in the tracing UI."
]
},
{
"cell_type": "markdown",
"id": "cdacd159-eb4d-49e9-bb2a-c55322c40ed4",
"metadata": {
"tags": []
},
"source": [
"### Review the Test Results\n",
"\n",
"You can review the test results tracing UI below by navigating to the \"Datasets & Testing\" page and selecting the **\"calculator-example-dataset-*\"** dataset and associated test project.\n",
"\n",
"This will show the new runs and the feedback logged from the selected evaluators."
]
},
{
"cell_type": "markdown",
"id": "591c819e-9932-45cf-adab-63727dd49559",
"metadata": {},
"source": [
"## Exporting Datasets and Runs\n",
"\n",
"LangSmith lets you export data to common formats such as CSV or JSONL directly in the web app. You can also use the client to fetch runs for further analysis, to store in your own database, or to share with others. Let's fetch the run traces from the evaluation run."
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "33bfefde-d1bb-4f50-9f7a-fd572ee76820",
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/plain": [
"Run(id=UUID('eb71a98c-660b-45e4-904e-e1567fdec145'), name='AgentExecutor', start_time=datetime.datetime(2023, 7, 13, 8, 23, 35, 102907), run_type=<RunTypeEnum.chain: 'chain'>, end_time=datetime.datetime(2023, 7, 13, 8, 23, 37, 793962), extra={'runtime': {'library': 'langchain', 'runtime': 'python', 'platform': 'macOS-13.4.1-arm64-arm-64bit', 'sdk_version': '0.0.5', 'library_version': '0.0.231', 'runtime_version': '3.11.2'}, 'total_tokens': 512, 'prompt_tokens': 451, 'completion_tokens': 61}, error=None, serialized=None, events=[{'name': 'start', 'time': '2023-07-13T08:23:35.102907'}, {'name': 'end', 'time': '2023-07-13T08:23:37.793962'}], inputs={'input': 'what is 1213 divided by 4345?'}, outputs={'output': '1213 divided by 4345 is approximately 0.2792.'}, reference_example_id=UUID('d343add7-2631-417b-905a-dc39361ace69'), parent_run_id=None, tags=['openai-functions', 'testing-notebook'], execution_order=1, session_id=UUID('cc5f4f88-f1bf-495f-8adb-384f66321eb2'), child_run_ids=[UUID('daa9708a-ad08-4be1-9841-e92e2f384cce'), UUID('28b1ada7-3fe8-4853-a5b0-dac8a93a3066'), UUID('dc0b4867-3f3d-46f7-bfb5-f4be10f3cc52'), UUID('58c9494e-2ea6-4291-ab78-73b8ffcdaef5'), UUID('8f5a3e08-ce96-4c81-a6aa-86bf5b3bb590'), UUID('f0447532-7ded-45b6-9d87-f1fa18e381b0')], child_runs=None, feedback_stats={'correctness': {'n': 1, 'avg': 1.0, 'mode': 1}, 'helpfulness': {'n': 1, 'avg': 1.0, 'mode': 1}, 'fifth-grader-score': {'n': 1, 'avg': 0.0, 'mode': 0}, 'embedding_cosine_distance': {'n': 1, 'avg': 0.144522385071361, 'mode': 0.144522385071361}})"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"runs = list(client.list_runs(dataset_name=dataset_name))\n",
"runs[0]"
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "6595c888-1f5c-4ae3-9390-0a559f5575d1",
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/plain": [
"{'correctness': {'n': 7, 'avg': 0.7142857142857143, 'mode': 1},\n",
" 'helpfulness': {'n': 7, 'avg': 1.0, 'mode': 1},\n",
" 'fifth-grader-score': {'n': 7, 'avg': 0.7142857142857143, 'mode': 1},\n",
" 'embedding_cosine_distance': {'n': 7,\n",
" 'avg': 0.08308464442094905,\n",
" 'mode': 0.00371031210788608}}"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"client.read_project(project_id=runs[0].session_id).feedback_stats"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "2646f0fb-81d4-43ce-8a9b-54b8e19841e2",
"metadata": {
"tags": []
},
"source": [
"## Conclusion\n",
"\n",
"Congratulations! You have succesfully traced and evaluated an agent using LangSmith!\n",
"\n",
"This was a quick guide to get started, but there are many more ways to use LangSmith to speed up your developer flow and produce better results.\n",
"\n",
"For more information on how you can get the most out of LangSmith, check out [LangSmith documentation](https://docs.smith.langchain.com/), and please reach out with questions, feature requests, or feedback at [support@langchain.dev](mailto:support@langchain.dev)."
]
},
{
"cell_type": "markdown",
"id": "57237f12",
"metadata": {},
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.9"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -16,7 +16,7 @@
},
{
"cell_type": "code",
"execution_count": 2,
"execution_count": 1,
"id": "c0a83623",
"metadata": {},
"outputs": [],
@@ -38,6 +38,20 @@
">This initializes the SerpAPIWrapper for search functionality (search).\n"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "a2b0a215",
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"\n",
"os.environ[\n",
" \"SERPAPI_API_KEY\"\n",
"] = \"897780527132b5f31d8d73c40c820d5ef2c2279687efa69f413a61f752027747\""
]
},
{
"cell_type": "code",
"execution_count": 3,
@@ -46,11 +60,11 @@
"outputs": [],
"source": [
"# Initialize the OpenAI language model\n",
"#Replace <your_api_key> in openai_api_key=\"<your_api_key>\" with your actual OpenAI key.\n",
"# Replace <your_api_key> in openai_api_key=\"<your_api_key>\" with your actual OpenAI key.\n",
"llm = ChatOpenAI(temperature=0, model=\"gpt-3.5-turbo-0613\")\n",
"\n",
"# Initialize the SerpAPIWrapper for search functionality\n",
"#Replace <your_api_key> in openai_api_key=\"<your_api_key>\" with your actual SerpAPI key.\n",
"# Replace <your_api_key> in openai_api_key=\"<your_api_key>\" with your actual SerpAPI key.\n",
"search = SerpAPIWrapper()\n",
"\n",
"# Define a list of tools offered by the agent\n",
@@ -58,9 +72,9 @@
" Tool(\n",
" name=\"Search\",\n",
" func=search.run,\n",
" description=\"Useful when you need to answer questions about current events. You should ask targeted questions.\"\n",
" description=\"Useful when you need to answer questions about current events. You should ask targeted questions.\",\n",
" ),\n",
"]\n"
"]"
]
},
{
@@ -70,7 +84,9 @@
"metadata": {},
"outputs": [],
"source": [
"mrkl = initialize_agent(tools, llm, agent=AgentType.OPENAI_MULTI_FUNCTIONS, verbose=True)"
"mrkl = initialize_agent(\n",
" tools, llm, agent=AgentType.OPENAI_MULTI_FUNCTIONS, verbose=True\n",
")"
]
},
{
@@ -82,6 +98,7 @@
"source": [
"# Do this so we can see exactly what's going on under the hood\n",
"import langchain\n",
"\n",
"langchain.debug = True"
]
},
@@ -194,15 +211,223 @@
}
],
"source": [
"mrkl.run(\n",
" \"What is the weather in LA and SF?\"\n",
"mrkl.run(\"What is the weather in LA and SF?\")"
]
},
{
"cell_type": "markdown",
"id": "d31d4c09",
"metadata": {},
"source": [
"## Configuring max iteration behavior\n",
"\n",
"To make sure that our agent doesn't get stuck in excessively long loops, we can set max_iterations. We can also set an early stopping method, which will determine our agent's behavior once the number of max iterations is hit. By default, the early stopping uses method `force` which just returns that constant string. Alternatively, you could specify method `generate` which then does one FINAL pass through the LLM to generate an output."
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "9f5f6743",
"metadata": {},
"outputs": [],
"source": [
"mrkl = initialize_agent(\n",
" tools,\n",
" llm,\n",
" agent=AgentType.OPENAI_FUNCTIONS,\n",
" verbose=True,\n",
" max_iterations=2,\n",
" early_stopping_method=\"generate\",\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "4362ebc7",
"metadata": {
"scrolled": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\u001b[32;1m\u001b[1;3m[chain/start]\u001b[0m \u001b[1m[1:chain:AgentExecutor] Entering Chain run with input:\n",
"\u001b[0m{\n",
" \"input\": \"What is the weather in NYC today, yesterday, and the day before?\"\n",
"}\n",
"\u001b[32;1m\u001b[1;3m[llm/start]\u001b[0m \u001b[1m[1:chain:AgentExecutor > 2:llm:ChatOpenAI] Entering LLM run with input:\n",
"\u001b[0m{\n",
" \"prompts\": [\n",
" \"System: You are a helpful AI assistant.\\nHuman: What is the weather in NYC today, yesterday, and the day before?\"\n",
" ]\n",
"}\n",
"\u001b[36;1m\u001b[1;3m[llm/end]\u001b[0m \u001b[1m[1:chain:AgentExecutor > 2:llm:ChatOpenAI] [1.27s] Exiting LLM run with output:\n",
"\u001b[0m{\n",
" \"generations\": [\n",
" [\n",
" {\n",
" \"text\": \"\",\n",
" \"generation_info\": null,\n",
" \"message\": {\n",
" \"lc\": 1,\n",
" \"type\": \"constructor\",\n",
" \"id\": [\n",
" \"langchain\",\n",
" \"schema\",\n",
" \"messages\",\n",
" \"AIMessage\"\n",
" ],\n",
" \"kwargs\": {\n",
" \"content\": \"\",\n",
" \"additional_kwargs\": {\n",
" \"function_call\": {\n",
" \"name\": \"Search\",\n",
" \"arguments\": \"{\\n \\\"query\\\": \\\"weather in NYC today\\\"\\n}\"\n",
" }\n",
" }\n",
" }\n",
" }\n",
" }\n",
" ]\n",
" ],\n",
" \"llm_output\": {\n",
" \"token_usage\": {\n",
" \"prompt_tokens\": 79,\n",
" \"completion_tokens\": 17,\n",
" \"total_tokens\": 96\n",
" },\n",
" \"model_name\": \"gpt-3.5-turbo-0613\"\n",
" },\n",
" \"run\": null\n",
"}\n",
"\u001b[32;1m\u001b[1;3m[tool/start]\u001b[0m \u001b[1m[1:chain:AgentExecutor > 3:tool:Search] Entering Tool run with input:\n",
"\u001b[0m\"{'query': 'weather in NYC today'}\"\n",
"\u001b[36;1m\u001b[1;3m[tool/end]\u001b[0m \u001b[1m[1:chain:AgentExecutor > 3:tool:Search] [3.84s] Exiting Tool run with output:\n",
"\u001b[0m\"10:00 am · Feels Like85° · WindSE 4 mph · Humidity78% · UV Index3 of 11 · Cloud Cover81% · Rain Amount0 in ...\"\n",
"\u001b[32;1m\u001b[1;3m[llm/start]\u001b[0m \u001b[1m[1:chain:AgentExecutor > 4:llm:ChatOpenAI] Entering LLM run with input:\n",
"\u001b[0m{\n",
" \"prompts\": [\n",
" \"System: You are a helpful AI assistant.\\nHuman: What is the weather in NYC today, yesterday, and the day before?\\nAI: {'name': 'Search', 'arguments': '{\\\\n \\\"query\\\": \\\"weather in NYC today\\\"\\\\n}'}\\nFunction: 10:00 am · Feels Like85° · WindSE 4 mph · Humidity78% · UV Index3 of 11 · Cloud Cover81% · Rain Amount0 in ...\"\n",
" ]\n",
"}\n",
"\u001b[36;1m\u001b[1;3m[llm/end]\u001b[0m \u001b[1m[1:chain:AgentExecutor > 4:llm:ChatOpenAI] [1.24s] Exiting LLM run with output:\n",
"\u001b[0m{\n",
" \"generations\": [\n",
" [\n",
" {\n",
" \"text\": \"\",\n",
" \"generation_info\": null,\n",
" \"message\": {\n",
" \"lc\": 1,\n",
" \"type\": \"constructor\",\n",
" \"id\": [\n",
" \"langchain\",\n",
" \"schema\",\n",
" \"messages\",\n",
" \"AIMessage\"\n",
" ],\n",
" \"kwargs\": {\n",
" \"content\": \"\",\n",
" \"additional_kwargs\": {\n",
" \"function_call\": {\n",
" \"name\": \"Search\",\n",
" \"arguments\": \"{\\n \\\"query\\\": \\\"weather in NYC yesterday\\\"\\n}\"\n",
" }\n",
" }\n",
" }\n",
" }\n",
" }\n",
" ]\n",
" ],\n",
" \"llm_output\": {\n",
" \"token_usage\": {\n",
" \"prompt_tokens\": 142,\n",
" \"completion_tokens\": 17,\n",
" \"total_tokens\": 159\n",
" },\n",
" \"model_name\": \"gpt-3.5-turbo-0613\"\n",
" },\n",
" \"run\": null\n",
"}\n",
"\u001b[32;1m\u001b[1;3m[tool/start]\u001b[0m \u001b[1m[1:chain:AgentExecutor > 5:tool:Search] Entering Tool run with input:\n",
"\u001b[0m\"{'query': 'weather in NYC yesterday'}\"\n",
"\u001b[36;1m\u001b[1;3m[tool/end]\u001b[0m \u001b[1m[1:chain:AgentExecutor > 5:tool:Search] [1.15s] Exiting Tool run with output:\n",
"\u001b[0m\"New York Temperature Yesterday. Maximum temperature yesterday: 81 °F (at 1:51 pm) Minimum temperature yesterday: 72 °F (at 7:17 pm) Average temperature ...\"\n",
"\u001b[32;1m\u001b[1;3m[llm/start]\u001b[0m \u001b[1m[1:llm:ChatOpenAI] Entering LLM run with input:\n",
"\u001b[0m{\n",
" \"prompts\": [\n",
" \"System: You are a helpful AI assistant.\\nHuman: What is the weather in NYC today, yesterday, and the day before?\\nAI: {'name': 'Search', 'arguments': '{\\\\n \\\"query\\\": \\\"weather in NYC today\\\"\\\\n}'}\\nFunction: 10:00 am · Feels Like85° · WindSE 4 mph · Humidity78% · UV Index3 of 11 · Cloud Cover81% · Rain Amount0 in ...\\nAI: {'name': 'Search', 'arguments': '{\\\\n \\\"query\\\": \\\"weather in NYC yesterday\\\"\\\\n}'}\\nFunction: New York Temperature Yesterday. Maximum temperature yesterday: 81 °F (at 1:51 pm) Minimum temperature yesterday: 72 °F (at 7:17 pm) Average temperature ...\"\n",
" ]\n",
"}\n",
"\u001b[36;1m\u001b[1;3m[llm/end]\u001b[0m \u001b[1m[1:llm:ChatOpenAI] [2.68s] Exiting LLM run with output:\n",
"\u001b[0m{\n",
" \"generations\": [\n",
" [\n",
" {\n",
" \"text\": \"Today in NYC, the weather is currently 85°F with a southeast wind of 4 mph. The humidity is at 78% and there is 81% cloud cover. There is no rain expected today.\\n\\nYesterday in NYC, the maximum temperature was 81°F at 1:51 pm, and the minimum temperature was 72°F at 7:17 pm.\\n\\nFor the day before yesterday, I do not have the specific weather information.\",\n",
" \"generation_info\": null,\n",
" \"message\": {\n",
" \"lc\": 1,\n",
" \"type\": \"constructor\",\n",
" \"id\": [\n",
" \"langchain\",\n",
" \"schema\",\n",
" \"messages\",\n",
" \"AIMessage\"\n",
" ],\n",
" \"kwargs\": {\n",
" \"content\": \"Today in NYC, the weather is currently 85°F with a southeast wind of 4 mph. The humidity is at 78% and there is 81% cloud cover. There is no rain expected today.\\n\\nYesterday in NYC, the maximum temperature was 81°F at 1:51 pm, and the minimum temperature was 72°F at 7:17 pm.\\n\\nFor the day before yesterday, I do not have the specific weather information.\",\n",
" \"additional_kwargs\": {}\n",
" }\n",
" }\n",
" }\n",
" ]\n",
" ],\n",
" \"llm_output\": {\n",
" \"token_usage\": {\n",
" \"prompt_tokens\": 160,\n",
" \"completion_tokens\": 91,\n",
" \"total_tokens\": 251\n",
" },\n",
" \"model_name\": \"gpt-3.5-turbo-0613\"\n",
" },\n",
" \"run\": null\n",
"}\n",
"\u001b[36;1m\u001b[1;3m[chain/end]\u001b[0m \u001b[1m[1:chain:AgentExecutor] [10.18s] Exiting Chain run with output:\n",
"\u001b[0m{\n",
" \"output\": \"Today in NYC, the weather is currently 85°F with a southeast wind of 4 mph. The humidity is at 78% and there is 81% cloud cover. There is no rain expected today.\\n\\nYesterday in NYC, the maximum temperature was 81°F at 1:51 pm, and the minimum temperature was 72°F at 7:17 pm.\\n\\nFor the day before yesterday, I do not have the specific weather information.\"\n",
"}\n"
]
},
{
"data": {
"text/plain": [
"'Today in NYC, the weather is currently 85°F with a southeast wind of 4 mph. The humidity is at 78% and there is 81% cloud cover. There is no rain expected today.\\n\\nYesterday in NYC, the maximum temperature was 81°F at 1:51 pm, and the minimum temperature was 72°F at 7:17 pm.\\n\\nFor the day before yesterday, I do not have the specific weather information.'"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"mrkl.run(\"What is the weather in NYC today, yesterday, and the day before?\")"
]
},
{
"cell_type": "markdown",
"id": "067a8d3e",
"metadata": {},
"source": [
"Notice that we never get around to looking up the weather the day before yesterday, due to hitting our max_iterations limit."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9f5f6743",
"id": "c3318a11",
"metadata": {},
"outputs": [],
"source": []
@@ -210,9 +435,9 @@
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"display_name": "venv",
"language": "python",
"name": "python3"
"name": "venv"
},
"language_info": {
"codemirror_mode": {
@@ -224,7 +449,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
"version": "3.11.3"
}
},
"nbformat": 4,

View File

@@ -78,6 +78,7 @@
"source": [
"from langchain.prompts import MessagesPlaceholder\n",
"from langchain.memory import ConversationBufferMemory\n",
"\n",
"agent_kwargs = {\n",
" \"extra_prompt_messages\": [MessagesPlaceholder(variable_name=\"memory\")],\n",
"}\n",
@@ -92,12 +93,12 @@
"outputs": [],
"source": [
"agent = initialize_agent(\n",
" tools, \n",
" llm, \n",
" agent=AgentType.OPENAI_FUNCTIONS, \n",
" verbose=True, \n",
" agent_kwargs=agent_kwargs, \n",
" memory=memory\n",
" tools,\n",
" llm,\n",
" agent=AgentType.OPENAI_FUNCTIONS,\n",
" verbose=True,\n",
" agent_kwargs=agent_kwargs,\n",
" memory=memory,\n",
")"
]
},

View File

@@ -42,15 +42,14 @@
"import yfinance as yf\n",
"from datetime import datetime, timedelta\n",
"\n",
"\n",
"def get_current_stock_price(ticker):\n",
" \"\"\"Method to get current stock price\"\"\"\n",
"\n",
" ticker_data = yf.Ticker(ticker)\n",
" recent = ticker_data.history(period='1d')\n",
" return {\n",
" 'price': recent.iloc[0]['Close'],\n",
" 'currency': ticker_data.info['currency']\n",
" }\n",
" recent = ticker_data.history(period=\"1d\")\n",
" return {\"price\": recent.iloc[0][\"Close\"], \"currency\": ticker_data.info[\"currency\"]}\n",
"\n",
"\n",
"def get_stock_performance(ticker, days):\n",
" \"\"\"Method to get stock price change in percentage\"\"\"\n",
@@ -58,11 +57,9 @@
" past_date = datetime.today() - timedelta(days=days)\n",
" ticker_data = yf.Ticker(ticker)\n",
" history = ticker_data.history(start=past_date)\n",
" old_price = history.iloc[0]['Close']\n",
" current_price = history.iloc[-1]['Close']\n",
" return {\n",
" 'percent_change': ((current_price - old_price)/old_price)*100\n",
" }"
" old_price = history.iloc[0][\"Close\"]\n",
" current_price = history.iloc[-1][\"Close\"]\n",
" return {\"percent_change\": ((current_price - old_price) / old_price) * 100}"
]
},
{
@@ -88,7 +85,7 @@
}
],
"source": [
"get_current_stock_price('MSFT')"
"get_current_stock_price(\"MSFT\")"
]
},
{
@@ -114,7 +111,7 @@
}
],
"source": [
"get_stock_performance('MSFT', 30)"
"get_stock_performance(\"MSFT\", 30)"
]
},
{
@@ -138,10 +135,13 @@
"from pydantic import BaseModel, Field\n",
"from langchain.tools import BaseTool\n",
"\n",
"\n",
"class CurrentStockPriceInput(BaseModel):\n",
" \"\"\"Inputs for get_current_stock_price\"\"\"\n",
"\n",
" ticker: str = Field(description=\"Ticker symbol of the stock\")\n",
"\n",
"\n",
"class CurrentStockPriceTool(BaseTool):\n",
" name = \"get_current_stock_price\"\n",
" description = \"\"\"\n",
@@ -160,8 +160,10 @@
"\n",
"class StockPercentChangeInput(BaseModel):\n",
" \"\"\"Inputs for get_stock_performance\"\"\"\n",
"\n",
" ticker: str = Field(description=\"Ticker symbol of the stock\")\n",
" days: int = Field(description='Timedelta days to get past date from current date')\n",
" days: int = Field(description=\"Timedelta days to get past date from current date\")\n",
"\n",
"\n",
"class StockPerformanceTool(BaseTool):\n",
" name = \"get_stock_performance\"\n",
@@ -202,15 +204,9 @@
"from langchain.chat_models import ChatOpenAI\n",
"from langchain.agents import initialize_agent\n",
"\n",
"llm = ChatOpenAI(\n",
" model=\"gpt-3.5-turbo-0613\",\n",
" temperature=0\n",
")\n",
"llm = ChatOpenAI(model=\"gpt-3.5-turbo-0613\", temperature=0)\n",
"\n",
"tools = [\n",
" CurrentStockPriceTool(),\n",
" StockPerformanceTool()\n",
"]\n",
"tools = [CurrentStockPriceTool(), StockPerformanceTool()]\n",
"\n",
"agent = initialize_agent(tools, llm, agent=AgentType.OPENAI_FUNCTIONS, verbose=True)"
]
@@ -261,7 +257,9 @@
}
],
"source": [
"agent.run(\"What is the current price of Microsoft stock? How it has performed over past 6 months?\")"
"agent.run(\n",
" \"What is the current price of Microsoft stock? How it has performed over past 6 months?\"\n",
")"
]
},
{
@@ -355,7 +353,9 @@
}
],
"source": [
"agent.run('In the past 3 months, which stock between Microsoft and Google has performed the best?')"
"agent.run(\n",
" \"In the past 3 months, which stock between Microsoft and Google has performed the best?\"\n",
")"
]
}
],

View File

@@ -79,10 +79,10 @@
"source": [
"llm = ChatOpenAI(temperature=0, model=\"gpt-3.5-turbo-0613\")\n",
"agent = initialize_agent(\n",
" toolkit.get_tools(), \n",
" llm, \n",
" agent=AgentType.OPENAI_FUNCTIONS, \n",
" verbose=True, \n",
" toolkit.get_tools(),\n",
" llm,\n",
" agent=AgentType.OPENAI_FUNCTIONS,\n",
" verbose=True,\n",
" agent_kwargs=agent_kwargs,\n",
")"
]

View File

@@ -17,16 +17,7 @@
"execution_count": 1,
"id": "8632a37c",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/Users/harrisonchase/.pyenv/versions/3.9.1/envs/langchain/lib/python3.9/site-packages/deeplake/util/check_latest_version.py:32: UserWarning: A newer version of deeplake (3.6.5) is available. It's recommended that you update to the latest version using `pip install -U deeplake`.\n",
" warnings.warn(\n"
]
}
],
"outputs": [],
"source": [
"from pydantic import BaseModel, Field\n",
"\n",
@@ -56,14 +47,14 @@
"files = [\n",
" # https://abc.xyz/investor/static/pdf/2023Q1_alphabet_earnings_release.pdf\n",
" {\n",
" \"name\": \"alphabet-earnings\", \n",
" \"name\": \"alphabet-earnings\",\n",
" \"path\": \"/Users/harrisonchase/Downloads/2023Q1_alphabet_earnings_release.pdf\",\n",
" }, \n",
" },\n",
" # https://digitalassets.tesla.com/tesla-contents/image/upload/IR/TSLA-Q1-2023-Update\n",
" {\n",
" \"name\": \"tesla-earnings\", \n",
" \"path\": \"/Users/harrisonchase/Downloads/TSLA-Q1-2023-Update.pdf\"\n",
" }\n",
" \"name\": \"tesla-earnings\",\n",
" \"path\": \"/Users/harrisonchase/Downloads/TSLA-Q1-2023-Update.pdf\",\n",
" },\n",
"]\n",
"\n",
"for file in files:\n",
@@ -73,14 +64,14 @@
" docs = text_splitter.split_documents(pages)\n",
" embeddings = OpenAIEmbeddings()\n",
" retriever = FAISS.from_documents(docs, embeddings).as_retriever()\n",
" \n",
"\n",
" # Wrap retrievers in a Tool\n",
" tools.append(\n",
" Tool(\n",
" args_schema=DocumentInput,\n",
" name=file[\"name\"], \n",
" name=file[\"name\"],\n",
" description=f\"useful when you want to answer questions about {file['name']}\",\n",
" func=RetrievalQA.from_chain_type(llm=llm, retriever=retriever)\n",
" func=RetrievalQA.from_chain_type(llm=llm, retriever=retriever),\n",
" )\n",
" )"
]
@@ -139,7 +130,7 @@
"source": [
"llm = ChatOpenAI(\n",
" temperature=0,\n",
" model=\"gpt-3.5-turbo-0613\", \n",
" model=\"gpt-3.5-turbo-0613\",\n",
")\n",
"\n",
"agent = initialize_agent(\n",
@@ -170,6 +161,7 @@
"outputs": [],
"source": [
"import langchain\n",
"\n",
"langchain.debug = True"
]
},
@@ -405,7 +397,7 @@
"source": [
"llm = ChatOpenAI(\n",
" temperature=0,\n",
" model=\"gpt-3.5-turbo-0613\", \n",
" model=\"gpt-3.5-turbo-0613\",\n",
")\n",
"\n",
"agent = initialize_agent(\n",

View File

@@ -136,9 +136,11 @@
}
],
"source": [
"agent.run(\"Create an email draft for me to edit of a letter from the perspective of a sentient parrot\"\n",
" \" who is looking to collaborate on some research with her\"\n",
" \" estranged friend, a cat. Under no circumstances may you send the message, however.\")"
"agent.run(\n",
" \"Create an email draft for me to edit of a letter from the perspective of a sentient parrot\"\n",
" \" who is looking to collaborate on some research with her\"\n",
" \" estranged friend, a cat. Under no circumstances may you send the message, however.\"\n",
")"
]
},
{
@@ -160,7 +162,9 @@
}
],
"source": [
"agent.run(\"Could you search in my drafts folder and let me know if any of them are about collaboration?\")"
"agent.run(\n",
" \"Could you search in my drafts folder and let me know if any of them are about collaboration?\"\n",
")"
]
},
{
@@ -190,7 +194,9 @@
}
],
"source": [
"agent.run(\"Can you schedule a 30 minute meeting with a sentient parrot to discuss research collaborations on October 3, 2023 at 2 pm Easter Time?\")"
"agent.run(\n",
" \"Can you schedule a 30 minute meeting with a sentient parrot to discuss research collaborations on October 3, 2023 at 2 pm Easter Time?\"\n",
")"
]
},
{
@@ -210,7 +216,9 @@
}
],
"source": [
"agent.run(\"Can you tell me if I have any events on October 3, 2023 in Eastern Time, and if so, tell me if any of them are with a sentient parrot?\")"
"agent.run(\n",
" \"Can you tell me if I have any events on October 3, 2023 in Eastern Time, and if so, tell me if any of them are with a sentient parrot?\"\n",
")"
]
}
],

View File

@@ -24,7 +24,7 @@
"metadata": {},
"outputs": [],
"source": [
"#!pip install apify-client"
"#!pip install apify-client openai langchain chromadb tiktoken"
]
},
{

View File

@@ -34,6 +34,7 @@
"outputs": [],
"source": [
"import os\n",
"\n",
"os.environ[\"DATAFORSEO_LOGIN\"] = \"your_api_access_username\"\n",
"os.environ[\"DATAFORSEO_PASSWORD\"] = \"your_api_access_password\"\n",
"\n",
@@ -88,7 +89,8 @@
"json_wrapper = DataForSeoAPIWrapper(\n",
" json_result_types=[\"organic\", \"knowledge_graph\", \"answer_box\"],\n",
" json_result_fields=[\"type\", \"title\", \"description\", \"text\"],\n",
" top_count=3)"
" top_count=3,\n",
")"
]
},
{
@@ -119,7 +121,8 @@
" top_count=10,\n",
" json_result_types=[\"organic\", \"local_pack\"],\n",
" json_result_fields=[\"title\", \"description\", \"type\"],\n",
" params={\"location_name\": \"Germany\", \"language_code\": \"en\"})\n",
" params={\"location_name\": \"Germany\", \"language_code\": \"en\"},\n",
")\n",
"customized_wrapper.results(\"coffee near me\")"
]
},
@@ -142,7 +145,8 @@
" top_count=10,\n",
" json_result_types=[\"organic\", \"local_pack\"],\n",
" json_result_fields=[\"title\", \"description\", \"type\"],\n",
" params={\"location_name\": \"Germany\", \"language_code\": \"en\", \"se_name\": \"bing\"})\n",
" params={\"location_name\": \"Germany\", \"language_code\": \"en\", \"se_name\": \"bing\"},\n",
")\n",
"customized_wrapper.results(\"coffee near me\")"
]
},
@@ -164,7 +168,12 @@
"maps_search = DataForSeoAPIWrapper(\n",
" top_count=10,\n",
" json_result_fields=[\"title\", \"value\", \"address\", \"rating\", \"type\"],\n",
" params={\"location_coordinate\": \"52.512,13.36,12z\", \"language_code\": \"en\", \"se_type\": \"maps\"})\n",
" params={\n",
" \"location_coordinate\": \"52.512,13.36,12z\",\n",
" \"language_code\": \"en\",\n",
" \"se_type\": \"maps\",\n",
" },\n",
")\n",
"maps_search.results(\"coffee near me\")"
]
},
@@ -184,10 +193,12 @@
"outputs": [],
"source": [
"from langchain.agents import Tool\n",
"\n",
"search = DataForSeoAPIWrapper(\n",
" top_count=3,\n",
" json_result_types=[\"organic\"],\n",
" json_result_fields=[\"title\", \"description\", \"type\"])\n",
" json_result_fields=[\"title\", \"description\", \"type\"],\n",
")\n",
"tool = Tool(\n",
" name=\"google-search-answer\",\n",
" description=\"My new answer tool\",\n",

View File

@@ -52,7 +52,6 @@
"tools = load_tools(\n",
" [\"graphql\"],\n",
" graphql_endpoint=\"https://swapi-graphql.netlify.app/.netlify/functions/index\",\n",
" llm=llm,\n",
")\n",
"\n",
"agent = initialize_agent(\n",

View File

@@ -0,0 +1,233 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"id": "16763ed3",
"metadata": {},
"source": [
"# Lemon AI NLP Workflow Automation\n",
"\\\n",
"Full docs are available at: https://github.com/felixbrock/lemonai-py-client\n",
"\n",
"**Lemon AI helps you build powerful AI assistants in minutes and automate workflows by allowing for accurate and reliable read and write operations in tools like Airtable, Hubspot, Discord, Notion, Slack and Github.**\n",
"\n",
"Most connectors available today are focused on read-only operations, limiting the potential of LLMs. Agents, on the other hand, have a tendency to hallucinate from time to time due to missing context or instructions.\n",
"\n",
"With Lemon AI, it is possible to give your agents access to well-defined APIs for reliable read and write operations. In addition, Lemon AI functions allow you to further reduce the risk of hallucinations by providing a way to statically define workflows that the model can rely on in case of uncertainty."
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "4881b484-1b97-478f-b206-aec407ceff66",
"metadata": {},
"source": [
"## Quick Start\n",
"\n",
"The following quick start demonstrates how to use Lemon AI in combination with Agents to automate workflows that involve interaction with internal tooling."
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "ff91b41a",
"metadata": {},
"source": [
"### 1. Install Lemon AI\n",
"\n",
"Requires Python 3.8.1 and above.\n",
"\n",
"To use Lemon AI in your Python project run `pip install lemonai`\n",
"\n",
"This will install the corresponding Lemon AI client which you can then import into your script.\n",
"\n",
"The tool uses Python packages langchain and loguru. In case of any installation errors with Lemon AI, install both packages first and then install the Lemon AI package."
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "340ff63d",
"metadata": {},
"source": [
"### 2. Launch the Server\n",
"\n",
"The interaction of your agents and all tools provided by Lemon AI is handled by the [Lemon AI Server](https://github.com/felixbrock/lemonai-server). To use Lemon AI you need to run the server on your local machine so the Lemon AI Python client can connect to it."
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "e845f402",
"metadata": {},
"source": [
"### 3. Use Lemon AI with Langchain"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "d3ae6a82",
"metadata": {},
"source": [
"Lemon AI automatically solves given tasks by finding the right combination of relevant tools or uses Lemon AI Functions as an alternative. The following example demonstrates how to retrieve a user from Hackernews and write it to a table in Airtable:"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "43476a22",
"metadata": {},
"source": [
"#### (Optional) Define your Lemon AI Functions"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "cb038670",
"metadata": {},
"source": [
"Similar to [OpenAI functions](https://openai.com/blog/function-calling-and-other-api-updates), Lemon AI provides the option to define workflows as reusable functions. These functions can be defined for use cases where it is especially important to move as close as possible to near-deterministic behavior. Specific workflows can be defined in a separate lemonai.json:"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "e423ebbb",
"metadata": {},
"source": [
"```json\n",
"[\n",
" {\n",
" \"name\": \"Hackernews Airtable User Workflow\",\n",
" \"description\": \"retrieves user data from Hackernews and appends it to a table in Airtable\",\n",
" \"tools\": [\"hackernews-get-user\", \"airtable-append-data\"]\n",
" }\n",
"]\n",
"```"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "3fdb36ce",
"metadata": {},
"source": [
"Your model will have access to these functions and will prefer them over self-selecting tools to solve a given task. All you have to do is to let the agent know that it should use a given function by including the function name in the prompt."
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "ebfb8b5d",
"metadata": {},
"source": [
"#### Include Lemon AI in your Langchain project "
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "5318715d",
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"from lemonai import execute_workflow\n",
"from langchain import OpenAI"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "c9d082cb",
"metadata": {},
"source": [
"#### Load API Keys and Access Tokens\n",
"\n",
"To use tools that require authentication, you have to store the corresponding access credentials in your environment in the format \"{tool name}_{authentication string}\" where the authentication string is one of [\"API_KEY\", \"SECRET_KEY\", \"SUBSCRIPTION_KEY\", \"ACCESS_KEY\"] for API keys or [\"ACCESS_TOKEN\", \"SECRET_TOKEN\"] for authentication tokens. Examples are \"OPENAI_API_KEY\", \"BING_SUBSCRIPTION_KEY\", \"AIRTABLE_ACCESS_TOKEN\"."
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "a370d999",
"metadata": {},
"outputs": [],
"source": [
"\"\"\" Load all relevant API Keys and Access Tokens into your environment variables \"\"\"\n",
"os.environ[\"OPENAI_API_KEY\"] = \"*INSERT OPENAI API KEY HERE*\"\n",
"os.environ[\"AIRTABLE_ACCESS_TOKEN\"] = \"*INSERT AIRTABLE TOKEN HERE*\""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "38d158e7",
"metadata": {},
"outputs": [],
"source": [
"hackernews_username = \"*INSERT HACKERNEWS USERNAME HERE*\"\n",
"airtable_base_id = \"*INSERT BASE ID HERE*\"\n",
"airtable_table_id = \"*INSERT TABLE ID HERE*\"\n",
"\n",
"\"\"\" Define your instruction to be given to your LLM \"\"\"\n",
"prompt = f\"\"\"Read information from Hackernews for user {hackernews_username} and then write the results to\n",
"Airtable (baseId: {airtable_base_id}, tableId: {airtable_table_id}). Only write the fields \"username\", \"karma\"\n",
"and \"created_at_i\". Please make sure that Airtable does NOT automatically convert the field types.\n",
"\"\"\"\n",
"\n",
"\"\"\"\n",
"Use the Lemon AI execute_workflow wrapper \n",
"to run your Langchain agent in combination with Lemon AI \n",
"\"\"\"\n",
"model = OpenAI(temperature=0)\n",
"\n",
"execute_workflow(llm=model, prompt_string=prompt)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "aef3e801",
"metadata": {},
"source": [
"### 4. Gain transparency on your Agent's decision making\n",
"\n",
"To gain transparency on how your Agent interacts with Lemon AI tools to solve a given task, all decisions made, tools used and operations performed are written to a local `lemonai.log` file. Every time your LLM agent is interacting with the Lemon AI tool stack a corresponding log entry is created.\n",
"\n",
"```log\n",
"2023-06-26T11:50:27.708785+0100 - b5f91c59-8487-45c2-800a-156eac0c7dae - hackernews-get-user\n",
"2023-06-26T11:50:39.624035+0100 - b5f91c59-8487-45c2-800a-156eac0c7dae - airtable-append-data\n",
"2023-06-26T11:58:32.925228+0100 - 5efe603c-9898-4143-b99a-55b50007ed9d - hackernews-get-user\n",
"2023-06-26T11:58:43.988788+0100 - 5efe603c-9898-4143-b99a-55b50007ed9d - airtable-append-data\n",
"```\n",
"\n",
"By using the [Lemon AI Analytics Tool](https://github.com/felixbrock/lemonai-analytics) you can easily gain a better understanding of how frequently and in which order tools are used. As a result, you can identify weak spots in your agents decision-making capabilities and move to a more deterministic behavior by defining Lemon AI functions."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -90,7 +90,12 @@
"metadata": {},
"outputs": [],
"source": [
"search.results(\"The best blog post about AI safety is definitely this: \", 10, include_domains=[\"lesswrong.com\"], start_published_date=\"2019-01-01\")"
"search.results(\n",
" \"The best blog post about AI safety is definitely this: \",\n",
" 10,\n",
" include_domains=[\"lesswrong.com\"],\n",
" start_published_date=\"2019-01-01\",\n",
")"
]
},
{

View File

@@ -341,7 +341,7 @@
"outputs": [],
"source": [
"llm = OpenAI(temperature=0)\n",
"zapier = ZapierNLAWrapper(zapier_nla_oauth_access_token='<fill in access token here>')\n",
"zapier = ZapierNLAWrapper(zapier_nla_oauth_access_token=\"<fill in access token here>\")\n",
"toolkit = ZapierToolkit.from_zapier_nla_wrapper(zapier)\n",
"agent = initialize_agent(\n",
" toolkit.get_tools(), llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True\n",

View File

@@ -0,0 +1,220 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# Context\n",
"\n",
"![Context - Product Analytics for AI Chatbots](https://go.getcontext.ai/langchain.png)\n",
"\n",
"[Context](https://getcontext.ai/) provides product analytics for AI chatbots.\n",
"\n",
"Context helps you understand how users are interacting with your AI chat products.\n",
"Gain critical insights, optimise poor experiences, and minimise brand risks.\n"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"In this guide we will show you how to integrate with Context."
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"tags": []
},
"source": [
"## Installation and Setup"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"vscode": {
"languageId": "shellscript"
}
},
"outputs": [],
"source": [
"$ pip install context-python --upgrade"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"### Getting API Credentials\n",
"\n",
"To get your Context API token:\n",
"\n",
"1. Go to the settings page within your Context account (https://go.getcontext.ai/settings).\n",
"2. Generate a new API Token.\n",
"3. Store this token somewhere secure."
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"### Setup Context\n",
"\n",
"To use the `ContextCallbackHandler`, import the handler from Langchain and instantiate it with your Context API token.\n",
"\n",
"Ensure you have installed the `context-python` package before using the handler."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"\n",
"from langchain.callbacks import ContextCallbackHandler\n",
"\n",
"token = os.environ[\"CONTEXT_API_TOKEN\"]\n",
"\n",
"context_callback = ContextCallbackHandler(token)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Usage\n",
"### Using the Context callback within a Chat Model\n",
"\n",
"The Context callback handler can be used to directly record transcripts between users and AI assistants.\n",
"\n",
"#### Example"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"\n",
"from langchain.chat_models import ChatOpenAI\n",
"from langchain.schema import (\n",
" SystemMessage,\n",
" HumanMessage,\n",
")\n",
"from langchain.callbacks import ContextCallbackHandler\n",
"\n",
"token = os.environ[\"CONTEXT_API_TOKEN\"]\n",
"\n",
"chat = ChatOpenAI(\n",
" headers={\"user_id\": \"123\"}, temperature=0, callbacks=[ContextCallbackHandler(token)]\n",
")\n",
"\n",
"messages = [\n",
" SystemMessage(\n",
" content=\"You are a helpful assistant that translates English to French.\"\n",
" ),\n",
" HumanMessage(content=\"I love programming.\"),\n",
"]\n",
"\n",
"print(chat(messages))"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"### Using the Context callback within Chains\n",
"\n",
"The Context callback handler can also be used to record the inputs and outputs of chains. Note that intermediate steps of the chain are not recorded - only the starting inputs and final outputs.\n",
"\n",
"__Note:__ Ensure that you pass the same context object to the chat model and the chain.\n",
"\n",
"Wrong:\n",
"> ```python\n",
"> chat = ChatOpenAI(temperature=0.9, callbacks=[ContextCallbackHandler(token)])\n",
"> chain = LLMChain(llm=chat, prompt=chat_prompt_template, callbacks=[ContextCallbackHandler(token)])\n",
"> ```\n",
"\n",
"Correct:\n",
">```python\n",
">handler = ContextCallbackHandler(token)\n",
">chat = ChatOpenAI(temperature=0.9, callbacks=[callback])\n",
">chain = LLMChain(llm=chat, prompt=chat_prompt_template, callbacks=[callback])\n",
">```\n",
"\n",
"#### Example"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"\n",
"from langchain.chat_models import ChatOpenAI\n",
"from langchain import LLMChain\n",
"from langchain.prompts import PromptTemplate\n",
"from langchain.prompts.chat import (\n",
" ChatPromptTemplate,\n",
" HumanMessagePromptTemplate,\n",
")\n",
"from langchain.callbacks import ContextCallbackHandler\n",
"\n",
"token = os.environ[\"CONTEXT_API_TOKEN\"]\n",
"\n",
"human_message_prompt = HumanMessagePromptTemplate(\n",
" prompt=PromptTemplate(\n",
" template=\"What is a good name for a company that makes {product}?\",\n",
" input_variables=[\"product\"],\n",
" )\n",
")\n",
"chat_prompt_template = ChatPromptTemplate.from_messages([human_message_prompt])\n",
"callback = ContextCallbackHandler(token)\n",
"chat = ChatOpenAI(temperature=0.9, callbacks=[callback])\n",
"chain = LLMChain(llm=chat, prompt=chat_prompt_template, callbacks=[callback])\n",
"print(chain.run(\"colorful socks\"))"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.3"
},
"vscode": {
"interpreter": {
"hash": "a53ebf4a859167383b364e7e7521d0add3c2dbbdecce4edf676e8c4634ff3fbb"
}
}
},
"nbformat": 4,
"nbformat_minor": 4
}

View File

@@ -57,6 +57,7 @@
"\n",
"# Remove the (1) import sys and sys.path.append(..) and (2) uncomment `!pip install langchain` after merging the PR for Infino/LangChain integration.\n",
"import sys\n",
"\n",
"sys.path.append(\"../../../../../langchain\")\n",
"#!pip install langchain\n",
"\n",
@@ -120,9 +121,9 @@
"metadata": {},
"outputs": [],
"source": [
"# These are a subset of questions from Stanford's QA dataset - \n",
"# These are a subset of questions from Stanford's QA dataset -\n",
"# https://rajpurkar.github.io/SQuAD-explorer/\n",
"data = '''In what country is Normandy located?\n",
"data = \"\"\"In what country is Normandy located?\n",
"When were the Normans in Normandy?\n",
"From which countries did the Norse originate?\n",
"Who was the Norse leader?\n",
@@ -141,9 +142,9 @@
"What principality did William the conquerer found?\n",
"What is the original meaning of the word Norman?\n",
"When was the Latin version of the word Norman first recorded?\n",
"What name comes from the English words Normans/Normanz?'''\n",
"What name comes from the English words Normans/Normanz?\"\"\"\n",
"\n",
"questions = data.split('\\n')"
"questions = data.split(\"\\n\")"
]
},
{
@@ -190,10 +191,12 @@
],
"source": [
"# Set your key here.\n",
"#os.environ[\"OPENAI_API_KEY\"] = \"YOUR_API_KEY\"\n",
"# os.environ[\"OPENAI_API_KEY\"] = \"YOUR_API_KEY\"\n",
"\n",
"# Create callback handler. This logs latency, errors, token usage, prompts as well as prompt responses to Infino.\n",
"handler = InfinoCallbackHandler(model_id=\"test_openai\", model_version=\"0.1\", verbose=False)\n",
"handler = InfinoCallbackHandler(\n",
" model_id=\"test_openai\", model_version=\"0.1\", verbose=False\n",
")\n",
"\n",
"# Create LLM.\n",
"llm = OpenAI(temperature=0.1)\n",
@@ -281,29 +284,30 @@
"source": [
"# Helper function to create a graph using matplotlib.\n",
"def plot(data, title):\n",
" data = json.loads(data)\n",
" data = json.loads(data)\n",
"\n",
" # Extract x and y values from the data\n",
" timestamps = [item[\"time\"] for item in data]\n",
" dates=[dt.datetime.fromtimestamp(ts) for ts in timestamps]\n",
" y = [item[\"value\"] for item in data]\n",
" # Extract x and y values from the data\n",
" timestamps = [item[\"time\"] for item in data]\n",
" dates = [dt.datetime.fromtimestamp(ts) for ts in timestamps]\n",
" y = [item[\"value\"] for item in data]\n",
"\n",
" plt.rcParams['figure.figsize'] = [6, 4]\n",
" plt.subplots_adjust(bottom=0.2)\n",
" plt.xticks(rotation=25 )\n",
" ax=plt.gca()\n",
" xfmt = md.DateFormatter('%Y-%m-%d %H:%M:%S')\n",
" ax.xaxis.set_major_formatter(xfmt)\n",
" \n",
" # Create the plot\n",
" plt.plot(dates, y)\n",
" plt.rcParams[\"figure.figsize\"] = [6, 4]\n",
" plt.subplots_adjust(bottom=0.2)\n",
" plt.xticks(rotation=25)\n",
" ax = plt.gca()\n",
" xfmt = md.DateFormatter(\"%Y-%m-%d %H:%M:%S\")\n",
" ax.xaxis.set_major_formatter(xfmt)\n",
"\n",
" # Set labels and title\n",
" plt.xlabel(\"Time\")\n",
" plt.ylabel(\"Value\")\n",
" plt.title(title)\n",
" # Create the plot\n",
" plt.plot(dates, y)\n",
"\n",
" # Set labels and title\n",
" plt.xlabel(\"Time\")\n",
" plt.ylabel(\"Value\")\n",
" plt.title(title)\n",
"\n",
" plt.show()\n",
"\n",
" plt.show()\n",
"\n",
"response = client.search_ts(\"__name__\", \"latency\", 0, int(time.time()))\n",
"plot(response.text, \"Latency\")\n",
@@ -318,7 +322,7 @@
"plot(response.text, \"Completion Tokens\")\n",
"\n",
"response = client.search_ts(\"__name__\", \"total_tokens\", 0, int(time.time()))\n",
"plot(response.text, \"Total Tokens\")\n"
"plot(response.text, \"Total Tokens\")"
]
},
{
@@ -356,7 +360,7 @@
"\n",
"query = \"king charles III\"\n",
"response = client.search_log(\"king charles III\", 0, int(time.time()))\n",
"print(\"Results for\", query, \":\", response.text)\n"
"print(\"Results for\", query, \":\", response.text)"
]
},
{

View File

@@ -9,7 +9,7 @@
In this guide we will demonstrate how to use `StreamlitCallbackHandler` to display the thoughts and actions of an agent in an
interactive Streamlit app. Try it out with the running app below using the [MRKL agent](/docs/modules/agents/how_to/mrkl/):
<iframe loading="lazy" src="https://mrkl-minimal.streamlit.app/?embed=true&embed_options=light_theme"
<iframe loading="lazy" src="https://langchain-mrkl.streamlit.app/?embed=true&embed_options=light_theme"
style={{ width: 100 + '%', border: 'none', marginBottom: 1 + 'rem', height: 600 }}
allow="camera;clipboard-read;clipboard-write;"
></iframe>
@@ -35,7 +35,7 @@ st_callback = StreamlitCallbackHandler(st.container())
```
Additional keyword arguments to customize the display behavior are described in the
[API reference](https://api.python.langchain.com/en/latest/modules/callbacks.html#langchain.callbacks.StreamlitCallbackHandler).
[API reference](https://api.python.langchain.com/en/latest/callbacks/langchain.callbacks.streamlit.streamlit_callback_handler.StreamlitCallbackHandler.html).
### Scenario 1: Using an Agent with Tools

View File

@@ -0,0 +1,921 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"id": "82f3f65d-fbcb-4e8e-b04b-959856283643",
"metadata": {},
"source": [
"# Causal program-aided language (CPAL) chain\n",
"\n",
"The CPAL chain builds on the recent PAL to stop LLM hallucination. The problem with the PAL approach is that it hallucinates on a math problem with a nested chain of dependence. The innovation here is that this new CPAL approach includes causal structure to fix hallucination.\n",
"\n",
"The original [PR's description](https://github.com/hwchase17/langchain/pull/6255) contains a full overview.\n",
"\n",
"Using the CPAL chain, the LLM translated this\n",
"\n",
" \"Tim buys the same number of pets as Cindy and Boris.\"\n",
" \"Cindy buys the same number of pets as Bill plus Bob.\"\n",
" \"Boris buys the same number of pets as Ben plus Beth.\"\n",
" \"Bill buys the same number of pets as Obama.\"\n",
" \"Bob buys the same number of pets as Obama.\"\n",
" \"Ben buys the same number of pets as Obama.\"\n",
" \"Beth buys the same number of pets as Obama.\"\n",
" \"If Obama buys one pet, how many pets total does everyone buy?\"\n",
"\n",
"\n",
"into this\n",
"\n",
"![complex-graph.png](/img/cpal_diagram.png).\n",
"\n",
"Outline of code examples demoed in this notebook.\n",
"\n",
"1. CPAL's value against hallucination: CPAL vs PAL \n",
" 1.1 Complex narrative \n",
" 1.2 Unanswerable math word problem \n",
"2. CPAL's three types of causal diagrams ([The Book of Why](https://en.wikipedia.org/wiki/The_Book_of_Why)). \n",
" 2.1 Mediator \n",
" 2.2 Collider \n",
" 2.3 Confounder "
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "1370e40f",
"metadata": {},
"outputs": [],
"source": [
"from IPython.display import SVG\n",
"\n",
"from langchain.experimental.cpal.base import CPALChain\n",
"from langchain.chains import PALChain\n",
"from langchain import OpenAI\n",
"\n",
"llm = OpenAI(temperature=0, max_tokens=512)\n",
"cpal_chain = CPALChain.from_univariate_prompt(llm=llm, verbose=True)\n",
"pal_chain = PALChain.from_math_prompt(llm=llm, verbose=True)"
]
},
{
"cell_type": "markdown",
"id": "858a87d9-a9bd-4850-9687-9af4b0856b62",
"metadata": {},
"source": [
"## CPAL's value against hallucination: CPAL vs PAL\n",
"\n",
"Like PAL, CPAL intends to reduce large language model (LLM) hallucination.\n",
"\n",
"The CPAL chain is different from the PAL chain for a couple of reasons.\n",
"\n",
"CPAL adds a causal structure (or DAG) to link entity actions (or math expressions).\n",
"The CPAL math expressions are modeling a chain of cause and effect relations, which can be intervened upon, whereas for the PAL chain math expressions are projected math identities.\n"
]
},
{
"cell_type": "markdown",
"id": "496403c5-d268-43ae-8852-2bd9903ce444",
"metadata": {},
"source": [
"### 1.1 Complex narrative\n",
"\n",
"Takeaway: PAL hallucinates, CPAL does not hallucinate."
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "d5dad768-2892-4825-8093-9b840f643a8a",
"metadata": {},
"outputs": [],
"source": [
"question = (\n",
" \"Tim buys the same number of pets as Cindy and Boris.\"\n",
" \"Cindy buys the same number of pets as Bill plus Bob.\"\n",
" \"Boris buys the same number of pets as Ben plus Beth.\"\n",
" \"Bill buys the same number of pets as Obama.\"\n",
" \"Bob buys the same number of pets as Obama.\"\n",
" \"Ben buys the same number of pets as Obama.\"\n",
" \"Beth buys the same number of pets as Obama.\"\n",
" \"If Obama buys one pet, how many pets total does everyone buy?\"\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "bbffa7a0-3c22-4a1d-ab2d-f230973073b0",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3mdef solution():\n",
" \"\"\"Tim buys the same number of pets as Cindy and Boris.Cindy buys the same number of pets as Bill plus Bob.Boris buys the same number of pets as Ben plus Beth.Bill buys the same number of pets as Obama.Bob buys the same number of pets as Obama.Ben buys the same number of pets as Obama.Beth buys the same number of pets as Obama.If Obama buys one pet, how many pets total does everyone buy?\"\"\"\n",
" obama_pets = 1\n",
" tim_pets = obama_pets\n",
" cindy_pets = obama_pets + obama_pets\n",
" boris_pets = obama_pets + obama_pets\n",
" total_pets = tim_pets + cindy_pets + boris_pets\n",
" result = total_pets\n",
" return result\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"'5'"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pal_chain.run(question)"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "35a70d1d-86f8-4abc-b818-fbd083f072e9",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3mstory outcome data\n",
" name code value depends_on\n",
"0 obama pass 1.0 []\n",
"1 bill bill.value = obama.value 1.0 [obama]\n",
"2 bob bob.value = obama.value 1.0 [obama]\n",
"3 ben ben.value = obama.value 1.0 [obama]\n",
"4 beth beth.value = obama.value 1.0 [obama]\n",
"5 cindy cindy.value = bill.value + bob.value 2.0 [bill, bob]\n",
"6 boris boris.value = ben.value + beth.value 2.0 [ben, beth]\n",
"7 tim tim.value = cindy.value + boris.value 4.0 [cindy, boris]\u001b[0m\n",
"\n",
"\u001b[36;1m\u001b[1;3mquery data\n",
"{\n",
" \"question\": \"how many pets total does everyone buy?\",\n",
" \"expression\": \"SELECT SUM(value) FROM df\",\n",
" \"llm_error_msg\": \"\"\n",
"}\u001b[0m\n",
"\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"13.0"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cpal_chain.run(question)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "ccb6b2b0-9de6-4f66-a8fb-fc59229ee316",
"metadata": {},
"outputs": [
{
"data": {
"image/svg+xml": [
"<svg xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\" width=\"292pt\" height=\"260pt\" viewBox=\"0.00 0.00 292.00 260.00\">\n",
"<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 256)\">\n",
"<polygon fill=\"white\" stroke=\"transparent\" points=\"-4,4 -4,-256 288,-256 288,4 -4,4\"/>\n",
"<!-- obama -->\n",
"<g id=\"node1\" class=\"node\">\n",
"<title>obama</title>\n",
"<ellipse fill=\"none\" stroke=\"black\" cx=\"137\" cy=\"-234\" rx=\"41.69\" ry=\"18\"/>\n",
"<text text-anchor=\"middle\" x=\"137\" y=\"-230.3\" font-family=\"Times,serif\" font-size=\"14.00\">obama</text>\n",
"</g>\n",
"<!-- bill -->\n",
"<g id=\"node2\" class=\"node\">\n",
"<title>bill</title>\n",
"<ellipse fill=\"none\" stroke=\"black\" cx=\"27\" cy=\"-162\" rx=\"27\" ry=\"18\"/>\n",
"<text text-anchor=\"middle\" x=\"27\" y=\"-158.3\" font-family=\"Times,serif\" font-size=\"14.00\">bill</text>\n",
"</g>\n",
"<!-- obama&#45;&gt;bill -->\n",
"<g id=\"edge1\" class=\"edge\">\n",
"<title>obama-&gt;bill</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M114.47,-218.67C97.08,-207.6 72.94,-192.23 54.42,-180.45\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"56.15,-177.4 45.84,-174.99 52.4,-183.31 56.15,-177.4\"/>\n",
"</g>\n",
"<!-- bob -->\n",
"<g id=\"node3\" class=\"node\">\n",
"<title>bob</title>\n",
"<ellipse fill=\"none\" stroke=\"black\" cx=\"100\" cy=\"-162\" rx=\"28\" ry=\"18\"/>\n",
"<text text-anchor=\"middle\" x=\"100\" y=\"-158.3\" font-family=\"Times,serif\" font-size=\"14.00\">bob</text>\n",
"</g>\n",
"<!-- obama&#45;&gt;bob -->\n",
"<g id=\"edge2\" class=\"edge\">\n",
"<title>obama-&gt;bob</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M128.04,-216.05C123.66,-207.77 118.3,-197.62 113.44,-188.42\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"116.39,-186.51 108.62,-179.31 110.2,-189.79 116.39,-186.51\"/>\n",
"</g>\n",
"<!-- ben -->\n",
"<g id=\"node4\" class=\"node\">\n",
"<title>ben</title>\n",
"<ellipse fill=\"none\" stroke=\"black\" cx=\"174\" cy=\"-162\" rx=\"28\" ry=\"18\"/>\n",
"<text text-anchor=\"middle\" x=\"174\" y=\"-158.3\" font-family=\"Times,serif\" font-size=\"14.00\">ben</text>\n",
"</g>\n",
"<!-- obama&#45;&gt;ben -->\n",
"<g id=\"edge3\" class=\"edge\">\n",
"<title>obama-&gt;ben</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M145.96,-216.05C150.34,-207.77 155.7,-197.62 160.56,-188.42\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"163.8,-189.79 165.38,-179.31 157.61,-186.51 163.8,-189.79\"/>\n",
"</g>\n",
"<!-- beth -->\n",
"<g id=\"node5\" class=\"node\">\n",
"<title>beth</title>\n",
"<ellipse fill=\"none\" stroke=\"black\" cx=\"252\" cy=\"-162\" rx=\"32\" ry=\"18\"/>\n",
"<text text-anchor=\"middle\" x=\"252\" y=\"-158.3\" font-family=\"Times,serif\" font-size=\"14.00\">beth</text>\n",
"</g>\n",
"<!-- obama&#45;&gt;beth -->\n",
"<g id=\"edge4\" class=\"edge\">\n",
"<title>obama-&gt;beth</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M160.27,-218.83C178.18,-207.94 203.04,-192.8 222.37,-181.04\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"224.36,-183.92 231.08,-175.73 220.72,-177.95 224.36,-183.92\"/>\n",
"</g>\n",
"<!-- cindy -->\n",
"<g id=\"node6\" class=\"node\">\n",
"<title>cindy</title>\n",
"<ellipse fill=\"none\" stroke=\"black\" cx=\"93\" cy=\"-90\" rx=\"36\" ry=\"18\"/>\n",
"<text text-anchor=\"middle\" x=\"93\" y=\"-86.3\" font-family=\"Times,serif\" font-size=\"14.00\">cindy</text>\n",
"</g>\n",
"<!-- bill&#45;&gt;cindy -->\n",
"<g id=\"edge5\" class=\"edge\">\n",
"<title>bill-&gt;cindy</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M41,-146.15C49.77,-136.85 61.25,-124.67 71.2,-114.12\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"73.79,-116.47 78.11,-106.8 68.7,-111.67 73.79,-116.47\"/>\n",
"</g>\n",
"<!-- bob&#45;&gt;cindy -->\n",
"<g id=\"edge6\" class=\"edge\">\n",
"<title>bob-&gt;cindy</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M98.27,-143.7C97.5,-135.98 96.57,-126.71 95.71,-118.11\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"99.19,-117.7 94.71,-108.1 92.22,-118.4 99.19,-117.7\"/>\n",
"</g>\n",
"<!-- boris -->\n",
"<g id=\"node7\" class=\"node\">\n",
"<title>boris</title>\n",
"<ellipse fill=\"none\" stroke=\"black\" cx=\"181\" cy=\"-90\" rx=\"34.5\" ry=\"18\"/>\n",
"<text text-anchor=\"middle\" x=\"181\" y=\"-86.3\" font-family=\"Times,serif\" font-size=\"14.00\">boris</text>\n",
"</g>\n",
"<!-- ben&#45;&gt;boris -->\n",
"<g id=\"edge7\" class=\"edge\">\n",
"<title>ben-&gt;boris</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M175.73,-143.7C176.5,-135.98 177.43,-126.71 178.29,-118.11\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"181.78,-118.4 179.29,-108.1 174.81,-117.7 181.78,-118.4\"/>\n",
"</g>\n",
"<!-- beth&#45;&gt;boris -->\n",
"<g id=\"edge8\" class=\"edge\">\n",
"<title>beth-&gt;boris</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M236.59,-145.81C227.01,-136.36 214.51,-124.04 203.8,-113.48\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"205.96,-110.69 196.38,-106.16 201.04,-115.67 205.96,-110.69\"/>\n",
"</g>\n",
"<!-- tim -->\n",
"<g id=\"node8\" class=\"node\">\n",
"<title>tim</title>\n",
"<ellipse fill=\"none\" stroke=\"black\" cx=\"137\" cy=\"-18\" rx=\"27\" ry=\"18\"/>\n",
"<text text-anchor=\"middle\" x=\"137\" y=\"-14.3\" font-family=\"Times,serif\" font-size=\"14.00\">tim</text>\n",
"</g>\n",
"<!-- cindy&#45;&gt;tim -->\n",
"<g id=\"edge9\" class=\"edge\">\n",
"<title>cindy-&gt;tim</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M103.43,-72.41C108.82,-63.83 115.51,-53.19 121.49,-43.67\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"124.59,-45.32 126.95,-34.99 118.66,-41.59 124.59,-45.32\"/>\n",
"</g>\n",
"<!-- boris&#45;&gt;tim -->\n",
"<g id=\"edge10\" class=\"edge\">\n",
"<title>boris-&gt;tim</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M170.79,-72.77C165.41,-64.19 158.68,-53.49 152.65,-43.9\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"155.43,-41.75 147.15,-35.15 149.51,-45.48 155.43,-41.75\"/>\n",
"</g>\n",
"</g>\n",
"</svg>"
],
"text/plain": [
"<IPython.core.display.SVG object>"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# wait 20 secs to see display\n",
"cpal_chain.draw(path=\"web.svg\")\n",
"SVG(\"web.svg\")"
]
},
{
"cell_type": "markdown",
"id": "1f6f345a-bb16-4e64-83c4-cbbc789a8325",
"metadata": {},
"source": [
"### Unanswerable math\n",
"\n",
"Takeaway: PAL hallucinates, where CPAL, rather than hallucinate, answers with _\"unanswerable, narrative question and plot are incoherent\"_"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "068afd79-fd41-4ec2-b4d0-c64140dc413f",
"metadata": {},
"outputs": [],
"source": [
"question = (\n",
" \"Jan has three times the number of pets as Marcia.\"\n",
" \"Marcia has two more pets than Cindy.\"\n",
" \"If Cindy has ten pets, how many pets does Barak have?\"\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "02f77db2-72e8-46c2-90b3-5e37ca42f80d",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3mdef solution():\n",
" \"\"\"Jan has three times the number of pets as Marcia.Marcia has two more pets than Cindy.If Cindy has ten pets, how many pets does Barak have?\"\"\"\n",
" cindy_pets = 10\n",
" marcia_pets = cindy_pets + 2\n",
" jan_pets = marcia_pets * 3\n",
" result = jan_pets\n",
" return result\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"'36'"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pal_chain.run(question)"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "925958de-e998-4ffa-8b2e-5a00ddae5026",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3mstory outcome data\n",
" name code value depends_on\n",
"0 cindy pass 10.0 []\n",
"1 marcia marcia.value = cindy.value + 2 12.0 [cindy]\n",
"2 jan jan.value = marcia.value * 3 36.0 [marcia]\u001b[0m\n",
"\n",
"\u001b[36;1m\u001b[1;3mquery data\n",
"{\n",
" \"question\": \"how many pets does barak have?\",\n",
" \"expression\": \"SELECT name, value FROM df WHERE name = 'barak'\",\n",
" \"llm_error_msg\": \"\"\n",
"}\u001b[0m\n",
"\n",
"unanswerable, query and outcome are incoherent\n",
"\n",
"outcome:\n",
" name code value depends_on\n",
"0 cindy pass 10.0 []\n",
"1 marcia marcia.value = cindy.value + 2 12.0 [cindy]\n",
"2 jan jan.value = marcia.value * 3 36.0 [marcia]\n",
"query:\n",
"{'question': 'how many pets does barak have?', 'expression': \"SELECT name, value FROM df WHERE name = 'barak'\", 'llm_error_msg': ''}\n"
]
}
],
"source": [
"try:\n",
" cpal_chain.run(question)\n",
"except Exception as e_msg:\n",
" print(e_msg)"
]
},
{
"cell_type": "markdown",
"id": "095adc76",
"metadata": {},
"source": [
"### Basic math\n",
"\n",
"#### Causal mediator"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "3ecf03fa-8350-4c4e-8080-84a307ba6ad4",
"metadata": {},
"outputs": [],
"source": [
"question = (\n",
" \"Jan has three times the number of pets as Marcia. \"\n",
" \"Marcia has two more pets than Cindy. \"\n",
" \"If Cindy has four pets, how many total pets do the three have?\"\n",
")"
]
},
{
"cell_type": "markdown",
"id": "74e49c47-3eed-4abe-98b7-8e97bcd15944",
"metadata": {},
"source": [
"---\n",
"PAL"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "2e88395f-d014-4362-abb0-88f6800860bb",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3mdef solution():\n",
" \"\"\"Jan has three times the number of pets as Marcia. Marcia has two more pets than Cindy. If Cindy has four pets, how many total pets do the three have?\"\"\"\n",
" cindy_pets = 4\n",
" marcia_pets = cindy_pets + 2\n",
" jan_pets = marcia_pets * 3\n",
" total_pets = cindy_pets + marcia_pets + jan_pets\n",
" result = total_pets\n",
" return result\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"'28'"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pal_chain.run(question)"
]
},
{
"cell_type": "markdown",
"id": "20ba6640-3d17-4b59-8101-aaba89d68cf4",
"metadata": {},
"source": [
"---\n",
"CPAL"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "312a0943-a482-4ed0-a064-1e7a72e9479b",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3mstory outcome data\n",
" name code value depends_on\n",
"0 cindy pass 4.0 []\n",
"1 marcia marcia.value = cindy.value + 2 6.0 [cindy]\n",
"2 jan jan.value = marcia.value * 3 18.0 [marcia]\u001b[0m\n",
"\n",
"\u001b[36;1m\u001b[1;3mquery data\n",
"{\n",
" \"question\": \"how many total pets do the three have?\",\n",
" \"expression\": \"SELECT SUM(value) FROM df\",\n",
" \"llm_error_msg\": \"\"\n",
"}\u001b[0m\n",
"\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"28.0"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cpal_chain.run(question)"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "4466b975-ae2b-4252-972b-b3182a089ade",
"metadata": {},
"outputs": [
{
"data": {
"image/svg+xml": [
"<svg xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\" width=\"92pt\" height=\"188pt\" viewBox=\"0.00 0.00 92.49 188.00\">\n",
"<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 184)\">\n",
"<polygon fill=\"white\" stroke=\"transparent\" points=\"-4,4 -4,-184 88.49,-184 88.49,4 -4,4\"/>\n",
"<!-- cindy -->\n",
"<g id=\"node1\" class=\"node\">\n",
"<title>cindy</title>\n",
"<ellipse fill=\"none\" stroke=\"black\" cx=\"42.25\" cy=\"-162\" rx=\"36\" ry=\"18\"/>\n",
"<text text-anchor=\"middle\" x=\"42.25\" y=\"-158.3\" font-family=\"Times,serif\" font-size=\"14.00\">cindy</text>\n",
"</g>\n",
"<!-- marcia -->\n",
"<g id=\"node2\" class=\"node\">\n",
"<title>marcia</title>\n",
"<ellipse fill=\"none\" stroke=\"black\" cx=\"42.25\" cy=\"-90\" rx=\"42.49\" ry=\"18\"/>\n",
"<text text-anchor=\"middle\" x=\"42.25\" y=\"-86.3\" font-family=\"Times,serif\" font-size=\"14.00\">marcia</text>\n",
"</g>\n",
"<!-- cindy&#45;&gt;marcia -->\n",
"<g id=\"edge1\" class=\"edge\">\n",
"<title>cindy-&gt;marcia</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M42.25,-143.7C42.25,-135.98 42.25,-126.71 42.25,-118.11\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"45.75,-118.1 42.25,-108.1 38.75,-118.1 45.75,-118.1\"/>\n",
"</g>\n",
"<!-- jan -->\n",
"<g id=\"node3\" class=\"node\">\n",
"<title>jan</title>\n",
"<ellipse fill=\"none\" stroke=\"black\" cx=\"42.25\" cy=\"-18\" rx=\"27\" ry=\"18\"/>\n",
"<text text-anchor=\"middle\" x=\"42.25\" y=\"-14.3\" font-family=\"Times,serif\" font-size=\"14.00\">jan</text>\n",
"</g>\n",
"<!-- marcia&#45;&gt;jan -->\n",
"<g id=\"edge2\" class=\"edge\">\n",
"<title>marcia-&gt;jan</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M42.25,-71.7C42.25,-63.98 42.25,-54.71 42.25,-46.11\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"45.75,-46.1 42.25,-36.1 38.75,-46.1 45.75,-46.1\"/>\n",
"</g>\n",
"</g>\n",
"</svg>"
],
"text/plain": [
"<IPython.core.display.SVG object>"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# wait 20 secs to see display\n",
"cpal_chain.draw(path=\"web.svg\")\n",
"SVG(\"web.svg\")"
]
},
{
"cell_type": "markdown",
"id": "29fa7b8a-75a3-4270-82a2-2c31939cd7e0",
"metadata": {},
"source": [
"### Causal collider"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "618eddac-f0ef-4ab5-90ed-72e880fdeba3",
"metadata": {},
"outputs": [],
"source": [
"question = (\n",
" \"Jan has the number of pets as Marcia plus the number of pets as Cindy. \"\n",
" \"Marcia has no pets. \"\n",
" \"If Cindy has four pets, how many total pets do the three have?\"\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "a01563f3-7974-4de4-8bd9-0b7d710aa0d3",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3mstory outcome data\n",
" name code value depends_on\n",
"0 marcia pass 0.0 []\n",
"1 cindy pass 4.0 []\n",
"2 jan jan.value = marcia.value + cindy.value 4.0 [marcia, cindy]\u001b[0m\n",
"\n",
"\u001b[36;1m\u001b[1;3mquery data\n",
"{\n",
" \"question\": \"how many total pets do the three have?\",\n",
" \"expression\": \"SELECT SUM(value) FROM df\",\n",
" \"llm_error_msg\": \"\"\n",
"}\u001b[0m\n",
"\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"8.0"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cpal_chain.run(question)"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "0fbe7243-0522-4946-b9a2-6e21e7c49a42",
"metadata": {},
"outputs": [
{
"data": {
"image/svg+xml": [
"<svg xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\" width=\"182pt\" height=\"116pt\" viewBox=\"0.00 0.00 182.00 116.00\">\n",
"<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 112)\">\n",
"<polygon fill=\"white\" stroke=\"transparent\" points=\"-4,4 -4,-112 178,-112 178,4 -4,4\"/>\n",
"<!-- marcia -->\n",
"<g id=\"node1\" class=\"node\">\n",
"<title>marcia</title>\n",
"<ellipse fill=\"none\" stroke=\"black\" cx=\"42.25\" cy=\"-90\" rx=\"42.49\" ry=\"18\"/>\n",
"<text text-anchor=\"middle\" x=\"42.25\" y=\"-86.3\" font-family=\"Times,serif\" font-size=\"14.00\">marcia</text>\n",
"</g>\n",
"<!-- jan -->\n",
"<g id=\"node2\" class=\"node\">\n",
"<title>jan</title>\n",
"<ellipse fill=\"none\" stroke=\"black\" cx=\"90.25\" cy=\"-18\" rx=\"27\" ry=\"18\"/>\n",
"<text text-anchor=\"middle\" x=\"90.25\" y=\"-14.3\" font-family=\"Times,serif\" font-size=\"14.00\">jan</text>\n",
"</g>\n",
"<!-- marcia&#45;&gt;jan -->\n",
"<g id=\"edge1\" class=\"edge\">\n",
"<title>marcia-&gt;jan</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M53.62,-72.41C59.57,-63.74 66.95,-52.97 73.53,-43.38\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"76.51,-45.21 79.28,-34.99 70.74,-41.26 76.51,-45.21\"/>\n",
"</g>\n",
"<!-- cindy -->\n",
"<g id=\"node3\" class=\"node\">\n",
"<title>cindy</title>\n",
"<ellipse fill=\"none\" stroke=\"black\" cx=\"138.25\" cy=\"-90\" rx=\"36\" ry=\"18\"/>\n",
"<text text-anchor=\"middle\" x=\"138.25\" y=\"-86.3\" font-family=\"Times,serif\" font-size=\"14.00\">cindy</text>\n",
"</g>\n",
"<!-- cindy&#45;&gt;jan -->\n",
"<g id=\"edge2\" class=\"edge\">\n",
"<title>cindy-&gt;jan</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M127.11,-72.77C121.09,-63.98 113.54,-52.96 106.83,-43.19\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"109.53,-40.94 100.99,-34.67 103.75,-44.89 109.53,-40.94\"/>\n",
"</g>\n",
"</g>\n",
"</svg>"
],
"text/plain": [
"<IPython.core.display.SVG object>"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# wait 20 secs to see display\n",
"cpal_chain.draw(path=\"web.svg\")\n",
"SVG(\"web.svg\")"
]
},
{
"cell_type": "markdown",
"id": "d4082538-ec03-44f0-aac3-07e03aad7555",
"metadata": {},
"source": [
"### Causal confounder"
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "83932c30-950b-435a-b328-7993ce8cc6bd",
"metadata": {},
"outputs": [],
"source": [
"question = (\n",
" \"Jan has the number of pets as Marcia plus the number of pets as Cindy. \"\n",
" \"Marcia has two more pets than Cindy. \"\n",
" \"If Cindy has four pets, how many total pets do the three have?\"\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "570de307-7c6b-4fdc-80c3-4361daa8a629",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3mstory outcome data\n",
" name code value depends_on\n",
"0 cindy pass 4.0 []\n",
"1 marcia marcia.value = cindy.value + 2 6.0 [cindy]\n",
"2 jan jan.value = cindy.value + marcia.value 10.0 [cindy, marcia]\u001b[0m\n",
"\n",
"\u001b[36;1m\u001b[1;3mquery data\n",
"{\n",
" \"question\": \"how many total pets do the three have?\",\n",
" \"expression\": \"SELECT SUM(value) FROM df\",\n",
" \"llm_error_msg\": \"\"\n",
"}\u001b[0m\n",
"\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"20.0"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cpal_chain.run(question)"
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "00375615-6b6d-4357-bdb8-f64f682f7605",
"metadata": {},
"outputs": [
{
"data": {
"image/svg+xml": [
"<svg xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\" width=\"121pt\" height=\"188pt\" viewBox=\"0.00 0.00 120.99 188.00\">\n",
"<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 184)\">\n",
"<polygon fill=\"white\" stroke=\"transparent\" points=\"-4,4 -4,-184 116.99,-184 116.99,4 -4,4\"/>\n",
"<!-- cindy -->\n",
"<g id=\"node1\" class=\"node\">\n",
"<title>cindy</title>\n",
"<ellipse fill=\"none\" stroke=\"black\" cx=\"77.25\" cy=\"-162\" rx=\"36\" ry=\"18\"/>\n",
"<text text-anchor=\"middle\" x=\"77.25\" y=\"-158.3\" font-family=\"Times,serif\" font-size=\"14.00\">cindy</text>\n",
"</g>\n",
"<!-- marcia -->\n",
"<g id=\"node2\" class=\"node\">\n",
"<title>marcia</title>\n",
"<ellipse fill=\"none\" stroke=\"black\" cx=\"42.25\" cy=\"-90\" rx=\"42.49\" ry=\"18\"/>\n",
"<text text-anchor=\"middle\" x=\"42.25\" y=\"-86.3\" font-family=\"Times,serif\" font-size=\"14.00\">marcia</text>\n",
"</g>\n",
"<!-- cindy&#45;&gt;marcia -->\n",
"<g id=\"edge1\" class=\"edge\">\n",
"<title>cindy-&gt;marcia</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M68.95,-144.41C64.87,-136.25 59.86,-126.22 55.28,-117.07\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"58.33,-115.34 50.72,-107.96 52.07,-118.47 58.33,-115.34\"/>\n",
"</g>\n",
"<!-- jan -->\n",
"<g id=\"node3\" class=\"node\">\n",
"<title>jan</title>\n",
"<ellipse fill=\"none\" stroke=\"black\" cx=\"77.25\" cy=\"-18\" rx=\"27\" ry=\"18\"/>\n",
"<text text-anchor=\"middle\" x=\"77.25\" y=\"-14.3\" font-family=\"Times,serif\" font-size=\"14.00\">jan</text>\n",
"</g>\n",
"<!-- cindy&#45;&gt;jan -->\n",
"<g id=\"edge2\" class=\"edge\">\n",
"<title>cindy-&gt;jan</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M83.73,-144.1C87.32,-133.84 91.42,-120.36 93.25,-108 95.58,-92.17 95.58,-87.83 93.25,-72 91.95,-63.21 89.5,-53.86 86.91,-45.5\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"90.19,-44.29 83.73,-35.9 83.55,-46.49 90.19,-44.29\"/>\n",
"</g>\n",
"<!-- marcia&#45;&gt;jan -->\n",
"<g id=\"edge3\" class=\"edge\">\n",
"<title>marcia-&gt;jan</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M50.72,-72.06C54.86,-63.77 59.94,-53.62 64.53,-44.42\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"67.75,-45.82 69.09,-35.31 61.49,-42.69 67.75,-45.82\"/>\n",
"</g>\n",
"</g>\n",
"</svg>"
],
"text/plain": [
"<IPython.core.display.SVG object>"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# wait 20 secs to see display\n",
"cpal_chain.draw(path=\"web.svg\")\n",
"SVG(\"web.svg\")"
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "255683de-0c1c-4131-b277-99d09f5ac1fc",
"metadata": {},
"outputs": [],
"source": [
"%load_ext autoreload\n",
"%autoreload 2"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.3"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,218 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "dd7ec7af",
"metadata": {},
"source": [
"# Elasticsearch database\n",
"\n",
"Interact with Elasticsearch analytics database via Langchain. This chain builds search queries via the Elasticsearch DSL API (filters and aggregations).\n",
"\n",
"The Elasticsearch client must have permissions for index listing, mapping description and search queries.\n",
"\n",
"See [here](https://www.elastic.co/guide/en/elasticsearch/reference/current/docker.html) for instructions on how to run Elasticsearch locally.\n",
"\n",
"Make sure to install the Elasticsearch Python client before:\n",
"\n",
"```sh\n",
"pip install elasticsearch\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "dd8eae75",
"metadata": {},
"outputs": [],
"source": [
"from elasticsearch import Elasticsearch\n",
"\n",
"from langchain.chains.elasticsearch_database import ElasticsearchDatabaseChain\n",
"from langchain.chat_models import ChatOpenAI"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "659b5ed0",
"metadata": {},
"outputs": [],
"source": [
"import warnings\n",
"\n",
"warnings.filterwarnings(\"ignore\")"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "5cde03bc",
"metadata": {},
"outputs": [],
"source": [
"# Initialize Elasticsearch python client.\n",
"# See https://elasticsearch-py.readthedocs.io/en/v8.8.2/api.html#elasticsearch.Elasticsearch\n",
"ELASTIC_SEARCH_SERVER = \"https://elastic:gvODoJ_nRYQIJZfG7=ec@localhost:9200\"\n",
"db = Elasticsearch(ELASTIC_SEARCH_SERVER, ca_certs=False, verify_certs=False)"
]
},
{
"cell_type": "markdown",
"id": "74a41374",
"metadata": {},
"source": [
"Uncomment the next cell to initially populate your db."
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "430ada0f",
"metadata": {},
"outputs": [],
"source": [
"# customers = [\n",
"# {\"firstname\": \"Jennifer\", \"lastname\": \"Walters\"},\n",
"# {\"firstname\": \"Monica\",\"lastname\":\"Rambeau\"},\n",
"# {\"firstname\": \"Carol\",\"lastname\":\"Danvers\"},\n",
"# {\"firstname\": \"Wanda\",\"lastname\":\"Maximoff\"},\n",
"# {\"firstname\": \"Jennifer\",\"lastname\":\"Takeda\"},\n",
"# ]\n",
"# for i, customer in enumerate(customers):\n",
"# db.create(index=\"customers\", document=customer, id=i)"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "f36ae0d8",
"metadata": {},
"outputs": [],
"source": [
"llm = ChatOpenAI(model_name=\"gpt-4\", temperature=0)\n",
"chain = ElasticsearchDatabaseChain.from_llm(llm=llm, database=db, verbose=True)"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "b5d22d9d",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new ElasticsearchDatabaseChain chain...\u001b[0m\n",
"What are the first names of all the customers?\n",
"ESQuery:\u001b[32;1m\u001b[1;3m{'size': 10, 'query': {'match_all': {}}, '_source': ['firstname']}\u001b[0m\n",
"ESResult: \u001b[33;1m\u001b[1;3m{'took': 5, 'timed_out': False, '_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0}, 'hits': {'total': {'value': 6, 'relation': 'eq'}, 'max_score': 1.0, 'hits': [{'_index': 'customers', '_id': '0', '_score': 1.0, '_source': {'firstname': 'Jennifer'}}, {'_index': 'customers', '_id': '1', '_score': 1.0, '_source': {'firstname': 'Monica'}}, {'_index': 'customers', '_id': '2', '_score': 1.0, '_source': {'firstname': 'Carol'}}, {'_index': 'customers', '_id': '3', '_score': 1.0, '_source': {'firstname': 'Wanda'}}, {'_index': 'customers', '_id': '4', '_score': 1.0, '_source': {'firstname': 'Jennifer'}}, {'_index': 'customers', '_id': 'firstname', '_score': 1.0, '_source': {'firstname': 'Jennifer'}}]}}\u001b[0m\n",
"Answer:\u001b[32;1m\u001b[1;3mThe first names of all the customers are Jennifer, Monica, Carol, Wanda, and Jennifer.\u001b[0m\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"'The first names of all the customers are Jennifer, Monica, Carol, Wanda, and Jennifer.'"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"question = \"What are the first names of all the customers?\"\n",
"chain.run(question)"
]
},
{
"cell_type": "markdown",
"id": "9b4bfada",
"metadata": {},
"source": [
"## Custom prompt\n",
"\n",
"For best results you'll likely need to customize the prompt."
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "0a494f5b",
"metadata": {},
"outputs": [],
"source": [
"from langchain.chains.elasticsearch_database.prompts import DEFAULT_DSL_TEMPLATE\n",
"from langchain.prompts.prompt import PromptTemplate\n",
"\n",
"PROMPT_TEMPLATE = \"\"\"Given an input question, create a syntactically correct Elasticsearch query to run. Unless the user specifies in their question a specific number of examples they wish to obtain, always limit your query to at most {top_k} results. You can order the results by a relevant column to return the most interesting examples in the database.\n",
"\n",
"Unless told to do not query for all the columns from a specific index, only ask for a the few relevant columns given the question.\n",
"\n",
"Pay attention to use only the column names that you can see in the mapping description. Be careful to not query for columns that do not exist. Also, pay attention to which column is in which index. Return the query as valid json.\n",
"\n",
"Use the following format:\n",
"\n",
"Question: Question here\n",
"ESQuery: Elasticsearch Query formatted as json\n",
"\"\"\"\n",
"\n",
"PROMPT = PromptTemplate.from_template(\n",
" PROMPT_TEMPLATE,\n",
")\n",
"chain = ElasticsearchDatabaseChain.from_llm(llm=llm, database=db, query_prompt=PROMPT)"
]
},
{
"cell_type": "markdown",
"id": "372b8f93",
"metadata": {},
"source": [
"## Adding example rows from each index\n",
"\n",
"Sometimes, the format of the data is not obvious and it is optimal to include a sample of rows from the indices in the prompt to allow the LLM to understand the data before providing a final query. Here we will use this feature to let the LLM know that artists are saved with their full names by providing ten rows from the index."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "eef818de",
"metadata": {},
"outputs": [],
"source": [
"chain = ElasticsearchDatabaseChain.from_llm(\n",
" llm=ChatOpenAI(temperature=0),\n",
" database=db,\n",
" sample_documents_in_index_info=2, # 2 rows from each index will be included in the prompt as sample data\n",
")"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "venv",
"language": "python",
"name": "venv"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.3"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -14,7 +14,7 @@
},
{
"cell_type": "code",
"execution_count": 1,
"execution_count": 2,
"id": "34f04daf",
"metadata": {},
"outputs": [
@@ -35,7 +35,7 @@
},
{
"cell_type": "code",
"execution_count": 2,
"execution_count": 3,
"id": "a2648974",
"metadata": {},
"outputs": [],
@@ -56,15 +56,110 @@
"id": "78ff9df9",
"metadata": {},
"source": [
"To extract entities, we need to create a schema like the following, were we specify all the properties we want to find and the type we expect them to have. We can also specify which of these properties are required and which are optional."
"To extract entities, we need to create a schema where we specify all the properties we want to find and the type we expect them to have. We can also specify which of these properties are required and which are optional."
]
},
{
"cell_type": "code",
"execution_count": 3,
"execution_count": 4,
"id": "4ac43eba",
"metadata": {},
"outputs": [],
"source": [
"schema = {\n",
" \"properties\": {\n",
" \"name\": {\"type\": \"string\"},\n",
" \"height\": {\"type\": \"integer\"},\n",
" \"hair_color\": {\"type\": \"string\"},\n",
" },\n",
" \"required\": [\"name\", \"height\"],\n",
"}"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "640bd005",
"metadata": {},
"outputs": [],
"source": [
"inp = \"\"\"\n",
"Alex is 5 feet tall. Claudia is 1 feet taller Alex and jumps higher than him. Claudia is a brunette and Alex is blonde.\n",
" \"\"\""
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "64313214",
"metadata": {},
"outputs": [],
"source": [
"chain = create_extraction_chain(schema, llm)"
]
},
{
"cell_type": "markdown",
"id": "17c48adb",
"metadata": {},
"source": [
"As we can see, we extracted the required entities and their properties in the required format (it even calculated Claudia's height before returning!)"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "cc5436ed",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[{'name': 'Alex', 'height': 5, 'hair_color': 'blonde'},\n",
" {'name': 'Claudia', 'height': 6, 'hair_color': 'brunette'}]"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chain.run(inp)"
]
},
{
"cell_type": "markdown",
"id": "8d51fcdc",
"metadata": {},
"source": [
"## Several entity types"
]
},
{
"cell_type": "markdown",
"id": "5813affe",
"metadata": {},
"source": [
"Notice that we are using OpenAI functions under the hood and thus the model can only call one function per request (with one, unique schema)"
]
},
{
"cell_type": "markdown",
"id": "511b9838",
"metadata": {},
"source": [
"If we want to extract more than one entity type, we need to introduce a little hack - we will define our properties with an included entity type. \n",
"\n",
"Following we have an example where we also want to extract dog attributes from the passage. Notice the 'person_' and 'dog_' prefixes we use for each property; this tells the model which entity type the property refers to. In this way, the model can return properties from several entity types in one single call."
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "cf243a26",
"metadata": {},
"outputs": [],
"source": [
"schema = {\n",
" \"properties\": {\n",
@@ -103,10 +198,10 @@
},
{
"cell_type": "markdown",
"id": "17c48adb",
"id": "eb074f7b",
"metadata": {},
"source": [
"As we can see, we extracted the required entities and their properties in the required format:"
"People attributes and dog attributes were correctly extracted from the text in the same call"
]
},
{
@@ -128,7 +223,207 @@
" 'person_hair_color': 'brunette'}]"
]
},
"execution_count": 6,
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chain.run(inp)"
]
},
{
"cell_type": "markdown",
"id": "0273e0e2",
"metadata": {},
"source": [
"## Unrelated entities"
]
},
{
"cell_type": "markdown",
"id": "c07b3480",
"metadata": {},
"source": [
"What if our entities are unrelated? In that case, the model will return the unrelated entities in different dictionaries, allowing us to successfully extract several unrelated entity types in the same call."
]
},
{
"cell_type": "markdown",
"id": "01d98af0",
"metadata": {},
"source": [
"Notice that we use `required: []`: we need to allow the model to return **only** person attributes or **only** dog attributes for a single entity (person or dog)"
]
},
{
"cell_type": "code",
"execution_count": 48,
"id": "e584c993",
"metadata": {},
"outputs": [],
"source": [
"schema = {\n",
" \"properties\": {\n",
" \"person_name\": {\"type\": \"string\"},\n",
" \"person_height\": {\"type\": \"integer\"},\n",
" \"person_hair_color\": {\"type\": \"string\"},\n",
" \"dog_name\": {\"type\": \"string\"},\n",
" \"dog_breed\": {\"type\": \"string\"},\n",
" },\n",
" \"required\": [],\n",
"}"
]
},
{
"cell_type": "code",
"execution_count": 49,
"id": "ad6b105f",
"metadata": {},
"outputs": [],
"source": [
"inp = \"\"\"\n",
"Alex is 5 feet tall. Claudia is 1 feet taller Alex and jumps higher than him. Claudia is a brunette and Alex is blonde.\n",
"\n",
"Willow is a German Shepherd that likes to play with other dogs and can always be found playing with Milo, a border collie that lives close by.\n",
"\"\"\""
]
},
{
"cell_type": "code",
"execution_count": 50,
"id": "6bfe5a33",
"metadata": {},
"outputs": [],
"source": [
"chain = create_extraction_chain(schema, llm)"
]
},
{
"cell_type": "markdown",
"id": "24fe09af",
"metadata": {},
"source": [
"We have each entity in its own separate dictionary, with only the appropriate attributes being returned"
]
},
{
"cell_type": "code",
"execution_count": 51,
"id": "f6e1fd89",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[{'person_name': 'Alex', 'person_height': 5, 'person_hair_color': 'blonde'},\n",
" {'person_name': 'Claudia',\n",
" 'person_height': 6,\n",
" 'person_hair_color': 'brunette'},\n",
" {'dog_name': 'Willow', 'dog_breed': 'German Shepherd'},\n",
" {'dog_name': 'Milo', 'dog_breed': 'border collie'}]"
]
},
"execution_count": 51,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chain.run(inp)"
]
},
{
"cell_type": "markdown",
"id": "0ac466d1",
"metadata": {},
"source": [
"## Extra info for an entity"
]
},
{
"cell_type": "markdown",
"id": "d240ffc1",
"metadata": {},
"source": [
"What if.. _we don't know what we want?_ More specifically, say we know a few properties we want to extract for a given entity but we also want to know if there's any extra information in the passage. Fortunately, we don't need to structure everything - we can have unstructured extraction as well. \n",
"\n",
"We can do this by introducing another hack, namely the *extra_info* attribute - let's see an example."
]
},
{
"cell_type": "code",
"execution_count": 68,
"id": "f19685f6",
"metadata": {},
"outputs": [],
"source": [
"schema = {\n",
" \"properties\": {\n",
" \"person_name\": {\"type\": \"string\"},\n",
" \"person_height\": {\"type\": \"integer\"},\n",
" \"person_hair_color\": {\"type\": \"string\"},\n",
" \"dog_name\": {\"type\": \"string\"},\n",
" \"dog_breed\": {\"type\": \"string\"},\n",
" \"dog_extra_info\": {\"type\": \"string\"},\n",
" },\n",
"}"
]
},
{
"cell_type": "code",
"execution_count": 81,
"id": "200c3477",
"metadata": {},
"outputs": [],
"source": [
"inp = \"\"\"\n",
"Alex is 5 feet tall. Claudia is 1 feet taller Alex and jumps higher than him. Claudia is a brunette and Alex is blonde.\n",
"\n",
"Willow is a German Shepherd that likes to play with other dogs and can always be found playing with Milo, a border collie that lives close by.\n",
"\"\"\""
]
},
{
"cell_type": "code",
"execution_count": 82,
"id": "ddad7dc6",
"metadata": {},
"outputs": [],
"source": [
"chain = create_extraction_chain(schema, llm)"
]
},
{
"cell_type": "markdown",
"id": "e5c0dbbc",
"metadata": {},
"source": [
"It is nice to know more about Willow and Milo!"
]
},
{
"cell_type": "code",
"execution_count": 83,
"id": "c22cfd30",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[{'person_name': 'Alex', 'person_height': 5, 'person_hair_color': 'blonde'},\n",
" {'person_name': 'Claudia',\n",
" 'person_height': 6,\n",
" 'person_hair_color': 'brunette'},\n",
" {'dog_name': 'Willow',\n",
" 'dog_breed': 'German Shepherd',\n",
" 'dog_extra_information': 'likes to play with other dogs'},\n",
" {'dog_name': 'Milo',\n",
" 'dog_breed': 'border collie',\n",
" 'dog_extra_information': 'lives close by'}]"
]
},
"execution_count": 83,
"metadata": {},
"output_type": "execute_result"
}

View File

@@ -72,7 +72,10 @@
"import numpy as np\n",
"\n",
"from langchain.schema import BaseRetriever\n",
"from langchain.callbacks.manager import AsyncCallbackManagerForRetrieverRun, CallbackManagerForRetrieverRun\n",
"from langchain.callbacks.manager import (\n",
" AsyncCallbackManagerForRetrieverRun,\n",
" CallbackManagerForRetrieverRun,\n",
")\n",
"from langchain.utilities import GoogleSerperAPIWrapper\n",
"from langchain.embeddings import OpenAIEmbeddings\n",
"from langchain.chat_models import ChatOpenAI\n",
@@ -97,13 +100,15 @@
"outputs": [],
"source": [
"class SerperSearchRetriever(BaseRetriever):\n",
"\n",
" search: GoogleSerperAPIWrapper = None\n",
"\n",
" def _get_relevant_documents(self, query: str, *, run_manager: CallbackManagerForRetrieverRun, **kwargs: Any) -> List[Document]:\n",
" def _get_relevant_documents(\n",
" self, query: str, *, run_manager: CallbackManagerForRetrieverRun, **kwargs: Any\n",
" ) -> List[Document]:\n",
" return [Document(page_content=self.search.run(query))]\n",
"\n",
" async def _aget_relevant_documents(self,\n",
" async def _aget_relevant_documents(\n",
" self,\n",
" query: str,\n",
" *,\n",
" run_manager: AsyncCallbackManagerForRetrieverRun,\n",

View File

@@ -83,9 +83,15 @@
"schema = client.schema()\n",
"schema.propertyKey(\"name\").asText().ifNotExist().create()\n",
"schema.propertyKey(\"birthDate\").asText().ifNotExist().create()\n",
"schema.vertexLabel(\"Person\").properties(\"name\", \"birthDate\").usePrimaryKeyId().primaryKeys(\"name\").ifNotExist().create()\n",
"schema.vertexLabel(\"Movie\").properties(\"name\").usePrimaryKeyId().primaryKeys(\"name\").ifNotExist().create()\n",
"schema.edgeLabel(\"ActedIn\").sourceLabel(\"Person\").targetLabel(\"Movie\").ifNotExist().create()"
"schema.vertexLabel(\"Person\").properties(\n",
" \"name\", \"birthDate\"\n",
").usePrimaryKeyId().primaryKeys(\"name\").ifNotExist().create()\n",
"schema.vertexLabel(\"Movie\").properties(\"name\").usePrimaryKeyId().primaryKeys(\n",
" \"name\"\n",
").ifNotExist().create()\n",
"schema.edgeLabel(\"ActedIn\").sourceLabel(\"Person\").targetLabel(\n",
" \"Movie\"\n",
").ifNotExist().create()"
]
},
{
@@ -124,7 +130,9 @@
"\n",
"g.addEdge(\"ActedIn\", \"1:Al Pacino\", \"2:The Godfather\", {})\n",
"g.addEdge(\"ActedIn\", \"1:Al Pacino\", \"2:The Godfather Part II\", {})\n",
"g.addEdge(\"ActedIn\", \"1:Al Pacino\", \"2:The Godfather Coda The Death of Michael Corleone\", {})\n",
"g.addEdge(\n",
" \"ActedIn\", \"1:Al Pacino\", \"2:The Godfather Coda The Death of Michael Corleone\", {}\n",
")\n",
"g.addEdge(\"ActedIn\", \"1:Robert De Niro\", \"2:The Godfather Part II\", {})"
]
},
@@ -164,7 +172,7 @@
" password=\"admin\",\n",
" address=\"localhost\",\n",
" port=8080,\n",
" graph=\"hugegraph\"\n",
" graph=\"hugegraph\",\n",
")"
]
},
@@ -228,9 +236,7 @@
"metadata": {},
"outputs": [],
"source": [
"chain = HugeGraphQAChain.from_llm(\n",
" ChatOpenAI(temperature=0), graph=graph, verbose=True\n",
")"
"chain = HugeGraphQAChain.from_llm(ChatOpenAI(temperature=0), graph=graph, verbose=True)"
]
},
{

View File

@@ -31,6 +31,7 @@
"outputs": [],
"source": [
"import kuzu\n",
"\n",
"db = kuzu.Database(\"test_db\")\n",
"conn = kuzu.Connection(db)"
]
@@ -61,7 +62,9 @@
],
"source": [
"conn.execute(\"CREATE NODE TABLE Movie (name STRING, PRIMARY KEY(name))\")\n",
"conn.execute(\"CREATE NODE TABLE Person (name STRING, birthDate STRING, PRIMARY KEY(name))\")\n",
"conn.execute(\n",
" \"CREATE NODE TABLE Person (name STRING, birthDate STRING, PRIMARY KEY(name))\"\n",
")\n",
"conn.execute(\"CREATE REL TABLE ActedIn (FROM Person TO Movie)\")"
]
},
@@ -94,11 +97,21 @@
"conn.execute(\"CREATE (:Person {name: 'Robert De Niro', birthDate: '1943-08-17'})\")\n",
"conn.execute(\"CREATE (:Movie {name: 'The Godfather'})\")\n",
"conn.execute(\"CREATE (:Movie {name: 'The Godfather: Part II'})\")\n",
"conn.execute(\"CREATE (:Movie {name: 'The Godfather Coda: The Death of Michael Corleone'})\")\n",
"conn.execute(\"MATCH (p:Person), (m:Movie) WHERE p.name = 'Al Pacino' AND m.name = 'The Godfather' CREATE (p)-[:ActedIn]->(m)\")\n",
"conn.execute(\"MATCH (p:Person), (m:Movie) WHERE p.name = 'Al Pacino' AND m.name = 'The Godfather: Part II' CREATE (p)-[:ActedIn]->(m)\")\n",
"conn.execute(\"MATCH (p:Person), (m:Movie) WHERE p.name = 'Al Pacino' AND m.name = 'The Godfather Coda: The Death of Michael Corleone' CREATE (p)-[:ActedIn]->(m)\")\n",
"conn.execute(\"MATCH (p:Person), (m:Movie) WHERE p.name = 'Robert De Niro' AND m.name = 'The Godfather: Part II' CREATE (p)-[:ActedIn]->(m)\")"
"conn.execute(\n",
" \"CREATE (:Movie {name: 'The Godfather Coda: The Death of Michael Corleone'})\"\n",
")\n",
"conn.execute(\n",
" \"MATCH (p:Person), (m:Movie) WHERE p.name = 'Al Pacino' AND m.name = 'The Godfather' CREATE (p)-[:ActedIn]->(m)\"\n",
")\n",
"conn.execute(\n",
" \"MATCH (p:Person), (m:Movie) WHERE p.name = 'Al Pacino' AND m.name = 'The Godfather: Part II' CREATE (p)-[:ActedIn]->(m)\"\n",
")\n",
"conn.execute(\n",
" \"MATCH (p:Person), (m:Movie) WHERE p.name = 'Al Pacino' AND m.name = 'The Godfather Coda: The Death of Michael Corleone' CREATE (p)-[:ActedIn]->(m)\"\n",
")\n",
"conn.execute(\n",
" \"MATCH (p:Person), (m:Movie) WHERE p.name = 'Robert De Niro' AND m.name = 'The Godfather: Part II' CREATE (p)-[:ActedIn]->(m)\"\n",
")"
]
},
{
@@ -137,9 +150,7 @@
"metadata": {},
"outputs": [],
"source": [
"chain = KuzuQAChain.from_llm(\n",
" ChatOpenAI(temperature=0), graph=graph, verbose=True\n",
")"
"chain = KuzuQAChain.from_llm(ChatOpenAI(temperature=0), graph=graph, verbose=True)"
]
},
{

View File

@@ -60,7 +60,8 @@
],
"metadata": {
"collapsed": false
}
},
"id": "7af596b5"
},
{
"cell_type": "markdown",
@@ -150,20 +151,20 @@
"text": [
"\n",
"\n",
"\u001B[1m> Entering new GraphSparqlQAChain chain...\u001B[0m\n",
"\u001b[1m> Entering new GraphSparqlQAChain chain...\u001b[0m\n",
"Identified intent:\n",
"\u001B[32;1m\u001B[1;3mSELECT\u001B[0m\n",
"\u001b[32;1m\u001b[1;3mSELECT\u001b[0m\n",
"Generated SPARQL:\n",
"\u001B[32;1m\u001B[1;3mPREFIX foaf: <http://xmlns.com/foaf/0.1/>\n",
"\u001b[32;1m\u001b[1;3mPREFIX foaf: <http://xmlns.com/foaf/0.1/>\n",
"SELECT ?homepage\n",
"WHERE {\n",
" ?person foaf:name \"Tim Berners-Lee\" .\n",
" ?person foaf:workplaceHomepage ?homepage .\n",
"}\u001B[0m\n",
"}\u001b[0m\n",
"Full Context:\n",
"\u001B[32;1m\u001B[1;3m[]\u001B[0m\n",
"\u001b[32;1m\u001b[1;3m[]\u001b[0m\n",
"\n",
"\u001B[1m> Finished chain.\u001B[0m\n"
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
@@ -207,19 +208,19 @@
"text": [
"\n",
"\n",
"\u001B[1m> Entering new GraphSparqlQAChain chain...\u001B[0m\n",
"\u001b[1m> Entering new GraphSparqlQAChain chain...\u001b[0m\n",
"Identified intent:\n",
"\u001B[32;1m\u001B[1;3mUPDATE\u001B[0m\n",
"\u001b[32;1m\u001b[1;3mUPDATE\u001b[0m\n",
"Generated SPARQL:\n",
"\u001B[32;1m\u001B[1;3mPREFIX foaf: <http://xmlns.com/foaf/0.1/>\n",
"\u001b[32;1m\u001b[1;3mPREFIX foaf: <http://xmlns.com/foaf/0.1/>\n",
"INSERT {\n",
" ?person foaf:workplaceHomepage <http://www.w3.org/foo/bar/> .\n",
"}\n",
"WHERE {\n",
" ?person foaf:name \"Timothy Berners-Lee\" .\n",
"}\u001B[0m\n",
"}\u001b[0m\n",
"\n",
"\u001B[1m> Finished chain.\u001B[0m\n"
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
@@ -234,7 +235,9 @@
}
],
"source": [
"chain.run(\"Save that the person with the name 'Timothy Berners-Lee' has a work homepage at 'http://www.w3.org/foo/bar/'\")"
"chain.run(\n",
" \"Save that the person with the name 'Timothy Berners-Lee' has a work homepage at 'http://www.w3.org/foo/bar/'\"\n",
")"
]
},
{
@@ -297,4 +300,4 @@
},
"nbformat": 4,
"nbformat_minor": 5
}
}

View File

@@ -0,0 +1,160 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# LLM Symbolic Math \n",
"This notebook showcases using LLMs and Python to Solve Algebraic Equations."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Calculating the limit of an equation"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new LLMSymbolicMathChain chain...\u001b[0m\n",
"What is the limit of sin(x) / x as x goes to 0?\u001b[32;1m\u001b[1;3mAnswer: 1\u001b[0m\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"'Answer: 1'"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from langchain.llms import OpenAI\n",
"from langchain.chains.llm_symbolic_math.base import LLMSymbolicMathChain\n",
"\n",
"llm = OpenAI(temperature=0)\n",
"llm_symbolic_math = LLMSymbolicMathChain.from_llm(llm, verbose=True)\n",
"\n",
"llm_symbolic_math.run(\"What is the limit of sin(x) / x as x goes to 0?\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Calculating an integral"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new LLMSymbolicMathChain chain...\u001b[0m\n",
"What is the integral of e^-x from 0 to infinity?\u001b[32;1m\u001b[1;3mAnswer: 1\u001b[0m\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"'Answer: 1'"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"llm_symbolic_math.run(\"What is the integral of e^-x from 0 to infinity?\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Calculating an algebraic equation"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new LLMSymbolicMathChain chain...\u001b[0m\n",
"What are the solutions to this equation x**2 - x?\u001b[32;1m\u001b[1;3mAnswer: 0 and 1.\u001b[0m\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"'Answer: 0 and 1.'"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"llm_symbolic_math.run(\"What are the solutions to this equation x**2 - x?\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.4"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -38,7 +38,7 @@
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
"texts = text_splitter.split_documents(documents)\n",
"for i, text in enumerate(texts):\n",
" text.metadata['source'] = f\"{i}-pl\"\n",
" text.metadata[\"source\"] = f\"{i}-pl\"\n",
"embeddings = OpenAIEmbeddings()\n",
"docsearch = Chroma.from_documents(texts, embeddings)"
]
@@ -97,8 +97,8 @@
"outputs": [],
"source": [
"final_qa_chain = StuffDocumentsChain(\n",
" llm_chain=qa_chain, \n",
" document_variable_name='context',\n",
" llm_chain=qa_chain,\n",
" document_variable_name=\"context\",\n",
" document_prompt=doc_prompt,\n",
")"
]
@@ -111,8 +111,7 @@
"outputs": [],
"source": [
"retrieval_qa = RetrievalQA(\n",
" retriever=docsearch.as_retriever(),\n",
" combine_documents_chain=final_qa_chain\n",
" retriever=docsearch.as_retriever(), combine_documents_chain=final_qa_chain\n",
")"
]
},
@@ -175,8 +174,8 @@
"outputs": [],
"source": [
"final_qa_chain_pydantic = StuffDocumentsChain(\n",
" llm_chain=qa_chain_pydantic, \n",
" document_variable_name='context',\n",
" llm_chain=qa_chain_pydantic,\n",
" document_variable_name=\"context\",\n",
" document_prompt=doc_prompt,\n",
")"
]
@@ -189,8 +188,7 @@
"outputs": [],
"source": [
"retrieval_qa_pydantic = RetrievalQA(\n",
" retriever=docsearch.as_retriever(),\n",
" combine_documents_chain=final_qa_chain_pydantic\n",
" retriever=docsearch.as_retriever(), combine_documents_chain=final_qa_chain_pydantic\n",
")"
]
},
@@ -235,6 +233,7 @@
"from langchain.chains import ConversationalRetrievalChain\n",
"from langchain.memory import ConversationBufferMemory\n",
"from langchain.chains import LLMChain\n",
"\n",
"memory = ConversationBufferMemory(memory_key=\"chat_history\", return_messages=True)\n",
"_template = \"\"\"Given the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.\\\n",
"Make sure to avoid using any unclear pronouns.\n",
@@ -258,10 +257,10 @@
"outputs": [],
"source": [
"qa = ConversationalRetrievalChain(\n",
" question_generator=condense_question_chain, \n",
" question_generator=condense_question_chain,\n",
" retriever=docsearch.as_retriever(),\n",
" memory=memory, \n",
" combine_docs_chain=final_qa_chain\n",
" memory=memory,\n",
" combine_docs_chain=final_qa_chain,\n",
")"
]
},
@@ -389,7 +388,9 @@
" \"\"\"An answer to the question being asked, with sources.\"\"\"\n",
"\n",
" answer: str = Field(..., description=\"Answer to the question that was asked\")\n",
" countries_referenced: List[str] = Field(..., description=\"All of the countries mentioned in the sources\")\n",
" countries_referenced: List[str] = Field(\n",
" ..., description=\"All of the countries mentioned in the sources\"\n",
" )\n",
" sources: List[str] = Field(\n",
" ..., description=\"List of sources used to answer the question\"\n",
" )\n",
@@ -405,20 +406,23 @@
" HumanMessage(content=\"Answer question using the following context\"),\n",
" HumanMessagePromptTemplate.from_template(\"{context}\"),\n",
" HumanMessagePromptTemplate.from_template(\"Question: {question}\"),\n",
" HumanMessage(content=\"Tips: Make sure to answer in the correct format. Return all of the countries mentioned in the sources in uppercase characters.\"),\n",
" HumanMessage(\n",
" content=\"Tips: Make sure to answer in the correct format. Return all of the countries mentioned in the sources in uppercase characters.\"\n",
" ),\n",
"]\n",
"\n",
"chain_prompt = ChatPromptTemplate(messages=prompt_messages)\n",
"\n",
"qa_chain_pydantic = create_qa_with_structure_chain(llm, CustomResponseSchema, output_parser=\"pydantic\", prompt=chain_prompt)\n",
"qa_chain_pydantic = create_qa_with_structure_chain(\n",
" llm, CustomResponseSchema, output_parser=\"pydantic\", prompt=chain_prompt\n",
")\n",
"final_qa_chain_pydantic = StuffDocumentsChain(\n",
" llm_chain=qa_chain_pydantic,\n",
" document_variable_name='context',\n",
" document_variable_name=\"context\",\n",
" document_prompt=doc_prompt,\n",
")\n",
"retrieval_qa_pydantic = RetrievalQA(\n",
" retriever=docsearch.as_retriever(),\n",
" combine_documents_chain=final_qa_chain_pydantic\n",
" retriever=docsearch.as_retriever(), combine_documents_chain=final_qa_chain_pydantic\n",
")\n",
"query = \"What did he say about russia\"\n",
"retrieval_qa_pydantic.run(query)"

View File

@@ -35,7 +35,9 @@
"metadata": {},
"outputs": [],
"source": [
"chain = get_openapi_chain(\"https://www.klarna.com/us/shopping/public/openai/v0/api-docs/\")"
"chain = get_openapi_chain(\n",
" \"https://www.klarna.com/us/shopping/public/openai/v0/api-docs/\"\n",
")"
]
},
{
@@ -186,7 +188,9 @@
},
"outputs": [],
"source": [
"chain = get_openapi_chain(\"https://gist.githubusercontent.com/roaldnefs/053e505b2b7a807290908fe9aa3e1f00/raw/0a212622ebfef501163f91e23803552411ed00e4/openapi.yaml\")"
"chain = get_openapi_chain(\n",
" \"https://gist.githubusercontent.com/roaldnefs/053e505b2b7a807290908fe9aa3e1f00/raw/0a212622ebfef501163f91e23803552411ed00e4/openapi.yaml\"\n",
")"
]
},
{

View File

@@ -28,7 +28,7 @@
"\n",
"from pydantic import Extra\n",
"\n",
"from langchain.base_language import BaseLanguageModel\n",
"from langchain.schemea import BaseLanguageModel\n",
"from langchain.callbacks.manager import (\n",
" AsyncCallbackManagerForChainRun,\n",
" CallbackManagerForChainRun,\n",

View File

@@ -22,7 +22,8 @@
"from typing import Optional\n",
"\n",
"from langchain.chains.openai_functions import (\n",
" create_openai_fn_chain, create_structured_output_chain\n",
" create_openai_fn_chain,\n",
" create_structured_output_chain,\n",
")\n",
"from langchain.chat_models import ChatOpenAI\n",
"from langchain.prompts import ChatPromptTemplate, HumanMessagePromptTemplate\n",
@@ -58,8 +59,10 @@
"source": [
"from pydantic import BaseModel, Field\n",
"\n",
"\n",
"class Person(BaseModel):\n",
" \"\"\"Identifying information about a person.\"\"\"\n",
"\n",
" name: str = Field(..., description=\"The person's name\")\n",
" age: int = Field(..., description=\"The person's age\")\n",
" fav_food: Optional[str] = Field(None, description=\"The person's favorite food\")"
@@ -103,13 +106,15 @@
"llm = ChatOpenAI(model=\"gpt-3.5-turbo-0613\", temperature=0)\n",
"\n",
"prompt_msgs = [\n",
" SystemMessage(\n",
" content=\"You are a world class algorithm for extracting information in structured formats.\"\n",
" ),\n",
" HumanMessage(content=\"Use the given format to extract information from the following input:\"),\n",
" HumanMessagePromptTemplate.from_template(\"{input}\"),\n",
" HumanMessage(content=\"Tips: Make sure to answer in the correct format\"),\n",
" ]\n",
" SystemMessage(\n",
" content=\"You are a world class algorithm for extracting information in structured formats.\"\n",
" ),\n",
" HumanMessage(\n",
" content=\"Use the given format to extract information from the following input:\"\n",
" ),\n",
" HumanMessagePromptTemplate.from_template(\"{input}\"),\n",
" HumanMessage(content=\"Tips: Make sure to answer in the correct format\"),\n",
"]\n",
"prompt = ChatPromptTemplate(messages=prompt_msgs)\n",
"\n",
"chain = create_structured_output_chain(Person, llm, prompt, verbose=True)\n",
@@ -162,12 +167,17 @@
"source": [
"from typing import Sequence\n",
"\n",
"\n",
"class People(BaseModel):\n",
" \"\"\"Identifying information about all people in a text.\"\"\"\n",
"\n",
" people: Sequence[Person] = Field(..., description=\"The people in the text\")\n",
" \n",
"\n",
"\n",
"chain = create_structured_output_chain(People, llm, prompt, verbose=True)\n",
"chain.run(\"Sally is 13, Joey just turned 12 and loves spinach. Caroline is 10 years older than Sally, so she's 23.\")"
"chain.run(\n",
" \"Sally is 13, Joey just turned 12 and loves spinach. Caroline is 10 years older than Sally, so she's 23.\"\n",
")"
]
},
{
@@ -192,27 +202,16 @@
" \"description\": \"Identifying information about a person.\",\n",
" \"type\": \"object\",\n",
" \"properties\": {\n",
" \"name\": {\n",
" \"title\": \"Name\",\n",
" \"description\": \"The person's name\",\n",
" \"type\": \"string\"\n",
" },\n",
" \"age\": {\n",
" \"title\": \"Age\",\n",
" \"description\": \"The person's age\",\n",
" \"type\": \"integer\"\n",
" },\n",
" \"fav_food\": {\n",
" \"title\": \"Fav Food\",\n",
" \"description\": \"The person's favorite food\",\n",
" \"type\": \"string\"\n",
" }\n",
" \"name\": {\"title\": \"Name\", \"description\": \"The person's name\", \"type\": \"string\"},\n",
" \"age\": {\"title\": \"Age\", \"description\": \"The person's age\", \"type\": \"integer\"},\n",
" \"fav_food\": {\n",
" \"title\": \"Fav Food\",\n",
" \"description\": \"The person's favorite food\",\n",
" \"type\": \"string\",\n",
" },\n",
" },\n",
" \"required\": [\n",
" \"name\",\n",
" \"age\"\n",
" ]\n",
"}\n"
" \"required\": [\"name\", \"age\"],\n",
"}"
]
},
{
@@ -286,13 +285,15 @@
"source": [
"class RecordPerson(BaseModel):\n",
" \"\"\"Record some identifying information about a pe.\"\"\"\n",
"\n",
" name: str = Field(..., description=\"The person's name\")\n",
" age: int = Field(..., description=\"The person's age\")\n",
" fav_food: Optional[str] = Field(None, description=\"The person's favorite food\")\n",
"\n",
" \n",
"\n",
"class RecordDog(BaseModel):\n",
" \"\"\"Record some identifying information about a dog.\"\"\"\n",
"\n",
" name: str = Field(..., description=\"The dog's name\")\n",
" color: str = Field(..., description=\"The dog's color\")\n",
" fav_food: Optional[str] = Field(None, description=\"The dog's favorite food\")"
@@ -333,10 +334,10 @@
],
"source": [
"prompt_msgs = [\n",
" SystemMessage(\n",
" content=\"You are a world class algorithm for recording entities\"\n",
" SystemMessage(content=\"You are a world class algorithm for recording entities\"),\n",
" HumanMessage(\n",
" content=\"Make calls to the relevant function to record the entities in the following input:\"\n",
" ),\n",
" HumanMessage(content=\"Make calls to the relevant function to record the entities in the following input:\"),\n",
" HumanMessagePromptTemplate.from_template(\"{input}\"),\n",
" HumanMessage(content=\"Tips: Make sure to answer in the correct format\"),\n",
"]\n",
@@ -393,11 +394,16 @@
"source": [
"class OptionalFavFood(BaseModel):\n",
" \"\"\"Either a food or null.\"\"\"\n",
" food: Optional[str] = Field(None, description=\"Either the name of a food or null. Should be null if the food isn't known.\")\n",
"\n",
" food: Optional[str] = Field(\n",
" None,\n",
" description=\"Either the name of a food or null. Should be null if the food isn't known.\",\n",
" )\n",
"\n",
"\n",
"def record_person(name: str, age: int, fav_food: OptionalFavFood) -> str:\n",
" \"\"\"Record some basic identifying information about a person.\n",
" \n",
"\n",
" Args:\n",
" name: The person's name.\n",
" age: The person's age in years.\n",
@@ -405,9 +411,11 @@
" \"\"\"\n",
" return f\"Recording person {name} of age {age} with favorite food {fav_food.food}!\"\n",
"\n",
" \n",
"\n",
"chain = create_openai_fn_chain([record_person], llm, prompt, verbose=True)\n",
"chain.run(\"The most important thing to remember about Tommy, my 12 year old, is that he'll do anything for apple pie.\")"
"chain.run(\n",
" \"The most important thing to remember about Tommy, my 12 year old, is that he'll do anything for apple pie.\"\n",
")"
]
},
{
@@ -458,7 +466,7 @@
"source": [
"def record_dog(name: str, color: str, fav_food: OptionalFavFood) -> str:\n",
" \"\"\"Record some basic identifying information about a dog.\n",
" \n",
"\n",
" Args:\n",
" name: The dog's name.\n",
" color: The dog's color.\n",
@@ -468,7 +476,9 @@
"\n",
"\n",
"chain = create_openai_fn_chain([record_person, record_dog], llm, prompt, verbose=True)\n",
"chain.run(\"I can't find my dog Henry anywhere, he's a small brown beagle. Could you send a message about him?\")"
"chain.run(\n",
" \"I can't find my dog Henry anywhere, he's a small brown beagle. Could you send a message about him?\"\n",
")"
]
},
{

View File

@@ -77,7 +77,9 @@
}
],
"source": [
"loader = BraveSearchLoader(query=\"obama middle name\", api_key=api_key, search_kwargs={\"count\": 3})\n",
"loader = BraveSearchLoader(\n",
" query=\"obama middle name\", api_key=api_key, search_kwargs={\"count\": 3}\n",
")\n",
"docs = loader.load()\n",
"len(docs)"
]

View File

@@ -0,0 +1,81 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# Browserless"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import BrowserlessLoader"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"BROWSERLESS_API_TOKEN = \"YOUR_API_TOKEN\""
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<!DOCTYPE html><html class=\"client-js vector-feature-language-in-header-enabled vector-feature-language-in-main-page-header-disabled vector-feature-sticky-header-disabled vector-feature-page-tools-pinned-disabled vector-feature-toc-pinned-enabled vector-feature-main-menu-pinned-disabled vector-feature-limited-width-enabled vector-feature-limited-width-content-enabled vector-feature-zebra-design-disabled\" lang=\"en\" dir=\"ltr\"><head>\n",
"<meta charset=\"UTF-8\">\n",
"<title>Document classification - Wikipedia</title>\n",
"<script>document.documentElement.className=\"client-js vector-feature-language-in-header-enabled vector-feature-language-in-main-page-header-disabled vector-feature-sticky-header-disabled vector-feature-page-tools-pinned-disabled vector-feature-toc-pinned-enabled vector-feature-main-menu-pinned-disabled vector-feature-limited-width-enabled vector-feature-limited-width-content-enabled vector-feature-zebra-design-disabled\";(function(){var cookie=document.cookie.match(/(?:^|; )enwikimwclien\n"
]
}
],
"source": [
"loader = BrowserlessLoader(\n",
" api_token=BROWSERLESS_API_TOKEN,\n",
" urls=[\n",
" \"https://en.wikipedia.org/wiki/Document_classification\",\n",
" ],\n",
")\n",
"\n",
"documents = loader.load()\n",
"\n",
"print(documents[0].page_content[:1000])"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "venv",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.1"
},
"orig_nbformat": 4
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -0,0 +1,96 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# Datadog Logs\n",
"\n",
">[Datadog](https://www.datadoghq.com/) is a monitoring and analytics platform for cloud-scale applications.\n",
"\n",
"This loader fetches the logs from your applications in Datadog using the `datadog_api_client` Python package. You must initialize the loader with your `Datadog API key` and `APP key`, and you need to pass in the query to extract the desired logs."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import DatadogLogsLoader"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#!pip install datadog-api-client"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"query = \"service:agent status:error\"\n",
"\n",
"loader = DatadogLogsLoader(\n",
" query=query,\n",
" api_key=DD_API_KEY,\n",
" app_key=DD_APP_KEY,\n",
" from_time=1688732708951, # Optional, timestamp in milliseconds\n",
" to_time=1688736308951, # Optional, timestamp in milliseconds\n",
" limit=100, # Optional, default is 100\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[Document(page_content='message: grep: /etc/datadog-agent/system-probe.yaml: No such file or directory', metadata={'id': 'AgAAAYkwpLImvkjRpQAAAAAAAAAYAAAAAEFZa3dwTUFsQUFEWmZfLU5QdElnM3dBWQAAACQAAAAAMDE4OTMwYTQtYzk3OS00MmJjLTlhNDAtOTY4N2EwY2I5ZDdk', 'status': 'error', 'service': 'agent', 'tags': ['accessible-from-goog-gke-node', 'allow-external-ingress-high-ports', 'allow-external-ingress-http', 'allow-external-ingress-https', 'container_id:c7d8ecd27b5b3cfdf3b0df04b8965af6f233f56b7c3c2ffabfab5e3b6ccbd6a5', 'container_name:lab_datadog_1', 'datadog.pipelines:false', 'datadog.submission_auth:private_api_key', 'docker_image:datadog/agent:7.41.1', 'env:dd101-dev', 'hostname:lab-host', 'image_name:datadog/agent', 'image_tag:7.41.1', 'instance-id:7497601202021312403', 'instance-type:custom-1-4096', 'instruqt_aws_accounts:', 'instruqt_azure_subscriptions:', 'instruqt_gcp_projects:', 'internal-hostname:lab-host.d4rjybavkary.svc.cluster.local', 'numeric_project_id:3390740675', 'p-d4rjybavkary', 'project:instruqt-prod', 'service:agent', 'short_image:agent', 'source:agent', 'zone:europe-west1-b'], 'timestamp': datetime.datetime(2023, 7, 7, 13, 57, 27, 206000, tzinfo=tzutc())}),\n",
" Document(page_content='message: grep: /etc/datadog-agent/system-probe.yaml: No such file or directory', metadata={'id': 'AgAAAYkwpLImvkjRpgAAAAAAAAAYAAAAAEFZa3dwTUFsQUFEWmZfLU5QdElnM3dBWgAAACQAAAAAMDE4OTMwYTQtYzk3OS00MmJjLTlhNDAtOTY4N2EwY2I5ZDdk', 'status': 'error', 'service': 'agent', 'tags': ['accessible-from-goog-gke-node', 'allow-external-ingress-high-ports', 'allow-external-ingress-http', 'allow-external-ingress-https', 'container_id:c7d8ecd27b5b3cfdf3b0df04b8965af6f233f56b7c3c2ffabfab5e3b6ccbd6a5', 'container_name:lab_datadog_1', 'datadog.pipelines:false', 'datadog.submission_auth:private_api_key', 'docker_image:datadog/agent:7.41.1', 'env:dd101-dev', 'hostname:lab-host', 'image_name:datadog/agent', 'image_tag:7.41.1', 'instance-id:7497601202021312403', 'instance-type:custom-1-4096', 'instruqt_aws_accounts:', 'instruqt_azure_subscriptions:', 'instruqt_gcp_projects:', 'internal-hostname:lab-host.d4rjybavkary.svc.cluster.local', 'numeric_project_id:3390740675', 'p-d4rjybavkary', 'project:instruqt-prod', 'service:agent', 'short_image:agent', 'source:agent', 'zone:europe-west1-b'], 'timestamp': datetime.datetime(2023, 7, 7, 13, 57, 27, 206000, tzinfo=tzutc())})]"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"documents = loader.load()\n",
"documents"
]
}
],
"metadata": {
"kernelspec": {
"display_name": ".venv",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.11"
},
"orig_nbformat": 4
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -0,0 +1,5 @@
Stanley Cups
Team Location Stanley Cups
Blues STL 1
Flyers PHI 2
Maple Leafs TOR 13
1 Stanley Cups
2 Team Location Stanley Cups
3 Blues STL 1
4 Flyers PHI 2
5 Maple Leafs TOR 13

View File

@@ -1840,7 +1840,7 @@ This category contains articles that are incomplete and are tagged with the {{T|
<username>FANDOM</username>
<id>32769624</id>
</contributor>
<comment>Created page with "{{LicenseBox|text=''This work is licensed under the [https://opensource.org/licenses/MIT MIT License].''}}{{#ifeq: {{NAMESPACENUMBER}} | 0 | &lt;includeonly&gt;Category:MIT licens..."</comment>
<comment>Created page with "{{LicenseBox|text=''This work is licensed under the [https://opensource.org/licenses/MIT MIT License].''}}{{#ifeq: {{NAMESPACENUMBER}} | 0 | &lt;includeonly&gt;Category:MIT license..."</comment>
<origin>104</origin>
<model>wikitext</model>
<format>text/x-wiki</format>

View File

@@ -126,11 +126,11 @@
"metadata": {},
"outputs": [],
"source": [
"file_id=\"1x9WBtFPWMEAdjcJzPScRsjpjQvpSo_kz\"\n",
"file_id = \"1x9WBtFPWMEAdjcJzPScRsjpjQvpSo_kz\"\n",
"loader = GoogleDriveLoader(\n",
" file_ids=[file_id],\n",
" file_loader_cls=UnstructuredFileIOLoader,\n",
" file_loader_kwargs={\"mode\": \"elements\"}\n",
" file_loader_kwargs={\"mode\": \"elements\"},\n",
")"
]
},
@@ -180,11 +180,11 @@
"metadata": {},
"outputs": [],
"source": [
"folder_id=\"1asMOHY1BqBS84JcRbOag5LOJac74gpmD\"\n",
"folder_id = \"1asMOHY1BqBS84JcRbOag5LOJac74gpmD\"\n",
"loader = GoogleDriveLoader(\n",
" folder_id=folder_id,\n",
" file_loader_cls=UnstructuredFileIOLoader,\n",
" file_loader_kwargs={\"mode\": \"elements\"}\n",
" file_loader_kwargs={\"mode\": \"elements\"},\n",
")"
]
},

View File

@@ -101,7 +101,7 @@
" \"../Papers/\",\n",
" glob=\"*\",\n",
" suffixes=[\".pdf\"],\n",
" parser= GrobidParser(segment_sentences=False)\n",
" parser=GrobidParser(segment_sentences=False),\n",
")\n",
"docs = loader.load()"
]

View File

@@ -18,7 +18,10 @@
"outputs": [],
"source": [
"from langchain.document_loaders import WebBaseLoader\n",
"loader_web = WebBaseLoader(\"https://github.com/basecamp/handbook/blob/master/37signals-is-you.md\")"
"\n",
"loader_web = WebBaseLoader(\n",
" \"https://github.com/basecamp/handbook/blob/master/37signals-is-you.md\"\n",
")"
]
},
{
@@ -29,6 +32,7 @@
"outputs": [],
"source": [
"from langchain.document_loaders import PyPDFLoader\n",
"\n",
"loader_pdf = PyPDFLoader(\"../MachineLearning-Lecture01.pdf\")"
]
},
@@ -40,7 +44,8 @@
"outputs": [],
"source": [
"from langchain.document_loaders.merge import MergedDataLoader\n",
"loader_all=MergedDataLoader(loaders=[loader_web,loader_pdf])"
"\n",
"loader_all = MergedDataLoader(loaders=[loader_web, loader_pdf])"
]
},
{
@@ -50,7 +55,7 @@
"metadata": {},
"outputs": [],
"source": [
"docs_all=loader_all.load()"
"docs_all = loader_all.load()"
]
},
{

View File

@@ -36,7 +36,9 @@
],
"source": [
"# Create a new loader object for the MHTML file\n",
"loader = MHTMLLoader(file_path='../../../../../../tests/integration_tests/examples/example.mht')\n",
"loader = MHTMLLoader(\n",
" file_path=\"../../../../../../tests/integration_tests/examples/example.mht\"\n",
")\n",
"\n",
"# Load the document from the file\n",
"documents = loader.load()\n",

View File

@@ -53,11 +53,9 @@
"metadata": {},
"outputs": [],
"source": [
"dataset = \"vw6y-z8j6\" # 311 data\n",
"dataset = \"tmnf-yvry\" # crime data\n",
"loader = OpenCityDataLoader(city_id=\"data.sfgov.org\",\n",
" dataset_id=dataset,\n",
" limit=2000)"
"dataset = \"vw6y-z8j6\" # 311 data\n",
"dataset = \"tmnf-yvry\" # crime data\n",
"loader = OpenCityDataLoader(city_id=\"data.sfgov.org\", dataset_id=dataset, limit=2000)"
]
},
{

View File

@@ -33,9 +33,7 @@
"metadata": {},
"outputs": [],
"source": [
"loader = UnstructuredOrgModeLoader(\n",
" file_path=\"example_data/README.org\", mode=\"elements\"\n",
")\n",
"loader = UnstructuredOrgModeLoader(file_path=\"example_data/README.org\", mode=\"elements\")\n",
"docs = loader.load()"
]
},

View File

@@ -239,7 +239,7 @@
}
],
"source": [
"# Use lazy load for larger table, which won't read the full table into memory \n",
"# Use lazy load for larger table, which won't read the full table into memory\n",
"for i in loader.lazy_load():\n",
" print(i)"
]

View File

@@ -1,7 +1,6 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"id": "5a7cc773",
"metadata": {},
@@ -25,7 +24,7 @@
},
{
"cell_type": "code",
"execution_count": 2,
"execution_count": 1,
"id": "2e3532b2",
"metadata": {},
"outputs": [],
@@ -34,7 +33,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "6384c057",
"metadata": {},
@@ -44,19 +42,19 @@
},
{
"cell_type": "code",
"execution_count": 3,
"execution_count": 2,
"id": "d69e5620",
"metadata": {},
"outputs": [],
"source": [
"url = 'https://js.langchain.com/docs/modules/memory/examples/'\n",
"loader=RecursiveUrlLoader(url=url)\n",
"docs=loader.load()"
"url = \"https://js.langchain.com/docs/modules/memory/examples/\"\n",
"loader = RecursiveUrlLoader(url=url)\n",
"docs = loader.load()"
]
},
{
"cell_type": "code",
"execution_count": 4,
"execution_count": 3,
"id": "084fb2ce",
"metadata": {},
"outputs": [
@@ -66,7 +64,7 @@
"12"
]
},
"execution_count": 4,
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
@@ -77,17 +75,17 @@
},
{
"cell_type": "code",
"execution_count": 5,
"execution_count": 4,
"id": "89355b7c",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'\\n\\n\\n\\n\\nDynamoDB-Backed Chat Memory | \\uf8ffü¶úÔ∏è\\uf8ffüîó Lan'"
"'\\n\\n\\n\\n\\nBuffer Window Memory | 🦜️🔗 Langchain\\n\\n\\n\\n\\n\\nSki'"
]
},
"execution_count": 5,
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
@@ -98,20 +96,20 @@
},
{
"cell_type": "code",
"execution_count": 6,
"execution_count": 5,
"id": "13bd7e16",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'source': 'https://js.langchain.com/docs/modules/memory/examples/dynamodb',\n",
" 'title': 'DynamoDB-Backed Chat Memory | \\uf8ffü¶úÔ∏è\\uf8ffüîó Langchain',\n",
" 'description': 'For longer-term persistence across chat sessions, you can swap out the default in-memory chatHistory that backs chat memory classes like BufferMemory for a DynamoDB instance.',\n",
"{'source': 'https://js.langchain.com/docs/modules/memory/examples/buffer_window_memory',\n",
" 'title': 'Buffer Window Memory | 🦜️🔗 Langchain',\n",
" 'description': 'BufferWindowMemory keeps track of the back-and-forths in conversation, and then uses a window of size k to surface the last k back-and-forths to use as memory.',\n",
" 'language': 'en'}"
]
},
"execution_count": 6,
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
@@ -121,14 +119,29 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "40fc13ef",
"metadata": {},
"source": [
"Now, let's try a more extensive example, the `docs` root dir.\n",
"\n",
"We will skip everything under `api`."
"We will skip everything under `api`.\n",
"\n",
"For this, we can `lazy_load` each page as we crawl the tree, using `WebBaseLoader` to load each as we go."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5c938b9f",
"metadata": {},
"outputs": [],
"source": [
"url = \"https://js.langchain.com/docs/\"\n",
"exclude_dirs = [\"https://js.langchain.com/docs/api/\"]\n",
"loader = RecursiveUrlLoader(url=url, exclude_dirs=exclude_dirs)\n",
"# Lazy load each\n",
"docs = [print(doc) or doc for doc in loader.lazy_load()]"
]
},
{
@@ -138,22 +151,22 @@
"metadata": {},
"outputs": [],
"source": [
"url = 'https://js.langchain.com/docs/'\n",
"exclude_dirs=['https://js.langchain.com/docs/api/']\n",
"loader=RecursiveUrlLoader(url=url,exclude_dirs=exclude_dirs)\n",
"docs=loader.load()"
"# Load all pages\n",
"docs = loader.load()"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "457e30f3",
"metadata": {},
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"176"
"188"
]
},
"execution_count": 8,
@@ -174,7 +187,7 @@
{
"data": {
"text/plain": [
"'\\n\\n\\n\\n\\nHacker News | \\uf8ffü¶úÔ∏è\\uf8ffüîó Langchain\\n\\n\\n\\n\\n\\nSkip'"
"'\\n\\n\\n\\n\\nAgent Simulations | 🦜️🔗 Langchain\\n\\n\\n\\n\\n\\nSkip t'"
]
},
"execution_count": 9,
@@ -195,9 +208,9 @@
{
"data": {
"text/plain": [
"{'source': 'https://js.langchain.com/docs/modules/indexes/document_loaders/examples/web_loaders/hn',\n",
" 'title': 'Hacker News | \\uf8ffü¶úÔ∏è\\uf8ffüîó Langchain',\n",
" 'description': 'This example goes over how to load data from the hacker news website, using Cheerio. One document will be created for each page.',\n",
"{'source': 'https://js.langchain.com/docs/use_cases/agent_simulations/',\n",
" 'title': 'Agent Simulations | 🦜️🔗 Langchain',\n",
" 'description': 'Agent simulations involve taking multiple agents and having them interact with each other.',\n",
" 'language': 'en'}"
]
},

View File

@@ -33,9 +33,7 @@
"metadata": {},
"outputs": [],
"source": [
"loader = UnstructuredRSTLoader(\n",
" file_path=\"example_data/README.rst\", mode=\"elements\"\n",
")\n",
"loader = UnstructuredRSTLoader(file_path=\"example_data/README.rst\", mode=\"elements\")\n",
"docs = loader.load()"
]
},

View File

@@ -30,7 +30,8 @@
"outputs": [],
"source": [
"import warnings\n",
"warnings.filterwarnings('ignore')\n",
"\n",
"warnings.filterwarnings(\"ignore\")\n",
"from pprint import pprint\n",
"from langchain.text_splitter import Language\n",
"from langchain.document_loaders.generic import GenericLoader\n",
@@ -48,7 +49,7 @@
" \"./example_data/source_code\",\n",
" glob=\"*\",\n",
" suffixes=[\".py\", \".js\"],\n",
" parser=LanguageParser()\n",
" parser=LanguageParser(),\n",
")\n",
"docs = loader.load()"
]
@@ -200,7 +201,7 @@
" \"./example_data/source_code\",\n",
" glob=\"*\",\n",
" suffixes=[\".py\"],\n",
" parser=LanguageParser(language=Language.PYTHON, parser_threshold=1000)\n",
" parser=LanguageParser(language=Language.PYTHON, parser_threshold=1000),\n",
")\n",
"docs = loader.load()"
]
@@ -281,7 +282,7 @@
" \"./example_data/source_code\",\n",
" glob=\"*\",\n",
" suffixes=[\".js\"],\n",
" parser=LanguageParser(language=Language.JS)\n",
" parser=LanguageParser(language=Language.JS),\n",
")\n",
"docs = loader.load()"
]

View File

@@ -43,10 +43,10 @@
"outputs": [],
"source": [
"conf = CosConfig(\n",
" Region=\"your cos region\",\n",
" SecretId=\"your cos secret_id\",\n",
" SecretKey=\"your cos secret_key\",\n",
" )\n",
" Region=\"your cos region\",\n",
" SecretId=\"your cos secret_id\",\n",
" SecretKey=\"your cos secret_key\",\n",
")\n",
"loader = TencentCOSDirectoryLoader(conf=conf, bucket=\"you_cos_bucket\")"
]
},

View File

@@ -43,10 +43,10 @@
"outputs": [],
"source": [
"conf = CosConfig(\n",
" Region=\"your cos region\",\n",
" SecretId=\"your cos secret_id\",\n",
" SecretKey=\"your cos secret_key\",\n",
" )\n",
" Region=\"your cos region\",\n",
" SecretId=\"your cos secret_id\",\n",
" SecretKey=\"your cos secret_key\",\n",
")\n",
"loader = TencentCOSFileLoader(conf=conf, bucket=\"you_cos_bucket\", key=\"fake.docx\")"
]
},

View File

@@ -0,0 +1,181 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# TSV\n",
"\n",
">A [tab-separated values (TSV)](https://en.wikipedia.org/wiki/Tab-separated_values) file is a simple, text-based file format for storing tabular data.[3] Records are separated by newlines, and values within a record are separated by tab characters."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## `UnstructuredTSVLoader`\n",
"\n",
"You can also load the table using the `UnstructuredTSVLoader`. One advantage of using `UnstructuredTSVLoader` is that if you use it in `\"elements\"` mode, an HTML representation of the table will be available in the metadata."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders.tsv import UnstructuredTSVLoader"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"loader = UnstructuredTSVLoader(\n",
" file_path=\"example_data/mlb_teams_2012.csv\", mode=\"elements\"\n",
")\n",
"docs = loader.load()"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<table border=\"1\" class=\"dataframe\">\n",
" <tbody>\n",
" <tr>\n",
" <td>Nationals, 81.34, 98</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Reds, 82.20, 97</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Yankees, 197.96, 95</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Giants, 117.62, 94</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Braves, 83.31, 94</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Athletics, 55.37, 94</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Rangers, 120.51, 93</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Orioles, 81.43, 93</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Rays, 64.17, 90</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Angels, 154.49, 89</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Tigers, 132.30, 88</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Cardinals, 110.30, 88</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Dodgers, 95.14, 86</td>\n",
" </tr>\n",
" <tr>\n",
" <td>White Sox, 96.92, 85</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Brewers, 97.65, 83</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Phillies, 174.54, 81</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Diamondbacks, 74.28, 81</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Pirates, 63.43, 79</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Padres, 55.24, 76</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Mariners, 81.97, 75</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Mets, 93.35, 74</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Blue Jays, 75.48, 73</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Royals, 60.91, 72</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Marlins, 118.07, 69</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Red Sox, 173.18, 69</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Indians, 78.43, 68</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Twins, 94.08, 66</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Rockies, 78.06, 64</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Cubs, 88.19, 61</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Astros, 60.65, 55</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n"
]
}
],
"source": [
"print(docs[0].metadata[\"text_as_html\"])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.13"
}
},
"nbformat": 4,
"nbformat_minor": 4
}

View File

@@ -233,7 +233,8 @@
],
"metadata": {
"collapsed": false
}
},
"id": "672264ad"
},
{
"cell_type": "code",
@@ -241,16 +242,18 @@
"outputs": [],
"source": [
"loader = WebBaseLoader(\n",
" \"https://www.walmart.com/search?q=parrots\", proxies={\n",
" \"https://www.walmart.com/search?q=parrots\",\n",
" proxies={\n",
" \"http\": \"http://{username}:{password}:@proxy.service.com:6666/\",\n",
" \"https\": \"https://{username}:{password}:@proxy.service.com:6666/\"\n",
" }\n",
" \"https\": \"https://{username}:{password}:@proxy.service.com:6666/\",\n",
" },\n",
")\n",
"docs = loader.load()\n"
"docs = loader.load()"
],
"metadata": {
"collapsed": false
}
},
"id": "9caf0310"
}
],
"metadata": {
@@ -274,4 +277,4 @@
},
"nbformat": 4,
"nbformat_minor": 5
}
}

View File

@@ -0,0 +1,304 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# Xorbits Pandas DataFrame\n",
"\n",
"This notebook goes over how to load data from a [xorbits.pandas](https://doc.xorbits.io/en/latest/reference/pandas/frame.html) DataFrame."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"#!pip install xorbits"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"import xorbits.pandas as pd"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"df = pd.read_csv(\"example_data/mlb_teams_2012.csv\")"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "b0d1d84e23c04f1296f63b3ea3dd1e5b",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
" 0%| | 0.00/100 [00:00<?, ?it/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Team</th>\n",
" <th>\"Payroll (millions)\"</th>\n",
" <th>\"Wins\"</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Nationals</td>\n",
" <td>81.34</td>\n",
" <td>98</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Reds</td>\n",
" <td>82.20</td>\n",
" <td>97</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Yankees</td>\n",
" <td>197.96</td>\n",
" <td>95</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Giants</td>\n",
" <td>117.62</td>\n",
" <td>94</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Braves</td>\n",
" <td>83.31</td>\n",
" <td>94</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Team \"Payroll (millions)\" \"Wins\"\n",
"0 Nationals 81.34 98\n",
"1 Reds 82.20 97\n",
"2 Yankees 197.96 95\n",
"3 Giants 117.62 94\n",
"4 Braves 83.31 94"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head()"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import XorbitsLoader"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"loader = XorbitsLoader(df, page_content_column=\"Team\")"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "c8c8b67f1aae4a3c9de7734bb6cf738e",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
" 0%| | 0.00/100 [00:00<?, ?it/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/plain": [
"[Document(page_content='Nationals', metadata={' \"Payroll (millions)\"': 81.34, ' \"Wins\"': 98}),\n",
" Document(page_content='Reds', metadata={' \"Payroll (millions)\"': 82.2, ' \"Wins\"': 97}),\n",
" Document(page_content='Yankees', metadata={' \"Payroll (millions)\"': 197.96, ' \"Wins\"': 95}),\n",
" Document(page_content='Giants', metadata={' \"Payroll (millions)\"': 117.62, ' \"Wins\"': 94}),\n",
" Document(page_content='Braves', metadata={' \"Payroll (millions)\"': 83.31, ' \"Wins\"': 94}),\n",
" Document(page_content='Athletics', metadata={' \"Payroll (millions)\"': 55.37, ' \"Wins\"': 94}),\n",
" Document(page_content='Rangers', metadata={' \"Payroll (millions)\"': 120.51, ' \"Wins\"': 93}),\n",
" Document(page_content='Orioles', metadata={' \"Payroll (millions)\"': 81.43, ' \"Wins\"': 93}),\n",
" Document(page_content='Rays', metadata={' \"Payroll (millions)\"': 64.17, ' \"Wins\"': 90}),\n",
" Document(page_content='Angels', metadata={' \"Payroll (millions)\"': 154.49, ' \"Wins\"': 89}),\n",
" Document(page_content='Tigers', metadata={' \"Payroll (millions)\"': 132.3, ' \"Wins\"': 88}),\n",
" Document(page_content='Cardinals', metadata={' \"Payroll (millions)\"': 110.3, ' \"Wins\"': 88}),\n",
" Document(page_content='Dodgers', metadata={' \"Payroll (millions)\"': 95.14, ' \"Wins\"': 86}),\n",
" Document(page_content='White Sox', metadata={' \"Payroll (millions)\"': 96.92, ' \"Wins\"': 85}),\n",
" Document(page_content='Brewers', metadata={' \"Payroll (millions)\"': 97.65, ' \"Wins\"': 83}),\n",
" Document(page_content='Phillies', metadata={' \"Payroll (millions)\"': 174.54, ' \"Wins\"': 81}),\n",
" Document(page_content='Diamondbacks', metadata={' \"Payroll (millions)\"': 74.28, ' \"Wins\"': 81}),\n",
" Document(page_content='Pirates', metadata={' \"Payroll (millions)\"': 63.43, ' \"Wins\"': 79}),\n",
" Document(page_content='Padres', metadata={' \"Payroll (millions)\"': 55.24, ' \"Wins\"': 76}),\n",
" Document(page_content='Mariners', metadata={' \"Payroll (millions)\"': 81.97, ' \"Wins\"': 75}),\n",
" Document(page_content='Mets', metadata={' \"Payroll (millions)\"': 93.35, ' \"Wins\"': 74}),\n",
" Document(page_content='Blue Jays', metadata={' \"Payroll (millions)\"': 75.48, ' \"Wins\"': 73}),\n",
" Document(page_content='Royals', metadata={' \"Payroll (millions)\"': 60.91, ' \"Wins\"': 72}),\n",
" Document(page_content='Marlins', metadata={' \"Payroll (millions)\"': 118.07, ' \"Wins\"': 69}),\n",
" Document(page_content='Red Sox', metadata={' \"Payroll (millions)\"': 173.18, ' \"Wins\"': 69}),\n",
" Document(page_content='Indians', metadata={' \"Payroll (millions)\"': 78.43, ' \"Wins\"': 68}),\n",
" Document(page_content='Twins', metadata={' \"Payroll (millions)\"': 94.08, ' \"Wins\"': 66}),\n",
" Document(page_content='Rockies', metadata={' \"Payroll (millions)\"': 78.06, ' \"Wins\"': 64}),\n",
" Document(page_content='Cubs', metadata={' \"Payroll (millions)\"': 88.19, ' \"Wins\"': 61}),\n",
" Document(page_content='Astros', metadata={' \"Payroll (millions)\"': 60.65, ' \"Wins\"': 55})]"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"loader.load()"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "fc85c9f59b3644689d05853159fbd358",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
" 0%| | 0.00/100 [00:00<?, ?it/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"page_content='Nationals' metadata={' \"Payroll (millions)\"': 81.34, ' \"Wins\"': 98}\n",
"page_content='Reds' metadata={' \"Payroll (millions)\"': 82.2, ' \"Wins\"': 97}\n",
"page_content='Yankees' metadata={' \"Payroll (millions)\"': 197.96, ' \"Wins\"': 95}\n",
"page_content='Giants' metadata={' \"Payroll (millions)\"': 117.62, ' \"Wins\"': 94}\n",
"page_content='Braves' metadata={' \"Payroll (millions)\"': 83.31, ' \"Wins\"': 94}\n",
"page_content='Athletics' metadata={' \"Payroll (millions)\"': 55.37, ' \"Wins\"': 94}\n",
"page_content='Rangers' metadata={' \"Payroll (millions)\"': 120.51, ' \"Wins\"': 93}\n",
"page_content='Orioles' metadata={' \"Payroll (millions)\"': 81.43, ' \"Wins\"': 93}\n",
"page_content='Rays' metadata={' \"Payroll (millions)\"': 64.17, ' \"Wins\"': 90}\n",
"page_content='Angels' metadata={' \"Payroll (millions)\"': 154.49, ' \"Wins\"': 89}\n",
"page_content='Tigers' metadata={' \"Payroll (millions)\"': 132.3, ' \"Wins\"': 88}\n",
"page_content='Cardinals' metadata={' \"Payroll (millions)\"': 110.3, ' \"Wins\"': 88}\n",
"page_content='Dodgers' metadata={' \"Payroll (millions)\"': 95.14, ' \"Wins\"': 86}\n",
"page_content='White Sox' metadata={' \"Payroll (millions)\"': 96.92, ' \"Wins\"': 85}\n",
"page_content='Brewers' metadata={' \"Payroll (millions)\"': 97.65, ' \"Wins\"': 83}\n",
"page_content='Phillies' metadata={' \"Payroll (millions)\"': 174.54, ' \"Wins\"': 81}\n",
"page_content='Diamondbacks' metadata={' \"Payroll (millions)\"': 74.28, ' \"Wins\"': 81}\n",
"page_content='Pirates' metadata={' \"Payroll (millions)\"': 63.43, ' \"Wins\"': 79}\n",
"page_content='Padres' metadata={' \"Payroll (millions)\"': 55.24, ' \"Wins\"': 76}\n",
"page_content='Mariners' metadata={' \"Payroll (millions)\"': 81.97, ' \"Wins\"': 75}\n",
"page_content='Mets' metadata={' \"Payroll (millions)\"': 93.35, ' \"Wins\"': 74}\n",
"page_content='Blue Jays' metadata={' \"Payroll (millions)\"': 75.48, ' \"Wins\"': 73}\n",
"page_content='Royals' metadata={' \"Payroll (millions)\"': 60.91, ' \"Wins\"': 72}\n",
"page_content='Marlins' metadata={' \"Payroll (millions)\"': 118.07, ' \"Wins\"': 69}\n",
"page_content='Red Sox' metadata={' \"Payroll (millions)\"': 173.18, ' \"Wins\"': 69}\n",
"page_content='Indians' metadata={' \"Payroll (millions)\"': 78.43, ' \"Wins\"': 68}\n",
"page_content='Twins' metadata={' \"Payroll (millions)\"': 94.08, ' \"Wins\"': 66}\n",
"page_content='Rockies' metadata={' \"Payroll (millions)\"': 78.06, ' \"Wins\"': 64}\n",
"page_content='Cubs' metadata={' \"Payroll (millions)\"': 88.19, ' \"Wins\"': 61}\n",
"page_content='Astros' metadata={' \"Payroll (millions)\"': 60.65, ' \"Wins\"': 55}\n"
]
}
],
"source": [
"# Use lazy load for larger table, which won't read the full table into memory\n",
"for i in loader.lazy_load():\n",
" print(i)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "base",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.13"
},
"orig_nbformat": 4
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -0,0 +1 @@
label: 'Integrations'

View File

@@ -0,0 +1,269 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Doctran Extract Properties\n",
"\n",
"We can extract useful features of documents using the [Doctran](https://github.com/psychic-api/doctran) library, which uses OpenAI's function calling feature to extract specific metadata.\n",
"\n",
"Extracting metadata from documents is helpful for a variety of tasks, including:\n",
"* Classification: classifying documents into different categories\n",
"* Data mining: Extract structured data that can be used for data analysis\n",
"* Style transfer: Change the way text is written to more closely match expected user input, improving vector search results"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"! pip install doctran"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"scrolled": false
},
"outputs": [],
"source": [
"import json\n",
"from langchain.schema import Document\n",
"from langchain.document_transformers import DoctranPropertyExtractor"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from dotenv import load_dotenv\n",
"\n",
"load_dotenv()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Input\n",
"This is the document we'll extract properties from."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[Generated with ChatGPT]\n",
"\n",
"Confidential Document - For Internal Use Only\n",
"\n",
"Date: July 1, 2023\n",
"\n",
"Subject: Updates and Discussions on Various Topics\n",
"\n",
"Dear Team,\n",
"\n",
"I hope this email finds you well. In this document, I would like to provide you with some important updates and discuss various topics that require our attention. Please treat the information contained herein as highly confidential.\n",
"\n",
"Security and Privacy Measures\n",
"As part of our ongoing commitment to ensure the security and privacy of our customers' data, we have implemented robust measures across all our systems. We would like to commend John Doe (email: john.doe@example.com) from the IT department for his diligent work in enhancing our network security. Moving forward, we kindly remind everyone to strictly adhere to our data protection policies and guidelines. Additionally, if you come across any potential security risks or incidents, please report them immediately to our dedicated team at security@example.com.\n",
"\n",
"HR Updates and Employee Benefits\n",
"Recently, we welcomed several new team members who have made significant contributions to their respective departments. I would like to recognize Jane Smith (SSN: 049-45-5928) for her outstanding performance in customer service. Jane has consistently received positive feedback from our clients. Furthermore, please remember that the open enrollment period for our employee benefits program is fast approaching. Should you have any questions or require assistance, please contact our HR representative, Michael Johnson (phone: 418-492-3850, email: michael.johnson@example.com).\n",
"\n",
"Marketing Initiatives and Campaigns\n",
"Our marketing team has been actively working on developing new strategies to increase brand awareness and drive customer engagement. We would like to thank Sarah Thompson (phone: 415-555-1234) for her exceptional efforts in managing our social media platforms. Sarah has successfully increased our follower base by 20% in the past month alone. Moreover, please mark your calendars for the upcoming product launch event on July 15th. We encourage all team members to attend and support this exciting milestone for our company.\n",
"\n",
"Research and Development Projects\n",
"In our pursuit of innovation, our research and development department has been working tirelessly on various projects. I would like to acknowledge the exceptional work of David Rodriguez (email: david.rodriguez@example.com) in his role as project lead. David's contributions to the development of our cutting-edge technology have been instrumental. Furthermore, we would like to remind everyone to share their ideas and suggestions for potential new projects during our monthly R&D brainstorming session, scheduled for July 10th.\n",
"\n",
"Please treat the information in this document with utmost confidentiality and ensure that it is not shared with unauthorized individuals. If you have any questions or concerns regarding the topics discussed, please do not hesitate to reach out to me directly.\n",
"\n",
"Thank you for your attention, and let's continue to work together to achieve our goals.\n",
"\n",
"Best regards,\n",
"\n",
"Jason Fan\n",
"Cofounder & CEO\n",
"Psychic\n",
"jason@psychic.dev\n",
"\n"
]
}
],
"source": [
"sample_text = \"\"\"[Generated with ChatGPT]\n",
"\n",
"Confidential Document - For Internal Use Only\n",
"\n",
"Date: July 1, 2023\n",
"\n",
"Subject: Updates and Discussions on Various Topics\n",
"\n",
"Dear Team,\n",
"\n",
"I hope this email finds you well. In this document, I would like to provide you with some important updates and discuss various topics that require our attention. Please treat the information contained herein as highly confidential.\n",
"\n",
"Security and Privacy Measures\n",
"As part of our ongoing commitment to ensure the security and privacy of our customers' data, we have implemented robust measures across all our systems. We would like to commend John Doe (email: john.doe@example.com) from the IT department for his diligent work in enhancing our network security. Moving forward, we kindly remind everyone to strictly adhere to our data protection policies and guidelines. Additionally, if you come across any potential security risks or incidents, please report them immediately to our dedicated team at security@example.com.\n",
"\n",
"HR Updates and Employee Benefits\n",
"Recently, we welcomed several new team members who have made significant contributions to their respective departments. I would like to recognize Jane Smith (SSN: 049-45-5928) for her outstanding performance in customer service. Jane has consistently received positive feedback from our clients. Furthermore, please remember that the open enrollment period for our employee benefits program is fast approaching. Should you have any questions or require assistance, please contact our HR representative, Michael Johnson (phone: 418-492-3850, email: michael.johnson@example.com).\n",
"\n",
"Marketing Initiatives and Campaigns\n",
"Our marketing team has been actively working on developing new strategies to increase brand awareness and drive customer engagement. We would like to thank Sarah Thompson (phone: 415-555-1234) for her exceptional efforts in managing our social media platforms. Sarah has successfully increased our follower base by 20% in the past month alone. Moreover, please mark your calendars for the upcoming product launch event on July 15th. We encourage all team members to attend and support this exciting milestone for our company.\n",
"\n",
"Research and Development Projects\n",
"In our pursuit of innovation, our research and development department has been working tirelessly on various projects. I would like to acknowledge the exceptional work of David Rodriguez (email: david.rodriguez@example.com) in his role as project lead. David's contributions to the development of our cutting-edge technology have been instrumental. Furthermore, we would like to remind everyone to share their ideas and suggestions for potential new projects during our monthly R&D brainstorming session, scheduled for July 10th.\n",
"\n",
"Please treat the information in this document with utmost confidentiality and ensure that it is not shared with unauthorized individuals. If you have any questions or concerns regarding the topics discussed, please do not hesitate to reach out to me directly.\n",
"\n",
"Thank you for your attention, and let's continue to work together to achieve our goals.\n",
"\n",
"Best regards,\n",
"\n",
"Jason Fan\n",
"Cofounder & CEO\n",
"Psychic\n",
"jason@psychic.dev\n",
"\"\"\"\n",
"print(sample_text)"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"documents = [Document(page_content=sample_text)]\n",
"properties = [\n",
" {\n",
" \"name\": \"category\",\n",
" \"description\": \"What type of email this is.\",\n",
" \"type\": \"string\",\n",
" \"enum\": [\"update\", \"action_item\", \"customer_feedback\", \"announcement\", \"other\"],\n",
" \"required\": True,\n",
" },\n",
" {\n",
" \"name\": \"mentions\",\n",
" \"description\": \"A list of all people mentioned in this email.\",\n",
" \"type\": \"array\",\n",
" \"items\": {\n",
" \"name\": \"full_name\",\n",
" \"description\": \"The full name of the person mentioned.\",\n",
" \"type\": \"string\",\n",
" },\n",
" \"required\": True,\n",
" },\n",
" {\n",
" \"name\": \"eli5\",\n",
" \"description\": \"Explain this email to me like I'm 5 years old.\",\n",
" \"type\": \"string\",\n",
" \"required\": True,\n",
" },\n",
"]\n",
"property_extractor = DoctranPropertyExtractor(properties=properties)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Output\n",
"After extracting properties from a document, the result will be returned as a new document with properties provided in the metadata"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"extracted_document = await property_extractor.atransform_documents(\n",
" documents, properties=properties\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{\n",
" \"extracted_properties\": {\n",
" \"category\": \"update\",\n",
" \"mentions\": [\n",
" \"John Doe\",\n",
" \"Jane Smith\",\n",
" \"Michael Johnson\",\n",
" \"Sarah Thompson\",\n",
" \"David Rodriguez\",\n",
" \"Jason Fan\"\n",
" ],\n",
" \"eli5\": \"This is an email from the CEO, Jason Fan, giving updates about different areas in the company. He talks about new security measures and praises John Doe for his work. He also mentions new hires and praises Jane Smith for her work in customer service. The CEO reminds everyone about the upcoming benefits enrollment and says to contact Michael Johnson with any questions. He talks about the marketing team's work and praises Sarah Thompson for increasing their social media followers. There's also a product launch event on July 15th. Lastly, he talks about the research and development projects and praises David Rodriguez for his work. There's a brainstorming session on July 10th.\"\n",
" }\n",
"}\n"
]
}
],
"source": [
"print(json.dumps(extracted_document[0].metadata, indent=2))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.3"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -0,0 +1,266 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Doctran Interrogate Documents\n",
"Documents used in a vector store knowledge base are typically stored in narrative or conversational format. However, most user queries are in question format. If we convert documents into Q&A format before vectorizing them, we can increase the liklihood of retrieving relevant documents, and decrease the liklihood of retrieving irrelevant documents.\n",
"\n",
"We can accomplish this using the [Doctran](https://github.com/psychic-api/doctran) library, which uses OpenAI's function calling feature to \"interrogate\" documents.\n",
"\n",
"See [this notebook](https://github.com/psychic-api/doctran/blob/main/benchmark.ipynb) for benchmarks on vector similarity scores for various queries based on raw documents versus interrogated documents."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"! pip install doctran"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"scrolled": false
},
"outputs": [],
"source": [
"import json\n",
"from langchain.schema import Document\n",
"from langchain.document_transformers import DoctranQATransformer"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from dotenv import load_dotenv\n",
"\n",
"load_dotenv()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Input\n",
"This is the document we'll interrogate"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[Generated with ChatGPT]\n",
"\n",
"Confidential Document - For Internal Use Only\n",
"\n",
"Date: July 1, 2023\n",
"\n",
"Subject: Updates and Discussions on Various Topics\n",
"\n",
"Dear Team,\n",
"\n",
"I hope this email finds you well. In this document, I would like to provide you with some important updates and discuss various topics that require our attention. Please treat the information contained herein as highly confidential.\n",
"\n",
"Security and Privacy Measures\n",
"As part of our ongoing commitment to ensure the security and privacy of our customers' data, we have implemented robust measures across all our systems. We would like to commend John Doe (email: john.doe@example.com) from the IT department for his diligent work in enhancing our network security. Moving forward, we kindly remind everyone to strictly adhere to our data protection policies and guidelines. Additionally, if you come across any potential security risks or incidents, please report them immediately to our dedicated team at security@example.com.\n",
"\n",
"HR Updates and Employee Benefits\n",
"Recently, we welcomed several new team members who have made significant contributions to their respective departments. I would like to recognize Jane Smith (SSN: 049-45-5928) for her outstanding performance in customer service. Jane has consistently received positive feedback from our clients. Furthermore, please remember that the open enrollment period for our employee benefits program is fast approaching. Should you have any questions or require assistance, please contact our HR representative, Michael Johnson (phone: 418-492-3850, email: michael.johnson@example.com).\n",
"\n",
"Marketing Initiatives and Campaigns\n",
"Our marketing team has been actively working on developing new strategies to increase brand awareness and drive customer engagement. We would like to thank Sarah Thompson (phone: 415-555-1234) for her exceptional efforts in managing our social media platforms. Sarah has successfully increased our follower base by 20% in the past month alone. Moreover, please mark your calendars for the upcoming product launch event on July 15th. We encourage all team members to attend and support this exciting milestone for our company.\n",
"\n",
"Research and Development Projects\n",
"In our pursuit of innovation, our research and development department has been working tirelessly on various projects. I would like to acknowledge the exceptional work of David Rodriguez (email: david.rodriguez@example.com) in his role as project lead. David's contributions to the development of our cutting-edge technology have been instrumental. Furthermore, we would like to remind everyone to share their ideas and suggestions for potential new projects during our monthly R&D brainstorming session, scheduled for July 10th.\n",
"\n",
"Please treat the information in this document with utmost confidentiality and ensure that it is not shared with unauthorized individuals. If you have any questions or concerns regarding the topics discussed, please do not hesitate to reach out to me directly.\n",
"\n",
"Thank you for your attention, and let's continue to work together to achieve our goals.\n",
"\n",
"Best regards,\n",
"\n",
"Jason Fan\n",
"Cofounder & CEO\n",
"Psychic\n",
"jason@psychic.dev\n",
"\n"
]
}
],
"source": [
"sample_text = \"\"\"[Generated with ChatGPT]\n",
"\n",
"Confidential Document - For Internal Use Only\n",
"\n",
"Date: July 1, 2023\n",
"\n",
"Subject: Updates and Discussions on Various Topics\n",
"\n",
"Dear Team,\n",
"\n",
"I hope this email finds you well. In this document, I would like to provide you with some important updates and discuss various topics that require our attention. Please treat the information contained herein as highly confidential.\n",
"\n",
"Security and Privacy Measures\n",
"As part of our ongoing commitment to ensure the security and privacy of our customers' data, we have implemented robust measures across all our systems. We would like to commend John Doe (email: john.doe@example.com) from the IT department for his diligent work in enhancing our network security. Moving forward, we kindly remind everyone to strictly adhere to our data protection policies and guidelines. Additionally, if you come across any potential security risks or incidents, please report them immediately to our dedicated team at security@example.com.\n",
"\n",
"HR Updates and Employee Benefits\n",
"Recently, we welcomed several new team members who have made significant contributions to their respective departments. I would like to recognize Jane Smith (SSN: 049-45-5928) for her outstanding performance in customer service. Jane has consistently received positive feedback from our clients. Furthermore, please remember that the open enrollment period for our employee benefits program is fast approaching. Should you have any questions or require assistance, please contact our HR representative, Michael Johnson (phone: 418-492-3850, email: michael.johnson@example.com).\n",
"\n",
"Marketing Initiatives and Campaigns\n",
"Our marketing team has been actively working on developing new strategies to increase brand awareness and drive customer engagement. We would like to thank Sarah Thompson (phone: 415-555-1234) for her exceptional efforts in managing our social media platforms. Sarah has successfully increased our follower base by 20% in the past month alone. Moreover, please mark your calendars for the upcoming product launch event on July 15th. We encourage all team members to attend and support this exciting milestone for our company.\n",
"\n",
"Research and Development Projects\n",
"In our pursuit of innovation, our research and development department has been working tirelessly on various projects. I would like to acknowledge the exceptional work of David Rodriguez (email: david.rodriguez@example.com) in his role as project lead. David's contributions to the development of our cutting-edge technology have been instrumental. Furthermore, we would like to remind everyone to share their ideas and suggestions for potential new projects during our monthly R&D brainstorming session, scheduled for July 10th.\n",
"\n",
"Please treat the information in this document with utmost confidentiality and ensure that it is not shared with unauthorized individuals. If you have any questions or concerns regarding the topics discussed, please do not hesitate to reach out to me directly.\n",
"\n",
"Thank you for your attention, and let's continue to work together to achieve our goals.\n",
"\n",
"Best regards,\n",
"\n",
"Jason Fan\n",
"Cofounder & CEO\n",
"Psychic\n",
"jason@psychic.dev\n",
"\"\"\"\n",
"print(sample_text)"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"documents = [Document(page_content=sample_text)]\n",
"qa_transformer = DoctranQATransformer()\n",
"transformed_document = await qa_transformer.atransform_documents(documents)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Output\n",
"After interrogating a document, the result will be returned as a new document with questions and answers provided in the metadata."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{\n",
" \"questions_and_answers\": [\n",
" {\n",
" \"question\": \"What is the purpose of this document?\",\n",
" \"answer\": \"The purpose of this document is to provide important updates and discuss various topics that require the team's attention.\"\n",
" },\n",
" {\n",
" \"question\": \"Who is responsible for enhancing the network security?\",\n",
" \"answer\": \"John Doe from the IT department is responsible for enhancing the network security.\"\n",
" },\n",
" {\n",
" \"question\": \"Where should potential security risks or incidents be reported?\",\n",
" \"answer\": \"Potential security risks or incidents should be reported to the dedicated team at security@example.com.\"\n",
" },\n",
" {\n",
" \"question\": \"Who has been recognized for outstanding performance in customer service?\",\n",
" \"answer\": \"Jane Smith has been recognized for her outstanding performance in customer service.\"\n",
" },\n",
" {\n",
" \"question\": \"When is the open enrollment period for the employee benefits program?\",\n",
" \"answer\": \"The document does not specify the exact dates for the open enrollment period for the employee benefits program, but it mentions that it is fast approaching.\"\n",
" },\n",
" {\n",
" \"question\": \"Who should be contacted for questions or assistance regarding the employee benefits program?\",\n",
" \"answer\": \"For questions or assistance regarding the employee benefits program, the HR representative, Michael Johnson, should be contacted.\"\n",
" },\n",
" {\n",
" \"question\": \"Who has been acknowledged for managing the company's social media platforms?\",\n",
" \"answer\": \"Sarah Thompson has been acknowledged for managing the company's social media platforms.\"\n",
" },\n",
" {\n",
" \"question\": \"When is the upcoming product launch event?\",\n",
" \"answer\": \"The upcoming product launch event is on July 15th.\"\n",
" },\n",
" {\n",
" \"question\": \"Who has been recognized for their contributions to the development of the company's technology?\",\n",
" \"answer\": \"David Rodriguez has been recognized for his contributions to the development of the company's technology.\"\n",
" },\n",
" {\n",
" \"question\": \"When is the monthly R&D brainstorming session?\",\n",
" \"answer\": \"The monthly R&D brainstorming session is scheduled for July 10th.\"\n",
" },\n",
" {\n",
" \"question\": \"Who should be contacted for questions or concerns regarding the topics discussed in the document?\",\n",
" \"answer\": \"For questions or concerns regarding the topics discussed in the document, Jason Fan, the Cofounder & CEO, should be contacted.\"\n",
" }\n",
" ]\n",
"}\n"
]
}
],
"source": [
"transformed_document = await qa_transformer.atransform_documents(documents)\n",
"print(json.dumps(transformed_document[0].metadata, indent=2))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.3"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -0,0 +1,208 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Doctran Translate Documents\n",
"Comparing documents through embeddings has the benefit of working across multiple languages. \"Harrison says hello\" and \"Harrison dice hola\" will occupy similar positions in the vector space because they have the same meaning semantically.\n",
"\n",
"However, it can still be useful to use a LLM translate documents into other languages before vectorizing them. This is especially helpful when users are expected to query the knowledge base in different languages, or when state of the art embeddings models are not available for a given language.\n",
"\n",
"We can accomplish this using the [Doctran](https://github.com/psychic-api/doctran) library, which uses OpenAI's function calling feature to translate documents between languages."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"! pip install doctran"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"from langchain.schema import Document\n",
"from langchain.document_transformers import DoctranTextTranslator"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from dotenv import load_dotenv\n",
"\n",
"load_dotenv()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Input\n",
"This is the document we'll translate"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"sample_text = \"\"\"[Generated with ChatGPT]\n",
"\n",
"Confidential Document - For Internal Use Only\n",
"\n",
"Date: July 1, 2023\n",
"\n",
"Subject: Updates and Discussions on Various Topics\n",
"\n",
"Dear Team,\n",
"\n",
"I hope this email finds you well. In this document, I would like to provide you with some important updates and discuss various topics that require our attention. Please treat the information contained herein as highly confidential.\n",
"\n",
"Security and Privacy Measures\n",
"As part of our ongoing commitment to ensure the security and privacy of our customers' data, we have implemented robust measures across all our systems. We would like to commend John Doe (email: john.doe@example.com) from the IT department for his diligent work in enhancing our network security. Moving forward, we kindly remind everyone to strictly adhere to our data protection policies and guidelines. Additionally, if you come across any potential security risks or incidents, please report them immediately to our dedicated team at security@example.com.\n",
"\n",
"HR Updates and Employee Benefits\n",
"Recently, we welcomed several new team members who have made significant contributions to their respective departments. I would like to recognize Jane Smith (SSN: 049-45-5928) for her outstanding performance in customer service. Jane has consistently received positive feedback from our clients. Furthermore, please remember that the open enrollment period for our employee benefits program is fast approaching. Should you have any questions or require assistance, please contact our HR representative, Michael Johnson (phone: 418-492-3850, email: michael.johnson@example.com).\n",
"\n",
"Marketing Initiatives and Campaigns\n",
"Our marketing team has been actively working on developing new strategies to increase brand awareness and drive customer engagement. We would like to thank Sarah Thompson (phone: 415-555-1234) for her exceptional efforts in managing our social media platforms. Sarah has successfully increased our follower base by 20% in the past month alone. Moreover, please mark your calendars for the upcoming product launch event on July 15th. We encourage all team members to attend and support this exciting milestone for our company.\n",
"\n",
"Research and Development Projects\n",
"In our pursuit of innovation, our research and development department has been working tirelessly on various projects. I would like to acknowledge the exceptional work of David Rodriguez (email: david.rodriguez@example.com) in his role as project lead. David's contributions to the development of our cutting-edge technology have been instrumental. Furthermore, we would like to remind everyone to share their ideas and suggestions for potential new projects during our monthly R&D brainstorming session, scheduled for July 10th.\n",
"\n",
"Please treat the information in this document with utmost confidentiality and ensure that it is not shared with unauthorized individuals. If you have any questions or concerns regarding the topics discussed, please do not hesitate to reach out to me directly.\n",
"\n",
"Thank you for your attention, and let's continue to work together to achieve our goals.\n",
"\n",
"Best regards,\n",
"\n",
"Jason Fan\n",
"Cofounder & CEO\n",
"Psychic\n",
"jason@psychic.dev\n",
"\"\"\""
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"documents = [Document(page_content=sample_text)]\n",
"qa_translator = DoctranTextTranslator(language=\"spanish\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Output\n",
"After translating a document, the result will be returned as a new document with the page_content translated into the target language"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"scrolled": false
},
"outputs": [],
"source": [
"translated_document = await qa_translator.atransform_documents(documents)"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[Generado con ChatGPT]\n",
"\n",
"Documento confidencial - Solo para uso interno\n",
"\n",
"Fecha: 1 de julio de 2023\n",
"\n",
"Asunto: Actualizaciones y discusiones sobre varios temas\n",
"\n",
"Estimado equipo,\n",
"\n",
"Espero que este correo electrónico les encuentre bien. En este documento, me gustaría proporcionarles algunas actualizaciones importantes y discutir varios temas que requieren nuestra atención. Por favor, traten la información contenida aquí como altamente confidencial.\n",
"\n",
"Medidas de seguridad y privacidad\n",
"Como parte de nuestro compromiso continuo para garantizar la seguridad y privacidad de los datos de nuestros clientes, hemos implementado medidas robustas en todos nuestros sistemas. Nos gustaría elogiar a John Doe (correo electrónico: john.doe@example.com) del departamento de TI por su diligente trabajo en mejorar nuestra seguridad de red. En adelante, recordamos amablemente a todos que se adhieran estrictamente a nuestras políticas y directrices de protección de datos. Además, si se encuentran con cualquier riesgo de seguridad o incidente potencial, por favor repórtelo inmediatamente a nuestro equipo dedicado en security@example.com.\n",
"\n",
"Actualizaciones de RRHH y beneficios para empleados\n",
"Recientemente, dimos la bienvenida a varios nuevos miembros del equipo que han hecho contribuciones significativas a sus respectivos departamentos. Me gustaría reconocer a Jane Smith (SSN: 049-45-5928) por su sobresaliente rendimiento en el servicio al cliente. Jane ha recibido constantemente comentarios positivos de nuestros clientes. Además, recuerden que el período de inscripción abierta para nuestro programa de beneficios para empleados se acerca rápidamente. Si tienen alguna pregunta o necesitan asistencia, por favor contacten a nuestro representante de RRHH, Michael Johnson (teléfono: 418-492-3850, correo electrónico: michael.johnson@example.com).\n",
"\n",
"Iniciativas y campañas de marketing\n",
"Nuestro equipo de marketing ha estado trabajando activamente en el desarrollo de nuevas estrategias para aumentar la conciencia de marca y fomentar la participación del cliente. Nos gustaría agradecer a Sarah Thompson (teléfono: 415-555-1234) por sus excepcionales esfuerzos en la gestión de nuestras plataformas de redes sociales. Sarah ha aumentado con éxito nuestra base de seguidores en un 20% solo en el último mes. Además, por favor marquen sus calendarios para el próximo evento de lanzamiento de producto el 15 de julio. Animamos a todos los miembros del equipo a asistir y apoyar este emocionante hito para nuestra empresa.\n",
"\n",
"Proyectos de investigación y desarrollo\n",
"En nuestra búsqueda de la innovación, nuestro departamento de investigación y desarrollo ha estado trabajando incansablemente en varios proyectos. Me gustaría reconocer el excepcional trabajo de David Rodríguez (correo electrónico: david.rodriguez@example.com) en su papel de líder de proyecto. Las contribuciones de David al desarrollo de nuestra tecnología de vanguardia han sido fundamentales. Además, nos gustaría recordar a todos que compartan sus ideas y sugerencias para posibles nuevos proyectos durante nuestra sesión de lluvia de ideas de I+D mensual, programada para el 10 de julio.\n",
"\n",
"Por favor, traten la información de este documento con la máxima confidencialidad y asegúrense de que no se comparte con personas no autorizadas. Si tienen alguna pregunta o inquietud sobre los temas discutidos, no duden en ponerse en contacto conmigo directamente.\n",
"\n",
"Gracias por su atención, y sigamos trabajando juntos para alcanzar nuestros objetivos.\n",
"\n",
"Saludos cordiales,\n",
"\n",
"Jason Fan\n",
"Cofundador y CEO\n",
"Psychic\n",
"jason@psychic.dev\n"
]
}
],
"source": [
"print(translated_document[0].page_content)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.3"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -0,0 +1,261 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# OpenAI Functions Metadata Tagger\n",
"\n",
"It can often be useful to tag ingested documents with structured metadata, such as the title, tone, or length of a document, to allow for more targeted similarity search later. However, for large numbers of documents, performing this labelling process manually can be tedious.\n",
"\n",
"The `OpenAIMetadataTagger` document transformer automates this process by extracting metadata from each provided document according to a provided schema. It uses a configurable OpenAI Functions-powered chain under the hood, so if you pass a custom LLM instance, it must be an OpenAI model with functions support. \n",
"\n",
"**Note:** This document transformer works best with complete documents, so it's best to run it first with whole documents before doing any other splitting or processing!\n",
"\n",
"For example, let's say you wanted to index a set of movie reviews. You could initialize the document transformer with a valid JSON Schema object as follows:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"from langchain.schema import Document\n",
"from langchain.chat_models import ChatOpenAI\n",
"from langchain.document_transformers.openai_functions import create_metadata_tagger"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"schema = {\n",
" \"properties\": {\n",
" \"movie_title\": {\"type\": \"string\"},\n",
" \"critic\": {\"type\": \"string\"},\n",
" \"tone\": {\"type\": \"string\", \"enum\": [\"positive\", \"negative\"]},\n",
" \"rating\": {\n",
" \"type\": \"integer\",\n",
" \"description\": \"The number of stars the critic rated the movie\",\n",
" },\n",
" },\n",
" \"required\": [\"movie_title\", \"critic\", \"tone\"],\n",
"}\n",
"\n",
"# Must be an OpenAI model that supports functions\n",
"llm = ChatOpenAI(temperature=0, model=\"gpt-3.5-turbo-0613\")\n",
"\n",
"document_transformer = create_metadata_tagger(metadata_schema=schema, llm=llm)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can then simply pass the document transformer a list of documents, and it will extract metadata from the contents:"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"original_documents = [\n",
" Document(\n",
" page_content=\"Review of The Bee Movie\\nBy Roger Ebert\\n\\nThis is the greatest movie ever made. 4 out of 5 stars.\"\n",
" ),\n",
" Document(\n",
" page_content=\"Review of The Godfather\\nBy Anonymous\\n\\nThis movie was super boring. 1 out of 5 stars.\",\n",
" metadata={\"reliable\": False},\n",
" ),\n",
"]\n",
"\n",
"enhanced_documents = document_transformer.transform_documents(original_documents)"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Review of The Bee Movie\n",
"By Roger Ebert\n",
"\n",
"This is the greatest movie ever made. 4 out of 5 stars.\n",
"\n",
"{\"movie_title\": \"The Bee Movie\", \"critic\": \"Roger Ebert\", \"tone\": \"positive\", \"rating\": 4}\n",
"\n",
"---------------\n",
"\n",
"Review of The Godfather\n",
"By Anonymous\n",
"\n",
"This movie was super boring. 1 out of 5 stars.\n",
"\n",
"{\"movie_title\": \"The Godfather\", \"critic\": \"Anonymous\", \"tone\": \"negative\", \"rating\": 1, \"reliable\": false}\n"
]
}
],
"source": [
"import json\n",
"\n",
"print(\n",
" *[d.page_content + \"\\n\\n\" + json.dumps(d.metadata) for d in enhanced_documents],\n",
" sep=\"\\n\\n---------------\\n\\n\"\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The new documents can then be further processed by a text splitter before being loaded into a vector store. Extracted fields will not overwrite existing metadata.\n",
"\n",
"You can also initialize the document transformer with a Pydantic schema:"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Review of The Bee Movie\n",
"By Roger Ebert\n",
"\n",
"This is the greatest movie ever made. 4 out of 5 stars.\n",
"\n",
"{\"movie_title\": \"The Bee Movie\", \"critic\": \"Roger Ebert\", \"tone\": \"positive\", \"rating\": 4}\n",
"\n",
"---------------\n",
"\n",
"Review of The Godfather\n",
"By Anonymous\n",
"\n",
"This movie was super boring. 1 out of 5 stars.\n",
"\n",
"{\"movie_title\": \"The Godfather\", \"critic\": \"Anonymous\", \"tone\": \"negative\", \"rating\": 1, \"reliable\": false}\n"
]
}
],
"source": [
"from typing import Literal\n",
"\n",
"from pydantic import BaseModel, Field\n",
"\n",
"\n",
"class Properties(BaseModel):\n",
" movie_title: str\n",
" critic: str\n",
" tone: Literal[\"positive\", \"negative\"]\n",
" rating: int = Field(description=\"Rating out of 5 stars\")\n",
"\n",
"\n",
"document_transformer = create_metadata_tagger(Properties, llm)\n",
"enhanced_documents = document_transformer.transform_documents(original_documents)\n",
"\n",
"print(\n",
" *[d.page_content + \"\\n\\n\" + json.dumps(d.metadata) for d in enhanced_documents],\n",
" sep=\"\\n\\n---------------\\n\\n\"\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"\n",
"## Customization\n",
"\n",
"You can pass the underlying tagging chain the standard LLMChain arguments in the document transformer constructor. For example, if you wanted to ask the LLM to focus specific details in the input documents, or extract metadata in a certain style, you could pass in a custom prompt:"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Review of The Bee Movie\n",
"By Roger Ebert\n",
"\n",
"This is the greatest movie ever made. 4 out of 5 stars.\n",
"\n",
"{\"movie_title\": \"The Bee Movie\", \"critic\": \"Roger Ebert\", \"tone\": \"positive\", \"rating\": 4}\n",
"\n",
"---------------\n",
"\n",
"Review of The Godfather\n",
"By Anonymous\n",
"\n",
"This movie was super boring. 1 out of 5 stars.\n",
"\n",
"{\"movie_title\": \"The Godfather\", \"critic\": \"Roger Ebert\", \"tone\": \"negative\", \"rating\": 1, \"reliable\": false}\n"
]
}
],
"source": [
"from langchain.prompts import ChatPromptTemplate\n",
"\n",
"prompt = ChatPromptTemplate.from_template(\n",
" \"\"\"Extract relevant information from the following text.\n",
"Anonymous critics are actually Roger Ebert.\n",
"\n",
"{input}\n",
"\"\"\"\n",
")\n",
"\n",
"document_transformer = create_metadata_tagger(schema, llm, prompt=prompt)\n",
"enhanced_documents = document_transformer.transform_documents(original_documents)\n",
"\n",
"print(\n",
" *[d.page_content + \"\\n\\n\" + json.dumps(d.metadata) for d in enhanced_documents],\n",
" sep=\"\\n\\n---------------\\n\\n\"\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "venv",
"language": "python",
"name": "venv"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.3"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -155,9 +155,12 @@
"\n",
"# Char-level splits\n",
"from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
"\n",
"chunk_size = 250\n",
"chunk_overlap = 30\n",
"text_splitter = RecursiveCharacterTextSplitter(chunk_size=chunk_size, chunk_overlap=chunk_overlap)\n",
"text_splitter = RecursiveCharacterTextSplitter(\n",
" chunk_size=chunk_size, chunk_overlap=chunk_overlap\n",
")\n",
"\n",
"# Split\n",
"splits = text_splitter.split_documents(md_header_splits)\n",

View File

@@ -28,14 +28,14 @@
"# Load blog post\n",
"loader = WebBaseLoader(\"https://lilianweng.github.io/posts/2023-06-23-agent/\")\n",
"data = loader.load()\n",
" \n",
"\n",
"# Split\n",
"text_splitter = RecursiveCharacterTextSplitter(chunk_size = 500, chunk_overlap = 0)\n",
"text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)\n",
"splits = text_splitter.split_documents(data)\n",
"\n",
"# VectorDB\n",
"embedding = OpenAIEmbeddings()\n",
"vectordb = Chroma.from_documents(documents=splits,embedding=embedding)"
"vectordb = Chroma.from_documents(documents=splits, embedding=embedding)"
]
},
{
@@ -57,9 +57,12 @@
"source": [
"from langchain.chat_models import ChatOpenAI\n",
"from langchain.retrievers.multi_query import MultiQueryRetriever\n",
"question=\"What are the approaches to Task Decomposition?\"\n",
"\n",
"question = \"What are the approaches to Task Decomposition?\"\n",
"llm = ChatOpenAI(temperature=0)\n",
"retriever_from_llm = MultiQueryRetriever.from_llm(retriever=vectordb.as_retriever(),llm=llm)"
"retriever_from_llm = MultiQueryRetriever.from_llm(\n",
" retriever=vectordb.as_retriever(), llm=llm\n",
")"
]
},
{
@@ -71,8 +74,9 @@
"source": [
"# Set logging for the queries\n",
"import logging\n",
"\n",
"logging.basicConfig()\n",
"logging.getLogger('langchain.retrievers.multi_query').setLevel(logging.INFO)"
"logging.getLogger(\"langchain.retrievers.multi_query\").setLevel(logging.INFO)"
]
},
{
@@ -127,20 +131,24 @@
"from langchain.prompts import PromptTemplate\n",
"from langchain.output_parsers import PydanticOutputParser\n",
"\n",
"\n",
"# Output parser will split the LLM result into a list of queries\n",
"class LineList(BaseModel):\n",
" # \"lines\" is the key (attribute name) of the parsed output\n",
" lines: List[str] = Field(description=\"Lines of text\")\n",
"\n",
"\n",
"class LineListOutputParser(PydanticOutputParser):\n",
" def __init__(self) -> None:\n",
" super().__init__(pydantic_object=LineList)\n",
"\n",
" def parse(self, text: str) -> LineList:\n",
" lines = text.strip().split(\"\\n\")\n",
" return LineList(lines=lines)\n",
"\n",
"\n",
"output_parser = LineListOutputParser()\n",
" \n",
"\n",
"QUERY_PROMPT = PromptTemplate(\n",
" input_variables=[\"question\"],\n",
" template=\"\"\"You are an AI language model assistant. Your task is to generate five \n",
@@ -153,10 +161,10 @@
"llm = ChatOpenAI(temperature=0)\n",
"\n",
"# Chain\n",
"llm_chain = LLMChain(llm=llm,prompt=QUERY_PROMPT,output_parser=output_parser)\n",
" \n",
"llm_chain = LLMChain(llm=llm, prompt=QUERY_PROMPT, output_parser=output_parser)\n",
"\n",
"# Other inputs\n",
"question=\"What are the approaches to Task Decomposition?\""
"question = \"What are the approaches to Task Decomposition?\""
]
},
{
@@ -185,12 +193,14 @@
],
"source": [
"# Run\n",
"retriever = MultiQueryRetriever(retriever=vectordb.as_retriever(), \n",
" llm_chain=llm_chain,\n",
" parser_key=\"lines\") # \"lines\" is the key (attribute name) of the parsed output\n",
"retriever = MultiQueryRetriever(\n",
" retriever=vectordb.as_retriever(), llm_chain=llm_chain, parser_key=\"lines\"\n",
") # \"lines\" is the key (attribute name) of the parsed output\n",
"\n",
"# Results\n",
"unique_docs = retriever.get_relevant_documents(query=\"What does the course say about regression?\")\n",
"unique_docs = retriever.get_relevant_documents(\n",
" query=\"What does the course say about regression?\"\n",
")\n",
"len(unique_docs)"
]
}

View File

@@ -59,11 +59,11 @@
"import os\n",
"import getpass\n",
"\n",
"os.environ['OPENAI_API_KEY'] = getpass.getpass('OpenAI API Key:')\n",
"os.environ['MYSCALE_HOST'] = getpass.getpass('MyScale URL:')\n",
"os.environ['MYSCALE_PORT'] = getpass.getpass('MyScale Port:')\n",
"os.environ['MYSCALE_USERNAME'] = getpass.getpass('MyScale Username:')\n",
"os.environ['MYSCALE_PASSWORD'] = getpass.getpass('MyScale Password:')"
"os.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"OpenAI API Key:\")\n",
"os.environ[\"MYSCALE_HOST\"] = getpass.getpass(\"MyScale URL:\")\n",
"os.environ[\"MYSCALE_PORT\"] = getpass.getpass(\"MyScale Port:\")\n",
"os.environ[\"MYSCALE_USERNAME\"] = getpass.getpass(\"MyScale Username:\")\n",
"os.environ[\"MYSCALE_PASSWORD\"] = getpass.getpass(\"MyScale Password:\")"
]
},
{
@@ -103,16 +103,40 @@
"outputs": [],
"source": [
"docs = [\n",
" Document(page_content=\"A bunch of scientists bring back dinosaurs and mayhem breaks loose\", metadata={\"date\": \"1993-07-02\", \"rating\": 7.7, \"genre\": [\"science fiction\"]}),\n",
" Document(page_content=\"Leo DiCaprio gets lost in a dream within a dream within a dream within a ...\", metadata={\"date\": \"2010-12-30\", \"director\": \"Christopher Nolan\", \"rating\": 8.2}),\n",
" Document(page_content=\"A psychologist / detective gets lost in a series of dreams within dreams within dreams and Inception reused the idea\", metadata={\"date\": \"2006-04-23\", \"director\": \"Satoshi Kon\", \"rating\": 8.6}),\n",
" Document(page_content=\"A bunch of normal-sized women are supremely wholesome and some men pine after them\", metadata={\"date\": \"2019-08-22\", \"director\": \"Greta Gerwig\", \"rating\": 8.3}),\n",
" Document(page_content=\"Toys come alive and have a blast doing so\", metadata={\"date\": \"1995-02-11\", \"genre\": [\"animated\"]}),\n",
" Document(page_content=\"Three men walk into the Zone, three men walk out of the Zone\", metadata={\"date\": \"1979-09-10\", \"rating\": 9.9, \"director\": \"Andrei Tarkovsky\", \"genre\": [\"science fiction\", \"adventure\"], \"rating\": 9.9})\n",
" Document(\n",
" page_content=\"A bunch of scientists bring back dinosaurs and mayhem breaks loose\",\n",
" metadata={\"date\": \"1993-07-02\", \"rating\": 7.7, \"genre\": [\"science fiction\"]},\n",
" ),\n",
" Document(\n",
" page_content=\"Leo DiCaprio gets lost in a dream within a dream within a dream within a ...\",\n",
" metadata={\"date\": \"2010-12-30\", \"director\": \"Christopher Nolan\", \"rating\": 8.2},\n",
" ),\n",
" Document(\n",
" page_content=\"A psychologist / detective gets lost in a series of dreams within dreams within dreams and Inception reused the idea\",\n",
" metadata={\"date\": \"2006-04-23\", \"director\": \"Satoshi Kon\", \"rating\": 8.6},\n",
" ),\n",
" Document(\n",
" page_content=\"A bunch of normal-sized women are supremely wholesome and some men pine after them\",\n",
" metadata={\"date\": \"2019-08-22\", \"director\": \"Greta Gerwig\", \"rating\": 8.3},\n",
" ),\n",
" Document(\n",
" page_content=\"Toys come alive and have a blast doing so\",\n",
" metadata={\"date\": \"1995-02-11\", \"genre\": [\"animated\"]},\n",
" ),\n",
" Document(\n",
" page_content=\"Three men walk into the Zone, three men walk out of the Zone\",\n",
" metadata={\n",
" \"date\": \"1979-09-10\",\n",
" \"rating\": 9.9,\n",
" \"director\": \"Andrei Tarkovsky\",\n",
" \"genre\": [\"science fiction\", \"adventure\"],\n",
" \"rating\": 9.9,\n",
" },\n",
" ),\n",
"]\n",
"vectorstore = MyScale.from_documents(\n",
" docs, \n",
" embeddings, \n",
" docs,\n",
" embeddings,\n",
")"
]
},
@@ -138,39 +162,39 @@
"from langchain.retrievers.self_query.base import SelfQueryRetriever\n",
"from langchain.chains.query_constructor.base import AttributeInfo\n",
"\n",
"metadata_field_info=[\n",
"metadata_field_info = [\n",
" AttributeInfo(\n",
" name=\"genre\",\n",
" description=\"The genres of the movie\", \n",
" type=\"list[string]\", \n",
" description=\"The genres of the movie\",\n",
" type=\"list[string]\",\n",
" ),\n",
" # If you want to include length of a list, just define it as a new column\n",
" # This will teach the LLM to use it as a column when constructing filter.\n",
" AttributeInfo(\n",
" name=\"length(genre)\",\n",
" description=\"The length of genres of the movie\", \n",
" type=\"integer\", \n",
" description=\"The length of genres of the movie\",\n",
" type=\"integer\",\n",
" ),\n",
" # Now you can define a column as timestamp. By simply set the type to timestamp.\n",
" AttributeInfo(\n",
" name=\"date\",\n",
" description=\"The date the movie was released\", \n",
" type=\"timestamp\", \n",
" description=\"The date the movie was released\",\n",
" type=\"timestamp\",\n",
" ),\n",
" AttributeInfo(\n",
" name=\"director\",\n",
" description=\"The name of the movie director\", \n",
" type=\"string\", \n",
" description=\"The name of the movie director\",\n",
" type=\"string\",\n",
" ),\n",
" AttributeInfo(\n",
" name=\"rating\",\n",
" description=\"A 1-10 rating for the movie\",\n",
" type=\"float\"\n",
" name=\"rating\", description=\"A 1-10 rating for the movie\", type=\"float\"\n",
" ),\n",
"]\n",
"document_content_description = \"Brief summary of a movie\"\n",
"llm = OpenAI(temperature=0)\n",
"retriever = SelfQueryRetriever.from_llm(llm, vectorstore, document_content_description, metadata_field_info, verbose=True)"
"retriever = SelfQueryRetriever.from_llm(\n",
" llm, vectorstore, document_content_description, metadata_field_info, verbose=True\n",
")"
]
},
{
@@ -225,7 +249,9 @@
"outputs": [],
"source": [
"# This example specifies a composite filter\n",
"retriever.get_relevant_documents(\"What's a highly rated (above 8.5) science fiction film?\")"
"retriever.get_relevant_documents(\n",
" \"What's a highly rated (above 8.5) science fiction film?\"\n",
")"
]
},
{
@@ -236,7 +262,9 @@
"outputs": [],
"source": [
"# This example specifies a query and composite filter\n",
"retriever.get_relevant_documents(\"What's a movie after 1990 but before 2005 that's all about toys, and preferably is animated\")"
"retriever.get_relevant_documents(\n",
" \"What's a movie after 1990 but before 2005 that's all about toys, and preferably is animated\"\n",
")"
]
},
{
@@ -290,7 +318,9 @@
"outputs": [],
"source": [
"# Contain works for lists: so you can match a list with contain comparator!\n",
"retriever.get_relevant_documents(\"What's a movie who has genres science fiction and adventure?\")"
"retriever.get_relevant_documents(\n",
" \"What's a movie who has genres science fiction and adventure?\"\n",
")"
]
},
{
@@ -315,12 +345,12 @@
"outputs": [],
"source": [
"retriever = SelfQueryRetriever.from_llm(\n",
" llm, \n",
" vectorstore, \n",
" document_content_description, \n",
" metadata_field_info, \n",
" llm,\n",
" vectorstore,\n",
" document_content_description,\n",
" metadata_field_info,\n",
" enable_limit=True,\n",
" verbose=True\n",
" verbose=True,\n",
")"
]
},

View File

@@ -18,7 +18,7 @@
"## Creating a Pinecone index\n",
"First we'll want to create a `Pinecone` VectorStore and seed it with some data. We've created a small demo set of documents that contain summaries of movies.\n",
"\n",
"To use Pinecone, you to have `pinecone` package installed and you must have an API key and an Environment. Here are the [installation instructions](https://docs.pinecone.io/docs/quickstart).\n",
"To use Pinecone, you have to have `pinecone` package installed and you must have an API key and an Environment. Here are the [installation instructions](https://docs.pinecone.io/docs/quickstart).\n",
"\n",
"NOTE: The self-query retriever requires you to have `lark` package installed."
]

View File

@@ -53,9 +53,7 @@
"metadata": {},
"outputs": [],
"source": [
"retriever = AmazonKendraRetriever(\n",
" index_id=\"c0806df7-e76b-4bce-9b5c-d5582f6b1a03\"\n",
")"
"retriever = AmazonKendraRetriever(index_id=\"c0806df7-e76b-4bce-9b5c-d5582f6b1a03\")"
]
},
{

View File

@@ -91,7 +91,7 @@
"metadata": {},
"outputs": [],
"source": [
"retriever = AzureCognitiveSearchRetriever(content_key=\"content\")"
"retriever = AzureCognitiveSearchRetriever(content_key=\"content\", top_k=10)"
]
},
{
@@ -111,6 +111,36 @@
"source": [
"retriever.get_relevant_documents(\"what is langchain\")"
]
},
{
"cell_type": "markdown",
"id": "72eca08e",
"metadata": {},
"source": [
"You can change the number of results returned with the `top_k` parameter. The default value is `None`, which returns all results. "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "097146c5",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"id": "6d9963f5",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"id": "dc120696",
"metadata": {},
"source": []
}
],
"metadata": {

View File

@@ -1,21 +1,31 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"id": "9fc6205b",
"metadata": {},
"source": [
"# Databerry\n",
"# Chaindesk\n",
"\n",
">[Databerry platform](https://docs.databerry.ai/introduction) brings data from anywhere (Datsources: Text, PDF, Word, PowerPpoint, Excel, Notion, Airtable, Google Sheets, etc..) into Datastores (container of multiple Datasources).\n",
"Then your Datastores can be connected to ChatGPT via Plugins or any other Large Langue Model (LLM) via the `Databerry API`.\n",
">[Chaindesk platform](https://docs.chaindesk.ai/introduction) brings data from anywhere (Datsources: Text, PDF, Word, PowerPpoint, Excel, Notion, Airtable, Google Sheets, etc..) into Datastores (container of multiple Datasources).\n",
"Then your Datastores can be connected to ChatGPT via Plugins or any other Large Langue Model (LLM) via the `Chaindesk API`.\n",
"\n",
"This notebook shows how to use [Databerry's](https://www.databerry.ai/) retriever.\n",
"This notebook shows how to use [Chaindesk's](https://www.chaindesk.ai/) retriever.\n",
"\n",
"First, you will need to sign up for Databerry, create a datastore, add some data and get your datastore api endpoint url. You need the [API Key](https://docs.databerry.ai/api-reference/authentication)."
"First, you will need to sign up for Chaindesk, create a datastore, add some data and get your datastore api endpoint url. You need the [API Key](https://docs.chaindesk.ai/api-reference/authentication)."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3697b9fd",
"metadata": {},
"outputs": [],
"source": []
},
{
"attachments": {},
"cell_type": "markdown",
"id": "944e172b",
"metadata": {},
@@ -34,7 +44,7 @@
},
"outputs": [],
"source": [
"from langchain.retrievers import DataberryRetriever"
"from langchain.retrievers import ChaindeskRetriever"
]
},
{
@@ -46,9 +56,9 @@
},
"outputs": [],
"source": [
"retriever = DataberryRetriever(\n",
" datastore_url=\"https://clg1xg2h80000l708dymr0fxc.databerry.ai/query\",\n",
" # api_key=\"DATABERRY_API_KEY\", # optional if datastore is public\n",
"retriever = ChaindeskRetriever(\n",
" datastore_url=\"https://clg1xg2h80000l708dymr0fxc.chaindesk.ai/query\",\n",
" # api_key=\"CHAINDESK_API_KEY\", # optional if datastore is public\n",
" # top_k=10 # optional\n",
")"
]

View File

@@ -111,10 +111,10 @@
"db.index(\n",
" [\n",
" MyDoc(\n",
" title=f'My document {i}',\n",
" title_embedding=embeddings.embed_query(f'query {i}'),\n",
" title=f\"My document {i}\",\n",
" title_embedding=embeddings.embed_query(f\"query {i}\"),\n",
" year=i,\n",
" color=random.choice(['red', 'green', 'blue']),\n",
" color=random.choice([\"red\", \"green\", \"blue\"]),\n",
" )\n",
" for i in range(100)\n",
" ]\n",
@@ -142,15 +142,15 @@
"source": [
"# create a retriever\n",
"retriever = DocArrayRetriever(\n",
" index=db, \n",
" embeddings=embeddings, \n",
" search_field='title_embedding', \n",
" content_field='title',\n",
" index=db,\n",
" embeddings=embeddings,\n",
" search_field=\"title_embedding\",\n",
" content_field=\"title\",\n",
" filters=filter_query,\n",
")\n",
"\n",
"# find the relevant document\n",
"doc = retriever.get_relevant_documents('some query')\n",
"doc = retriever.get_relevant_documents(\"some query\")\n",
"print(doc)"
]
},
@@ -179,16 +179,16 @@
"\n",
"\n",
"# initialize the index\n",
"db = HnswDocumentIndex[MyDoc](work_dir='hnsw_index')\n",
"db = HnswDocumentIndex[MyDoc](work_dir=\"hnsw_index\")\n",
"\n",
"# index data\n",
"db.index(\n",
" [\n",
" MyDoc(\n",
" title=f'My document {i}',\n",
" title_embedding=embeddings.embed_query(f'query {i}'),\n",
" title=f\"My document {i}\",\n",
" title_embedding=embeddings.embed_query(f\"query {i}\"),\n",
" year=i,\n",
" color=random.choice(['red', 'green', 'blue']),\n",
" color=random.choice([\"red\", \"green\", \"blue\"]),\n",
" )\n",
" for i in range(100)\n",
" ]\n",
@@ -216,15 +216,15 @@
"source": [
"# create a retriever\n",
"retriever = DocArrayRetriever(\n",
" index=db, \n",
" embeddings=embeddings, \n",
" search_field='title_embedding', \n",
" content_field='title',\n",
" index=db,\n",
" embeddings=embeddings,\n",
" search_field=\"title_embedding\",\n",
" content_field=\"title\",\n",
" filters=filter_query,\n",
")\n",
"\n",
"# find the relevant document\n",
"doc = retriever.get_relevant_documents('some query')\n",
"doc = retriever.get_relevant_documents(\"some query\")\n",
"print(doc)"
]
},
@@ -249,11 +249,12 @@
},
"outputs": [],
"source": [
"# There's a small difference with the Weaviate backend compared to the others. \n",
"# Here, you need to 'mark' the field used for vector search with 'is_embedding=True'. \n",
"# There's a small difference with the Weaviate backend compared to the others.\n",
"# Here, you need to 'mark' the field used for vector search with 'is_embedding=True'.\n",
"# So, let's create a new schema for Weaviate that takes care of this requirement.\n",
"\n",
"from pydantic import Field \n",
"from pydantic import Field\n",
"\n",
"\n",
"class WeaviateDoc(BaseDoc):\n",
" title: str\n",
@@ -275,19 +276,17 @@
"\n",
"\n",
"# initialize the index\n",
"dbconfig = WeaviateDocumentIndex.DBConfig(\n",
" host=\"http://localhost:8080\"\n",
")\n",
"dbconfig = WeaviateDocumentIndex.DBConfig(host=\"http://localhost:8080\")\n",
"db = WeaviateDocumentIndex[WeaviateDoc](db_config=dbconfig)\n",
"\n",
"# index data\n",
"db.index(\n",
" [\n",
" MyDoc(\n",
" title=f'My document {i}',\n",
" title_embedding=embeddings.embed_query(f'query {i}'),\n",
" title=f\"My document {i}\",\n",
" title_embedding=embeddings.embed_query(f\"query {i}\"),\n",
" year=i,\n",
" color=random.choice(['red', 'green', 'blue']),\n",
" color=random.choice([\"red\", \"green\", \"blue\"]),\n",
" )\n",
" for i in range(100)\n",
" ]\n",
@@ -315,15 +314,15 @@
"source": [
"# create a retriever\n",
"retriever = DocArrayRetriever(\n",
" index=db, \n",
" embeddings=embeddings, \n",
" search_field='title_embedding', \n",
" content_field='title',\n",
" index=db,\n",
" embeddings=embeddings,\n",
" search_field=\"title_embedding\",\n",
" content_field=\"title\",\n",
" filters=filter_query,\n",
")\n",
"\n",
"# find the relevant document\n",
"doc = retriever.get_relevant_documents('some query')\n",
"doc = retriever.get_relevant_documents(\"some query\")\n",
"print(doc)"
]
},
@@ -353,18 +352,17 @@
"\n",
"# initialize the index\n",
"db = ElasticDocIndex[MyDoc](\n",
" hosts=\"http://localhost:9200\", \n",
" index_name=\"docarray_retriever\"\n",
" hosts=\"http://localhost:9200\", index_name=\"docarray_retriever\"\n",
")\n",
"\n",
"# index data\n",
"db.index(\n",
" [\n",
" MyDoc(\n",
" title=f'My document {i}',\n",
" title_embedding=embeddings.embed_query(f'query {i}'),\n",
" title=f\"My document {i}\",\n",
" title_embedding=embeddings.embed_query(f\"query {i}\"),\n",
" year=i,\n",
" color=random.choice(['red', 'green', 'blue']),\n",
" color=random.choice([\"red\", \"green\", \"blue\"]),\n",
" )\n",
" for i in range(100)\n",
" ]\n",
@@ -392,15 +390,15 @@
"source": [
"# create a retriever\n",
"retriever = DocArrayRetriever(\n",
" index=db, \n",
" embeddings=embeddings, \n",
" search_field='title_embedding', \n",
" content_field='title',\n",
" index=db,\n",
" embeddings=embeddings,\n",
" search_field=\"title_embedding\",\n",
" content_field=\"title\",\n",
" filters=filter_query,\n",
")\n",
"\n",
"# find the relevant document\n",
"doc = retriever.get_relevant_documents('some query')\n",
"doc = retriever.get_relevant_documents(\"some query\")\n",
"print(doc)"
]
},
@@ -445,10 +443,10 @@
"db.index(\n",
" [\n",
" MyDoc(\n",
" title=f'My document {i}',\n",
" title_embedding=embeddings.embed_query(f'query {i}'),\n",
" title=f\"My document {i}\",\n",
" title_embedding=embeddings.embed_query(f\"query {i}\"),\n",
" year=i,\n",
" color=random.choice(['red', 'green', 'blue']),\n",
" color=random.choice([\"red\", \"green\", \"blue\"]),\n",
" )\n",
" for i in range(100)\n",
" ]\n",
@@ -486,15 +484,15 @@
"source": [
"# create a retriever\n",
"retriever = DocArrayRetriever(\n",
" index=db, \n",
" embeddings=embeddings, \n",
" search_field='title_embedding', \n",
" content_field='title',\n",
" index=db,\n",
" embeddings=embeddings,\n",
" search_field=\"title_embedding\",\n",
" content_field=\"title\",\n",
" filters=filter_query,\n",
")\n",
"\n",
"# find the relevant document\n",
"doc = retriever.get_relevant_documents('some query')\n",
"doc = retriever.get_relevant_documents(\"some query\")\n",
"print(doc)"
]
},
@@ -552,7 +550,7 @@
" \"director\": \"Francis Ford Coppola\",\n",
" \"rating\": 9.2,\n",
" },\n",
"]\n"
"]"
]
},
{
@@ -573,9 +571,9 @@
],
"source": [
"import getpass\n",
"import os \n",
"import os\n",
"\n",
"os.environ['OPENAI_API_KEY'] = getpass.getpass('OpenAI API Key:')"
"os.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"OpenAI API Key:\")"
]
},
{
@@ -591,6 +589,7 @@
"from docarray.typing import NdArray\n",
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
"\n",
"\n",
"# define schema for your movie documents\n",
"class MyDoc(BaseDoc):\n",
" title: str\n",
@@ -598,7 +597,7 @@
" description_embedding: NdArray[1536]\n",
" rating: float\n",
" director: str\n",
" \n",
"\n",
"\n",
"embeddings = OpenAIEmbeddings()\n",
"\n",
@@ -626,7 +625,7 @@
"from docarray.index import HnswDocumentIndex\n",
"\n",
"# initialize the index\n",
"db = HnswDocumentIndex[MyDoc](work_dir='movie_search')\n",
"db = HnswDocumentIndex[MyDoc](work_dir=\"movie_search\")\n",
"\n",
"# add data\n",
"db.index(docs)"
@@ -663,14 +662,14 @@
"\n",
"# create a retriever\n",
"retriever = DocArrayRetriever(\n",
" index=db, \n",
" embeddings=embeddings, \n",
" search_field='description_embedding', \n",
" content_field='description'\n",
" index=db,\n",
" embeddings=embeddings,\n",
" search_field=\"description_embedding\",\n",
" content_field=\"description\",\n",
")\n",
"\n",
"# find the relevant document\n",
"doc = retriever.get_relevant_documents('movie about dreams')\n",
"doc = retriever.get_relevant_documents(\"movie about dreams\")\n",
"print(doc)"
]
},
@@ -703,16 +702,16 @@
"\n",
"# create a retriever\n",
"retriever = DocArrayRetriever(\n",
" index=db, \n",
" embeddings=embeddings, \n",
" search_field='description_embedding', \n",
" content_field='description',\n",
" filters={'director': {'$eq': 'Christopher Nolan'}},\n",
" index=db,\n",
" embeddings=embeddings,\n",
" search_field=\"description_embedding\",\n",
" content_field=\"description\",\n",
" filters={\"director\": {\"$eq\": \"Christopher Nolan\"}},\n",
" top_k=2,\n",
")\n",
"\n",
"# find relevant documents\n",
"docs = retriever.get_relevant_documents('space travel')\n",
"docs = retriever.get_relevant_documents(\"space travel\")\n",
"print(docs)"
]
},
@@ -745,17 +744,17 @@
"\n",
"# create a retriever\n",
"retriever = DocArrayRetriever(\n",
" index=db, \n",
" embeddings=embeddings, \n",
" search_field='description_embedding', \n",
" content_field='description',\n",
" filters={'rating': {'$gte': 8.7}},\n",
" search_type='mmr',\n",
" index=db,\n",
" embeddings=embeddings,\n",
" search_field=\"description_embedding\",\n",
" content_field=\"description\",\n",
" filters={\"rating\": {\"$gte\": 8.7}},\n",
" search_type=\"mmr\",\n",
" top_k=3,\n",
")\n",
"\n",
"# find relevant documents\n",
"docs = retriever.get_relevant_documents('action movies')\n",
"docs = retriever.get_relevant_documents(\"action movies\")\n",
"print(docs)"
]
},

View File

@@ -1,6 +1,7 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"id": "fc0db1bc",
"metadata": {},
@@ -25,7 +26,10 @@
"from langchain.vectorstores import Chroma\n",
"from langchain.embeddings import HuggingFaceEmbeddings\n",
"from langchain.embeddings import OpenAIEmbeddings\n",
"from langchain.document_transformers import EmbeddingsRedundantFilter\n",
"from langchain.document_transformers import (\n",
" EmbeddingsRedundantFilter,\n",
" EmbeddingsClusteringFilter,\n",
")\n",
"from langchain.retrievers.document_compressors import DocumentCompressorPipeline\n",
"from langchain.retrievers import ContextualCompressionRetriever\n",
"\n",
@@ -70,6 +74,7 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "c152339d",
"metadata": {},
@@ -92,6 +97,46 @@
" base_compressor=pipeline, base_retriever=lotr\n",
")"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "c10022fa",
"metadata": {},
"source": [
"## Pick a representative sample of documents from the merged retrievers."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b3885482",
"metadata": {},
"outputs": [],
"source": [
"# This filter will divide the documents vectors into clusters or \"centers\" of meaning.\n",
"# Then it will pick the closest document to that center for the final results.\n",
"# By default the result document will be ordered/grouped by clusters.\n",
"filter_ordered_cluster = EmbeddingsClusteringFilter(\n",
" embeddings=filter_embeddings,\n",
" num_clusters=10,\n",
" num_closest=1,\n",
")\n",
"\n",
"# If you want the final document to be ordered by the original retriever scores\n",
"# you need to add the \"sorted\" parameter.\n",
"filter_ordered_by_retriever = EmbeddingsClusteringFilter(\n",
" embeddings=filter_embeddings,\n",
" num_clusters=10,\n",
" num_closest=1,\n",
" sorted=True,\n",
")\n",
"\n",
"pipeline = DocumentCompressorPipeline(transformers=[filter_ordered_by_retriever])\n",
"compression_retriever = ContextualCompressionRetriever(\n",
" base_compressor=pipeline, base_retriever=lotr\n",
")"
]
}
],
"metadata": {

View File

@@ -123,7 +123,7 @@
"\n",
"index_name = \"langchain-pinecone-hybrid-search\"\n",
"\n",
"pinecone.init(api_key=api_key, enviroment=env)\n",
"pinecone.init(api_key=api_key, environment=env)\n",
"pinecone.whoami()"
]
},

View File

@@ -111,9 +111,7 @@
"\n",
"# Set up Zep Chat History. We'll use this to add chat histories to the memory store\n",
"zep_chat_history = ZepChatMessageHistory(\n",
" session_id=session_id,\n",
" url=ZEP_API_URL,\n",
" api_key=zep_api_key\n",
" session_id=session_id, url=ZEP_API_URL, api_key=zep_api_key\n",
")"
]
},
@@ -247,7 +245,7 @@
" session_id=session_id, # Ensure that you provide the session_id when instantiating the Retriever\n",
" url=ZEP_API_URL,\n",
" top_k=5,\n",
" api_key=zep_api_key\n",
" api_key=zep_api_key,\n",
")\n",
"\n",
"await zep_retriever.aget_relevant_documents(\"Who wrote Parable of the Sower?\")"

View File

@@ -65,7 +65,7 @@
}
],
"source": [
"# Please login and get your API key from https://clarifai.com/settings/security \n",
"# Please login and get your API key from https://clarifai.com/settings/security\n",
"from getpass import getpass\n",
"\n",
"CLARIFAI_PAT = getpass()"
@@ -130,9 +130,9 @@
"metadata": {},
"outputs": [],
"source": [
"USER_ID = 'openai'\n",
"APP_ID = 'embed'\n",
"MODEL_ID = 'text-embedding-ada'\n",
"USER_ID = \"openai\"\n",
"APP_ID = \"embed\"\n",
"MODEL_ID = \"text-embedding-ada\"\n",
"\n",
"# You can provide a specific model version as the model_version_id arg.\n",
"# MODEL_VERSION_ID = \"MODEL_VERSION_ID\""
@@ -148,7 +148,9 @@
"outputs": [],
"source": [
"# Initialize a Clarifai embedding model\n",
"embeddings = ClarifaiEmbeddings(pat=CLARIFAI_PAT, user_id=USER_ID, app_id=APP_ID, model_id=MODEL_ID)"
"embeddings = ClarifaiEmbeddings(\n",
" pat=CLARIFAI_PAT, user_id=USER_ID, app_id=APP_ID, model_id=MODEL_ID\n",
")"
]
},
{

Some files were not shown because too many files have changed in this diff Show More