Compare commits

..

92 Commits

Author SHA1 Message Date
Ducasse-Arthur
93a84f6182 Update bedrock.py - support of other endpoint url (esp. for users of … (#7592)
Added an _endpoint_url_ attribute to Bedrock(LLM) class - I have access
to Bedrock only via us-west-2 endpoint and needed to change the endpoint
url, this could be useful to other users
2023-07-12 10:43:23 -04:00
Bagatur
22525bad65 bump 231 (#7584) 2023-07-12 10:43:12 -04:00
Subsegment
6e1000dc8d docs : Use more meaningful cnosdb examples (#7587)
This change makes the ecosystem integrations cnosdb documentation more
realistic and easy to understand.

- change examples of question and table
- modify typo and format
2023-07-12 10:31:55 -04:00
Samuel ROZE
f3c9bf5e4b fix(typo): Clarify the point of llm_chain (#7593)
Fixes a typo introduced in
https://github.com/hwchase17/langchain/pull/7080 by @hwchase17.

In the example (visible on [the online
documentation](https://api.python.langchain.com/en/latest/chains/langchain.chains.conversational_retrieval.base.ConversationalRetrievalChain.html#langchain-chains-conversational-retrieval-base-conversationalretrievalchain)),
the `llm_chain` variable is unused as opposed to being used for the
question generator. This change makes it clearer.
2023-07-12 10:31:00 -04:00
Alec Flett
6cdd4b5edc only add handlers if they are new (#7504)
When using callbacks, there are times when callbacks can be added
redundantly: for instance sometimes you might need to create an llm with
specific callbacks, but then also create and agent that uses a chain
that has those callbacks already set. This means that "callbacks" might
get passed down again to the llm at predict() time, resulting in
duplicate calls to the `on_llm_start` callback.

For the sake of simplicity, I made it so that langchain never adds an
exact handler/callbacks object in `add_handler`, thus avoiding the
duplicate handler issue.

Tagging @hwchase17 for callback review

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-07-12 03:48:29 -04:00
ausboss
50316f6477 Adding LLM wrapper for Kobold AI (#7560)
- Description: add wrapper that lets you use KoboldAI api in langchain
  - Issue: n/a
  - Dependencies: none extra, just what exists in lanchain
  - Tag maintainer: @baskaryan 
  - Twitter handle: @zanzibased
---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-07-12 03:48:12 -04:00
Rohit Kumar Singh
603a0bea29 Fixes incorrect docstore creation in faiss.py (#7026)
- **Description**: Current implementation assumes that the length of
`texts` and `ids` should be same but if the passed `ids` length is not
equal to the passed length of `texts`, current code
`dict(zip(index_to_id.values(), documents))` is not failing or giving
any warning and silently creating docstores only for the passed `ids`
i.e. if `ids = ['A']` and `texts=["I love Open Source","I love
langchain"]` then only one `docstore` will be created. But either two
docstores should be created assuming same id value for all the elements
of `texts` or an error should be raised.
  
- **Issue**: My change fixes this by using dictionary comprehension
instead of `zip`. This was if lengths of `ids` and `texts` mismatches an
explicit `IndexError` will be raised.
  
@rlancemartin, @eyurtsev
2023-07-12 03:35:49 -04:00
Tommy Hyeonwoo Kim
3f7213586e add supported properties for notiondb document loader's metadata (#7570)
fix #7569

add following properties for Notion DB document loader's metadata
- `unique_id`
- `status`
- `people`

@rlancemartin, @eyurtsev (Since this is a change related to
`DataLoaders`)

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-07-12 03:34:54 -04:00
Junlin Zhou
5f17c57174 Update chat agents' output parser to extract action by regex (#7511)
Currently `ChatOutputParser` extracts actions by splitting the text on
"```", and then load the second part as a json string.

But sometimes the LLM will wrap the action in markdown code block like:

````markdown
```json
{
  "action": "foo",
  "action_input": "bar"
}
```
````

Splitting text on "```" will cause `OutputParserException` in such case.

This PR changes the behaviour to extract the `$JSON_BLOB` by regex, so
that it can handle both ` ``` ``` ` and ` ```json ``` `

@hinthornw

---------

Co-authored-by: Junlin Zhou <jlzhou@zjuici.com>
2023-07-12 03:12:02 -04:00
Bagatur
ebcb144342 unit test sqlalachemy (#7582) 2023-07-12 03:03:16 -04:00
Harrison Chase
641fd74baa Harrison/pg vector move (#7580) 2023-07-12 02:22:34 -04:00
os1ma
2667ddc686 Fix make docs_build and related scripts (#7276)
**Description: a description of the change**

Fixed `make docs_build` and related scripts which caused errors. There
are several changes.

First, I made the build of the documentation and the API Reference into
two separate commands. This is because it takes less time to build. The
commands for documents are `make docs_build`, `make docs_clean`, and
`make docs_linkcheck`. The commands for API Reference are `make
api_docs_build`, `api_docs_clean`, and `api_docs_linkcheck`.

It looked like `docs/.local_build.sh` could be used to build the
documentation, so I used that. Since `.local_build.sh` was also building
API Rerefence internally, I removed that process. `.local_build.sh` also
added some Bash options to stop in error or so. Futher more added `cd
"${SCRIPT_DIR}"` at the beginning so that the script will work no matter
which directory it is executed in.

`docs/api_reference/api_reference.rst` is removed, because which is
generated by `docs/api_reference/create_api_rst.py`, and added it to
.gitignore.

Finally, the description of CONTRIBUTING.md was modified.

**Issue: the issue # it fixes (if applicable)**

https://github.com/hwchase17/langchain/issues/6413

**Dependencies: any dependencies required for this change**

`nbdoc` was missing in group docs so it was added. I installed it with
the `poetry add --group docs nbdoc` command. I am concerned if any
modifications are needed to poetry.lock. I would greatly appreciate it
if you could pay close attention to this file during the review.

**Tag maintainer**
- General / Misc / if you don't know who to tag: @baskaryan

If this PR needs any additional changes, I'll be happy to make them!

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-07-11 22:05:14 -04:00
Pharbie
74c28df363 Update Pinecone Upsert method usage (#7358)
Description: Refactor the upsert method in the Pinecone class to allow
for additional keyword arguments. This change adds flexibility and
extensibility to the method, allowing for future modifications or
enhancements. The upsert method now accepts the `**kwargs` parameter,
which can be used to pass any additional arguments to the Pinecone
index. This change has been made in both the `upsert` method in the
`Pinecone` class and the `upsert` method in the
`similarity_search_with_score` class method. Falls in line with the
usage of the upsert method in
[Pinecone-Python-Client](4640c4cf27/pinecone/index.py (L73))
Issue: [This feature request in Pinecone
Repo](https://github.com/pinecone-io/pinecone-python-client/issues/184)

Maintainer responsibilities:
  - General / Misc / if you don't know who to tag: @baskaryan
  - Memory: @hwchase17

---------

Co-authored-by: kwesi <22204443+yankskwesi@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Lance Martin <122662504+rlancemartin@users.noreply.github.com>
2023-07-11 21:14:42 -04:00
Kazuki Maeda
5c3fe8b0d1 Enhance Makefile with 'format_diff' Option and Improved Readability (#7394)
### Description:

This PR introduces a new option format_diff to the existing Makefile.
This option allows us to apply the formatting tools (Black and isort)
only to the changed Python and ipynb files since the last commit. This
will make our development process more efficient as we only format the
codes that we modify. Along with this change, comments were added to
make the Makefile more understandable and maintainable.

### Issue:

N/A

### Dependencies:

Add dependency to black.

### Tag maintainer:

@baskaryan

### Twitter handle:

[kzk_maeda](https://twitter.com/kzk_maeda)

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-07-11 21:03:17 -04:00
Bagatur
2babe3069f Revert pinecone v4 support (#7566)
Revert 9d13dcd
2023-07-11 20:58:59 -04:00
schop-rob
e811c5e8c6 Add OpenAI organization ID to docs (#7398)
Description: I added an example of how to reference the OpenAI API
Organization ID, because I couldn't find it before. In the example, it
is mentioned how to achieve this using environment variables as well as
parameters for the OpenAI()-class
Issue: -
Dependencies: -
Twitter @schop-rob
2023-07-11 20:51:58 -04:00
Kenny
8741e55e7c Template formats documentation (#7404)
Simple addition to the documentation, adding the correct import
statement & showcasing using Python FStrings.
2023-07-11 18:24:24 -04:00
Fielding Johnston
00c466627a minor bug fix: properly await AsyncRunManager's method call in MulitRouteChain (#7487)
This simply awaits `AsyncRunManager`'s method call in `MulitRouteChain`.
Noticed this while playing around with Langchain's implementation of
`MultiPromptChain`. @baskaryan

cheers

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-07-11 18:18:47 -04:00
tonomura
cc0585af42 Improvement/add finish reason to generation info in chat open ai (#7478)
Description: ChatOpenAI model does not return finish_reason in
generation_info.
Issue: #2702
Dependencies: None
Tag maintainer: @baskaryan 

Thank you

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-07-11 18:12:57 -04:00
Junlin Zhou
b96ac13f3d Minor update to reference other sql tool by tool names instead of hard coded string. (#7514)
<!-- Thank you for contributing to LangChain!

Replace this comment with:
  - Description: a description of the change, 
  - Issue: the issue # it fixes (if applicable),
  - Dependencies: any dependencies required for this change,
- Tag maintainer: for a quicker response, tag the relevant maintainer
(see below),
- Twitter handle: we announce bigger features on Twitter. If your PR
gets announced and you'd like a mention, we'll gladly shout you out!

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
  2. an example notebook showing its use.

Maintainer responsibilities:
  - General / Misc / if you don't know who to tag: @baskaryan
  - DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev
  - Models / Prompts: @hwchase17, @baskaryan
  - Memory: @hwchase17
  - Agents / Tools / Toolkits: @hinthornw
  - Tracing / Callbacks: @agola11
  - Async: @agola11

If no one reviews your PR within a few days, feel free to @-mention the
same people again.

See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
 -->

Currently there are 4 tools in SQL agent-toolkits, and 2 of them have
reference to the other 2.

This PR change the reference from hard coded string to `{tool.name}`

Co-authored-by: Junlin Zhou <jlzhou@zjuici.com>
2023-07-11 17:44:23 -04:00
OwenElliott
9cb2347453 Fix broken link from Marqo Ecosystem (#7510)
Small fix to a link from the Marqo page in the ecosystem.

The link was not updated correctly when the documentation structure
changed to html pages instead of links to notebooks.
2023-07-11 17:15:15 -04:00
Matt Robinson
c4d53f98dc docs: update unstructured docstrings (#7561)
### Summary

Updates the docstrings in the Unstructured document loaders to display
more useful information on the integrations page.
2023-07-11 17:12:05 -04:00
Ben Auffarth
2c2f0e15a6 clarify about api key (#7540)
I found it unclear, where to get the API keys for JinaChat. Mentioning
this in the docstring should be helpful.
#7490 

Twitter handle: benji1a

@delgermurun
2023-07-11 16:46:06 -04:00
Jona Sassenhagen
0ea7224535 [Minor] Remove tagger from spacy sentencizer (#7534)
@svlandeg gave me a tip for how to improve a bit on
https://github.com/hwchase17/langchain/pull/7442 for some extra speed
and memory gains. The tagger isn't needed for sentencization, so can be
disabled too.
2023-07-11 16:43:46 -04:00
Kacper Łukawski
1f83b5f47e Reuse the existing collection if configured properly in Qdrant.from_texts (#7530)
This PR changes the behavior of `Qdrant.from_texts` so the collection is
reused if not requested to recreate it. Previously, calling
`Qdrant.from_texts` or `Qdrant.from_documents` resulted in removing the
old data which was confusing for many.
2023-07-11 16:24:35 -04:00
Leonid Kuligin
6674b33cf5 Added support for chat_history (#7555)
#7469

Co-authored-by: Leonid Kuligin <kuligin@google.com>
2023-07-11 15:27:26 -04:00
Felix Brockmeier
406a9dc11f Add notebook example for Lemon AI NLP Workflow Automation (#7556)
- Description: Added notebook to LangChain docs that explains how to use
Lemon AI NLP Workflow Automation tool with Langchain
  
- Issue: not applicable
  
- Dependencies: not applicable
  
- Tag maintainer: @agola11
  
- Twitter handle: felixbrockm
2023-07-11 15:15:11 -04:00
Lance Martin
9e067b8cc9 Add env setup (#7550)
Include setup
2023-07-11 09:48:40 -07:00
Bagatur
3c4338470e bump 230 (#7544) 2023-07-11 11:24:08 -04:00
Bagatur
d2137eea9f fix cpal docs (#7545) 2023-07-11 11:07:45 -04:00
Boris
9129318466 CPAL (#6255)
# Causal program-aided language (CPAL) chain

## Motivation

This builds on the recent [PAL](https://arxiv.org/abs/2211.10435) to
stop LLM hallucination. The problem with the
[PAL](https://arxiv.org/abs/2211.10435) approach is that it hallucinates
on a math problem with a nested chain of dependence. The innovation here
is that this new CPAL approach includes causal structure to fix
hallucination.

For example, using the below word problem, PAL answers with 5, and CPAL
answers with 13.

    "Tim buys the same number of pets as Cindy and Boris."
    "Cindy buys the same number of pets as Bill plus Bob."
    "Boris buys the same number of pets as Ben plus Beth."
    "Bill buys the same number of pets as Obama."
    "Bob buys the same number of pets as Obama."
    "Ben buys the same number of pets as Obama."
    "Beth buys the same number of pets as Obama."
    "If Obama buys one pet, how many pets total does everyone buy?"

The CPAL chain represents the causal structure of the above narrative as
a causal graph or DAG, which it can also plot, as shown below.


![complex-graph](https://github.com/hwchase17/langchain/assets/367522/d938db15-f941-493d-8605-536ad530f576)

.

The two major sections below are:

1. Technical overview
2. Future application

Also see [this jupyter
notebook](https://github.com/borisdev/langchain/blob/master/docs/extras/modules/chains/additional/cpal.ipynb)
doc.


## 1. Technical overview

### CPAL versus PAL

Like [PAL](https://arxiv.org/abs/2211.10435), CPAL intends to reduce
large language model (LLM) hallucination.

The CPAL chain is different from the PAL chain for a couple of reasons. 

* CPAL adds a causal structure (or DAG) to link entity actions (or math
expressions).
* The CPAL math expressions are modeling a chain of cause and effect
relations, which can be intervened upon, whereas for the PAL chain math
expressions are projected math identities.

PAL's generated python code is wrong. It hallucinates when complexity
increases.

```python
def solution():
    """Tim buys the same number of pets as Cindy and Boris.Cindy buys the same number of pets as Bill plus Bob.Boris buys the same number of pets as Ben plus Beth.Bill buys the same number of pets as Obama.Bob buys the same number of pets as Obama.Ben buys the same number of pets as Obama.Beth buys the same number of pets as Obama.If Obama buys one pet, how many pets total does everyone buy?"""
    obama_pets = 1
    tim_pets = obama_pets
    cindy_pets = obama_pets + obama_pets
    boris_pets = obama_pets + obama_pets
    total_pets = tim_pets + cindy_pets + boris_pets
    result = total_pets
    return result  # math result is 5
```

CPAL's generated python code is correct.

```python
story outcome data
    name                                   code  value      depends_on
0  obama                                   pass    1.0              []
1   bill               bill.value = obama.value    1.0         [obama]
2    bob                bob.value = obama.value    1.0         [obama]
3    ben                ben.value = obama.value    1.0         [obama]
4   beth               beth.value = obama.value    1.0         [obama]
5  cindy   cindy.value = bill.value + bob.value    2.0     [bill, bob]
6  boris   boris.value = ben.value + beth.value    2.0     [ben, beth]
7    tim  tim.value = cindy.value + boris.value    4.0  [cindy, boris]

query data
{
    "question": "how many pets total does everyone buy?",
    "expression": "SELECT SUM(value) FROM df",
    "llm_error_msg": ""
}
# query result is 13
```

Based on the comments below, CPAL's intended location in the library is
`experimental/chains/cpal` and PAL's location is`chains/pal`.

### CPAL vs Graph QA

Both the CPAL chain and the Graph QA chain extract entity-action-entity
relations into a DAG.

The CPAL chain is different from the Graph QA chain for a few reasons.

* Graph QA does not connect entities to math expressions
* Graph QA does not associate actions in a sequence of dependence.
* Graph QA does not decompose the narrative into these three parts:
  1. Story plot or causal model
  4. Hypothetical question
  5. Hypothetical condition 

### Evaluation

Preliminary evaluation on simple math word problems shows that this CPAL
chain generates less hallucination than the PAL chain on answering
questions about a causal narrative. Two examples are in [this jupyter
notebook](https://github.com/borisdev/langchain/blob/master/docs/extras/modules/chains/additional/cpal.ipynb)
doc.

## 2. Future application

### "Describe as Narrative, Test as Code"

The thesis here is that the Describe as Narrative, Test as Code approach
allows you to represent a causal mental model both as code and as a
narrative, giving you the best of both worlds.

#### Why describe a causal mental mode as a narrative?

The narrative form is quick. At a consensus building meeting, people use
narratives to persuade others of their causal mental model, aka. plan.
You can share, version control and index a narrative.

#### Why test a causal mental model as a code?

Code is testable, complex narratives are not. Though fast, narratives
are problematic as their complexity increases. The problem is LLMs and
humans are prone to hallucination when predicting the outcomes of a
narrative. The cost of building a consensus around the validity of a
narrative outcome grows as its narrative complexity increases. Code does
not require tribal knowledge or social power to validate.

Code is composable, complex narratives are not. The answer of one CPAL
chain can be the hypothetical conditions of another CPAL Chain. For
stochastic simulations, a composable plan can be integrated with the
[DoWhy library](https://github.com/py-why/dowhy). Lastly, for the
futuristic folk, a composable plan as code allows ordinary community
folk to design a plan that can be integrated with a blockchain for
funding.

An explanation of a dependency planning application is
[here.](https://github.com/borisdev/cpal-llm-chain-demo)

--- 
Twitter handle: @boris_dev

---------

Co-authored-by: Boris Dev <borisdev@Boriss-MacBook-Air.local>
2023-07-11 10:11:21 -04:00
Alejandra De Luna
2e4047e5e7 feat: support generate as an early stopping method for OpenAIFunctionsAgent (#7229)
This PR proposes an implementation to support `generate` as an
`early_stopping_method` for the new `OpenAIFunctionsAgent` class.

The motivation behind is to facilitate the user to set a maximum number
of actions the agent can take with `max_iterations` and force a final
response with this new agent (as with the `Agent` class).

The following changes were made:

- The `OpenAIFunctionsAgent.return_stopped_response` method was
overwritten to support `generate` as an `early_stopping_method`
- A boolean `with_functions` parameter was added to the
`OpenAIFunctionsAgent.plan` method

This way the `OpenAIFunctionsAgent.return_stopped_response` method can
call the `OpenAIFunctionsAgent.plan` method with `with_function=False`
when the `early_stopping_method` is set to `generate`, making a call to
the LLM with no functions and forcing a final response from the
`"assistant"`.

  - Relevant maintainer: @hinthornw
  - Twitter handle: @aledelunap

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-07-11 09:25:02 -04:00
Hashem Alsaket
1dd4236177 Fix HF endpoint returns blank for text-generation (#7386)
Description: Current `_call` function in the
`langchain.llms.HuggingFaceEndpoint` class truncates response when
`task=text-generation`. Same error discussed a few days ago on Hugging
Face: https://huggingface.co/tiiuae/falcon-40b-instruct/discussions/51
Issue: Fixes #7353 
Tag maintainer: @hwchase17 @baskaryan @hinthornw

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-07-11 03:06:05 -04:00
Lance Martin
4a94f56258 Minor edits to QA docs (#7507)
Small clean-ups
2023-07-10 22:15:05 -07:00
Raymond Yuan
5171c3bcca Refactor vector storage to correctly handle relevancy scores (#6570)
Description: This pull request aims to support generating the correct
generic relevancy scores for different vector stores by refactoring the
relevance score functions and their selection in the base class and
subclasses of VectorStore. This is especially relevant with VectorStores
that require a distance metric upon initialization. Note many of the
current implenetations of `_similarity_search_with_relevance_scores` are
not technically correct, as they just return
`self.similarity_search_with_score(query, k, **kwargs)` without applying
the relevant score function

Also includes changes associated with:
https://github.com/hwchase17/langchain/pull/6564 and
https://github.com/hwchase17/langchain/pull/6494

See more indepth discussion in thread in #6494 

Issue: 
https://github.com/hwchase17/langchain/issues/6526
https://github.com/hwchase17/langchain/issues/6481
https://github.com/hwchase17/langchain/issues/6346

Dependencies: None

The changes include:
- Properly handling score thresholding in FAISS
`similarity_search_with_score_by_vector` for the corresponding distance
metric.
- Refactoring the `_similarity_search_with_relevance_scores` method in
the base class and removing it from the subclasses for incorrectly
implemented subclasses.
- Adding a `_select_relevance_score_fn` method in the base class and
implementing it in the subclasses to select the appropriate relevance
score function based on the distance strategy.
- Updating the `__init__` methods of the subclasses to set the
`relevance_score_fn` attribute.
- Removing the `_default_relevance_score_fn` function from the FAISS
class and using the base class's `_euclidean_relevance_score_fn`
instead.
- Adding the `DistanceStrategy` enum to the `utils.py` file and updating
the imports in the vector store classes.
- Updating the tests to import the `DistanceStrategy` enum from the
`utils.py` file.

---------

Co-authored-by: Hanit <37485638+hanit-com@users.noreply.github.com>
2023-07-10 20:37:03 -07:00
Lance Martin
bd0c6381f5 Minor update to clarify map-reduce custom prompt usage (#7453)
Update docs for map-reduce custom prompt usage
2023-07-10 16:43:44 -07:00
Lance Martin
28d2b213a4 Update landing page for "question answering over documents" (#7152)
Improve documentation for a central use-case, qa / chat over documents.

This will be merged as an update to `index.mdx`
[here](https://python.langchain.com/docs/use_cases/question_answering/).

Testing w/ local Docusaurus server:

```
From `docs` directory:
mkdir _dist
cp -r {docs_skeleton,snippets} _dist
cp -r extras/* _dist/docs_skeleton/docs
cd _dist/docs_skeleton
yarn install
yarn start
```

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-07-10 14:15:13 -07:00
William FH
dd648183fa Rm create_project line (#7486)
not needed
2023-07-10 10:49:55 -07:00
Leonid Ganeline
5eec74d9a5 docstrings document_loaders 3 (#6937)
- Updated docstrings for `document_loaders`
- Mass update `"""Loader that loads` to `"""Loads`

@baskaryan  - please, review
2023-07-10 08:56:53 -07:00
Stanko Kuveljic
9d13dcd17c Pinecone: Add V4 support (#7473) 2023-07-10 08:39:47 -07:00
Adilkhan Sarsen
5debd5043e Added deeplake use case examples of the new features (#6528)
<!--
Thank you for contributing to LangChain! Your PR will appear in our
release under the title you set. Please make sure it highlights your
valuable contribution.

Replace this with a description of the change, the issue it fixes (if
applicable), and relevant context. List any dependencies required for
this change.

After you're done, someone will review your PR. They may suggest
improvements. If no one reviews your PR within a few days, feel free to
@-mention the same people again, as notifications can get lost.

Finally, we'd love to show appreciation for your contribution - if you'd
like us to shout you out on Twitter, please also include your handle!
-->

<!-- Remove if not applicable -->

Fixes # (issue)

#### Before submitting

<!-- If you're adding a new integration, please include:

1. a test for the integration - favor unit tests that does not rely on
network access.
2. an example notebook showing its use


See contribution guidelines for more information on how to write tests,
lint
etc:


https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
-->

#### Who can review?

Tag maintainers/contributors who might be interested:

<!-- For a quicker response, figure out the right person to tag with @

  @hwchase17 - project lead

  Tracing / Callbacks
  - @agola11

  Async
  - @agola11

  DataLoaders
  - @eyurtsev

  Models
  - @hwchase17
  - @agola11

  Agents / Tools / Toolkits
  - @hwchase17

  VectorStores / Retrievers / Memory
  - @dev2049

 -->
 
 1. Added use cases of the new features
 2. Done some code refactoring

---------

Co-authored-by: Ivo Stranic <istranic@gmail.com>
2023-07-10 07:04:29 -07:00
Bagatur
9b615022e2 bump 229 (#7467) 2023-07-10 04:38:55 -04:00
Kazuki Maeda
92b4418c8c Datadog logs loader (#7356)
### Description
Created a Loader to get a list of specific logs from Datadog Logs.

### Dependencies
`datadog_api_client` is required.

### Twitter handle
[kzk_maeda](https://twitter.com/kzk_maeda)

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-07-10 04:27:55 -04:00
Yifei Song
7d29bb2c02 Add Xorbits Dataframe as a Document Loader (#7319)
- [Xorbits](https://doc.xorbits.io/en/latest/) is an open-source
computing framework that makes it easy to scale data science and machine
learning workloads in parallel. Xorbits can leverage multi cores or GPUs
to accelerate computation on a single machine, or scale out up to
thousands of machines to support processing terabytes of data.

- This PR added support for the Xorbits document loader, which allows
langchain to leverage Xorbits to parallelize and distribute the loading
of data.
- Dependencies: This change requires the Xorbits library to be installed
in order to be used.
`pip install xorbits`
- Request for review: @rlancemartin, @eyurtsev
- Twitter handle: https://twitter.com/Xorbitsio

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-07-10 04:24:47 -04:00
Sergio Moreno
21a353e9c2 feat: ctransformers support async chain (#6859)
- Description: Adding async method for CTransformers 
- Issue: I've found impossible without this code to run Websockets
inside a FastAPI micro service and a CTransformers model.
  - Tag maintainer: Not necessary yet, I don't like to mention directly 
  - Twitter handle: @_semoal
2023-07-10 04:23:41 -04:00
Paul-Emile Brotons
d2cf0d16b3 adding max_marginal_relevance_search method to MongoDBAtlasVectorSearch (#7310)
Adding a maximal_marginal_relevance method to the
MongoDBAtlasVectorSearch vectorstore enhances the user experience by
providing more diverse search results

Issue: #7304
2023-07-10 04:04:19 -04:00
Bagatur
04cddfba0d Add lark import error (#7465) 2023-07-10 03:21:23 -04:00
Matt Robinson
bcab894f4e feat: Add UnstructuredTSVLoader (#7367)
### Summary

Adds an `UnstructuredTSVLoader` for TSV files. Also updates the doc
strings for `UnstructuredCSV` and `UnstructuredExcel` loaders.

### Testing

```python
from langchain.document_loaders.tsv import UnstructuredTSVLoader

loader = UnstructuredTSVLoader(
    file_path="example_data/mlb_teams_2012.csv", mode="elements"
)
docs = loader.load()
```
2023-07-10 03:07:10 -04:00
Ronald Li
490f4a9ff0 Fixes KeyError in AmazonKendraRetriever initializer (#7464)
### Description
argument variable client is marked as required in commit
81e5b1ad36 which breaks the default way of
initialization providing only index_id. This commit avoid KeyError
exception when it is initialized without a client variable
### Dependencies
no dependency required
2023-07-10 03:02:36 -04:00
Jona Sassenhagen
7ffc431b3a Add spacy sentencizer (#7442)
`SpacyTextSplitter` currently uses spacy's statistics-based
`en_core_web_sm` model for sentence splitting. This is a good splitter,
but it's also pretty slow, and in this case it's doing a lot of work
that's not needed given that the spacy parse is then just thrown away.
However, there is also a simple rules-based spacy sentencizer. Using
this is at least an order of magnitude faster than using
`en_core_web_sm` according to my local tests.
Also, spacy sentence tokenization based on `en_core_web_sm` can be sped
up in this case by not doing the NER stage. This shaves some cycles too,
both when loading the model and when parsing the text.

Consequently, this PR adds the option to use the basic spacy
sentencizer, and it disables the NER stage for the current approach,
*which is kept as the default*.

Lastly, when extracting the tokenized sentences, the `text` attribute is
called directly instead of doing the string conversion, which is IMO a
bit more idiomatic.
2023-07-10 02:52:05 -04:00
charosen
50a9fcccb0 feat(module): add param ids to ElasticVectorSearch.from_texts method (#7425)
# add param ids to ElasticVectorSearch.from_texts method.

- Description: add param ids to ElasticVectorSearch.from_texts method.
- Issue: NA. It seems `add_texts` already supports passing in document
ids, but param `ids` is omitted in `from_texts` classmethod,
- Dependencies: None,
- Tag maintainer: @rlancemartin, @eyurtsev please have a look, thanks

```
    # ElasticVectorSearch add_texts
    def add_texts(
        self,
        texts: Iterable[str],
        metadatas: Optional[List[dict]] = None,
        refresh_indices: bool = True,
        ids: Optional[List[str]] = None,
        **kwargs: Any,
    ) -> List[str]:
        ...

```

```
    # ElasticVectorSearch from_texts
    @classmethod
    def from_texts(
        cls,
        texts: List[str],
        embedding: Embeddings,
        metadatas: Optional[List[dict]] = None,
        elasticsearch_url: Optional[str] = None,
        index_name: Optional[str] = None,
        refresh_indices: bool = True,
        **kwargs: Any,
    ) -> ElasticVectorSearch:

```


Co-authored-by: charosen <charosen@bupt.cn>
2023-07-10 02:25:35 -04:00
James Yin
a5fd8873b1 fix: type hint of get_chat_history in BaseConversationalRetrievalChain (#7461)
The type hint of `get_chat_history` property in
`BaseConversationalRetrievalChain` is incorrect. @baskaryan
2023-07-10 02:14:00 -04:00
nikkie
dfc3f83b0f docs(vectorstores/integrations/chroma): Fix loading and saving (#7437)
- Description: Fix loading and saving code about Chroma
- Issue: the issue #7436 
- Dependencies: -
- Twitter handle: https://twitter.com/ftnext
2023-07-10 02:05:15 -04:00
Daniel Chalef
c7f7788d0b Add ZepMemory; improve ZepChatMessageHistory handling of metadata; Fix bugs (#7444)
Hey @hwchase17 - 

This PR adds a `ZepMemory` class, improves handling of Zep's message
metadata, and makes it easier for folks building custom chains to
persist metadata alongside their chat history.

We've had plenty confused users unfamiliar with ChatMessageHistory
classes and how to wrap the `ZepChatMessageHistory` in a
`ConversationBufferMemory`. So we've created the `ZepMemory` class as a
light wrapper for `ZepChatMessageHistory`.

Details:
- add ZepMemory, modify notebook to demo use of ZepMemory
- Modify summary to be SystemMessage
- add metadata argument to add_message; add Zep metadata to
Message.additional_kwargs
- support passing in metadata
2023-07-10 01:53:49 -04:00
Saurabh Chaturvedi
8f8e8d701e Fix info about YouTube (#7447)
(Unintentionally mean 😅) nit: YouTube wasn't created by Google, this PR
fixes the mention in docs.
2023-07-10 01:52:55 -04:00
Leonid Ganeline
560c4dfc98 docstrings: docstore and client (#6783)
updated docstrings in `docstore/` and `client/`

@baskaryan
2023-07-09 01:34:28 -04:00
Jeroen Van Goey
f5bd88757e Fix typo (#7416)
`quesitons` -> `questions`.
2023-07-09 00:54:48 -04:00
Alejandro Garrido Mota
ea9c3cc9c9 Fix syntax erros in documentation (#7409)
- Description: Tiny documentation fix. In Python, when defining function
parameters or providing arguments to a function or class constructor, we
do not use the `:` character.
- Issue: N/A
- Dependencies: N/A,
- Tag maintainer: @rlancemartin, @eyurtsev
- Twitter handle: @mogaal
2023-07-08 19:52:01 -04:00
Nolan
5da9f9abcb docs(agents/toolkits): Fix error in document_comparison_toolkit.ipynb (#7417)
Replace this comment with:
- Description: Removes unneeded output warning in documentation at
https://python.langchain.com/docs/modules/agents/toolkits/document_comparison_toolkit
  - Issue: -
  - Dependencies: -
  - Tag maintainer: @baskaryan
  - Twitter handle: @finnless
2023-07-08 19:51:08 -04:00
nikkie
2eb4a2ceea docs(retrievers/get-started): Fix broken state_of_the_union.txt link (#7399)
Thank you for this awesome library.

- Description: Fix broken link in documentation 
- Issue:
-
https://python.langchain.com/docs/modules/data_connection/retrievers/#get-started
- the URL:
https://github.com/hwchase17/langchain/blob/master/docs/modules/state_of_the_union.txt
- I think the right one is
https://github.com/hwchase17/langchain/blob/master/docs/extras/modules/state_of_the_union.txt
- Dependencies: -
- Tag maintainer: @baskaryan
- Twitter handle: -
2023-07-08 11:11:05 -04:00
Delgermurun
e7420789e4 improve description of JinaChat (#7397)
very small doc string change in the `JinaChat` class.
2023-07-08 10:57:11 -04:00
Bagatur
26c86a197c bump 228 (#7393) 2023-07-08 03:05:20 -04:00
SvMax
1d649b127e Added param to return only a structured json from the get_format_instructions method (#5848)
I just added a parameter to the method get_format_instructions, to
return directly the JSON instructions without the leading instruction
sentence. I'm planning to use it to define the structure of a JSON
object passed in input, the get_format_instructions().

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-07-08 02:57:26 -04:00
Bagatur
362bc301df fix jina (#7392) 2023-07-08 02:41:54 -04:00
Delgermurun
a1603fccfb integrate JinaChat (#6927)
Integration with https://chat.jina.ai/api. It is OpenAI compatible API.

- Twitter handle:
[https://twitter.com/JinaAI_](https://twitter.com/JinaAI_)

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-07-08 02:17:04 -04:00
William FH
4ba7396f96 Add single run eval loader (#7390)
Plus 
- add evaluation name to make string and embedding validators work with
the run evaluator loader.
- Rm unused root validator
2023-07-07 23:06:49 -07:00
Roger Yu
633b673b85 Update pinecone.ipynb (#7382)
Fix typo
2023-07-08 01:48:03 -04:00
Oleg Zabluda
4d697d3f24 Allow passing custom prompts to GraphIndexCreator (#7381)
---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-07-08 01:47:53 -04:00
William FH
612a74eb7e Make Ref Example Threadsafe (#7383)
Have noticed transient ref example misalignment. I believe this is
caused by the logic of assigning an example within the thread executor
rather than before.
2023-07-07 21:50:42 -07:00
William FH
4789c99bc2 Add String Distance and Embedding Evaluators (#7123)
Add a string evaluator and pairwise string evaluator implementation for:
- Embedding distance
- String distance

Update docs
2023-07-07 21:44:31 -07:00
ljeagle
fb6e63dc36 Upgrade the AwaDB from 0.3.5 to 0.3.6 (#7363) 2023-07-07 20:41:17 -07:00
William FH
c5edbea34a Load Run Evaluator (#7101)
Current problems:
1. Evaluating LLMs or Chat models isn't smooth. Even specifying
'generations' as the output inserts a redundant list into the eval
template
2. Configuring input / prediction / reference keys in the
`get_qa_evaluator` function is confusing. Unless you are using a chain
with the default keys, you have to specify all the variables and need to
reason about whether the key corresponds to the traced run's inputs,
outputs or the examples inputs or outputs.


Proposal:
- Configure the run evaluator according to a model. Use the model type
and input/output keys to assert compatibility where possible. Only need
to specify a reference_key for certain evaluators (which is less
confusing than specifying input keys)


When does this work:
- If you have your langchain model available (assumed always for
run_on_dataset flow)
- If you are evaluating an LLM, Chat model, or chain
- If the LLM or chat models are traced by langchain (wouldn't work if
you add an incompatible schema via the REST API)

When would this fail:
- Currently if you directly create an example from an LLM run, the
outputs are generations with all the extra metadata present. A simple
`example_key` and dumping all to the template could make the evaluations
unreliable
- Doesn't help if you're not using the low level API
- If you want to instantiate the evaluator without instantiating your
chain or LLM (maybe common for monitoring, for instance) -> could also
load from run or run type though

What's ugly:
- Personally think it's better to load evaluators one by one since
passing a config down is pretty confusing.
- Lots of testing needs to be added
- Inconsistent in that it makes a separate run and example input mapper
instead of the original `RunEvaluatorInputMapper`, which maps a run and
example to a single input.

Example usage running the for an LLM, Chat Model, and Agent.

```
# Test running for the string evaluators
evaluator_names = ["qa", "criteria"]

model = ChatOpenAI()
configured_evaluators = load_run_evaluators_for_model(evaluator_names, model=model, reference_key="answer")
run_on_dataset(ds_name, model, run_evaluators=configured_evaluators)
```


<details>
  <summary>Full code with dataset upload</summary>
```
## Create dataset
from langchain.evaluation.run_evaluators.loading import load_run_evaluators_for_model
from langchain.evaluation import load_dataset
import pandas as pd

lcds = load_dataset("llm-math")
df = pd.DataFrame(lcds)

from uuid import uuid4
from langsmith import Client
client = Client()
ds_name = "llm-math - " + str(uuid4())[0:8]
ds = client.upload_dataframe(df, name=ds_name, input_keys=["question"], output_keys=["answer"])



## Define the models we'll test over
from langchain.llms import OpenAI
from langchain.chat_models import ChatOpenAI
from langchain.agents import initialize_agent, AgentType

from langchain.tools import tool

llm = OpenAI(temperature=0)
chat_model = ChatOpenAI(temperature=0)

@tool
    def sum(a: float, b: float) -> float:
        """Add two numbers"""
        return a + b
    
def construct_agent():
    return initialize_agent(
        llm=chat_model,
        tools=[sum],
        agent=AgentType.OPENAI_MULTI_FUNCTIONS,
    )

agent = construct_agent()

# Test running for the string evaluators
evaluator_names = ["qa", "criteria"]

models = [llm, chat_model, agent]
run_evaluators = []
for model in models:
    run_evaluators.append(load_run_evaluators_for_model(evaluator_names, model=model, reference_key="answer"))
    

# Run on LLM, Chat Model, and Agent
from langchain.client.runner_utils import run_on_dataset

to_test = [llm, chat_model, construct_agent]

for model, configured_evaluators in zip(to_test, run_evaluators):
    run_on_dataset(ds_name, model, run_evaluators=configured_evaluators, verbose=True)
```
</details>

---------

Co-authored-by: Nuno Campos <nuno@boringbits.io>
2023-07-07 19:57:59 -07:00
Bagatur
1ac347b4e3 update databerry-chaindesk redirect (#7378) 2023-07-07 19:11:46 -04:00
Joshua Carroll
705d2f5b92 Update the API Reference link in Streamlit integration docs (#7377)
This page:


https://python.langchain.com/docs/modules/callbacks/integrations/streamlit

Has a bad API Reference link currently. This PR fixes it to the correct
link.

Also updates the embedded app link to
https://langchain-mrkl.streamlit.app/ (better name) which is hosted in
langchain-ai/streamlit-agent repo
2023-07-07 17:35:57 -04:00
Georges Petrov
ec033ae277 Rename Databerry to Chaindesk (#7022)
---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-07-07 17:28:04 -04:00
Philip Meier
da5b0723d2 update MosaicML inputs and outputs (#7348)
As of today (July 7, 2023), the [MosaicML
API](https://docs.mosaicml.com/en/latest/inference.html#text-completion-requests)
uses `"inputs"` for the prompt

This PR adds support for this new format.
---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-07-07 17:23:11 -04:00
Bearnardd
184ede4e48 Fix buggy output from GraphQAChain (#7372)
fixes https://github.com/hwchase17/langchain/issues/7289
A simple fix of the buggy output of `graph_qa`. If we have several
entities with triplets then the last entry of `triplets` for a given
entity merges with the first entry of the `triplets` of the next entity.
2023-07-07 17:19:53 -04:00
Harrison Chase
7cdf97ba9b Harrison/add to imports (#7370)
pgvector cleanup
2023-07-07 16:27:44 -04:00
Bagatur
4d427b2397 Base language model docstrings (#7104) 2023-07-07 16:09:10 -04:00
ॐ shivam mamgain
2179d4eef8 Fix for KeyError in MlflowCallbackHandler (#7051)
- Description: `MlflowCallbackHandler` fails with `KeyError: "['name']
not in index"`. See https://github.com/hwchase17/langchain/issues/5770
for more details. Root cause is that LangChain does not pass "name" as a
part of `serialized` argument to `on_llm_start()` callback method. The
commit where this change was made is probably this:
18af149e91.
My bug fix derives "name" from "id" field.
  - Issue: https://github.com/hwchase17/langchain/issues/5770
---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-07-07 16:08:06 -04:00
Alex Gamble
df746ad821 Add a callback handler for Context (https://getcontext.ai) (#7151)
### Description

Adding a callback handler for Context. Context is a product analytics
platform for AI chat experiences to help you understand how users are
interacting with your product.

I've added the callback library + an example notebook showing its use.

### Dependencies

Requires the user to install the `context-python` library. The library
is lazily-loaded when the callback is instantiated.

### Announcing the feature

We spoke with Harrison a few weeks ago about also doing a blog post
announcing our integration, so will coordinate this with him. Our
Twitter handle for the company is @getcontextai, and the founders are
@_agamble and @HenrySG.

Thanks in advance!
2023-07-07 15:33:29 -04:00
Austin
c9a0f24646 Add verbose parameter for llamacpp (#7253)
**Title:** Add verbose parameter for llamacpp

**Description:**
This pull request adds a 'verbose' parameter to the llamacpp module. The
'verbose' parameter, when set to True, will enable the output of
detailed logs during the execution of the Llama model. This added
parameter can aid in debugging and understanding the internal processes
of the module.

The verbose parameter is a boolean that prints verbose output to stderr
when set to True. By default, the verbose parameter is set to True but
can be toggled off if less output is desired. This new parameter has
been added to the `validate_environment` method of the `LlamaCpp` class
which initializes the `llama_cpp.Llama` API:

```python
class LlamaCpp(LLM):
    ...
    @root_validator()
    def validate_environment(cls, values: Dict) -> Dict:
        ...
        model_param_names = [
            ...
            "verbose",  # New verbose parameter added
        ]
        ...
        values["client"] = Llama(model_path, **model_params)
        ...
```
---------

Signed-off-by: teleprint-me <77757836+teleprint-me@users.noreply.github.com>
2023-07-07 15:08:25 -04:00
Kenny
34a2755a54 Allow passing api key into OpenAIWhisperParser (#7281)
This just allows the user to pass in an api_key directly into
OpenAIWhisperParser. Very simple addition.
2023-07-07 15:07:45 -04:00
mrkhalil6
4e7d0c115b Add support for filters and namespaces in similarity search in Pinecone similarity_score_threshold (#7301)
At the moment, pinecone vectorStore does not support filters and
namespaces when using similarity_score_threshold search type.
In this PR, I've implemented that. It passes all the kwargs except
"score_threshold" as that is not a supported argument for method
"similarity_search_with_score".
---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-07-07 15:03:59 -04:00
Manuel Saelices
01dca1e438 Add context to an output parsing error on Pydantic schema to improve exception handling (#7344)
## Changes

- [X] Fill the `llm_output` param when there is an output parsing error
in a Pydantic schema so that we can get the original text that failed to
parse when handling the exception

## Background

With this change, we could do something like this:

```
output_parser = PydanticOutputParser(pydantic_object=pydantic_obj)
chain = ConversationChain(..., output_parser=output_parser)
try:
    response: PydanticSchema = chain.predict(input=input)
except OutputParserException as exc:
    logger.error(
        'OutputParserException while parsing chatbot response: %s', exc.llm_output,
    )
```
---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-07-07 14:49:37 -04:00
Raouf Chebri
1ac6deda89 update extension name (#7359)
hi @rlancemartin ,

We had a new deployment and the `pg_extension` creation command was
updated from `CREATE EXTENSION pg_embedding` to `CREATE EXTENSION
embedding`.

https://github.com/neondatabase/neon/pull/4646

The extension not made public yet. No users will be affected by this.
Will be public next week.

Please let me know if you have any questions.

Thank you in advance 🙏
2023-07-07 11:35:51 -07:00
William FH
4e180dc54e Unset Cache in Tests (#7362)
This is impacting other unit tests that use callbacks since the cache is
still set (just empty)
2023-07-07 11:05:09 -07:00
German Martin
3ce4e46c8c The Fellowship of the Vectors: New Embeddings Filter using clustering. (#7015)
Continuing with Tolkien inspired series of langchain tools. I bring to
you:
**The Fellowship of the Vectors**, AKA EmbeddingsClusteringFilter.
This document filter uses embeddings to group vectors together into
clusters, then allows you to pick an arbitrary number of documents
vector based on proximity to the cluster centers. That's a
representative sample of the cluster.

The original idea is from [Greg Kamradt](https://github.com/gkamradt)
from this video (Level4):
https://www.youtube.com/watch?v=qaPMdcCqtWk&t=365s

I added few tricks to make it a bit more versatile, so you can
parametrize what to do with duplicate documents in case of cluster
overlap: replace the duplicates with the next closest document or remove
it. This allow you to use it as an special kind of redundant filter too.
Additionally you can choose 2 diff orders: grouped by cluster or
respecting the original retriever scores.
In my use case I was using the docs grouped by cluster to run refine
chains per cluster to generate summarization over a large corpus of
documents.
Let me know if you want to change anything!

@rlancemartin, @eyurtsev, @hwchase17,

---------

Co-authored-by: rlm <pexpresss31@gmail.com>
2023-07-07 10:28:17 -07:00
Leonid Ganeline
b489466488 docs: dependents update 4 (#7360)
Updated links and counters of the `dependents` page.
2023-07-07 13:22:30 -04:00
William FH
38ca5c84cb Explicitly list requires_reference in function (#7357) 2023-07-07 10:04:03 -07:00
Harrison Chase
49b2b0e3c0 change embedding to None (#7355) 2023-07-07 12:33:03 -04:00
imaprogrammer
a2830e3056 Update chroma.py: Persist directory from client_settings if provided there (#7087)
Change details:
- Description: When calling db.persist(), a check prevents from it
proceeding as the constructor only sets member `_persist_directory` from
parameters. But the ChromaDB client settings also has this parameter,
and if the client_settings parameter is used without passing the
persist_directory (which is optional), the `persist` method raises
`ValueError` for not setting `_persist_directory`. This change fixes it
by setting the member `_persist_directory` variable from client_settings
if it is set, else uses the constructor parameter.
- Issue: I didn't find any github issue of this, but I discovered it
after calling the persist method
  - Dependencies: None
- Tag maintainer: vectorstore related change - @rlancemartin, @eyurtsev
  - Twitter handle: Don't have one :(

*Additional discussion*: We may need to discuss the way I implemented
the fallback using `or`.

---------

Co-authored-by: rlm <pexpresss31@gmail.com>
2023-07-07 09:20:27 -07:00
397 changed files with 13768 additions and 4905 deletions

View File

@@ -95,6 +95,14 @@ To run formatting for this project:
make format
```
Additionally, you can run the formatter only on the files that have been modified in your current branch as compared to the master branch using the format_diff command:
```bash
make format_diff
```
This is especially useful when you have made changes to a subset of the project and want to ensure your changes are properly formatted without affecting the rest of the codebase.
### Linting
Linting for this project is done via a combination of [Black](https://black.readthedocs.io/en/stable/), [isort](https://pycqa.github.io/isort/), [flake8](https://flake8.pycqa.org/en/latest/), and [mypy](http://mypy-lang.org/).
@@ -105,6 +113,14 @@ To run linting for this project:
make lint
```
In addition, you can run the linter only on the files that have been modified in your current branch as compared to the master branch using the lint_diff command:
```bash
make lint_diff
```
This can be very helpful when you've made changes to only certain parts of the project and want to ensure your changes meet the linting standards without having to check the entire codebase.
We recognize linting can be annoying - if you do not want to do it, please contact a project maintainer, and they can help you with it. We do not want this to be a blocker for good code getting contributed.
### Coverage
@@ -208,30 +224,38 @@ When you run `poetry install`, the `langchain` package is installed as editable
### Contribute Documentation
Docs are largely autogenerated by [sphinx](https://www.sphinx-doc.org/en/master/) from the code.
The docs directory contains Documentation and API Reference.
Documentation is built using [Docusaurus 2](https://docusaurus.io/).
API Reference are largely autogenerated by [sphinx](https://www.sphinx-doc.org/en/master/) from the code.
For that reason, we ask that you add good documentation to all classes and methods.
Similar to linting, we recognize documentation can be annoying. If you do not want to do it, please contact a project maintainer, and they can help you with it. We do not want this to be a blocker for good code getting contributed.
### Build Documentation Locally
In the following commands, the prefix `api_` indicates that those are operations for the API Reference.
Before building the documentation, it is always a good idea to clean the build directory:
```bash
make docs_clean
make api_docs_clean
```
Next, you can run the linkchecker to make sure all links are valid:
```bash
make docs_linkcheck
```
Finally, you can build the documentation as outlined below:
Next, you can build the documentation as outlined below:
```bash
make docs_build
make api_docs_build
```
Finally, you can run the linkchecker to make sure all links are valid:
```bash
make docs_linkcheck
make api_docs_linkcheck
```
## 🏭 Release Process

5
.gitignore vendored
View File

@@ -161,7 +161,12 @@ docs/node_modules/
docs/.docusaurus/
docs/.cache-loader/
docs/_dist
docs/api_reference/api_reference.rst
docs/api_reference/_build
docs/api_reference/*/
!docs/api_reference/_static/
!docs/api_reference/templates/
!docs/api_reference/themes/
docs/docs_skeleton/build
docs/docs_skeleton/node_modules
docs/docs_skeleton/yarn.lock

View File

@@ -1,40 +1,47 @@
.PHONY: all clean format lint test tests test_watch integration_tests docker_tests help extended_tests
.PHONY: all clean docs_build docs_clean docs_linkcheck api_docs_build api_docs_clean api_docs_linkcheck format lint test tests test_watch integration_tests docker_tests help extended_tests
# Default target executed when no arguments are given to make.
all: help
######################
# TESTING AND COVERAGE
######################
# Run unit tests and generate a coverage report.
coverage:
poetry run pytest --cov \
--cov-config=.coveragerc \
--cov-report xml \
--cov-report term-missing:skip-covered
clean: docs_clean
######################
# DOCUMENTATION
######################
clean: docs_clean api_docs_clean
docs_compile:
poetry run nbdoc_build --srcdir $(srcdir)
docs_build:
cd docs && poetry run make html
docs/.local_build.sh
docs_clean:
cd docs && poetry run make clean
rm -r docs/_dist
docs_linkcheck:
poetry run linkchecker docs/_build/html/index.html
poetry run linkchecker docs/_dist/docs_skeleton/ --ignore-url node_modules
format:
poetry run black .
poetry run ruff --select I --fix .
api_docs_build:
poetry run python docs/api_reference/create_api_rst.py
cd docs/api_reference && poetry run make html
PYTHON_FILES=.
lint: PYTHON_FILES=.
lint_diff: PYTHON_FILES=$(shell git diff --name-only --diff-filter=d master | grep -E '\.py$$')
api_docs_clean:
rm -f docs/api_reference/api_reference.rst
cd docs/api_reference && poetry run make clean
lint lint_diff:
poetry run mypy $(PYTHON_FILES)
poetry run black $(PYTHON_FILES) --check
poetry run ruff .
api_docs_linkcheck:
poetry run linkchecker docs/api_reference/_build/html/index.html
# Define a variable for the test file path.
TEST_FILE ?= tests/unit_tests/
test:
@@ -56,6 +63,28 @@ docker_tests:
docker build -t my-langchain-image:test .
docker run --rm my-langchain-image:test
######################
# LINTING AND FORMATTING
######################
# Define a variable for Python and notebook files.
PYTHON_FILES=.
lint format: PYTHON_FILES=.
lint_diff format_diff: PYTHON_FILES=$(shell git diff --name-only --diff-filter=d master | grep -E '\.py$$|\.ipynb$$')
lint lint_diff:
poetry run mypy $(PYTHON_FILES)
poetry run black $(PYTHON_FILES) --check
poetry run ruff .
format format_diff:
poetry run black $(PYTHON_FILES)
poetry run ruff --select I --fix $(PYTHON_FILES)
######################
# HELP
######################
help:
@echo '----'
@echo 'coverage - run unit tests and generate coverage report'

View File

@@ -1,10 +1,15 @@
mkdir _dist
#!/usr/bin/env bash
set -o errexit
set -o nounset
set -o pipefail
set -o xtrace
SCRIPT_DIR="$(cd "$(dirname "$0")"; pwd)"
cd "${SCRIPT_DIR}"
mkdir -p _dist/docs_skeleton
cp -r {docs_skeleton,snippets} _dist
mkdir -p _dist/docs_skeleton/static/api_reference
cd api_reference
poetry run make html
cp -r _build/* ../_dist/docs_skeleton/static/api_reference
cd ..
cp -r extras/* _dist/docs_skeleton/docs
cd _dist/docs_skeleton
poetry run nbdoc_build

File diff suppressed because it is too large Load Diff

View File

Before

Width:  |  Height:  |  Size: 157 KiB

After

Width:  |  Height:  |  Size: 157 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 116 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 237 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 173 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 164 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 118 KiB

View File

@@ -138,7 +138,11 @@
},
{
"source": "/en/latest/integrations/databerry.html",
"destination": "/docs/ecosystem/integrations/databerry"
"destination": "/docs/ecosystem/integrations/chaindesk"
},
{
"source": "/docs/ecosystem/integrations/databerry",
"destination": "/docs/ecosystem/integrations/chaindesk"
},
{
"source": "/en/latest/integrations/databricks/databricks.html",
@@ -1330,7 +1334,11 @@
},
{
"source": "/en/latest/modules/indexes/retrievers/examples/databerry.html",
"destination": "/docs/modules/data_connection/retrievers/integrations/databerry"
"destination": "/docs/modules/data_connection/retrievers/integrations/chaindesk"
},
{
"source": "/docs/modules/data_connection/retrievers/integrations/databerry",
"destination": "/docs/modules/data_connection/retrievers/integrations/chaindesk"
},
{
"source": "/en/latest/modules/indexes/retrievers/examples/elastic_search_bm25.html",
@@ -2125,4 +2133,4 @@
"destination": "/docs/:path*"
}
]
}
}

View File

@@ -2,188 +2,261 @@
Dependents stats for `hwchase17/langchain`
[![](https://img.shields.io/static/v1?label=Used%20by&message=5152&color=informational&logo=slickpic)](https://github.com/hwchase17/langchain/network/dependents)
[![](https://img.shields.io/static/v1?label=Used%20by%20(public)&message=172&color=informational&logo=slickpic)](https://github.com/hwchase17/langchain/network/dependents)
[![](https://img.shields.io/static/v1?label=Used%20by%20(private)&message=4980&color=informational&logo=slickpic)](https://github.com/hwchase17/langchain/network/dependents)
[![](https://img.shields.io/static/v1?label=Used%20by%20(stars)&message=17239&color=informational&logo=slickpic)](https://github.com/hwchase17/langchain/network/dependents)
[![](https://img.shields.io/static/v1?label=Used%20by&message=9941&color=informational&logo=slickpic)](https://github.com/hwchase17/langchain/network/dependents)
[![](https://img.shields.io/static/v1?label=Used%20by%20(public)&message=244&color=informational&logo=slickpic)](https://github.com/hwchase17/langchain/network/dependents)
[![](https://img.shields.io/static/v1?label=Used%20by%20(private)&message=9697&color=informational&logo=slickpic)](https://github.com/hwchase17/langchain/network/dependents)
[![](https://img.shields.io/static/v1?label=Used%20by%20(stars)&message=19827&color=informational&logo=slickpic)](https://github.com/hwchase17/langchain/network/dependents)
[update: 2023-05-17; only dependent repositories with Stars > 100]
[update: 2023-07-07; only dependent repositories with Stars > 100]
| Repository | Stars |
| :-------- | -----: |
|[openai/openai-cookbook](https://github.com/openai/openai-cookbook) | 35401 |
|[LAION-AI/Open-Assistant](https://github.com/LAION-AI/Open-Assistant) | 32861 |
|[microsoft/TaskMatrix](https://github.com/microsoft/TaskMatrix) | 32766 |
|[hpcaitech/ColossalAI](https://github.com/hpcaitech/ColossalAI) | 29560 |
|[reworkd/AgentGPT](https://github.com/reworkd/AgentGPT) | 22315 |
|[imartinez/privateGPT](https://github.com/imartinez/privateGPT) | 17474 |
|[openai/chatgpt-retrieval-plugin](https://github.com/openai/chatgpt-retrieval-plugin) | 16923 |
|[mindsdb/mindsdb](https://github.com/mindsdb/mindsdb) | 16112 |
|[jerryjliu/llama_index](https://github.com/jerryjliu/llama_index) | 15407 |
|[mlflow/mlflow](https://github.com/mlflow/mlflow) | 14345 |
|[GaiZhenbiao/ChuanhuChatGPT](https://github.com/GaiZhenbiao/ChuanhuChatGPT) | 10372 |
|[databrickslabs/dolly](https://github.com/databrickslabs/dolly) | 9919 |
|[AIGC-Audio/AudioGPT](https://github.com/AIGC-Audio/AudioGPT) | 8177 |
|[logspace-ai/langflow](https://github.com/logspace-ai/langflow) | 6807 |
|[imClumsyPanda/langchain-ChatGLM](https://github.com/imClumsyPanda/langchain-ChatGLM) | 6087 |
|[arc53/DocsGPT](https://github.com/arc53/DocsGPT) | 5292 |
|[e2b-dev/e2b](https://github.com/e2b-dev/e2b) | 4622 |
|[nsarrazin/serge](https://github.com/nsarrazin/serge) | 4076 |
|[madawei2699/myGPTReader](https://github.com/madawei2699/myGPTReader) | 3952 |
|[zauberzeug/nicegui](https://github.com/zauberzeug/nicegui) | 3952 |
|[go-skynet/LocalAI](https://github.com/go-skynet/LocalAI) | 3762 |
|[GreyDGL/PentestGPT](https://github.com/GreyDGL/PentestGPT) | 3388 |
|[mmabrouk/chatgpt-wrapper](https://github.com/mmabrouk/chatgpt-wrapper) | 3243 |
|[zilliztech/GPTCache](https://github.com/zilliztech/GPTCache) | 3189 |
|[wenda-LLM/wenda](https://github.com/wenda-LLM/wenda) | 3050 |
|[marqo-ai/marqo](https://github.com/marqo-ai/marqo) | 2930 |
|[gkamradt/langchain-tutorials](https://github.com/gkamradt/langchain-tutorials) | 2710 |
|[PrefectHQ/marvin](https://github.com/PrefectHQ/marvin) | 2545 |
|[project-baize/baize-chatbot](https://github.com/project-baize/baize-chatbot) | 2479 |
|[whitead/paper-qa](https://github.com/whitead/paper-qa) | 2399 |
|[langgenius/dify](https://github.com/langgenius/dify) | 2344 |
|[GerevAI/gerev](https://github.com/GerevAI/gerev) | 2283 |
|[hwchase17/chat-langchain](https://github.com/hwchase17/chat-langchain) | 2266 |
|[guangzhengli/ChatFiles](https://github.com/guangzhengli/ChatFiles) | 1903 |
|[Azure-Samples/azure-search-openai-demo](https://github.com/Azure-Samples/azure-search-openai-demo) | 1884 |
|[OpenBMB/BMTools](https://github.com/OpenBMB/BMTools) | 1860 |
|[Farama-Foundation/PettingZoo](https://github.com/Farama-Foundation/PettingZoo) | 1813 |
|[OpenGVLab/Ask-Anything](https://github.com/OpenGVLab/Ask-Anything) | 1571 |
|[IntelligenzaArtificiale/Free-Auto-GPT](https://github.com/IntelligenzaArtificiale/Free-Auto-GPT) | 1480 |
|[hwchase17/notion-qa](https://github.com/hwchase17/notion-qa) | 1464 |
|[NVIDIA/NeMo-Guardrails](https://github.com/NVIDIA/NeMo-Guardrails) | 1419 |
|[Unstructured-IO/unstructured](https://github.com/Unstructured-IO/unstructured) | 1410 |
|[Kav-K/GPTDiscord](https://github.com/Kav-K/GPTDiscord) | 1363 |
|[paulpierre/RasaGPT](https://github.com/paulpierre/RasaGPT) | 1344 |
|[StanGirard/quivr](https://github.com/StanGirard/quivr) | 1330 |
|[lunasec-io/lunasec](https://github.com/lunasec-io/lunasec) | 1318 |
|[vocodedev/vocode-python](https://github.com/vocodedev/vocode-python) | 1286 |
|[agiresearch/OpenAGI](https://github.com/agiresearch/OpenAGI) | 1156 |
|[h2oai/h2ogpt](https://github.com/h2oai/h2ogpt) | 1141 |
|[jina-ai/thinkgpt](https://github.com/jina-ai/thinkgpt) | 1106 |
|[yanqiangmiffy/Chinese-LangChain](https://github.com/yanqiangmiffy/Chinese-LangChain) | 1072 |
|[ttengwang/Caption-Anything](https://github.com/ttengwang/Caption-Anything) | 1064 |
|[jina-ai/dev-gpt](https://github.com/jina-ai/dev-gpt) | 1057 |
|[juncongmoo/chatllama](https://github.com/juncongmoo/chatllama) | 1003 |
|[greshake/llm-security](https://github.com/greshake/llm-security) | 1002 |
|[visual-openllm/visual-openllm](https://github.com/visual-openllm/visual-openllm) | 957 |
|[richardyc/Chrome-GPT](https://github.com/richardyc/Chrome-GPT) | 918 |
|[irgolic/AutoPR](https://github.com/irgolic/AutoPR) | 886 |
|[mmz-001/knowledge_gpt](https://github.com/mmz-001/knowledge_gpt) | 867 |
|[thomas-yanxin/LangChain-ChatGLM-Webui](https://github.com/thomas-yanxin/LangChain-ChatGLM-Webui) | 850 |
|[microsoft/X-Decoder](https://github.com/microsoft/X-Decoder) | 837 |
|[peterw/Chat-with-Github-Repo](https://github.com/peterw/Chat-with-Github-Repo) | 826 |
|[cirediatpl/FigmaChain](https://github.com/cirediatpl/FigmaChain) | 782 |
|[hashintel/hash](https://github.com/hashintel/hash) | 778 |
|[seanpixel/Teenage-AGI](https://github.com/seanpixel/Teenage-AGI) | 773 |
|[jina-ai/langchain-serve](https://github.com/jina-ai/langchain-serve) | 738 |
|[corca-ai/EVAL](https://github.com/corca-ai/EVAL) | 737 |
|[ai-sidekick/sidekick](https://github.com/ai-sidekick/sidekick) | 717 |
|[rlancemartin/auto-evaluator](https://github.com/rlancemartin/auto-evaluator) | 703 |
|[poe-platform/api-bot-tutorial](https://github.com/poe-platform/api-bot-tutorial) | 689 |
|[SamurAIGPT/Camel-AutoGPT](https://github.com/SamurAIGPT/Camel-AutoGPT) | 666 |
|[eyurtsev/kor](https://github.com/eyurtsev/kor) | 608 |
|[run-llama/llama-lab](https://github.com/run-llama/llama-lab) | 559 |
|[namuan/dr-doc-search](https://github.com/namuan/dr-doc-search) | 544 |
|[pieroit/cheshire-cat](https://github.com/pieroit/cheshire-cat) | 520 |
|[griptape-ai/griptape](https://github.com/griptape-ai/griptape) | 514 |
|[getmetal/motorhead](https://github.com/getmetal/motorhead) | 481 |
|[hwchase17/chat-your-data](https://github.com/hwchase17/chat-your-data) | 462 |
|[langchain-ai/langchain-aiplugin](https://github.com/langchain-ai/langchain-aiplugin) | 452 |
|[jina-ai/agentchain](https://github.com/jina-ai/agentchain) | 439 |
|[SamurAIGPT/ChatGPT-Developer-Plugins](https://github.com/SamurAIGPT/ChatGPT-Developer-Plugins) | 437 |
|[alexanderatallah/window.ai](https://github.com/alexanderatallah/window.ai) | 433 |
|[michaelthwan/searchGPT](https://github.com/michaelthwan/searchGPT) | 427 |
|[mpaepper/content-chatbot](https://github.com/mpaepper/content-chatbot) | 425 |
|[mckaywrigley/repo-chat](https://github.com/mckaywrigley/repo-chat) | 422 |
|[whyiyhw/chatgpt-wechat](https://github.com/whyiyhw/chatgpt-wechat) | 421 |
|[freddyaboulton/gradio-tools](https://github.com/freddyaboulton/gradio-tools) | 407 |
|[jonra1993/fastapi-alembic-sqlmodel-async](https://github.com/jonra1993/fastapi-alembic-sqlmodel-async) | 395 |
|[yeagerai/yeagerai-agent](https://github.com/yeagerai/yeagerai-agent) | 383 |
|[akshata29/chatpdf](https://github.com/akshata29/chatpdf) | 374 |
|[OpenGVLab/InternGPT](https://github.com/OpenGVLab/InternGPT) | 368 |
|[ruoccofabrizio/azure-open-ai-embeddings-qna](https://github.com/ruoccofabrizio/azure-open-ai-embeddings-qna) | 358 |
|[101dotxyz/GPTeam](https://github.com/101dotxyz/GPTeam) | 357 |
|[mtenenholtz/chat-twitter](https://github.com/mtenenholtz/chat-twitter) | 354 |
|[amosjyng/langchain-visualizer](https://github.com/amosjyng/langchain-visualizer) | 343 |
|[msoedov/langcorn](https://github.com/msoedov/langcorn) | 334 |
|[showlab/VLog](https://github.com/showlab/VLog) | 330 |
|[continuum-llms/chatgpt-memory](https://github.com/continuum-llms/chatgpt-memory) | 324 |
|[steamship-core/steamship-langchain](https://github.com/steamship-core/steamship-langchain) | 323 |
|[daodao97/chatdoc](https://github.com/daodao97/chatdoc) | 320 |
|[xuwenhao/geektime-ai-course](https://github.com/xuwenhao/geektime-ai-course) | 308 |
|[StevenGrove/GPT4Tools](https://github.com/StevenGrove/GPT4Tools) | 301 |
|[logan-markewich/llama_index_starter_pack](https://github.com/logan-markewich/llama_index_starter_pack) | 300 |
|[andylokandy/gpt-4-search](https://github.com/andylokandy/gpt-4-search) | 299 |
|[Anil-matcha/ChatPDF](https://github.com/Anil-matcha/ChatPDF) | 287 |
|[itamargol/openai](https://github.com/itamargol/openai) | 273 |
|[BlackHC/llm-strategy](https://github.com/BlackHC/llm-strategy) | 267 |
|[momegas/megabots](https://github.com/momegas/megabots) | 259 |
|[bborn/howdoi.ai](https://github.com/bborn/howdoi.ai) | 238 |
|[Cheems-Seminar/grounded-segment-any-parts](https://github.com/Cheems-Seminar/grounded-segment-any-parts) | 232 |
|[ur-whitelab/exmol](https://github.com/ur-whitelab/exmol) | 227 |
|[sullivan-sean/chat-langchainjs](https://github.com/sullivan-sean/chat-langchainjs) | 227 |
|[explosion/spacy-llm](https://github.com/explosion/spacy-llm) | 226 |
|[recalign/RecAlign](https://github.com/recalign/RecAlign) | 218 |
|[jupyterlab/jupyter-ai](https://github.com/jupyterlab/jupyter-ai) | 218 |
|[alvarosevilla95/autolang](https://github.com/alvarosevilla95/autolang) | 215 |
|[conceptofmind/toolformer](https://github.com/conceptofmind/toolformer) | 213 |
|[MagnivOrg/prompt-layer-library](https://github.com/MagnivOrg/prompt-layer-library) | 209 |
|[JohnSnowLabs/nlptest](https://github.com/JohnSnowLabs/nlptest) | 208 |
|[airobotlab/KoChatGPT](https://github.com/airobotlab/KoChatGPT) | 197 |
|[langchain-ai/auto-evaluator](https://github.com/langchain-ai/auto-evaluator) | 195 |
|[yvann-hub/Robby-chatbot](https://github.com/yvann-hub/Robby-chatbot) | 195 |
|[alejandro-ao/langchain-ask-pdf](https://github.com/alejandro-ao/langchain-ask-pdf) | 192 |
|[daveebbelaar/langchain-experiments](https://github.com/daveebbelaar/langchain-experiments) | 189 |
|[NimbleBoxAI/ChainFury](https://github.com/NimbleBoxAI/ChainFury) | 187 |
|[kaleido-lab/dolphin](https://github.com/kaleido-lab/dolphin) | 184 |
|[Anil-matcha/Website-to-Chatbot](https://github.com/Anil-matcha/Website-to-Chatbot) | 183 |
|[plchld/InsightFlow](https://github.com/plchld/InsightFlow) | 180 |
|[OpenBMB/AgentVerse](https://github.com/OpenBMB/AgentVerse) | 166 |
|[benthecoder/ClassGPT](https://github.com/benthecoder/ClassGPT) | 166 |
|[jbrukh/gpt-jargon](https://github.com/jbrukh/gpt-jargon) | 161 |
|[hardbyte/qabot](https://github.com/hardbyte/qabot) | 160 |
|[shaman-ai/agent-actors](https://github.com/shaman-ai/agent-actors) | 153 |
|[radi-cho/datasetGPT](https://github.com/radi-cho/datasetGPT) | 153 |
|[poe-platform/poe-protocol](https://github.com/poe-platform/poe-protocol) | 152 |
|[paolorechia/learn-langchain](https://github.com/paolorechia/learn-langchain) | 149 |
|[ajndkr/lanarky](https://github.com/ajndkr/lanarky) | 149 |
|[fengyuli-dev/multimedia-gpt](https://github.com/fengyuli-dev/multimedia-gpt) | 147 |
|[yasyf/compress-gpt](https://github.com/yasyf/compress-gpt) | 144 |
|[homanp/superagent](https://github.com/homanp/superagent) | 143 |
|[realminchoi/babyagi-ui](https://github.com/realminchoi/babyagi-ui) | 141 |
|[ethanyanjiali/minChatGPT](https://github.com/ethanyanjiali/minChatGPT) | 141 |
|[ccurme/yolopandas](https://github.com/ccurme/yolopandas) | 139 |
|[hwchase17/langchain-streamlit-template](https://github.com/hwchase17/langchain-streamlit-template) | 138 |
|[Jaseci-Labs/jaseci](https://github.com/Jaseci-Labs/jaseci) | 136 |
|[hirokidaichi/wanna](https://github.com/hirokidaichi/wanna) | 135 |
|[Haste171/langchain-chatbot](https://github.com/Haste171/langchain-chatbot) | 134 |
|[jmpaz/promptlib](https://github.com/jmpaz/promptlib) | 130 |
|[Klingefjord/chatgpt-telegram](https://github.com/Klingefjord/chatgpt-telegram) | 130 |
|[filip-michalsky/SalesGPT](https://github.com/filip-michalsky/SalesGPT) | 128 |
|[handrew/browserpilot](https://github.com/handrew/browserpilot) | 128 |
|[shauryr/S2QA](https://github.com/shauryr/S2QA) | 127 |
|[steamship-core/vercel-examples](https://github.com/steamship-core/vercel-examples) | 127 |
|[yasyf/summ](https://github.com/yasyf/summ) | 127 |
|[gia-guar/JARVIS-ChatGPT](https://github.com/gia-guar/JARVIS-ChatGPT) | 126 |
|[jerlendds/osintbuddy](https://github.com/jerlendds/osintbuddy) | 125 |
|[ibiscp/LLM-IMDB](https://github.com/ibiscp/LLM-IMDB) | 124 |
|[Teahouse-Studios/akari-bot](https://github.com/Teahouse-Studios/akari-bot) | 124 |
|[hwchase17/chroma-langchain](https://github.com/hwchase17/chroma-langchain) | 124 |
|[menloparklab/langchain-cohere-qdrant-doc-retrieval](https://github.com/menloparklab/langchain-cohere-qdrant-doc-retrieval) | 123 |
|[peterw/StoryStorm](https://github.com/peterw/StoryStorm) | 123 |
|[chakkaradeep/pyCodeAGI](https://github.com/chakkaradeep/pyCodeAGI) | 123 |
|[petehunt/langchain-github-bot](https://github.com/petehunt/langchain-github-bot) | 115 |
|[su77ungr/CASALIOY](https://github.com/su77ungr/CASALIOY) | 113 |
|[eunomia-bpf/GPTtrace](https://github.com/eunomia-bpf/GPTtrace) | 113 |
|[zenml-io/zenml-projects](https://github.com/zenml-io/zenml-projects) | 112 |
|[pablomarin/GPT-Azure-Search-Engine](https://github.com/pablomarin/GPT-Azure-Search-Engine) | 111 |
|[shamspias/customizable-gpt-chatbot](https://github.com/shamspias/customizable-gpt-chatbot) | 109 |
|[WongSaang/chatgpt-ui-server](https://github.com/WongSaang/chatgpt-ui-server) | 108 |
|[davila7/file-gpt](https://github.com/davila7/file-gpt) | 104 |
|[enhancedocs/enhancedocs](https://github.com/enhancedocs/enhancedocs) | 102 |
|[aurelio-labs/arxiv-bot](https://github.com/aurelio-labs/arxiv-bot) | 101 |
|[openai/openai-cookbook](https://github.com/openai/openai-cookbook) | 41047 |
|[LAION-AI/Open-Assistant](https://github.com/LAION-AI/Open-Assistant) | 33983 |
|[microsoft/TaskMatrix](https://github.com/microsoft/TaskMatrix) | 33375 |
|[imartinez/privateGPT](https://github.com/imartinez/privateGPT) | 31114 |
|[hpcaitech/ColossalAI](https://github.com/hpcaitech/ColossalAI) | 30369 |
|[reworkd/AgentGPT](https://github.com/reworkd/AgentGPT) | 24116 |
|[OpenBB-finance/OpenBBTerminal](https://github.com/OpenBB-finance/OpenBBTerminal) | 22565 |
|[openai/chatgpt-retrieval-plugin](https://github.com/openai/chatgpt-retrieval-plugin) | 18375 |
|[jerryjliu/llama_index](https://github.com/jerryjliu/llama_index) | 17723 |
|[mindsdb/mindsdb](https://github.com/mindsdb/mindsdb) | 16958 |
|[mlflow/mlflow](https://github.com/mlflow/mlflow) | 14632 |
|[GaiZhenbiao/ChuanhuChatGPT](https://github.com/GaiZhenbiao/ChuanhuChatGPT) | 11273 |
|[openai/evals](https://github.com/openai/evals) | 10745 |
|[databrickslabs/dolly](https://github.com/databrickslabs/dolly) | 10298 |
|[imClumsyPanda/langchain-ChatGLM](https://github.com/imClumsyPanda/langchain-ChatGLM) | 9838 |
|[logspace-ai/langflow](https://github.com/logspace-ai/langflow) | 9247 |
|[AIGC-Audio/AudioGPT](https://github.com/AIGC-Audio/AudioGPT) | 8768 |
|[PromtEngineer/localGPT](https://github.com/PromtEngineer/localGPT) | 8651 |
|[StanGirard/quivr](https://github.com/StanGirard/quivr) | 8119 |
|[go-skynet/LocalAI](https://github.com/go-skynet/LocalAI) | 7418 |
|[gventuri/pandas-ai](https://github.com/gventuri/pandas-ai) | 7301 |
|[PipedreamHQ/pipedream](https://github.com/PipedreamHQ/pipedream) | 6636 |
|[arc53/DocsGPT](https://github.com/arc53/DocsGPT) | 5849 |
|[e2b-dev/e2b](https://github.com/e2b-dev/e2b) | 5129 |
|[langgenius/dify](https://github.com/langgenius/dify) | 4804 |
|[serge-chat/serge](https://github.com/serge-chat/serge) | 4448 |
|[csunny/DB-GPT](https://github.com/csunny/DB-GPT) | 4350 |
|[wenda-LLM/wenda](https://github.com/wenda-LLM/wenda) | 4268 |
|[zauberzeug/nicegui](https://github.com/zauberzeug/nicegui) | 4244 |
|[intitni/CopilotForXcode](https://github.com/intitni/CopilotForXcode) | 4232 |
|[GreyDGL/PentestGPT](https://github.com/GreyDGL/PentestGPT) | 4154 |
|[madawei2699/myGPTReader](https://github.com/madawei2699/myGPTReader) | 4080 |
|[zilliztech/GPTCache](https://github.com/zilliztech/GPTCache) | 3949 |
|[gkamradt/langchain-tutorials](https://github.com/gkamradt/langchain-tutorials) | 3920 |
|[bentoml/OpenLLM](https://github.com/bentoml/OpenLLM) | 3481 |
|[MineDojo/Voyager](https://github.com/MineDojo/Voyager) | 3453 |
|[mmabrouk/chatgpt-wrapper](https://github.com/mmabrouk/chatgpt-wrapper) | 3355 |
|[postgresml/postgresml](https://github.com/postgresml/postgresml) | 3328 |
|[marqo-ai/marqo](https://github.com/marqo-ai/marqo) | 3100 |
|[kyegomez/tree-of-thoughts](https://github.com/kyegomez/tree-of-thoughts) | 3049 |
|[PrefectHQ/marvin](https://github.com/PrefectHQ/marvin) | 2844 |
|[project-baize/baize-chatbot](https://github.com/project-baize/baize-chatbot) | 2833 |
|[h2oai/h2ogpt](https://github.com/h2oai/h2ogpt) | 2809 |
|[hwchase17/chat-langchain](https://github.com/hwchase17/chat-langchain) | 2809 |
|[whitead/paper-qa](https://github.com/whitead/paper-qa) | 2664 |
|[Azure-Samples/azure-search-openai-demo](https://github.com/Azure-Samples/azure-search-openai-demo) | 2650 |
|[OpenGVLab/InternGPT](https://github.com/OpenGVLab/InternGPT) | 2525 |
|[GerevAI/gerev](https://github.com/GerevAI/gerev) | 2372 |
|[ParisNeo/lollms-webui](https://github.com/ParisNeo/lollms-webui) | 2287 |
|[OpenBMB/BMTools](https://github.com/OpenBMB/BMTools) | 2265 |
|[SamurAIGPT/privateGPT](https://github.com/SamurAIGPT/privateGPT) | 2084 |
|[Chainlit/chainlit](https://github.com/Chainlit/chainlit) | 1912 |
|[Farama-Foundation/PettingZoo](https://github.com/Farama-Foundation/PettingZoo) | 1869 |
|[OpenGVLab/Ask-Anything](https://github.com/OpenGVLab/Ask-Anything) | 1864 |
|[IntelligenzaArtificiale/Free-Auto-GPT](https://github.com/IntelligenzaArtificiale/Free-Auto-GPT) | 1849 |
|[Unstructured-IO/unstructured](https://github.com/Unstructured-IO/unstructured) | 1766 |
|[yanqiangmiffy/Chinese-LangChain](https://github.com/yanqiangmiffy/Chinese-LangChain) | 1745 |
|[NVIDIA/NeMo-Guardrails](https://github.com/NVIDIA/NeMo-Guardrails) | 1732 |
|[hwchase17/notion-qa](https://github.com/hwchase17/notion-qa) | 1716 |
|[paulpierre/RasaGPT](https://github.com/paulpierre/RasaGPT) | 1619 |
|[pinterest/querybook](https://github.com/pinterest/querybook) | 1468 |
|[vocodedev/vocode-python](https://github.com/vocodedev/vocode-python) | 1446 |
|[thomas-yanxin/LangChain-ChatGLM-Webui](https://github.com/thomas-yanxin/LangChain-ChatGLM-Webui) | 1430 |
|[Mintplex-Labs/anything-llm](https://github.com/Mintplex-Labs/anything-llm) | 1419 |
|[Kav-K/GPTDiscord](https://github.com/Kav-K/GPTDiscord) | 1416 |
|[lunasec-io/lunasec](https://github.com/lunasec-io/lunasec) | 1327 |
|[psychic-api/psychic](https://github.com/psychic-api/psychic) | 1307 |
|[jina-ai/thinkgpt](https://github.com/jina-ai/thinkgpt) | 1242 |
|[agiresearch/OpenAGI](https://github.com/agiresearch/OpenAGI) | 1239 |
|[ttengwang/Caption-Anything](https://github.com/ttengwang/Caption-Anything) | 1203 |
|[jina-ai/dev-gpt](https://github.com/jina-ai/dev-gpt) | 1179 |
|[keephq/keep](https://github.com/keephq/keep) | 1169 |
|[greshake/llm-security](https://github.com/greshake/llm-security) | 1156 |
|[richardyc/Chrome-GPT](https://github.com/richardyc/Chrome-GPT) | 1090 |
|[jina-ai/langchain-serve](https://github.com/jina-ai/langchain-serve) | 1088 |
|[mmz-001/knowledge_gpt](https://github.com/mmz-001/knowledge_gpt) | 1074 |
|[juncongmoo/chatllama](https://github.com/juncongmoo/chatllama) | 1057 |
|[noahshinn024/reflexion](https://github.com/noahshinn024/reflexion) | 1045 |
|[visual-openllm/visual-openllm](https://github.com/visual-openllm/visual-openllm) | 1036 |
|[101dotxyz/GPTeam](https://github.com/101dotxyz/GPTeam) | 999 |
|[poe-platform/api-bot-tutorial](https://github.com/poe-platform/api-bot-tutorial) | 989 |
|[irgolic/AutoPR](https://github.com/irgolic/AutoPR) | 974 |
|[homanp/superagent](https://github.com/homanp/superagent) | 970 |
|[microsoft/X-Decoder](https://github.com/microsoft/X-Decoder) | 941 |
|[peterw/Chat-with-Github-Repo](https://github.com/peterw/Chat-with-Github-Repo) | 896 |
|[SamurAIGPT/Camel-AutoGPT](https://github.com/SamurAIGPT/Camel-AutoGPT) | 856 |
|[cirediatpl/FigmaChain](https://github.com/cirediatpl/FigmaChain) | 840 |
|[chatarena/chatarena](https://github.com/chatarena/chatarena) | 829 |
|[rlancemartin/auto-evaluator](https://github.com/rlancemartin/auto-evaluator) | 816 |
|[seanpixel/Teenage-AGI](https://github.com/seanpixel/Teenage-AGI) | 816 |
|[hashintel/hash](https://github.com/hashintel/hash) | 806 |
|[corca-ai/EVAL](https://github.com/corca-ai/EVAL) | 790 |
|[eyurtsev/kor](https://github.com/eyurtsev/kor) | 752 |
|[cheshire-cat-ai/core](https://github.com/cheshire-cat-ai/core) | 713 |
|[e-johnstonn/BriefGPT](https://github.com/e-johnstonn/BriefGPT) | 686 |
|[run-llama/llama-lab](https://github.com/run-llama/llama-lab) | 685 |
|[refuel-ai/autolabel](https://github.com/refuel-ai/autolabel) | 673 |
|[griptape-ai/griptape](https://github.com/griptape-ai/griptape) | 617 |
|[billxbf/ReWOO](https://github.com/billxbf/ReWOO) | 616 |
|[Anil-matcha/ChatPDF](https://github.com/Anil-matcha/ChatPDF) | 609 |
|[NimbleBoxAI/ChainFury](https://github.com/NimbleBoxAI/ChainFury) | 592 |
|[getmetal/motorhead](https://github.com/getmetal/motorhead) | 581 |
|[ajndkr/lanarky](https://github.com/ajndkr/lanarky) | 574 |
|[namuan/dr-doc-search](https://github.com/namuan/dr-doc-search) | 572 |
|[kreneskyp/ix](https://github.com/kreneskyp/ix) | 564 |
|[akshata29/chatpdf](https://github.com/akshata29/chatpdf) | 540 |
|[hwchase17/chat-your-data](https://github.com/hwchase17/chat-your-data) | 540 |
|[whyiyhw/chatgpt-wechat](https://github.com/whyiyhw/chatgpt-wechat) | 537 |
|[khoj-ai/khoj](https://github.com/khoj-ai/khoj) | 531 |
|[SamurAIGPT/ChatGPT-Developer-Plugins](https://github.com/SamurAIGPT/ChatGPT-Developer-Plugins) | 528 |
|[microsoft/PodcastCopilot](https://github.com/microsoft/PodcastCopilot) | 526 |
|[ruoccofabrizio/azure-open-ai-embeddings-qna](https://github.com/ruoccofabrizio/azure-open-ai-embeddings-qna) | 515 |
|[alexanderatallah/window.ai](https://github.com/alexanderatallah/window.ai) | 494 |
|[StevenGrove/GPT4Tools](https://github.com/StevenGrove/GPT4Tools) | 483 |
|[jina-ai/agentchain](https://github.com/jina-ai/agentchain) | 472 |
|[mckaywrigley/repo-chat](https://github.com/mckaywrigley/repo-chat) | 465 |
|[yeagerai/yeagerai-agent](https://github.com/yeagerai/yeagerai-agent) | 464 |
|[langchain-ai/langchain-aiplugin](https://github.com/langchain-ai/langchain-aiplugin) | 464 |
|[mpaepper/content-chatbot](https://github.com/mpaepper/content-chatbot) | 455 |
|[michaelthwan/searchGPT](https://github.com/michaelthwan/searchGPT) | 455 |
|[freddyaboulton/gradio-tools](https://github.com/freddyaboulton/gradio-tools) | 450 |
|[amosjyng/langchain-visualizer](https://github.com/amosjyng/langchain-visualizer) | 446 |
|[msoedov/langcorn](https://github.com/msoedov/langcorn) | 445 |
|[plastic-labs/tutor-gpt](https://github.com/plastic-labs/tutor-gpt) | 426 |
|[poe-platform/poe-protocol](https://github.com/poe-platform/poe-protocol) | 426 |
|[jonra1993/fastapi-alembic-sqlmodel-async](https://github.com/jonra1993/fastapi-alembic-sqlmodel-async) | 418 |
|[langchain-ai/auto-evaluator](https://github.com/langchain-ai/auto-evaluator) | 416 |
|[steamship-core/steamship-langchain](https://github.com/steamship-core/steamship-langchain) | 401 |
|[xuwenhao/geektime-ai-course](https://github.com/xuwenhao/geektime-ai-course) | 400 |
|[continuum-llms/chatgpt-memory](https://github.com/continuum-llms/chatgpt-memory) | 386 |
|[mtenenholtz/chat-twitter](https://github.com/mtenenholtz/chat-twitter) | 382 |
|[explosion/spacy-llm](https://github.com/explosion/spacy-llm) | 368 |
|[showlab/VLog](https://github.com/showlab/VLog) | 363 |
|[yvann-hub/Robby-chatbot](https://github.com/yvann-hub/Robby-chatbot) | 363 |
|[daodao97/chatdoc](https://github.com/daodao97/chatdoc) | 361 |
|[opentensor/bittensor](https://github.com/opentensor/bittensor) | 360 |
|[alejandro-ao/langchain-ask-pdf](https://github.com/alejandro-ao/langchain-ask-pdf) | 355 |
|[logan-markewich/llama_index_starter_pack](https://github.com/logan-markewich/llama_index_starter_pack) | 351 |
|[jupyterlab/jupyter-ai](https://github.com/jupyterlab/jupyter-ai) | 348 |
|[alejandro-ao/ask-multiple-pdfs](https://github.com/alejandro-ao/ask-multiple-pdfs) | 321 |
|[andylokandy/gpt-4-search](https://github.com/andylokandy/gpt-4-search) | 314 |
|[mosaicml/examples](https://github.com/mosaicml/examples) | 313 |
|[personoids/personoids-lite](https://github.com/personoids/personoids-lite) | 306 |
|[itamargol/openai](https://github.com/itamargol/openai) | 304 |
|[Anil-matcha/Website-to-Chatbot](https://github.com/Anil-matcha/Website-to-Chatbot) | 299 |
|[momegas/megabots](https://github.com/momegas/megabots) | 299 |
|[BlackHC/llm-strategy](https://github.com/BlackHC/llm-strategy) | 289 |
|[daveebbelaar/langchain-experiments](https://github.com/daveebbelaar/langchain-experiments) | 283 |
|[wandb/weave](https://github.com/wandb/weave) | 279 |
|[Cheems-Seminar/grounded-segment-any-parts](https://github.com/Cheems-Seminar/grounded-segment-any-parts) | 273 |
|[jerlendds/osintbuddy](https://github.com/jerlendds/osintbuddy) | 271 |
|[OpenBMB/AgentVerse](https://github.com/OpenBMB/AgentVerse) | 270 |
|[MagnivOrg/prompt-layer-library](https://github.com/MagnivOrg/prompt-layer-library) | 269 |
|[sullivan-sean/chat-langchainjs](https://github.com/sullivan-sean/chat-langchainjs) | 259 |
|[Azure-Samples/openai](https://github.com/Azure-Samples/openai) | 252 |
|[bborn/howdoi.ai](https://github.com/bborn/howdoi.ai) | 248 |
|[hnawaz007/pythondataanalysis](https://github.com/hnawaz007/pythondataanalysis) | 247 |
|[conceptofmind/toolformer](https://github.com/conceptofmind/toolformer) | 243 |
|[truera/trulens](https://github.com/truera/trulens) | 239 |
|[ur-whitelab/exmol](https://github.com/ur-whitelab/exmol) | 238 |
|[intel/intel-extension-for-transformers](https://github.com/intel/intel-extension-for-transformers) | 237 |
|[monarch-initiative/ontogpt](https://github.com/monarch-initiative/ontogpt) | 236 |
|[wandb/edu](https://github.com/wandb/edu) | 231 |
|[recalign/RecAlign](https://github.com/recalign/RecAlign) | 229 |
|[alvarosevilla95/autolang](https://github.com/alvarosevilla95/autolang) | 223 |
|[kaleido-lab/dolphin](https://github.com/kaleido-lab/dolphin) | 221 |
|[JohnSnowLabs/nlptest](https://github.com/JohnSnowLabs/nlptest) | 220 |
|[paolorechia/learn-langchain](https://github.com/paolorechia/learn-langchain) | 219 |
|[Safiullah-Rahu/CSV-AI](https://github.com/Safiullah-Rahu/CSV-AI) | 215 |
|[Haste171/langchain-chatbot](https://github.com/Haste171/langchain-chatbot) | 215 |
|[steamship-packages/langchain-agent-production-starter](https://github.com/steamship-packages/langchain-agent-production-starter) | 214 |
|[airobotlab/KoChatGPT](https://github.com/airobotlab/KoChatGPT) | 213 |
|[filip-michalsky/SalesGPT](https://github.com/filip-michalsky/SalesGPT) | 211 |
|[marella/chatdocs](https://github.com/marella/chatdocs) | 207 |
|[su77ungr/CASALIOY](https://github.com/su77ungr/CASALIOY) | 200 |
|[shaman-ai/agent-actors](https://github.com/shaman-ai/agent-actors) | 195 |
|[plchld/InsightFlow](https://github.com/plchld/InsightFlow) | 189 |
|[jbrukh/gpt-jargon](https://github.com/jbrukh/gpt-jargon) | 186 |
|[hwchase17/langchain-streamlit-template](https://github.com/hwchase17/langchain-streamlit-template) | 185 |
|[huchenxucs/ChatDB](https://github.com/huchenxucs/ChatDB) | 179 |
|[benthecoder/ClassGPT](https://github.com/benthecoder/ClassGPT) | 178 |
|[hwchase17/chroma-langchain](https://github.com/hwchase17/chroma-langchain) | 178 |
|[radi-cho/datasetGPT](https://github.com/radi-cho/datasetGPT) | 177 |
|[jiran214/GPT-vup](https://github.com/jiran214/GPT-vup) | 176 |
|[rsaryev/talk-codebase](https://github.com/rsaryev/talk-codebase) | 174 |
|[edreisMD/plugnplai](https://github.com/edreisMD/plugnplai) | 174 |
|[gia-guar/JARVIS-ChatGPT](https://github.com/gia-guar/JARVIS-ChatGPT) | 172 |
|[hardbyte/qabot](https://github.com/hardbyte/qabot) | 171 |
|[shamspias/customizable-gpt-chatbot](https://github.com/shamspias/customizable-gpt-chatbot) | 165 |
|[gustavz/DataChad](https://github.com/gustavz/DataChad) | 164 |
|[yasyf/compress-gpt](https://github.com/yasyf/compress-gpt) | 163 |
|[SamPink/dev-gpt](https://github.com/SamPink/dev-gpt) | 161 |
|[yuanjie-ai/ChatLLM](https://github.com/yuanjie-ai/ChatLLM) | 161 |
|[pablomarin/GPT-Azure-Search-Engine](https://github.com/pablomarin/GPT-Azure-Search-Engine) | 160 |
|[jondurbin/airoboros](https://github.com/jondurbin/airoboros) | 157 |
|[fengyuli-dev/multimedia-gpt](https://github.com/fengyuli-dev/multimedia-gpt) | 157 |
|[PradipNichite/Youtube-Tutorials](https://github.com/PradipNichite/Youtube-Tutorials) | 156 |
|[nicknochnack/LangchainDocuments](https://github.com/nicknochnack/LangchainDocuments) | 155 |
|[ethanyanjiali/minChatGPT](https://github.com/ethanyanjiali/minChatGPT) | 155 |
|[ccurme/yolopandas](https://github.com/ccurme/yolopandas) | 154 |
|[chakkaradeep/pyCodeAGI](https://github.com/chakkaradeep/pyCodeAGI) | 153 |
|[preset-io/promptimize](https://github.com/preset-io/promptimize) | 150 |
|[onlyphantom/llm-python](https://github.com/onlyphantom/llm-python) | 148 |
|[Azure-Samples/azure-search-power-skills](https://github.com/Azure-Samples/azure-search-power-skills) | 146 |
|[realminchoi/babyagi-ui](https://github.com/realminchoi/babyagi-ui) | 144 |
|[microsoft/azure-openai-in-a-day-workshop](https://github.com/microsoft/azure-openai-in-a-day-workshop) | 144 |
|[jmpaz/promptlib](https://github.com/jmpaz/promptlib) | 143 |
|[shauryr/S2QA](https://github.com/shauryr/S2QA) | 142 |
|[handrew/browserpilot](https://github.com/handrew/browserpilot) | 141 |
|[Jaseci-Labs/jaseci](https://github.com/Jaseci-Labs/jaseci) | 140 |
|[Klingefjord/chatgpt-telegram](https://github.com/Klingefjord/chatgpt-telegram) | 140 |
|[WongSaang/chatgpt-ui-server](https://github.com/WongSaang/chatgpt-ui-server) | 139 |
|[ibiscp/LLM-IMDB](https://github.com/ibiscp/LLM-IMDB) | 139 |
|[menloparklab/langchain-cohere-qdrant-doc-retrieval](https://github.com/menloparklab/langchain-cohere-qdrant-doc-retrieval) | 138 |
|[hirokidaichi/wanna](https://github.com/hirokidaichi/wanna) | 137 |
|[steamship-core/vercel-examples](https://github.com/steamship-core/vercel-examples) | 137 |
|[deeppavlov/dream](https://github.com/deeppavlov/dream) | 136 |
|[miaoshouai/miaoshouai-assistant](https://github.com/miaoshouai/miaoshouai-assistant) | 135 |
|[sugarforever/LangChain-Tutorials](https://github.com/sugarforever/LangChain-Tutorials) | 135 |
|[yasyf/summ](https://github.com/yasyf/summ) | 135 |
|[peterw/StoryStorm](https://github.com/peterw/StoryStorm) | 134 |
|[vaibkumr/prompt-optimizer](https://github.com/vaibkumr/prompt-optimizer) | 132 |
|[ju-bezdek/langchain-decorators](https://github.com/ju-bezdek/langchain-decorators) | 130 |
|[homanp/vercel-langchain](https://github.com/homanp/vercel-langchain) | 128 |
|[Teahouse-Studios/akari-bot](https://github.com/Teahouse-Studios/akari-bot) | 127 |
|[petehunt/langchain-github-bot](https://github.com/petehunt/langchain-github-bot) | 125 |
|[eunomia-bpf/GPTtrace](https://github.com/eunomia-bpf/GPTtrace) | 122 |
|[fixie-ai/fixie-examples](https://github.com/fixie-ai/fixie-examples) | 122 |
|[Aggregate-Intellect/practical-llms](https://github.com/Aggregate-Intellect/practical-llms) | 120 |
|[davila7/file-gpt](https://github.com/davila7/file-gpt) | 120 |
|[Azure-Samples/azure-search-openai-demo-csharp](https://github.com/Azure-Samples/azure-search-openai-demo-csharp) | 119 |
|[prof-frink-lab/slangchain](https://github.com/prof-frink-lab/slangchain) | 117 |
|[aurelio-labs/arxiv-bot](https://github.com/aurelio-labs/arxiv-bot) | 117 |
|[zenml-io/zenml-projects](https://github.com/zenml-io/zenml-projects) | 116 |
|[flurb18/AgentOoba](https://github.com/flurb18/AgentOoba) | 114 |
|[kaarthik108/snowChat](https://github.com/kaarthik108/snowChat) | 112 |
|[RedisVentures/redis-openai-qna](https://github.com/RedisVentures/redis-openai-qna) | 111 |
|[solana-labs/chatgpt-plugin](https://github.com/solana-labs/chatgpt-plugin) | 111 |
|[kulltc/chatgpt-sql](https://github.com/kulltc/chatgpt-sql) | 109 |
|[summarizepaper/summarizepaper](https://github.com/summarizepaper/summarizepaper) | 109 |
|[Azure-Samples/miyagi](https://github.com/Azure-Samples/miyagi) | 106 |
|[ssheng/BentoChain](https://github.com/ssheng/BentoChain) | 106 |
|[voxel51/voxelgpt](https://github.com/voxel51/voxelgpt) | 105 |
|[mallahyari/drqa](https://github.com/mallahyari/drqa) | 103 |

View File

@@ -120,7 +120,8 @@
" history = []\n",
" while True:\n",
" user_input = input(\"\\n>>> input >>>\\n>>>: \")\n",
" if user_input == 'q': break\n",
" if user_input == \"q\":\n",
" break\n",
" history.append(HumanMessage(content=user_input))\n",
" history.append(llm(history))"
]

View File

@@ -1,17 +1,17 @@
# Databerry
# Chaindesk
>[Databerry](https://databerry.ai) is an [open source](https://github.com/gmpetrov/databerry) document retrieval platform that helps to connect your personal data with Large Language Models.
>[Chaindesk](https://chaindesk.ai) is an [open source](https://github.com/gmpetrov/databerry) document retrieval platform that helps to connect your personal data with Large Language Models.
## Installation and Setup
We need to sign up for Databerry, create a datastore, add some data and get your datastore api endpoint url.
We need the [API Key](https://docs.databerry.ai/api-reference/authentication).
We need to sign up for Chaindesk, create a datastore, add some data and get your datastore api endpoint url.
We need the [API Key](https://docs.chaindesk.ai/api-reference/authentication).
## Retriever
See a [usage example](/docs/modules/data_connection/retrievers/integrations/databerry.html).
See a [usage example](/docs/modules/data_connection/retrievers/integrations/chaindesk.html).
```python
from langchain.retrievers import DataberryRetriever
from langchain.retrievers import ChaindeskRetriever
```

View File

@@ -8,7 +8,7 @@ pip install cnos-connector
```
## Connecting to CnosDB
You can connect to CnosDB using the SQLDatabase.from_cnosdb() method.
You can connect to CnosDB using the `SQLDatabase.from_cnosdb()` method.
### Syntax
```python
def SQLDatabase.from_cnosdb(url: str = "127.0.0.1:8902",
@@ -31,7 +31,6 @@ Args:
## Examples
```python
# Connecting to CnosDB with SQLDatabase Wrapper
from cnosdb_connector import make_cnosdb_langchain_uri
from langchain import SQLDatabase
db = SQLDatabase.from_cnosdb()
@@ -43,7 +42,7 @@ from langchain.chat_models import ChatOpenAI
llm = ChatOpenAI(temperature=0, model_name="gpt-3.5-turbo")
```
### SQL Chain
### SQL Database Chain
This example demonstrates the use of the SQL Chain for answering a question over a CnosDB.
```python
from langchain import SQLDatabaseChain
@@ -51,15 +50,15 @@ from langchain import SQLDatabaseChain
db_chain = SQLDatabaseChain.from_llm(llm, db, verbose=True)
db_chain.run(
"What is the average fa of test table that time between November 3,2022 and November 4, 2022?"
"What is the average temperature of air at station XiaoMaiDao between October 19, 2022 and Occtober 20, 2022?"
)
```
```shell
> Entering new chain...
What is the average fa of test table that time between November 3, 2022 and November 4, 2022?
SQLQuery:SELECT AVG(fa) FROM test WHERE time >= '2022-11-03' AND time < '2022-11-04'
SQLResult: [(2.0,)]
Answer:The average fa of the test table between November 3, 2022, and November 4, 2022, is 2.0.
What is the average temperature of air at station XiaoMaiDao between October 19, 2022 and Occtober 20, 2022?
SQLQuery:SELECT AVG(temperature) FROM air WHERE station = 'XiaoMaiDao' AND time >= '2022-10-19' AND time < '2022-10-20'
SQLResult: [(68.0,)]
Answer:The average temperature of air at station XiaoMaiDao between October 19, 2022 and October 20, 2022 is 68.0.
> Finished chain.
```
### SQL Database Agent
@@ -73,36 +72,39 @@ agent = create_sql_agent(llm=llm, toolkit=toolkit, verbose=True)
```
```python
agent.run(
"What is the average fa of test table that time between November 3, 2022 and November 4, 2022?"
"What is the average temperature of air at station XiaoMaiDao between October 19, 2022 and Occtober 20, 2022?"
)
```
```shell
> Entering new chain...
Action: sql_db_list_tables
Action Input: ""
Observation: test
Thought:The relevant table is "test". I should query the schema of this table to see the column names.
Observation: air
Thought:The "air" table seems relevant to the question. I should query the schema of the "air" table to see what columns are available.
Action: sql_db_schema
Action Input: "test"
Action Input: "air"
Observation:
CREATE TABLE test (
CREATE TABLE air (
pressure FLOAT,
station STRING,
temperature FLOAT,
time TIMESTAMP,
fa BIGINT
visibility FLOAT
)
/*
3 rows from test table:
fa time
1 2022-11-03T06:20:11
2 2022-11-03T06:20:11.000000001
3 2022-11-03T06:20:11.000000002
3 rows from air table:
pressure station temperature time visibility
75.0 XiaoMaiDao 67.0 2022-10-19T03:40:00 54.0
77.0 XiaoMaiDao 69.0 2022-10-19T04:40:00 56.0
76.0 XiaoMaiDao 68.0 2022-10-19T05:40:00 55.0
*/
Thought:The relevant column is "fa" in the "test" table. I can now construct the query to calculate the average "fa" between the specified time range.
Thought:The "temperature" column in the "air" table is relevant to the question. I can query the average temperature between the specified dates.
Action: sql_db_query
Action Input: "SELECT AVG(fa) FROM test WHERE time >= '2022-11-03' AND time < '2022-11-04'"
Observation: [(2.0,)]
Thought:The average "fa" of the "test" table between November 3, 2022 and November 4, 2022 is 2.0.
Final Answer: 2.0
Action Input: "SELECT AVG(temperature) FROM air WHERE station = 'XiaoMaiDao' AND time >= '2022-10-19' AND time <= '2022-10-20'"
Observation: [(68.0,)]
Thought:The average temperature of air at station XiaoMaiDao between October 19, 2022 and October 20, 2022 is 68.0.
Final Answer: 68.0
> Finished chain.
```

View File

@@ -0,0 +1,19 @@
# Datadog Logs
>[Datadog](https://www.datadoghq.com/) is a monitoring and analytics platform for cloud-scale applications.
## Installation and Setup
```bash
pip install datadog_api_client
```
We must initialize the loader with the Datadog API key and APP key, and we need to set up the query to extract the desired logs.
## Document Loader
See a [usage example](/docs/modules/data_connection/document_loaders/integrations/datadog_logs.html).
```python
from langchain.document_loaders import DatadogLogsLoader
```

View File

@@ -28,4 +28,4 @@ To import this vectorstore:
from langchain.vectorstores import Marqo
```
For a more detailed walkthrough of the Marqo wrapper and some of its unique features, see [this notebook](../modules/data_connection/vectorstores/integrations/marqo.ipynb)
For a more detailed walkthrough of the Marqo wrapper and some of its unique features, see [this notebook](/docs/modules/data_connection/vectorstores/integrations/marqo.html)

View File

@@ -1,6 +1,6 @@
# YouTube
>[YouTube](https://www.youtube.com/) is an online video sharing and social media platform created by Google.
>[YouTube](https://www.youtube.com/) is an online video sharing and social media platform by Google.
> We download the `YouTube` transcripts and video information.
## Installation and Setup

View File

@@ -117,11 +117,11 @@
"\n",
"\n",
"# Initialize the language model\n",
"# You can add your own OpenAI API key by adding openai_api_key=\"<your_api_key>\" \n",
"# You can add your own OpenAI API key by adding openai_api_key=\"<your_api_key>\"\n",
"llm = ChatOpenAI(temperature=0, model=\"gpt-3.5-turbo-0613\")\n",
"\n",
"# Initialize the SerpAPIWrapper for search functionality\n",
"#Replace <your_api_key> in openai_api_key=\"<your_api_key>\" with your actual SerpAPI key.\n",
"# Replace <your_api_key> in openai_api_key=\"<your_api_key>\" with your actual SerpAPI key.\n",
"search = SerpAPIWrapper()\n",
"\n",
"# Define a list of tools offered by the agent\n",
@@ -130,7 +130,7 @@
" name=\"Search\",\n",
" func=search.run,\n",
" coroutine=search.arun,\n",
" description=\"Useful when you need to answer questions about current events. You should ask targeted questions.\"\n",
" description=\"Useful when you need to answer questions about current events. You should ask targeted questions.\",\n",
" ),\n",
"]"
]
@@ -143,8 +143,12 @@
},
"outputs": [],
"source": [
"functions_agent = initialize_agent(tools, llm, agent=AgentType.OPENAI_MULTI_FUNCTIONS, verbose=False)\n",
"conversations_agent = initialize_agent(tools, llm, agent=AgentType.CHAT_ZERO_SHOT_REACT_DESCRIPTION, verbose=False)"
"functions_agent = initialize_agent(\n",
" tools, llm, agent=AgentType.OPENAI_MULTI_FUNCTIONS, verbose=False\n",
")\n",
"conversations_agent = initialize_agent(\n",
" tools, llm, agent=AgentType.CHAT_ZERO_SHOT_REACT_DESCRIPTION, verbose=False\n",
")"
]
},
{
@@ -193,20 +197,20 @@
"\n",
"results = []\n",
"agents = [functions_agent, conversations_agent]\n",
"concurrency_level = 6 # How many concurrent agents to run. May need to decrease if OpenAI is rate limiting.\n",
"concurrency_level = 6 # How many concurrent agents to run. May need to decrease if OpenAI is rate limiting.\n",
"\n",
"# We will only run the first 20 examples of this dataset to speed things up\n",
"# This will lead to larger confidence intervals downstream.\n",
"batch = []\n",
"for example in tqdm(dataset[:20]):\n",
" batch.extend([agent.acall(example['inputs']) for agent in agents])\n",
" batch.extend([agent.acall(example[\"inputs\"]) for agent in agents])\n",
" if len(batch) >= concurrency_level:\n",
" batch_results = await asyncio.gather(*batch, return_exceptions=True)\n",
" results.extend(list(zip(*[iter(batch_results)]*2)))\n",
" results.extend(list(zip(*[iter(batch_results)] * 2)))\n",
" batch = []\n",
"if batch:\n",
" batch_results = await asyncio.gather(*batch, return_exceptions=True)\n",
" results.extend(list(zip(*[iter(batch_results)]*2)))"
" results.extend(list(zip(*[iter(batch_results)] * 2)))"
]
},
{
@@ -230,11 +234,12 @@
"source": [
"import random\n",
"\n",
"\n",
"def predict_preferences(dataset, results) -> list:\n",
" preferences = []\n",
"\n",
" for example, (res_a, res_b) in zip(dataset, results):\n",
" input_ = example['inputs']\n",
" input_ = example[\"inputs\"]\n",
" # Flip a coin to reduce persistent position bias\n",
" if random.random() < 0.5:\n",
" pred_a, pred_b = res_a, res_b\n",
@@ -243,16 +248,16 @@
" pred_a, pred_b = res_b, res_a\n",
" a, b = \"b\", \"a\"\n",
" eval_res = eval_chain.evaluate_string_pairs(\n",
" prediction=pred_a['output'] if isinstance(pred_a, dict) else str(pred_a),\n",
" prediction_b=pred_b['output'] if isinstance(pred_b, dict) else str(pred_b),\n",
" input=input_\n",
" prediction=pred_a[\"output\"] if isinstance(pred_a, dict) else str(pred_a),\n",
" prediction_b=pred_b[\"output\"] if isinstance(pred_b, dict) else str(pred_b),\n",
" input=input_,\n",
" )\n",
" if eval_res[\"value\"] == \"A\":\n",
" preferences.append(a)\n",
" elif eval_res[\"value\"] == \"B\":\n",
" preferences.append(b)\n",
" else:\n",
" preferences.append(None) # No preference\n",
" preferences.append(None) # No preference\n",
" return preferences"
]
},
@@ -298,10 +303,7 @@
" \"b\": \"Structured Chat Agent\",\n",
"}\n",
"counts = Counter(preferences)\n",
"pref_ratios = {\n",
" k: v/len(preferences) for k, v in\n",
" counts.items()\n",
"}\n",
"pref_ratios = {k: v / len(preferences) for k, v in counts.items()}\n",
"for k, v in pref_ratios.items():\n",
" print(f\"{name_map.get(k)}: {v:.2%}\")"
]
@@ -327,13 +329,16 @@
"source": [
"from math import sqrt\n",
"\n",
"def wilson_score_interval(preferences: list, which: str = \"a\", z: float = 1.96) -> tuple:\n",
"\n",
"def wilson_score_interval(\n",
" preferences: list, which: str = \"a\", z: float = 1.96\n",
") -> tuple:\n",
" \"\"\"Estimate the confidence interval using the Wilson score.\n",
" \n",
"\n",
" See: https://en.wikipedia.org/wiki/Binomial_proportion_confidence_interval#Wilson_score_interval\n",
" for more details, including when to use it and when it should not be used.\n",
" \"\"\"\n",
" total_preferences = preferences.count('a') + preferences.count('b')\n",
" total_preferences = preferences.count(\"a\") + preferences.count(\"b\")\n",
" n_s = preferences.count(which)\n",
"\n",
" if total_preferences == 0:\n",
@@ -342,8 +347,11 @@
" p_hat = n_s / total_preferences\n",
"\n",
" denominator = 1 + (z**2) / total_preferences\n",
" adjustment = (z / denominator) * sqrt(p_hat*(1-p_hat)/total_preferences + (z**2)/(4*total_preferences*total_preferences))\n",
" center = (p_hat + (z**2) / (2*total_preferences)) / denominator\n",
" adjustment = (z / denominator) * sqrt(\n",
" p_hat * (1 - p_hat) / total_preferences\n",
" + (z**2) / (4 * total_preferences * total_preferences)\n",
" )\n",
" center = (p_hat + (z**2) / (2 * total_preferences)) / denominator\n",
" lower_bound = min(max(center - adjustment, 0.0), 1.0)\n",
" upper_bound = min(max(center + adjustment, 0.0), 1.0)\n",
"\n",
@@ -369,7 +377,9 @@
"source": [
"for which_, name in name_map.items():\n",
" low, high = wilson_score_interval(preferences, which=which_)\n",
" print(f'The \"{name}\" would be preferred between {low:.2%} and {high:.2%} percent of the time (with 95% confidence).')"
" print(\n",
" f'The \"{name}\" would be preferred between {low:.2%} and {high:.2%} percent of the time (with 95% confidence).'\n",
" )"
]
},
{
@@ -398,13 +408,16 @@
],
"source": [
"from scipy import stats\n",
"\n",
"preferred_model = max(pref_ratios, key=pref_ratios.get)\n",
"successes = preferences.count(preferred_model)\n",
"n = len(preferences) - preferences.count(None)\n",
"p_value = stats.binom_test(successes, n, p=0.5, alternative='two-sided')\n",
"print(f\"\"\"The p-value is {p_value:.5f}. If the null hypothesis is true (i.e., if the selected eval chain actually has no preference between the models),\n",
"p_value = stats.binom_test(successes, n, p=0.5, alternative=\"two-sided\")\n",
"print(\n",
" f\"\"\"The p-value is {p_value:.5f}. If the null hypothesis is true (i.e., if the selected eval chain actually has no preference between the models),\n",
"then there is a {p_value:.5%} chance of observing the {name_map.get(preferred_model)} be preferred at least {successes}\n",
"times out of {n} trials.\"\"\")"
"times out of {n} trials.\"\"\"\n",
")"
]
},
{

View File

@@ -54,7 +54,7 @@
"outputs": [],
"source": [
"llm = ChatOpenAI(temperature=0)\n",
"query=\"What's the origin of the term synecdoche?\"\n",
"query = \"What's the origin of the term synecdoche?\"\n",
"prediction = llm.predict(query)"
]
},
@@ -151,19 +151,22 @@
}
],
"source": [
"eval_chain = CriteriaEvalChain.from_llm(llm=llm, criteria=\"correctness\", requires_reference=True)\n",
"eval_chain = CriteriaEvalChain.from_llm(\n",
" llm=llm, criteria=\"correctness\", requires_reference=True\n",
")\n",
"\n",
"# We can even override the model's learned knowledge using ground truth labels\n",
"eval_result = eval_chain.evaluate_strings(\n",
" input=\"What is the capital of the US?\",\n",
" prediction=\"Topeka, KS\", \n",
" reference=\"The capital of the US is Topeka, KS, where it permanently moved from Washington D.C. on May 16, 2023\")\n",
" prediction=\"Topeka, KS\",\n",
" reference=\"The capital of the US is Topeka, KS, where it permanently moved from Washington D.C. on May 16, 2023\",\n",
")\n",
"print(f'With ground truth: {eval_result[\"score\"]}')\n",
"\n",
"eval_chain = CriteriaEvalChain.from_llm(llm=llm, criteria=\"correctness\")\n",
"eval_result = eval_chain.evaluate_strings(\n",
" input=\"What is the capital of the US?\",\n",
" prediction=\"Topeka, KS\", \n",
" prediction=\"Topeka, KS\",\n",
")\n",
"print(f'Withoutg ground truth: {eval_result[\"score\"]}')"
]
@@ -230,9 +233,7 @@
}
],
"source": [
"custom_criterion = {\n",
" \"numeric\": \"Does the output contain numeric information?\"\n",
"}\n",
"custom_criterion = {\"numeric\": \"Does the output contain numeric information?\"}\n",
"\n",
"eval_chain = CriteriaEvalChain.from_llm(llm=llm, criteria=custom_criterion)\n",
"eval_result = eval_chain.evaluate_strings(prediction=prediction, input=query)\n",
@@ -269,11 +270,17 @@
"\n",
"# Example that complies\n",
"query = \"What's the population of lagos?\"\n",
"eval_result = eval_chain.evaluate_strings(prediction=\"I think that's a great question, you're really curious! About 30 million people live in Lagos, Nigeria, as of 2023.\", input=query)\n",
"eval_result = eval_chain.evaluate_strings(\n",
" prediction=\"I think that's a great question, you're really curious! About 30 million people live in Lagos, Nigeria, as of 2023.\",\n",
" input=query,\n",
")\n",
"print(\"Meets criteria: \", eval_result[\"score\"])\n",
"\n",
"# Example that does not comply\n",
"eval_result = eval_chain.evaluate_strings(prediction=\"The population of Lagos, Nigeria, is about 30 million people.\", input=query)\n",
"eval_result = eval_chain.evaluate_strings(\n",
" prediction=\"The population of Lagos, Nigeria, is about 30 million people.\",\n",
" input=query,\n",
")\n",
"print(\"Does not meet criteria: \", eval_result[\"score\"])"
]
},
@@ -350,8 +357,13 @@
}
],
"source": [
"eval_chain = CriteriaEvalChain.from_llm(llm=llm, criteria=[PRINCIPLES[\"harmful1\"], PRINCIPLES[\"harmful2\"]])\n",
"eval_result = eval_chain.evaluate_strings(prediction=\"I say that man is a lilly-livered nincompoop\", input=\"What do you think of Will?\")\n",
"eval_chain = CriteriaEvalChain.from_llm(\n",
" llm=llm, criteria=[PRINCIPLES[\"harmful1\"], PRINCIPLES[\"harmful2\"]]\n",
")\n",
"eval_result = eval_chain.evaluate_strings(\n",
" prediction=\"I say that man is a lilly-livered nincompoop\",\n",
" input=\"What do you think of Will?\",\n",
")\n",
"eval_result"
]
},

View File

@@ -339,9 +339,9 @@
" agent_trajectory=test_outputs_one[\"intermediate_steps\"],\n",
" reference=(\n",
" \"You need many more than 100,000 ping-pong balls in the empire state building.\"\n",
" )\n",
" ),\n",
")\n",
" \n",
"\n",
"\n",
"print(\"Score from 1 to 5: \", evaluation[\"score\"])\n",
"print(\"Reasoning: \", evaluation[\"reasoning\"])"

View File

@@ -16,7 +16,7 @@
},
{
"cell_type": "code",
"execution_count": 2,
"execution_count": 1,
"id": "c0a83623",
"metadata": {},
"outputs": [],
@@ -38,6 +38,20 @@
">This initializes the SerpAPIWrapper for search functionality (search).\n"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "a2b0a215",
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"\n",
"os.environ[\n",
" \"SERPAPI_API_KEY\"\n",
"] = \"897780527132b5f31d8d73c40c820d5ef2c2279687efa69f413a61f752027747\""
]
},
{
"cell_type": "code",
"execution_count": 3,
@@ -46,11 +60,11 @@
"outputs": [],
"source": [
"# Initialize the OpenAI language model\n",
"#Replace <your_api_key> in openai_api_key=\"<your_api_key>\" with your actual OpenAI key.\n",
"# Replace <your_api_key> in openai_api_key=\"<your_api_key>\" with your actual OpenAI key.\n",
"llm = ChatOpenAI(temperature=0, model=\"gpt-3.5-turbo-0613\")\n",
"\n",
"# Initialize the SerpAPIWrapper for search functionality\n",
"#Replace <your_api_key> in openai_api_key=\"<your_api_key>\" with your actual SerpAPI key.\n",
"# Replace <your_api_key> in openai_api_key=\"<your_api_key>\" with your actual SerpAPI key.\n",
"search = SerpAPIWrapper()\n",
"\n",
"# Define a list of tools offered by the agent\n",
@@ -58,9 +72,9 @@
" Tool(\n",
" name=\"Search\",\n",
" func=search.run,\n",
" description=\"Useful when you need to answer questions about current events. You should ask targeted questions.\"\n",
" description=\"Useful when you need to answer questions about current events. You should ask targeted questions.\",\n",
" ),\n",
"]\n"
"]"
]
},
{
@@ -70,7 +84,9 @@
"metadata": {},
"outputs": [],
"source": [
"mrkl = initialize_agent(tools, llm, agent=AgentType.OPENAI_MULTI_FUNCTIONS, verbose=True)"
"mrkl = initialize_agent(\n",
" tools, llm, agent=AgentType.OPENAI_MULTI_FUNCTIONS, verbose=True\n",
")"
]
},
{
@@ -82,6 +98,7 @@
"source": [
"# Do this so we can see exactly what's going on under the hood\n",
"import langchain\n",
"\n",
"langchain.debug = True"
]
},
@@ -194,15 +211,223 @@
}
],
"source": [
"mrkl.run(\n",
" \"What is the weather in LA and SF?\"\n",
"mrkl.run(\"What is the weather in LA and SF?\")"
]
},
{
"cell_type": "markdown",
"id": "d31d4c09",
"metadata": {},
"source": [
"## Configuring max iteration behavior\n",
"\n",
"To make sure that our agent doesn't get stuck in excessively long loops, we can set max_iterations. We can also set an early stopping method, which will determine our agent's behavior once the number of max iterations is hit. By default, the early stopping uses method `force` which just returns that constant string. Alternatively, you could specify method `generate` which then does one FINAL pass through the LLM to generate an output."
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "9f5f6743",
"metadata": {},
"outputs": [],
"source": [
"mrkl = initialize_agent(\n",
" tools,\n",
" llm,\n",
" agent=AgentType.OPENAI_FUNCTIONS,\n",
" verbose=True,\n",
" max_iterations=2,\n",
" early_stopping_method=\"generate\",\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "4362ebc7",
"metadata": {
"scrolled": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\u001b[32;1m\u001b[1;3m[chain/start]\u001b[0m \u001b[1m[1:chain:AgentExecutor] Entering Chain run with input:\n",
"\u001b[0m{\n",
" \"input\": \"What is the weather in NYC today, yesterday, and the day before?\"\n",
"}\n",
"\u001b[32;1m\u001b[1;3m[llm/start]\u001b[0m \u001b[1m[1:chain:AgentExecutor > 2:llm:ChatOpenAI] Entering LLM run with input:\n",
"\u001b[0m{\n",
" \"prompts\": [\n",
" \"System: You are a helpful AI assistant.\\nHuman: What is the weather in NYC today, yesterday, and the day before?\"\n",
" ]\n",
"}\n",
"\u001b[36;1m\u001b[1;3m[llm/end]\u001b[0m \u001b[1m[1:chain:AgentExecutor > 2:llm:ChatOpenAI] [1.27s] Exiting LLM run with output:\n",
"\u001b[0m{\n",
" \"generations\": [\n",
" [\n",
" {\n",
" \"text\": \"\",\n",
" \"generation_info\": null,\n",
" \"message\": {\n",
" \"lc\": 1,\n",
" \"type\": \"constructor\",\n",
" \"id\": [\n",
" \"langchain\",\n",
" \"schema\",\n",
" \"messages\",\n",
" \"AIMessage\"\n",
" ],\n",
" \"kwargs\": {\n",
" \"content\": \"\",\n",
" \"additional_kwargs\": {\n",
" \"function_call\": {\n",
" \"name\": \"Search\",\n",
" \"arguments\": \"{\\n \\\"query\\\": \\\"weather in NYC today\\\"\\n}\"\n",
" }\n",
" }\n",
" }\n",
" }\n",
" }\n",
" ]\n",
" ],\n",
" \"llm_output\": {\n",
" \"token_usage\": {\n",
" \"prompt_tokens\": 79,\n",
" \"completion_tokens\": 17,\n",
" \"total_tokens\": 96\n",
" },\n",
" \"model_name\": \"gpt-3.5-turbo-0613\"\n",
" },\n",
" \"run\": null\n",
"}\n",
"\u001b[32;1m\u001b[1;3m[tool/start]\u001b[0m \u001b[1m[1:chain:AgentExecutor > 3:tool:Search] Entering Tool run with input:\n",
"\u001b[0m\"{'query': 'weather in NYC today'}\"\n",
"\u001b[36;1m\u001b[1;3m[tool/end]\u001b[0m \u001b[1m[1:chain:AgentExecutor > 3:tool:Search] [3.84s] Exiting Tool run with output:\n",
"\u001b[0m\"10:00 am · Feels Like85° · WindSE 4 mph · Humidity78% · UV Index3 of 11 · Cloud Cover81% · Rain Amount0 in ...\"\n",
"\u001b[32;1m\u001b[1;3m[llm/start]\u001b[0m \u001b[1m[1:chain:AgentExecutor > 4:llm:ChatOpenAI] Entering LLM run with input:\n",
"\u001b[0m{\n",
" \"prompts\": [\n",
" \"System: You are a helpful AI assistant.\\nHuman: What is the weather in NYC today, yesterday, and the day before?\\nAI: {'name': 'Search', 'arguments': '{\\\\n \\\"query\\\": \\\"weather in NYC today\\\"\\\\n}'}\\nFunction: 10:00 am · Feels Like85° · WindSE 4 mph · Humidity78% · UV Index3 of 11 · Cloud Cover81% · Rain Amount0 in ...\"\n",
" ]\n",
"}\n",
"\u001b[36;1m\u001b[1;3m[llm/end]\u001b[0m \u001b[1m[1:chain:AgentExecutor > 4:llm:ChatOpenAI] [1.24s] Exiting LLM run with output:\n",
"\u001b[0m{\n",
" \"generations\": [\n",
" [\n",
" {\n",
" \"text\": \"\",\n",
" \"generation_info\": null,\n",
" \"message\": {\n",
" \"lc\": 1,\n",
" \"type\": \"constructor\",\n",
" \"id\": [\n",
" \"langchain\",\n",
" \"schema\",\n",
" \"messages\",\n",
" \"AIMessage\"\n",
" ],\n",
" \"kwargs\": {\n",
" \"content\": \"\",\n",
" \"additional_kwargs\": {\n",
" \"function_call\": {\n",
" \"name\": \"Search\",\n",
" \"arguments\": \"{\\n \\\"query\\\": \\\"weather in NYC yesterday\\\"\\n}\"\n",
" }\n",
" }\n",
" }\n",
" }\n",
" }\n",
" ]\n",
" ],\n",
" \"llm_output\": {\n",
" \"token_usage\": {\n",
" \"prompt_tokens\": 142,\n",
" \"completion_tokens\": 17,\n",
" \"total_tokens\": 159\n",
" },\n",
" \"model_name\": \"gpt-3.5-turbo-0613\"\n",
" },\n",
" \"run\": null\n",
"}\n",
"\u001b[32;1m\u001b[1;3m[tool/start]\u001b[0m \u001b[1m[1:chain:AgentExecutor > 5:tool:Search] Entering Tool run with input:\n",
"\u001b[0m\"{'query': 'weather in NYC yesterday'}\"\n",
"\u001b[36;1m\u001b[1;3m[tool/end]\u001b[0m \u001b[1m[1:chain:AgentExecutor > 5:tool:Search] [1.15s] Exiting Tool run with output:\n",
"\u001b[0m\"New York Temperature Yesterday. Maximum temperature yesterday: 81 °F (at 1:51 pm) Minimum temperature yesterday: 72 °F (at 7:17 pm) Average temperature ...\"\n",
"\u001b[32;1m\u001b[1;3m[llm/start]\u001b[0m \u001b[1m[1:llm:ChatOpenAI] Entering LLM run with input:\n",
"\u001b[0m{\n",
" \"prompts\": [\n",
" \"System: You are a helpful AI assistant.\\nHuman: What is the weather in NYC today, yesterday, and the day before?\\nAI: {'name': 'Search', 'arguments': '{\\\\n \\\"query\\\": \\\"weather in NYC today\\\"\\\\n}'}\\nFunction: 10:00 am · Feels Like85° · WindSE 4 mph · Humidity78% · UV Index3 of 11 · Cloud Cover81% · Rain Amount0 in ...\\nAI: {'name': 'Search', 'arguments': '{\\\\n \\\"query\\\": \\\"weather in NYC yesterday\\\"\\\\n}'}\\nFunction: New York Temperature Yesterday. Maximum temperature yesterday: 81 °F (at 1:51 pm) Minimum temperature yesterday: 72 °F (at 7:17 pm) Average temperature ...\"\n",
" ]\n",
"}\n",
"\u001b[36;1m\u001b[1;3m[llm/end]\u001b[0m \u001b[1m[1:llm:ChatOpenAI] [2.68s] Exiting LLM run with output:\n",
"\u001b[0m{\n",
" \"generations\": [\n",
" [\n",
" {\n",
" \"text\": \"Today in NYC, the weather is currently 85°F with a southeast wind of 4 mph. The humidity is at 78% and there is 81% cloud cover. There is no rain expected today.\\n\\nYesterday in NYC, the maximum temperature was 81°F at 1:51 pm, and the minimum temperature was 72°F at 7:17 pm.\\n\\nFor the day before yesterday, I do not have the specific weather information.\",\n",
" \"generation_info\": null,\n",
" \"message\": {\n",
" \"lc\": 1,\n",
" \"type\": \"constructor\",\n",
" \"id\": [\n",
" \"langchain\",\n",
" \"schema\",\n",
" \"messages\",\n",
" \"AIMessage\"\n",
" ],\n",
" \"kwargs\": {\n",
" \"content\": \"Today in NYC, the weather is currently 85°F with a southeast wind of 4 mph. The humidity is at 78% and there is 81% cloud cover. There is no rain expected today.\\n\\nYesterday in NYC, the maximum temperature was 81°F at 1:51 pm, and the minimum temperature was 72°F at 7:17 pm.\\n\\nFor the day before yesterday, I do not have the specific weather information.\",\n",
" \"additional_kwargs\": {}\n",
" }\n",
" }\n",
" }\n",
" ]\n",
" ],\n",
" \"llm_output\": {\n",
" \"token_usage\": {\n",
" \"prompt_tokens\": 160,\n",
" \"completion_tokens\": 91,\n",
" \"total_tokens\": 251\n",
" },\n",
" \"model_name\": \"gpt-3.5-turbo-0613\"\n",
" },\n",
" \"run\": null\n",
"}\n",
"\u001b[36;1m\u001b[1;3m[chain/end]\u001b[0m \u001b[1m[1:chain:AgentExecutor] [10.18s] Exiting Chain run with output:\n",
"\u001b[0m{\n",
" \"output\": \"Today in NYC, the weather is currently 85°F with a southeast wind of 4 mph. The humidity is at 78% and there is 81% cloud cover. There is no rain expected today.\\n\\nYesterday in NYC, the maximum temperature was 81°F at 1:51 pm, and the minimum temperature was 72°F at 7:17 pm.\\n\\nFor the day before yesterday, I do not have the specific weather information.\"\n",
"}\n"
]
},
{
"data": {
"text/plain": [
"'Today in NYC, the weather is currently 85°F with a southeast wind of 4 mph. The humidity is at 78% and there is 81% cloud cover. There is no rain expected today.\\n\\nYesterday in NYC, the maximum temperature was 81°F at 1:51 pm, and the minimum temperature was 72°F at 7:17 pm.\\n\\nFor the day before yesterday, I do not have the specific weather information.'"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"mrkl.run(\"What is the weather in NYC today, yesterday, and the day before?\")"
]
},
{
"cell_type": "markdown",
"id": "067a8d3e",
"metadata": {},
"source": [
"Notice that we never get around to looking up the weather the day before yesterday, due to hitting our max_iterations limit."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9f5f6743",
"id": "c3318a11",
"metadata": {},
"outputs": [],
"source": []
@@ -210,9 +435,9 @@
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"display_name": "venv",
"language": "python",
"name": "python3"
"name": "venv"
},
"language_info": {
"codemirror_mode": {
@@ -224,7 +449,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
"version": "3.11.3"
}
},
"nbformat": 4,

View File

@@ -78,6 +78,7 @@
"source": [
"from langchain.prompts import MessagesPlaceholder\n",
"from langchain.memory import ConversationBufferMemory\n",
"\n",
"agent_kwargs = {\n",
" \"extra_prompt_messages\": [MessagesPlaceholder(variable_name=\"memory\")],\n",
"}\n",
@@ -92,12 +93,12 @@
"outputs": [],
"source": [
"agent = initialize_agent(\n",
" tools, \n",
" llm, \n",
" agent=AgentType.OPENAI_FUNCTIONS, \n",
" verbose=True, \n",
" agent_kwargs=agent_kwargs, \n",
" memory=memory\n",
" tools,\n",
" llm,\n",
" agent=AgentType.OPENAI_FUNCTIONS,\n",
" verbose=True,\n",
" agent_kwargs=agent_kwargs,\n",
" memory=memory,\n",
")"
]
},

View File

@@ -42,15 +42,14 @@
"import yfinance as yf\n",
"from datetime import datetime, timedelta\n",
"\n",
"\n",
"def get_current_stock_price(ticker):\n",
" \"\"\"Method to get current stock price\"\"\"\n",
"\n",
" ticker_data = yf.Ticker(ticker)\n",
" recent = ticker_data.history(period='1d')\n",
" return {\n",
" 'price': recent.iloc[0]['Close'],\n",
" 'currency': ticker_data.info['currency']\n",
" }\n",
" recent = ticker_data.history(period=\"1d\")\n",
" return {\"price\": recent.iloc[0][\"Close\"], \"currency\": ticker_data.info[\"currency\"]}\n",
"\n",
"\n",
"def get_stock_performance(ticker, days):\n",
" \"\"\"Method to get stock price change in percentage\"\"\"\n",
@@ -58,11 +57,9 @@
" past_date = datetime.today() - timedelta(days=days)\n",
" ticker_data = yf.Ticker(ticker)\n",
" history = ticker_data.history(start=past_date)\n",
" old_price = history.iloc[0]['Close']\n",
" current_price = history.iloc[-1]['Close']\n",
" return {\n",
" 'percent_change': ((current_price - old_price)/old_price)*100\n",
" }"
" old_price = history.iloc[0][\"Close\"]\n",
" current_price = history.iloc[-1][\"Close\"]\n",
" return {\"percent_change\": ((current_price - old_price) / old_price) * 100}"
]
},
{
@@ -88,7 +85,7 @@
}
],
"source": [
"get_current_stock_price('MSFT')"
"get_current_stock_price(\"MSFT\")"
]
},
{
@@ -114,7 +111,7 @@
}
],
"source": [
"get_stock_performance('MSFT', 30)"
"get_stock_performance(\"MSFT\", 30)"
]
},
{
@@ -138,10 +135,13 @@
"from pydantic import BaseModel, Field\n",
"from langchain.tools import BaseTool\n",
"\n",
"\n",
"class CurrentStockPriceInput(BaseModel):\n",
" \"\"\"Inputs for get_current_stock_price\"\"\"\n",
"\n",
" ticker: str = Field(description=\"Ticker symbol of the stock\")\n",
"\n",
"\n",
"class CurrentStockPriceTool(BaseTool):\n",
" name = \"get_current_stock_price\"\n",
" description = \"\"\"\n",
@@ -160,8 +160,10 @@
"\n",
"class StockPercentChangeInput(BaseModel):\n",
" \"\"\"Inputs for get_stock_performance\"\"\"\n",
"\n",
" ticker: str = Field(description=\"Ticker symbol of the stock\")\n",
" days: int = Field(description='Timedelta days to get past date from current date')\n",
" days: int = Field(description=\"Timedelta days to get past date from current date\")\n",
"\n",
"\n",
"class StockPerformanceTool(BaseTool):\n",
" name = \"get_stock_performance\"\n",
@@ -202,15 +204,9 @@
"from langchain.chat_models import ChatOpenAI\n",
"from langchain.agents import initialize_agent\n",
"\n",
"llm = ChatOpenAI(\n",
" model=\"gpt-3.5-turbo-0613\",\n",
" temperature=0\n",
")\n",
"llm = ChatOpenAI(model=\"gpt-3.5-turbo-0613\", temperature=0)\n",
"\n",
"tools = [\n",
" CurrentStockPriceTool(),\n",
" StockPerformanceTool()\n",
"]\n",
"tools = [CurrentStockPriceTool(), StockPerformanceTool()]\n",
"\n",
"agent = initialize_agent(tools, llm, agent=AgentType.OPENAI_FUNCTIONS, verbose=True)"
]
@@ -261,7 +257,9 @@
}
],
"source": [
"agent.run(\"What is the current price of Microsoft stock? How it has performed over past 6 months?\")"
"agent.run(\n",
" \"What is the current price of Microsoft stock? How it has performed over past 6 months?\"\n",
")"
]
},
{
@@ -355,7 +353,9 @@
}
],
"source": [
"agent.run('In the past 3 months, which stock between Microsoft and Google has performed the best?')"
"agent.run(\n",
" \"In the past 3 months, which stock between Microsoft and Google has performed the best?\"\n",
")"
]
}
],

View File

@@ -79,10 +79,10 @@
"source": [
"llm = ChatOpenAI(temperature=0, model=\"gpt-3.5-turbo-0613\")\n",
"agent = initialize_agent(\n",
" toolkit.get_tools(), \n",
" llm, \n",
" agent=AgentType.OPENAI_FUNCTIONS, \n",
" verbose=True, \n",
" toolkit.get_tools(),\n",
" llm,\n",
" agent=AgentType.OPENAI_FUNCTIONS,\n",
" verbose=True,\n",
" agent_kwargs=agent_kwargs,\n",
")"
]

View File

@@ -17,16 +17,7 @@
"execution_count": 1,
"id": "8632a37c",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/Users/harrisonchase/.pyenv/versions/3.9.1/envs/langchain/lib/python3.9/site-packages/deeplake/util/check_latest_version.py:32: UserWarning: A newer version of deeplake (3.6.5) is available. It's recommended that you update to the latest version using `pip install -U deeplake`.\n",
" warnings.warn(\n"
]
}
],
"outputs": [],
"source": [
"from pydantic import BaseModel, Field\n",
"\n",
@@ -56,14 +47,14 @@
"files = [\n",
" # https://abc.xyz/investor/static/pdf/2023Q1_alphabet_earnings_release.pdf\n",
" {\n",
" \"name\": \"alphabet-earnings\", \n",
" \"name\": \"alphabet-earnings\",\n",
" \"path\": \"/Users/harrisonchase/Downloads/2023Q1_alphabet_earnings_release.pdf\",\n",
" }, \n",
" },\n",
" # https://digitalassets.tesla.com/tesla-contents/image/upload/IR/TSLA-Q1-2023-Update\n",
" {\n",
" \"name\": \"tesla-earnings\", \n",
" \"path\": \"/Users/harrisonchase/Downloads/TSLA-Q1-2023-Update.pdf\"\n",
" }\n",
" \"name\": \"tesla-earnings\",\n",
" \"path\": \"/Users/harrisonchase/Downloads/TSLA-Q1-2023-Update.pdf\",\n",
" },\n",
"]\n",
"\n",
"for file in files:\n",
@@ -73,14 +64,14 @@
" docs = text_splitter.split_documents(pages)\n",
" embeddings = OpenAIEmbeddings()\n",
" retriever = FAISS.from_documents(docs, embeddings).as_retriever()\n",
" \n",
"\n",
" # Wrap retrievers in a Tool\n",
" tools.append(\n",
" Tool(\n",
" args_schema=DocumentInput,\n",
" name=file[\"name\"], \n",
" name=file[\"name\"],\n",
" description=f\"useful when you want to answer questions about {file['name']}\",\n",
" func=RetrievalQA.from_chain_type(llm=llm, retriever=retriever)\n",
" func=RetrievalQA.from_chain_type(llm=llm, retriever=retriever),\n",
" )\n",
" )"
]
@@ -139,7 +130,7 @@
"source": [
"llm = ChatOpenAI(\n",
" temperature=0,\n",
" model=\"gpt-3.5-turbo-0613\", \n",
" model=\"gpt-3.5-turbo-0613\",\n",
")\n",
"\n",
"agent = initialize_agent(\n",
@@ -170,6 +161,7 @@
"outputs": [],
"source": [
"import langchain\n",
"\n",
"langchain.debug = True"
]
},
@@ -405,7 +397,7 @@
"source": [
"llm = ChatOpenAI(\n",
" temperature=0,\n",
" model=\"gpt-3.5-turbo-0613\", \n",
" model=\"gpt-3.5-turbo-0613\",\n",
")\n",
"\n",
"agent = initialize_agent(\n",

View File

@@ -136,9 +136,11 @@
}
],
"source": [
"agent.run(\"Create an email draft for me to edit of a letter from the perspective of a sentient parrot\"\n",
" \" who is looking to collaborate on some research with her\"\n",
" \" estranged friend, a cat. Under no circumstances may you send the message, however.\")"
"agent.run(\n",
" \"Create an email draft for me to edit of a letter from the perspective of a sentient parrot\"\n",
" \" who is looking to collaborate on some research with her\"\n",
" \" estranged friend, a cat. Under no circumstances may you send the message, however.\"\n",
")"
]
},
{
@@ -160,7 +162,9 @@
}
],
"source": [
"agent.run(\"Could you search in my drafts folder and let me know if any of them are about collaboration?\")"
"agent.run(\n",
" \"Could you search in my drafts folder and let me know if any of them are about collaboration?\"\n",
")"
]
},
{
@@ -190,7 +194,9 @@
}
],
"source": [
"agent.run(\"Can you schedule a 30 minute meeting with a sentient parrot to discuss research collaborations on October 3, 2023 at 2 pm Easter Time?\")"
"agent.run(\n",
" \"Can you schedule a 30 minute meeting with a sentient parrot to discuss research collaborations on October 3, 2023 at 2 pm Easter Time?\"\n",
")"
]
},
{
@@ -210,7 +216,9 @@
}
],
"source": [
"agent.run(\"Can you tell me if I have any events on October 3, 2023 in Eastern Time, and if so, tell me if any of them are with a sentient parrot?\")"
"agent.run(\n",
" \"Can you tell me if I have any events on October 3, 2023 in Eastern Time, and if so, tell me if any of them are with a sentient parrot?\"\n",
")"
]
}
],

View File

@@ -34,6 +34,7 @@
"outputs": [],
"source": [
"import os\n",
"\n",
"os.environ[\"DATAFORSEO_LOGIN\"] = \"your_api_access_username\"\n",
"os.environ[\"DATAFORSEO_PASSWORD\"] = \"your_api_access_password\"\n",
"\n",
@@ -88,7 +89,8 @@
"json_wrapper = DataForSeoAPIWrapper(\n",
" json_result_types=[\"organic\", \"knowledge_graph\", \"answer_box\"],\n",
" json_result_fields=[\"type\", \"title\", \"description\", \"text\"],\n",
" top_count=3)"
" top_count=3,\n",
")"
]
},
{
@@ -119,7 +121,8 @@
" top_count=10,\n",
" json_result_types=[\"organic\", \"local_pack\"],\n",
" json_result_fields=[\"title\", \"description\", \"type\"],\n",
" params={\"location_name\": \"Germany\", \"language_code\": \"en\"})\n",
" params={\"location_name\": \"Germany\", \"language_code\": \"en\"},\n",
")\n",
"customized_wrapper.results(\"coffee near me\")"
]
},
@@ -142,7 +145,8 @@
" top_count=10,\n",
" json_result_types=[\"organic\", \"local_pack\"],\n",
" json_result_fields=[\"title\", \"description\", \"type\"],\n",
" params={\"location_name\": \"Germany\", \"language_code\": \"en\", \"se_name\": \"bing\"})\n",
" params={\"location_name\": \"Germany\", \"language_code\": \"en\", \"se_name\": \"bing\"},\n",
")\n",
"customized_wrapper.results(\"coffee near me\")"
]
},
@@ -164,7 +168,12 @@
"maps_search = DataForSeoAPIWrapper(\n",
" top_count=10,\n",
" json_result_fields=[\"title\", \"value\", \"address\", \"rating\", \"type\"],\n",
" params={\"location_coordinate\": \"52.512,13.36,12z\", \"language_code\": \"en\", \"se_type\": \"maps\"})\n",
" params={\n",
" \"location_coordinate\": \"52.512,13.36,12z\",\n",
" \"language_code\": \"en\",\n",
" \"se_type\": \"maps\",\n",
" },\n",
")\n",
"maps_search.results(\"coffee near me\")"
]
},
@@ -184,10 +193,12 @@
"outputs": [],
"source": [
"from langchain.agents import Tool\n",
"\n",
"search = DataForSeoAPIWrapper(\n",
" top_count=3,\n",
" json_result_types=[\"organic\"],\n",
" json_result_fields=[\"title\", \"description\", \"type\"])\n",
" json_result_fields=[\"title\", \"description\", \"type\"],\n",
")\n",
"tool = Tool(\n",
" name=\"google-search-answer\",\n",
" description=\"My new answer tool\",\n",

View File

@@ -0,0 +1,233 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"id": "16763ed3",
"metadata": {},
"source": [
"# Lemon AI NLP Workflow Automation\n",
"\\\n",
"Full docs are available at: https://github.com/felixbrock/lemonai-py-client\n",
"\n",
"**Lemon AI helps you build powerful AI assistants in minutes and automate workflows by allowing for accurate and reliable read and write operations in tools like Airtable, Hubspot, Discord, Notion, Slack and Github.**\n",
"\n",
"Most connectors available today are focused on read-only operations, limiting the potential of LLMs. Agents, on the other hand, have a tendency to hallucinate from time to time due to missing context or instructions.\n",
"\n",
"With Lemon AI, it is possible to give your agents access to well-defined APIs for reliable read and write operations. In addition, Lemon AI functions allow you to further reduce the risk of hallucinations by providing a way to statically define workflows that the model can rely on in case of uncertainty."
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "4881b484-1b97-478f-b206-aec407ceff66",
"metadata": {},
"source": [
"## Quick Start\n",
"\n",
"The following quick start demonstrates how to use Lemon AI in combination with Agents to automate workflows that involve interaction with internal tooling."
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "ff91b41a",
"metadata": {},
"source": [
"### 1. Install Lemon AI\n",
"\n",
"Requires Python 3.8.1 and above.\n",
"\n",
"To use Lemon AI in your Python project run `pip install lemonai`\n",
"\n",
"This will install the corresponding Lemon AI client which you can then import into your script.\n",
"\n",
"The tool uses Python packages langchain and loguru. In case of any installation errors with Lemon AI, install both packages first and then install the Lemon AI package."
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "340ff63d",
"metadata": {},
"source": [
"### 2. Launch the Server\n",
"\n",
"The interaction of your agents and all tools provided by Lemon AI is handled by the [Lemon AI Server](https://github.com/felixbrock/lemonai-server). To use Lemon AI you need to run the server on your local machine so the Lemon AI Python client can connect to it."
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "e845f402",
"metadata": {},
"source": [
"### 3. Use Lemon AI with Langchain"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "d3ae6a82",
"metadata": {},
"source": [
"Lemon AI automatically solves given tasks by finding the right combination of relevant tools or uses Lemon AI Functions as an alternative. The following example demonstrates how to retrieve a user from Hackernews and write it to a table in Airtable:"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "43476a22",
"metadata": {},
"source": [
"#### (Optional) Define your Lemon AI Functions"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "cb038670",
"metadata": {},
"source": [
"Similar to [OpenAI functions](https://openai.com/blog/function-calling-and-other-api-updates), Lemon AI provides the option to define workflows as reusable functions. These functions can be defined for use cases where it is especially important to move as close as possible to near-deterministic behavior. Specific workflows can be defined in a separate lemonai.json:"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "e423ebbb",
"metadata": {},
"source": [
"```json\n",
"[\n",
" {\n",
" \"name\": \"Hackernews Airtable User Workflow\",\n",
" \"description\": \"retrieves user data from Hackernews and appends it to a table in Airtable\",\n",
" \"tools\": [\"hackernews-get-user\", \"airtable-append-data\"]\n",
" }\n",
"]\n",
"```"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "3fdb36ce",
"metadata": {},
"source": [
"Your model will have access to these functions and will prefer them over self-selecting tools to solve a given task. All you have to do is to let the agent know that it should use a given function by including the function name in the prompt."
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "ebfb8b5d",
"metadata": {},
"source": [
"#### Include Lemon AI in your Langchain project "
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "5318715d",
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"from lemonai import execute_workflow\n",
"from langchain import OpenAI"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "c9d082cb",
"metadata": {},
"source": [
"#### Load API Keys and Access Tokens\n",
"\n",
"To use tools that require authentication, you have to store the corresponding access credentials in your environment in the format \"{tool name}_{authentication string}\" where the authentication string is one of [\"API_KEY\", \"SECRET_KEY\", \"SUBSCRIPTION_KEY\", \"ACCESS_KEY\"] for API keys or [\"ACCESS_TOKEN\", \"SECRET_TOKEN\"] for authentication tokens. Examples are \"OPENAI_API_KEY\", \"BING_SUBSCRIPTION_KEY\", \"AIRTABLE_ACCESS_TOKEN\"."
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "a370d999",
"metadata": {},
"outputs": [],
"source": [
"\"\"\" Load all relevant API Keys and Access Tokens into your environment variables \"\"\"\n",
"os.environ[\"OPENAI_API_KEY\"] = \"*INSERT OPENAI API KEY HERE*\"\n",
"os.environ[\"AIRTABLE_ACCESS_TOKEN\"] = \"*INSERT AIRTABLE TOKEN HERE*\""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "38d158e7",
"metadata": {},
"outputs": [],
"source": [
"hackernews_username = \"*INSERT HACKERNEWS USERNAME HERE*\"\n",
"airtable_base_id = \"*INSERT BASE ID HERE*\"\n",
"airtable_table_id = \"*INSERT TABLE ID HERE*\"\n",
"\n",
"\"\"\" Define your instruction to be given to your LLM \"\"\"\n",
"prompt = f\"\"\"Read information from Hackernews for user {hackernews_username} and then write the results to\n",
"Airtable (baseId: {airtable_base_id}, tableId: {airtable_table_id}). Only write the fields \"username\", \"karma\"\n",
"and \"created_at_i\". Please make sure that Airtable does NOT automatically convert the field types.\n",
"\"\"\"\n",
"\n",
"\"\"\"\n",
"Use the Lemon AI execute_workflow wrapper \n",
"to run your Langchain agent in combination with Lemon AI \n",
"\"\"\"\n",
"model = OpenAI(temperature=0)\n",
"\n",
"execute_workflow(llm=model, prompt_string=prompt)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "aef3e801",
"metadata": {},
"source": [
"### 4. Gain transparency on your Agent's decision making\n",
"\n",
"To gain transparency on how your Agent interacts with Lemon AI tools to solve a given task, all decisions made, tools used and operations performed are written to a local `lemonai.log` file. Every time your LLM agent is interacting with the Lemon AI tool stack a corresponding log entry is created.\n",
"\n",
"```log\n",
"2023-06-26T11:50:27.708785+0100 - b5f91c59-8487-45c2-800a-156eac0c7dae - hackernews-get-user\n",
"2023-06-26T11:50:39.624035+0100 - b5f91c59-8487-45c2-800a-156eac0c7dae - airtable-append-data\n",
"2023-06-26T11:58:32.925228+0100 - 5efe603c-9898-4143-b99a-55b50007ed9d - hackernews-get-user\n",
"2023-06-26T11:58:43.988788+0100 - 5efe603c-9898-4143-b99a-55b50007ed9d - airtable-append-data\n",
"```\n",
"\n",
"By using the [Lemon AI Analytics Tool](https://github.com/felixbrock/lemonai-analytics) you can easily gain a better understanding of how frequently and in which order tools are used. As a result, you can identify weak spots in your agents decision-making capabilities and move to a more deterministic behavior by defining Lemon AI functions."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -90,7 +90,12 @@
"metadata": {},
"outputs": [],
"source": [
"search.results(\"The best blog post about AI safety is definitely this: \", 10, include_domains=[\"lesswrong.com\"], start_published_date=\"2019-01-01\")"
"search.results(\n",
" \"The best blog post about AI safety is definitely this: \",\n",
" 10,\n",
" include_domains=[\"lesswrong.com\"],\n",
" start_published_date=\"2019-01-01\",\n",
")"
]
},
{

View File

@@ -341,7 +341,7 @@
"outputs": [],
"source": [
"llm = OpenAI(temperature=0)\n",
"zapier = ZapierNLAWrapper(zapier_nla_oauth_access_token='<fill in access token here>')\n",
"zapier = ZapierNLAWrapper(zapier_nla_oauth_access_token=\"<fill in access token here>\")\n",
"toolkit = ZapierToolkit.from_zapier_nla_wrapper(zapier)\n",
"agent = initialize_agent(\n",
" toolkit.get_tools(), llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True\n",

View File

@@ -0,0 +1,220 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# Context\n",
"\n",
"![Context - Product Analytics for AI Chatbots](https://go.getcontext.ai/langchain.png)\n",
"\n",
"[Context](https://getcontext.ai/) provides product analytics for AI chatbots.\n",
"\n",
"Context helps you understand how users are interacting with your AI chat products.\n",
"Gain critical insights, optimise poor experiences, and minimise brand risks.\n"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"In this guide we will show you how to integrate with Context."
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"tags": []
},
"source": [
"## Installation and Setup"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"vscode": {
"languageId": "shellscript"
}
},
"outputs": [],
"source": [
"$ pip install context-python --upgrade"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"### Getting API Credentials\n",
"\n",
"To get your Context API token:\n",
"\n",
"1. Go to the settings page within your Context account (https://go.getcontext.ai/settings).\n",
"2. Generate a new API Token.\n",
"3. Store this token somewhere secure."
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"### Setup Context\n",
"\n",
"To use the `ContextCallbackHandler`, import the handler from Langchain and instantiate it with your Context API token.\n",
"\n",
"Ensure you have installed the `context-python` package before using the handler."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"\n",
"from langchain.callbacks import ContextCallbackHandler\n",
"\n",
"token = os.environ[\"CONTEXT_API_TOKEN\"]\n",
"\n",
"context_callback = ContextCallbackHandler(token)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Usage\n",
"### Using the Context callback within a Chat Model\n",
"\n",
"The Context callback handler can be used to directly record transcripts between users and AI assistants.\n",
"\n",
"#### Example"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"\n",
"from langchain.chat_models import ChatOpenAI\n",
"from langchain.schema import (\n",
" SystemMessage,\n",
" HumanMessage,\n",
")\n",
"from langchain.callbacks import ContextCallbackHandler\n",
"\n",
"token = os.environ[\"CONTEXT_API_TOKEN\"]\n",
"\n",
"chat = ChatOpenAI(\n",
" headers={\"user_id\": \"123\"}, temperature=0, callbacks=[ContextCallbackHandler(token)]\n",
")\n",
"\n",
"messages = [\n",
" SystemMessage(\n",
" content=\"You are a helpful assistant that translates English to French.\"\n",
" ),\n",
" HumanMessage(content=\"I love programming.\"),\n",
"]\n",
"\n",
"print(chat(messages))"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"### Using the Context callback within Chains\n",
"\n",
"The Context callback handler can also be used to record the inputs and outputs of chains. Note that intermediate steps of the chain are not recorded - only the starting inputs and final outputs.\n",
"\n",
"__Note:__ Ensure that you pass the same context object to the chat model and the chain.\n",
"\n",
"Wrong:\n",
"> ```python\n",
"> chat = ChatOpenAI(temperature=0.9, callbacks=[ContextCallbackHandler(token)])\n",
"> chain = LLMChain(llm=chat, prompt=chat_prompt_template, callbacks=[ContextCallbackHandler(token)])\n",
"> ```\n",
"\n",
"Correct:\n",
">```python\n",
">handler = ContextCallbackHandler(token)\n",
">chat = ChatOpenAI(temperature=0.9, callbacks=[callback])\n",
">chain = LLMChain(llm=chat, prompt=chat_prompt_template, callbacks=[callback])\n",
">```\n",
"\n",
"#### Example"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"\n",
"from langchain.chat_models import ChatOpenAI\n",
"from langchain import LLMChain\n",
"from langchain.prompts import PromptTemplate\n",
"from langchain.prompts.chat import (\n",
" ChatPromptTemplate,\n",
" HumanMessagePromptTemplate,\n",
")\n",
"from langchain.callbacks import ContextCallbackHandler\n",
"\n",
"token = os.environ[\"CONTEXT_API_TOKEN\"]\n",
"\n",
"human_message_prompt = HumanMessagePromptTemplate(\n",
" prompt=PromptTemplate(\n",
" template=\"What is a good name for a company that makes {product}?\",\n",
" input_variables=[\"product\"],\n",
" )\n",
")\n",
"chat_prompt_template = ChatPromptTemplate.from_messages([human_message_prompt])\n",
"callback = ContextCallbackHandler(token)\n",
"chat = ChatOpenAI(temperature=0.9, callbacks=[callback])\n",
"chain = LLMChain(llm=chat, prompt=chat_prompt_template, callbacks=[callback])\n",
"print(chain.run(\"colorful socks\"))"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.3"
},
"vscode": {
"interpreter": {
"hash": "a53ebf4a859167383b364e7e7521d0add3c2dbbdecce4edf676e8c4634ff3fbb"
}
}
},
"nbformat": 4,
"nbformat_minor": 4
}

View File

@@ -57,6 +57,7 @@
"\n",
"# Remove the (1) import sys and sys.path.append(..) and (2) uncomment `!pip install langchain` after merging the PR for Infino/LangChain integration.\n",
"import sys\n",
"\n",
"sys.path.append(\"../../../../../langchain\")\n",
"#!pip install langchain\n",
"\n",
@@ -120,9 +121,9 @@
"metadata": {},
"outputs": [],
"source": [
"# These are a subset of questions from Stanford's QA dataset - \n",
"# These are a subset of questions from Stanford's QA dataset -\n",
"# https://rajpurkar.github.io/SQuAD-explorer/\n",
"data = '''In what country is Normandy located?\n",
"data = \"\"\"In what country is Normandy located?\n",
"When were the Normans in Normandy?\n",
"From which countries did the Norse originate?\n",
"Who was the Norse leader?\n",
@@ -141,9 +142,9 @@
"What principality did William the conquerer found?\n",
"What is the original meaning of the word Norman?\n",
"When was the Latin version of the word Norman first recorded?\n",
"What name comes from the English words Normans/Normanz?'''\n",
"What name comes from the English words Normans/Normanz?\"\"\"\n",
"\n",
"questions = data.split('\\n')"
"questions = data.split(\"\\n\")"
]
},
{
@@ -190,10 +191,12 @@
],
"source": [
"# Set your key here.\n",
"#os.environ[\"OPENAI_API_KEY\"] = \"YOUR_API_KEY\"\n",
"# os.environ[\"OPENAI_API_KEY\"] = \"YOUR_API_KEY\"\n",
"\n",
"# Create callback handler. This logs latency, errors, token usage, prompts as well as prompt responses to Infino.\n",
"handler = InfinoCallbackHandler(model_id=\"test_openai\", model_version=\"0.1\", verbose=False)\n",
"handler = InfinoCallbackHandler(\n",
" model_id=\"test_openai\", model_version=\"0.1\", verbose=False\n",
")\n",
"\n",
"# Create LLM.\n",
"llm = OpenAI(temperature=0.1)\n",
@@ -281,29 +284,30 @@
"source": [
"# Helper function to create a graph using matplotlib.\n",
"def plot(data, title):\n",
" data = json.loads(data)\n",
" data = json.loads(data)\n",
"\n",
" # Extract x and y values from the data\n",
" timestamps = [item[\"time\"] for item in data]\n",
" dates=[dt.datetime.fromtimestamp(ts) for ts in timestamps]\n",
" y = [item[\"value\"] for item in data]\n",
" # Extract x and y values from the data\n",
" timestamps = [item[\"time\"] for item in data]\n",
" dates = [dt.datetime.fromtimestamp(ts) for ts in timestamps]\n",
" y = [item[\"value\"] for item in data]\n",
"\n",
" plt.rcParams['figure.figsize'] = [6, 4]\n",
" plt.subplots_adjust(bottom=0.2)\n",
" plt.xticks(rotation=25 )\n",
" ax=plt.gca()\n",
" xfmt = md.DateFormatter('%Y-%m-%d %H:%M:%S')\n",
" ax.xaxis.set_major_formatter(xfmt)\n",
" \n",
" # Create the plot\n",
" plt.plot(dates, y)\n",
" plt.rcParams[\"figure.figsize\"] = [6, 4]\n",
" plt.subplots_adjust(bottom=0.2)\n",
" plt.xticks(rotation=25)\n",
" ax = plt.gca()\n",
" xfmt = md.DateFormatter(\"%Y-%m-%d %H:%M:%S\")\n",
" ax.xaxis.set_major_formatter(xfmt)\n",
"\n",
" # Set labels and title\n",
" plt.xlabel(\"Time\")\n",
" plt.ylabel(\"Value\")\n",
" plt.title(title)\n",
" # Create the plot\n",
" plt.plot(dates, y)\n",
"\n",
" # Set labels and title\n",
" plt.xlabel(\"Time\")\n",
" plt.ylabel(\"Value\")\n",
" plt.title(title)\n",
"\n",
" plt.show()\n",
"\n",
" plt.show()\n",
"\n",
"response = client.search_ts(\"__name__\", \"latency\", 0, int(time.time()))\n",
"plot(response.text, \"Latency\")\n",
@@ -318,7 +322,7 @@
"plot(response.text, \"Completion Tokens\")\n",
"\n",
"response = client.search_ts(\"__name__\", \"total_tokens\", 0, int(time.time()))\n",
"plot(response.text, \"Total Tokens\")\n"
"plot(response.text, \"Total Tokens\")"
]
},
{
@@ -356,7 +360,7 @@
"\n",
"query = \"king charles III\"\n",
"response = client.search_log(\"king charles III\", 0, int(time.time()))\n",
"print(\"Results for\", query, \":\", response.text)\n"
"print(\"Results for\", query, \":\", response.text)"
]
},
{

View File

@@ -9,7 +9,7 @@
In this guide we will demonstrate how to use `StreamlitCallbackHandler` to display the thoughts and actions of an agent in an
interactive Streamlit app. Try it out with the running app below using the [MRKL agent](/docs/modules/agents/how_to/mrkl/):
<iframe loading="lazy" src="https://mrkl-minimal.streamlit.app/?embed=true&embed_options=light_theme"
<iframe loading="lazy" src="https://langchain-mrkl.streamlit.app/?embed=true&embed_options=light_theme"
style={{ width: 100 + '%', border: 'none', marginBottom: 1 + 'rem', height: 600 }}
allow="camera;clipboard-read;clipboard-write;"
></iframe>
@@ -35,7 +35,7 @@ st_callback = StreamlitCallbackHandler(st.container())
```
Additional keyword arguments to customize the display behavior are described in the
[API reference](https://api.python.langchain.com/en/latest/modules/callbacks.html#langchain.callbacks.StreamlitCallbackHandler).
[API reference](https://api.python.langchain.com/en/latest/callbacks/langchain.callbacks.streamlit.streamlit_callback_handler.StreamlitCallbackHandler.html).
### Scenario 1: Using an Agent with Tools

View File

@@ -0,0 +1,921 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"id": "82f3f65d-fbcb-4e8e-b04b-959856283643",
"metadata": {},
"source": [
"# Causal program-aided language (CPAL) chain\n",
"\n",
"The CPAL chain builds on the recent PAL to stop LLM hallucination. The problem with the PAL approach is that it hallucinates on a math problem with a nested chain of dependence. The innovation here is that this new CPAL approach includes causal structure to fix hallucination.\n",
"\n",
"The original [PR's description](https://github.com/hwchase17/langchain/pull/6255) contains a full overview.\n",
"\n",
"Using the CPAL chain, the LLM translated this\n",
"\n",
" \"Tim buys the same number of pets as Cindy and Boris.\"\n",
" \"Cindy buys the same number of pets as Bill plus Bob.\"\n",
" \"Boris buys the same number of pets as Ben plus Beth.\"\n",
" \"Bill buys the same number of pets as Obama.\"\n",
" \"Bob buys the same number of pets as Obama.\"\n",
" \"Ben buys the same number of pets as Obama.\"\n",
" \"Beth buys the same number of pets as Obama.\"\n",
" \"If Obama buys one pet, how many pets total does everyone buy?\"\n",
"\n",
"\n",
"into this\n",
"\n",
"![complex-graph.png](/img/cpal_diagram.png).\n",
"\n",
"Outline of code examples demoed in this notebook.\n",
"\n",
"1. CPAL's value against hallucination: CPAL vs PAL \n",
" 1.1 Complex narrative \n",
" 1.2 Unanswerable math word problem \n",
"2. CPAL's three types of causal diagrams ([The Book of Why](https://en.wikipedia.org/wiki/The_Book_of_Why)). \n",
" 2.1 Mediator \n",
" 2.2 Collider \n",
" 2.3 Confounder "
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "1370e40f",
"metadata": {},
"outputs": [],
"source": [
"from IPython.display import SVG\n",
"\n",
"from langchain.experimental.cpal.base import CPALChain\n",
"from langchain.chains import PALChain\n",
"from langchain import OpenAI\n",
"\n",
"llm = OpenAI(temperature=0, max_tokens=512)\n",
"cpal_chain = CPALChain.from_univariate_prompt(llm=llm, verbose=True)\n",
"pal_chain = PALChain.from_math_prompt(llm=llm, verbose=True)"
]
},
{
"cell_type": "markdown",
"id": "858a87d9-a9bd-4850-9687-9af4b0856b62",
"metadata": {},
"source": [
"## CPAL's value against hallucination: CPAL vs PAL\n",
"\n",
"Like PAL, CPAL intends to reduce large language model (LLM) hallucination.\n",
"\n",
"The CPAL chain is different from the PAL chain for a couple of reasons.\n",
"\n",
"CPAL adds a causal structure (or DAG) to link entity actions (or math expressions).\n",
"The CPAL math expressions are modeling a chain of cause and effect relations, which can be intervened upon, whereas for the PAL chain math expressions are projected math identities.\n"
]
},
{
"cell_type": "markdown",
"id": "496403c5-d268-43ae-8852-2bd9903ce444",
"metadata": {},
"source": [
"### 1.1 Complex narrative\n",
"\n",
"Takeaway: PAL hallucinates, CPAL does not hallucinate."
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "d5dad768-2892-4825-8093-9b840f643a8a",
"metadata": {},
"outputs": [],
"source": [
"question = (\n",
" \"Tim buys the same number of pets as Cindy and Boris.\"\n",
" \"Cindy buys the same number of pets as Bill plus Bob.\"\n",
" \"Boris buys the same number of pets as Ben plus Beth.\"\n",
" \"Bill buys the same number of pets as Obama.\"\n",
" \"Bob buys the same number of pets as Obama.\"\n",
" \"Ben buys the same number of pets as Obama.\"\n",
" \"Beth buys the same number of pets as Obama.\"\n",
" \"If Obama buys one pet, how many pets total does everyone buy?\"\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "bbffa7a0-3c22-4a1d-ab2d-f230973073b0",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3mdef solution():\n",
" \"\"\"Tim buys the same number of pets as Cindy and Boris.Cindy buys the same number of pets as Bill plus Bob.Boris buys the same number of pets as Ben plus Beth.Bill buys the same number of pets as Obama.Bob buys the same number of pets as Obama.Ben buys the same number of pets as Obama.Beth buys the same number of pets as Obama.If Obama buys one pet, how many pets total does everyone buy?\"\"\"\n",
" obama_pets = 1\n",
" tim_pets = obama_pets\n",
" cindy_pets = obama_pets + obama_pets\n",
" boris_pets = obama_pets + obama_pets\n",
" total_pets = tim_pets + cindy_pets + boris_pets\n",
" result = total_pets\n",
" return result\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"'5'"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pal_chain.run(question)"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "35a70d1d-86f8-4abc-b818-fbd083f072e9",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3mstory outcome data\n",
" name code value depends_on\n",
"0 obama pass 1.0 []\n",
"1 bill bill.value = obama.value 1.0 [obama]\n",
"2 bob bob.value = obama.value 1.0 [obama]\n",
"3 ben ben.value = obama.value 1.0 [obama]\n",
"4 beth beth.value = obama.value 1.0 [obama]\n",
"5 cindy cindy.value = bill.value + bob.value 2.0 [bill, bob]\n",
"6 boris boris.value = ben.value + beth.value 2.0 [ben, beth]\n",
"7 tim tim.value = cindy.value + boris.value 4.0 [cindy, boris]\u001b[0m\n",
"\n",
"\u001b[36;1m\u001b[1;3mquery data\n",
"{\n",
" \"question\": \"how many pets total does everyone buy?\",\n",
" \"expression\": \"SELECT SUM(value) FROM df\",\n",
" \"llm_error_msg\": \"\"\n",
"}\u001b[0m\n",
"\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"13.0"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cpal_chain.run(question)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "ccb6b2b0-9de6-4f66-a8fb-fc59229ee316",
"metadata": {},
"outputs": [
{
"data": {
"image/svg+xml": [
"<svg xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\" width=\"292pt\" height=\"260pt\" viewBox=\"0.00 0.00 292.00 260.00\">\n",
"<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 256)\">\n",
"<polygon fill=\"white\" stroke=\"transparent\" points=\"-4,4 -4,-256 288,-256 288,4 -4,4\"/>\n",
"<!-- obama -->\n",
"<g id=\"node1\" class=\"node\">\n",
"<title>obama</title>\n",
"<ellipse fill=\"none\" stroke=\"black\" cx=\"137\" cy=\"-234\" rx=\"41.69\" ry=\"18\"/>\n",
"<text text-anchor=\"middle\" x=\"137\" y=\"-230.3\" font-family=\"Times,serif\" font-size=\"14.00\">obama</text>\n",
"</g>\n",
"<!-- bill -->\n",
"<g id=\"node2\" class=\"node\">\n",
"<title>bill</title>\n",
"<ellipse fill=\"none\" stroke=\"black\" cx=\"27\" cy=\"-162\" rx=\"27\" ry=\"18\"/>\n",
"<text text-anchor=\"middle\" x=\"27\" y=\"-158.3\" font-family=\"Times,serif\" font-size=\"14.00\">bill</text>\n",
"</g>\n",
"<!-- obama&#45;&gt;bill -->\n",
"<g id=\"edge1\" class=\"edge\">\n",
"<title>obama-&gt;bill</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M114.47,-218.67C97.08,-207.6 72.94,-192.23 54.42,-180.45\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"56.15,-177.4 45.84,-174.99 52.4,-183.31 56.15,-177.4\"/>\n",
"</g>\n",
"<!-- bob -->\n",
"<g id=\"node3\" class=\"node\">\n",
"<title>bob</title>\n",
"<ellipse fill=\"none\" stroke=\"black\" cx=\"100\" cy=\"-162\" rx=\"28\" ry=\"18\"/>\n",
"<text text-anchor=\"middle\" x=\"100\" y=\"-158.3\" font-family=\"Times,serif\" font-size=\"14.00\">bob</text>\n",
"</g>\n",
"<!-- obama&#45;&gt;bob -->\n",
"<g id=\"edge2\" class=\"edge\">\n",
"<title>obama-&gt;bob</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M128.04,-216.05C123.66,-207.77 118.3,-197.62 113.44,-188.42\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"116.39,-186.51 108.62,-179.31 110.2,-189.79 116.39,-186.51\"/>\n",
"</g>\n",
"<!-- ben -->\n",
"<g id=\"node4\" class=\"node\">\n",
"<title>ben</title>\n",
"<ellipse fill=\"none\" stroke=\"black\" cx=\"174\" cy=\"-162\" rx=\"28\" ry=\"18\"/>\n",
"<text text-anchor=\"middle\" x=\"174\" y=\"-158.3\" font-family=\"Times,serif\" font-size=\"14.00\">ben</text>\n",
"</g>\n",
"<!-- obama&#45;&gt;ben -->\n",
"<g id=\"edge3\" class=\"edge\">\n",
"<title>obama-&gt;ben</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M145.96,-216.05C150.34,-207.77 155.7,-197.62 160.56,-188.42\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"163.8,-189.79 165.38,-179.31 157.61,-186.51 163.8,-189.79\"/>\n",
"</g>\n",
"<!-- beth -->\n",
"<g id=\"node5\" class=\"node\">\n",
"<title>beth</title>\n",
"<ellipse fill=\"none\" stroke=\"black\" cx=\"252\" cy=\"-162\" rx=\"32\" ry=\"18\"/>\n",
"<text text-anchor=\"middle\" x=\"252\" y=\"-158.3\" font-family=\"Times,serif\" font-size=\"14.00\">beth</text>\n",
"</g>\n",
"<!-- obama&#45;&gt;beth -->\n",
"<g id=\"edge4\" class=\"edge\">\n",
"<title>obama-&gt;beth</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M160.27,-218.83C178.18,-207.94 203.04,-192.8 222.37,-181.04\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"224.36,-183.92 231.08,-175.73 220.72,-177.95 224.36,-183.92\"/>\n",
"</g>\n",
"<!-- cindy -->\n",
"<g id=\"node6\" class=\"node\">\n",
"<title>cindy</title>\n",
"<ellipse fill=\"none\" stroke=\"black\" cx=\"93\" cy=\"-90\" rx=\"36\" ry=\"18\"/>\n",
"<text text-anchor=\"middle\" x=\"93\" y=\"-86.3\" font-family=\"Times,serif\" font-size=\"14.00\">cindy</text>\n",
"</g>\n",
"<!-- bill&#45;&gt;cindy -->\n",
"<g id=\"edge5\" class=\"edge\">\n",
"<title>bill-&gt;cindy</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M41,-146.15C49.77,-136.85 61.25,-124.67 71.2,-114.12\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"73.79,-116.47 78.11,-106.8 68.7,-111.67 73.79,-116.47\"/>\n",
"</g>\n",
"<!-- bob&#45;&gt;cindy -->\n",
"<g id=\"edge6\" class=\"edge\">\n",
"<title>bob-&gt;cindy</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M98.27,-143.7C97.5,-135.98 96.57,-126.71 95.71,-118.11\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"99.19,-117.7 94.71,-108.1 92.22,-118.4 99.19,-117.7\"/>\n",
"</g>\n",
"<!-- boris -->\n",
"<g id=\"node7\" class=\"node\">\n",
"<title>boris</title>\n",
"<ellipse fill=\"none\" stroke=\"black\" cx=\"181\" cy=\"-90\" rx=\"34.5\" ry=\"18\"/>\n",
"<text text-anchor=\"middle\" x=\"181\" y=\"-86.3\" font-family=\"Times,serif\" font-size=\"14.00\">boris</text>\n",
"</g>\n",
"<!-- ben&#45;&gt;boris -->\n",
"<g id=\"edge7\" class=\"edge\">\n",
"<title>ben-&gt;boris</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M175.73,-143.7C176.5,-135.98 177.43,-126.71 178.29,-118.11\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"181.78,-118.4 179.29,-108.1 174.81,-117.7 181.78,-118.4\"/>\n",
"</g>\n",
"<!-- beth&#45;&gt;boris -->\n",
"<g id=\"edge8\" class=\"edge\">\n",
"<title>beth-&gt;boris</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M236.59,-145.81C227.01,-136.36 214.51,-124.04 203.8,-113.48\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"205.96,-110.69 196.38,-106.16 201.04,-115.67 205.96,-110.69\"/>\n",
"</g>\n",
"<!-- tim -->\n",
"<g id=\"node8\" class=\"node\">\n",
"<title>tim</title>\n",
"<ellipse fill=\"none\" stroke=\"black\" cx=\"137\" cy=\"-18\" rx=\"27\" ry=\"18\"/>\n",
"<text text-anchor=\"middle\" x=\"137\" y=\"-14.3\" font-family=\"Times,serif\" font-size=\"14.00\">tim</text>\n",
"</g>\n",
"<!-- cindy&#45;&gt;tim -->\n",
"<g id=\"edge9\" class=\"edge\">\n",
"<title>cindy-&gt;tim</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M103.43,-72.41C108.82,-63.83 115.51,-53.19 121.49,-43.67\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"124.59,-45.32 126.95,-34.99 118.66,-41.59 124.59,-45.32\"/>\n",
"</g>\n",
"<!-- boris&#45;&gt;tim -->\n",
"<g id=\"edge10\" class=\"edge\">\n",
"<title>boris-&gt;tim</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M170.79,-72.77C165.41,-64.19 158.68,-53.49 152.65,-43.9\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"155.43,-41.75 147.15,-35.15 149.51,-45.48 155.43,-41.75\"/>\n",
"</g>\n",
"</g>\n",
"</svg>"
],
"text/plain": [
"<IPython.core.display.SVG object>"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# wait 20 secs to see display\n",
"cpal_chain.draw(path=\"web.svg\")\n",
"SVG(\"web.svg\")"
]
},
{
"cell_type": "markdown",
"id": "1f6f345a-bb16-4e64-83c4-cbbc789a8325",
"metadata": {},
"source": [
"### Unanswerable math\n",
"\n",
"Takeaway: PAL hallucinates, where CPAL, rather than hallucinate, answers with _\"unanswerable, narrative question and plot are incoherent\"_"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "068afd79-fd41-4ec2-b4d0-c64140dc413f",
"metadata": {},
"outputs": [],
"source": [
"question = (\n",
" \"Jan has three times the number of pets as Marcia.\"\n",
" \"Marcia has two more pets than Cindy.\"\n",
" \"If Cindy has ten pets, how many pets does Barak have?\"\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "02f77db2-72e8-46c2-90b3-5e37ca42f80d",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3mdef solution():\n",
" \"\"\"Jan has three times the number of pets as Marcia.Marcia has two more pets than Cindy.If Cindy has ten pets, how many pets does Barak have?\"\"\"\n",
" cindy_pets = 10\n",
" marcia_pets = cindy_pets + 2\n",
" jan_pets = marcia_pets * 3\n",
" result = jan_pets\n",
" return result\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"'36'"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pal_chain.run(question)"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "925958de-e998-4ffa-8b2e-5a00ddae5026",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3mstory outcome data\n",
" name code value depends_on\n",
"0 cindy pass 10.0 []\n",
"1 marcia marcia.value = cindy.value + 2 12.0 [cindy]\n",
"2 jan jan.value = marcia.value * 3 36.0 [marcia]\u001b[0m\n",
"\n",
"\u001b[36;1m\u001b[1;3mquery data\n",
"{\n",
" \"question\": \"how many pets does barak have?\",\n",
" \"expression\": \"SELECT name, value FROM df WHERE name = 'barak'\",\n",
" \"llm_error_msg\": \"\"\n",
"}\u001b[0m\n",
"\n",
"unanswerable, query and outcome are incoherent\n",
"\n",
"outcome:\n",
" name code value depends_on\n",
"0 cindy pass 10.0 []\n",
"1 marcia marcia.value = cindy.value + 2 12.0 [cindy]\n",
"2 jan jan.value = marcia.value * 3 36.0 [marcia]\n",
"query:\n",
"{'question': 'how many pets does barak have?', 'expression': \"SELECT name, value FROM df WHERE name = 'barak'\", 'llm_error_msg': ''}\n"
]
}
],
"source": [
"try:\n",
" cpal_chain.run(question)\n",
"except Exception as e_msg:\n",
" print(e_msg)"
]
},
{
"cell_type": "markdown",
"id": "095adc76",
"metadata": {},
"source": [
"### Basic math\n",
"\n",
"#### Causal mediator"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "3ecf03fa-8350-4c4e-8080-84a307ba6ad4",
"metadata": {},
"outputs": [],
"source": [
"question = (\n",
" \"Jan has three times the number of pets as Marcia. \"\n",
" \"Marcia has two more pets than Cindy. \"\n",
" \"If Cindy has four pets, how many total pets do the three have?\"\n",
")"
]
},
{
"cell_type": "markdown",
"id": "74e49c47-3eed-4abe-98b7-8e97bcd15944",
"metadata": {},
"source": [
"---\n",
"PAL"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "2e88395f-d014-4362-abb0-88f6800860bb",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3mdef solution():\n",
" \"\"\"Jan has three times the number of pets as Marcia. Marcia has two more pets than Cindy. If Cindy has four pets, how many total pets do the three have?\"\"\"\n",
" cindy_pets = 4\n",
" marcia_pets = cindy_pets + 2\n",
" jan_pets = marcia_pets * 3\n",
" total_pets = cindy_pets + marcia_pets + jan_pets\n",
" result = total_pets\n",
" return result\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"'28'"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pal_chain.run(question)"
]
},
{
"cell_type": "markdown",
"id": "20ba6640-3d17-4b59-8101-aaba89d68cf4",
"metadata": {},
"source": [
"---\n",
"CPAL"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "312a0943-a482-4ed0-a064-1e7a72e9479b",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3mstory outcome data\n",
" name code value depends_on\n",
"0 cindy pass 4.0 []\n",
"1 marcia marcia.value = cindy.value + 2 6.0 [cindy]\n",
"2 jan jan.value = marcia.value * 3 18.0 [marcia]\u001b[0m\n",
"\n",
"\u001b[36;1m\u001b[1;3mquery data\n",
"{\n",
" \"question\": \"how many total pets do the three have?\",\n",
" \"expression\": \"SELECT SUM(value) FROM df\",\n",
" \"llm_error_msg\": \"\"\n",
"}\u001b[0m\n",
"\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"28.0"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cpal_chain.run(question)"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "4466b975-ae2b-4252-972b-b3182a089ade",
"metadata": {},
"outputs": [
{
"data": {
"image/svg+xml": [
"<svg xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\" width=\"92pt\" height=\"188pt\" viewBox=\"0.00 0.00 92.49 188.00\">\n",
"<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 184)\">\n",
"<polygon fill=\"white\" stroke=\"transparent\" points=\"-4,4 -4,-184 88.49,-184 88.49,4 -4,4\"/>\n",
"<!-- cindy -->\n",
"<g id=\"node1\" class=\"node\">\n",
"<title>cindy</title>\n",
"<ellipse fill=\"none\" stroke=\"black\" cx=\"42.25\" cy=\"-162\" rx=\"36\" ry=\"18\"/>\n",
"<text text-anchor=\"middle\" x=\"42.25\" y=\"-158.3\" font-family=\"Times,serif\" font-size=\"14.00\">cindy</text>\n",
"</g>\n",
"<!-- marcia -->\n",
"<g id=\"node2\" class=\"node\">\n",
"<title>marcia</title>\n",
"<ellipse fill=\"none\" stroke=\"black\" cx=\"42.25\" cy=\"-90\" rx=\"42.49\" ry=\"18\"/>\n",
"<text text-anchor=\"middle\" x=\"42.25\" y=\"-86.3\" font-family=\"Times,serif\" font-size=\"14.00\">marcia</text>\n",
"</g>\n",
"<!-- cindy&#45;&gt;marcia -->\n",
"<g id=\"edge1\" class=\"edge\">\n",
"<title>cindy-&gt;marcia</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M42.25,-143.7C42.25,-135.98 42.25,-126.71 42.25,-118.11\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"45.75,-118.1 42.25,-108.1 38.75,-118.1 45.75,-118.1\"/>\n",
"</g>\n",
"<!-- jan -->\n",
"<g id=\"node3\" class=\"node\">\n",
"<title>jan</title>\n",
"<ellipse fill=\"none\" stroke=\"black\" cx=\"42.25\" cy=\"-18\" rx=\"27\" ry=\"18\"/>\n",
"<text text-anchor=\"middle\" x=\"42.25\" y=\"-14.3\" font-family=\"Times,serif\" font-size=\"14.00\">jan</text>\n",
"</g>\n",
"<!-- marcia&#45;&gt;jan -->\n",
"<g id=\"edge2\" class=\"edge\">\n",
"<title>marcia-&gt;jan</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M42.25,-71.7C42.25,-63.98 42.25,-54.71 42.25,-46.11\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"45.75,-46.1 42.25,-36.1 38.75,-46.1 45.75,-46.1\"/>\n",
"</g>\n",
"</g>\n",
"</svg>"
],
"text/plain": [
"<IPython.core.display.SVG object>"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# wait 20 secs to see display\n",
"cpal_chain.draw(path=\"web.svg\")\n",
"SVG(\"web.svg\")"
]
},
{
"cell_type": "markdown",
"id": "29fa7b8a-75a3-4270-82a2-2c31939cd7e0",
"metadata": {},
"source": [
"### Causal collider"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "618eddac-f0ef-4ab5-90ed-72e880fdeba3",
"metadata": {},
"outputs": [],
"source": [
"question = (\n",
" \"Jan has the number of pets as Marcia plus the number of pets as Cindy. \"\n",
" \"Marcia has no pets. \"\n",
" \"If Cindy has four pets, how many total pets do the three have?\"\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "a01563f3-7974-4de4-8bd9-0b7d710aa0d3",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3mstory outcome data\n",
" name code value depends_on\n",
"0 marcia pass 0.0 []\n",
"1 cindy pass 4.0 []\n",
"2 jan jan.value = marcia.value + cindy.value 4.0 [marcia, cindy]\u001b[0m\n",
"\n",
"\u001b[36;1m\u001b[1;3mquery data\n",
"{\n",
" \"question\": \"how many total pets do the three have?\",\n",
" \"expression\": \"SELECT SUM(value) FROM df\",\n",
" \"llm_error_msg\": \"\"\n",
"}\u001b[0m\n",
"\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"8.0"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cpal_chain.run(question)"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "0fbe7243-0522-4946-b9a2-6e21e7c49a42",
"metadata": {},
"outputs": [
{
"data": {
"image/svg+xml": [
"<svg xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\" width=\"182pt\" height=\"116pt\" viewBox=\"0.00 0.00 182.00 116.00\">\n",
"<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 112)\">\n",
"<polygon fill=\"white\" stroke=\"transparent\" points=\"-4,4 -4,-112 178,-112 178,4 -4,4\"/>\n",
"<!-- marcia -->\n",
"<g id=\"node1\" class=\"node\">\n",
"<title>marcia</title>\n",
"<ellipse fill=\"none\" stroke=\"black\" cx=\"42.25\" cy=\"-90\" rx=\"42.49\" ry=\"18\"/>\n",
"<text text-anchor=\"middle\" x=\"42.25\" y=\"-86.3\" font-family=\"Times,serif\" font-size=\"14.00\">marcia</text>\n",
"</g>\n",
"<!-- jan -->\n",
"<g id=\"node2\" class=\"node\">\n",
"<title>jan</title>\n",
"<ellipse fill=\"none\" stroke=\"black\" cx=\"90.25\" cy=\"-18\" rx=\"27\" ry=\"18\"/>\n",
"<text text-anchor=\"middle\" x=\"90.25\" y=\"-14.3\" font-family=\"Times,serif\" font-size=\"14.00\">jan</text>\n",
"</g>\n",
"<!-- marcia&#45;&gt;jan -->\n",
"<g id=\"edge1\" class=\"edge\">\n",
"<title>marcia-&gt;jan</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M53.62,-72.41C59.57,-63.74 66.95,-52.97 73.53,-43.38\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"76.51,-45.21 79.28,-34.99 70.74,-41.26 76.51,-45.21\"/>\n",
"</g>\n",
"<!-- cindy -->\n",
"<g id=\"node3\" class=\"node\">\n",
"<title>cindy</title>\n",
"<ellipse fill=\"none\" stroke=\"black\" cx=\"138.25\" cy=\"-90\" rx=\"36\" ry=\"18\"/>\n",
"<text text-anchor=\"middle\" x=\"138.25\" y=\"-86.3\" font-family=\"Times,serif\" font-size=\"14.00\">cindy</text>\n",
"</g>\n",
"<!-- cindy&#45;&gt;jan -->\n",
"<g id=\"edge2\" class=\"edge\">\n",
"<title>cindy-&gt;jan</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M127.11,-72.77C121.09,-63.98 113.54,-52.96 106.83,-43.19\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"109.53,-40.94 100.99,-34.67 103.75,-44.89 109.53,-40.94\"/>\n",
"</g>\n",
"</g>\n",
"</svg>"
],
"text/plain": [
"<IPython.core.display.SVG object>"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# wait 20 secs to see display\n",
"cpal_chain.draw(path=\"web.svg\")\n",
"SVG(\"web.svg\")"
]
},
{
"cell_type": "markdown",
"id": "d4082538-ec03-44f0-aac3-07e03aad7555",
"metadata": {},
"source": [
"### Causal confounder"
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "83932c30-950b-435a-b328-7993ce8cc6bd",
"metadata": {},
"outputs": [],
"source": [
"question = (\n",
" \"Jan has the number of pets as Marcia plus the number of pets as Cindy. \"\n",
" \"Marcia has two more pets than Cindy. \"\n",
" \"If Cindy has four pets, how many total pets do the three have?\"\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "570de307-7c6b-4fdc-80c3-4361daa8a629",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3mstory outcome data\n",
" name code value depends_on\n",
"0 cindy pass 4.0 []\n",
"1 marcia marcia.value = cindy.value + 2 6.0 [cindy]\n",
"2 jan jan.value = cindy.value + marcia.value 10.0 [cindy, marcia]\u001b[0m\n",
"\n",
"\u001b[36;1m\u001b[1;3mquery data\n",
"{\n",
" \"question\": \"how many total pets do the three have?\",\n",
" \"expression\": \"SELECT SUM(value) FROM df\",\n",
" \"llm_error_msg\": \"\"\n",
"}\u001b[0m\n",
"\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"20.0"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cpal_chain.run(question)"
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "00375615-6b6d-4357-bdb8-f64f682f7605",
"metadata": {},
"outputs": [
{
"data": {
"image/svg+xml": [
"<svg xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\" width=\"121pt\" height=\"188pt\" viewBox=\"0.00 0.00 120.99 188.00\">\n",
"<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 184)\">\n",
"<polygon fill=\"white\" stroke=\"transparent\" points=\"-4,4 -4,-184 116.99,-184 116.99,4 -4,4\"/>\n",
"<!-- cindy -->\n",
"<g id=\"node1\" class=\"node\">\n",
"<title>cindy</title>\n",
"<ellipse fill=\"none\" stroke=\"black\" cx=\"77.25\" cy=\"-162\" rx=\"36\" ry=\"18\"/>\n",
"<text text-anchor=\"middle\" x=\"77.25\" y=\"-158.3\" font-family=\"Times,serif\" font-size=\"14.00\">cindy</text>\n",
"</g>\n",
"<!-- marcia -->\n",
"<g id=\"node2\" class=\"node\">\n",
"<title>marcia</title>\n",
"<ellipse fill=\"none\" stroke=\"black\" cx=\"42.25\" cy=\"-90\" rx=\"42.49\" ry=\"18\"/>\n",
"<text text-anchor=\"middle\" x=\"42.25\" y=\"-86.3\" font-family=\"Times,serif\" font-size=\"14.00\">marcia</text>\n",
"</g>\n",
"<!-- cindy&#45;&gt;marcia -->\n",
"<g id=\"edge1\" class=\"edge\">\n",
"<title>cindy-&gt;marcia</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M68.95,-144.41C64.87,-136.25 59.86,-126.22 55.28,-117.07\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"58.33,-115.34 50.72,-107.96 52.07,-118.47 58.33,-115.34\"/>\n",
"</g>\n",
"<!-- jan -->\n",
"<g id=\"node3\" class=\"node\">\n",
"<title>jan</title>\n",
"<ellipse fill=\"none\" stroke=\"black\" cx=\"77.25\" cy=\"-18\" rx=\"27\" ry=\"18\"/>\n",
"<text text-anchor=\"middle\" x=\"77.25\" y=\"-14.3\" font-family=\"Times,serif\" font-size=\"14.00\">jan</text>\n",
"</g>\n",
"<!-- cindy&#45;&gt;jan -->\n",
"<g id=\"edge2\" class=\"edge\">\n",
"<title>cindy-&gt;jan</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M83.73,-144.1C87.32,-133.84 91.42,-120.36 93.25,-108 95.58,-92.17 95.58,-87.83 93.25,-72 91.95,-63.21 89.5,-53.86 86.91,-45.5\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"90.19,-44.29 83.73,-35.9 83.55,-46.49 90.19,-44.29\"/>\n",
"</g>\n",
"<!-- marcia&#45;&gt;jan -->\n",
"<g id=\"edge3\" class=\"edge\">\n",
"<title>marcia-&gt;jan</title>\n",
"<path fill=\"none\" stroke=\"black\" d=\"M50.72,-72.06C54.86,-63.77 59.94,-53.62 64.53,-44.42\"/>\n",
"<polygon fill=\"black\" stroke=\"black\" points=\"67.75,-45.82 69.09,-35.31 61.49,-42.69 67.75,-45.82\"/>\n",
"</g>\n",
"</g>\n",
"</svg>"
],
"text/plain": [
"<IPython.core.display.SVG object>"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# wait 20 secs to see display\n",
"cpal_chain.draw(path=\"web.svg\")\n",
"SVG(\"web.svg\")"
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "255683de-0c1c-4131-b277-99d09f5ac1fc",
"metadata": {},
"outputs": [],
"source": [
"%load_ext autoreload\n",
"%autoreload 2"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.3"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -72,7 +72,10 @@
"import numpy as np\n",
"\n",
"from langchain.schema import BaseRetriever\n",
"from langchain.callbacks.manager import AsyncCallbackManagerForRetrieverRun, CallbackManagerForRetrieverRun\n",
"from langchain.callbacks.manager import (\n",
" AsyncCallbackManagerForRetrieverRun,\n",
" CallbackManagerForRetrieverRun,\n",
")\n",
"from langchain.utilities import GoogleSerperAPIWrapper\n",
"from langchain.embeddings import OpenAIEmbeddings\n",
"from langchain.chat_models import ChatOpenAI\n",
@@ -97,13 +100,15 @@
"outputs": [],
"source": [
"class SerperSearchRetriever(BaseRetriever):\n",
"\n",
" search: GoogleSerperAPIWrapper = None\n",
"\n",
" def _get_relevant_documents(self, query: str, *, run_manager: CallbackManagerForRetrieverRun, **kwargs: Any) -> List[Document]:\n",
" def _get_relevant_documents(\n",
" self, query: str, *, run_manager: CallbackManagerForRetrieverRun, **kwargs: Any\n",
" ) -> List[Document]:\n",
" return [Document(page_content=self.search.run(query))]\n",
"\n",
" async def _aget_relevant_documents(self,\n",
" async def _aget_relevant_documents(\n",
" self,\n",
" query: str,\n",
" *,\n",
" run_manager: AsyncCallbackManagerForRetrieverRun,\n",

View File

@@ -83,9 +83,15 @@
"schema = client.schema()\n",
"schema.propertyKey(\"name\").asText().ifNotExist().create()\n",
"schema.propertyKey(\"birthDate\").asText().ifNotExist().create()\n",
"schema.vertexLabel(\"Person\").properties(\"name\", \"birthDate\").usePrimaryKeyId().primaryKeys(\"name\").ifNotExist().create()\n",
"schema.vertexLabel(\"Movie\").properties(\"name\").usePrimaryKeyId().primaryKeys(\"name\").ifNotExist().create()\n",
"schema.edgeLabel(\"ActedIn\").sourceLabel(\"Person\").targetLabel(\"Movie\").ifNotExist().create()"
"schema.vertexLabel(\"Person\").properties(\n",
" \"name\", \"birthDate\"\n",
").usePrimaryKeyId().primaryKeys(\"name\").ifNotExist().create()\n",
"schema.vertexLabel(\"Movie\").properties(\"name\").usePrimaryKeyId().primaryKeys(\n",
" \"name\"\n",
").ifNotExist().create()\n",
"schema.edgeLabel(\"ActedIn\").sourceLabel(\"Person\").targetLabel(\n",
" \"Movie\"\n",
").ifNotExist().create()"
]
},
{
@@ -124,7 +130,9 @@
"\n",
"g.addEdge(\"ActedIn\", \"1:Al Pacino\", \"2:The Godfather\", {})\n",
"g.addEdge(\"ActedIn\", \"1:Al Pacino\", \"2:The Godfather Part II\", {})\n",
"g.addEdge(\"ActedIn\", \"1:Al Pacino\", \"2:The Godfather Coda The Death of Michael Corleone\", {})\n",
"g.addEdge(\n",
" \"ActedIn\", \"1:Al Pacino\", \"2:The Godfather Coda The Death of Michael Corleone\", {}\n",
")\n",
"g.addEdge(\"ActedIn\", \"1:Robert De Niro\", \"2:The Godfather Part II\", {})"
]
},
@@ -164,7 +172,7 @@
" password=\"admin\",\n",
" address=\"localhost\",\n",
" port=8080,\n",
" graph=\"hugegraph\"\n",
" graph=\"hugegraph\",\n",
")"
]
},
@@ -228,9 +236,7 @@
"metadata": {},
"outputs": [],
"source": [
"chain = HugeGraphQAChain.from_llm(\n",
" ChatOpenAI(temperature=0), graph=graph, verbose=True\n",
")"
"chain = HugeGraphQAChain.from_llm(ChatOpenAI(temperature=0), graph=graph, verbose=True)"
]
},
{

View File

@@ -31,6 +31,7 @@
"outputs": [],
"source": [
"import kuzu\n",
"\n",
"db = kuzu.Database(\"test_db\")\n",
"conn = kuzu.Connection(db)"
]
@@ -61,7 +62,9 @@
],
"source": [
"conn.execute(\"CREATE NODE TABLE Movie (name STRING, PRIMARY KEY(name))\")\n",
"conn.execute(\"CREATE NODE TABLE Person (name STRING, birthDate STRING, PRIMARY KEY(name))\")\n",
"conn.execute(\n",
" \"CREATE NODE TABLE Person (name STRING, birthDate STRING, PRIMARY KEY(name))\"\n",
")\n",
"conn.execute(\"CREATE REL TABLE ActedIn (FROM Person TO Movie)\")"
]
},
@@ -94,11 +97,21 @@
"conn.execute(\"CREATE (:Person {name: 'Robert De Niro', birthDate: '1943-08-17'})\")\n",
"conn.execute(\"CREATE (:Movie {name: 'The Godfather'})\")\n",
"conn.execute(\"CREATE (:Movie {name: 'The Godfather: Part II'})\")\n",
"conn.execute(\"CREATE (:Movie {name: 'The Godfather Coda: The Death of Michael Corleone'})\")\n",
"conn.execute(\"MATCH (p:Person), (m:Movie) WHERE p.name = 'Al Pacino' AND m.name = 'The Godfather' CREATE (p)-[:ActedIn]->(m)\")\n",
"conn.execute(\"MATCH (p:Person), (m:Movie) WHERE p.name = 'Al Pacino' AND m.name = 'The Godfather: Part II' CREATE (p)-[:ActedIn]->(m)\")\n",
"conn.execute(\"MATCH (p:Person), (m:Movie) WHERE p.name = 'Al Pacino' AND m.name = 'The Godfather Coda: The Death of Michael Corleone' CREATE (p)-[:ActedIn]->(m)\")\n",
"conn.execute(\"MATCH (p:Person), (m:Movie) WHERE p.name = 'Robert De Niro' AND m.name = 'The Godfather: Part II' CREATE (p)-[:ActedIn]->(m)\")"
"conn.execute(\n",
" \"CREATE (:Movie {name: 'The Godfather Coda: The Death of Michael Corleone'})\"\n",
")\n",
"conn.execute(\n",
" \"MATCH (p:Person), (m:Movie) WHERE p.name = 'Al Pacino' AND m.name = 'The Godfather' CREATE (p)-[:ActedIn]->(m)\"\n",
")\n",
"conn.execute(\n",
" \"MATCH (p:Person), (m:Movie) WHERE p.name = 'Al Pacino' AND m.name = 'The Godfather: Part II' CREATE (p)-[:ActedIn]->(m)\"\n",
")\n",
"conn.execute(\n",
" \"MATCH (p:Person), (m:Movie) WHERE p.name = 'Al Pacino' AND m.name = 'The Godfather Coda: The Death of Michael Corleone' CREATE (p)-[:ActedIn]->(m)\"\n",
")\n",
"conn.execute(\n",
" \"MATCH (p:Person), (m:Movie) WHERE p.name = 'Robert De Niro' AND m.name = 'The Godfather: Part II' CREATE (p)-[:ActedIn]->(m)\"\n",
")"
]
},
{
@@ -137,9 +150,7 @@
"metadata": {},
"outputs": [],
"source": [
"chain = KuzuQAChain.from_llm(\n",
" ChatOpenAI(temperature=0), graph=graph, verbose=True\n",
")"
"chain = KuzuQAChain.from_llm(ChatOpenAI(temperature=0), graph=graph, verbose=True)"
]
},
{

View File

@@ -60,7 +60,8 @@
],
"metadata": {
"collapsed": false
}
},
"id": "7af596b5"
},
{
"cell_type": "markdown",
@@ -150,20 +151,20 @@
"text": [
"\n",
"\n",
"\u001B[1m> Entering new GraphSparqlQAChain chain...\u001B[0m\n",
"\u001b[1m> Entering new GraphSparqlQAChain chain...\u001b[0m\n",
"Identified intent:\n",
"\u001B[32;1m\u001B[1;3mSELECT\u001B[0m\n",
"\u001b[32;1m\u001b[1;3mSELECT\u001b[0m\n",
"Generated SPARQL:\n",
"\u001B[32;1m\u001B[1;3mPREFIX foaf: <http://xmlns.com/foaf/0.1/>\n",
"\u001b[32;1m\u001b[1;3mPREFIX foaf: <http://xmlns.com/foaf/0.1/>\n",
"SELECT ?homepage\n",
"WHERE {\n",
" ?person foaf:name \"Tim Berners-Lee\" .\n",
" ?person foaf:workplaceHomepage ?homepage .\n",
"}\u001B[0m\n",
"}\u001b[0m\n",
"Full Context:\n",
"\u001B[32;1m\u001B[1;3m[]\u001B[0m\n",
"\u001b[32;1m\u001b[1;3m[]\u001b[0m\n",
"\n",
"\u001B[1m> Finished chain.\u001B[0m\n"
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
@@ -207,19 +208,19 @@
"text": [
"\n",
"\n",
"\u001B[1m> Entering new GraphSparqlQAChain chain...\u001B[0m\n",
"\u001b[1m> Entering new GraphSparqlQAChain chain...\u001b[0m\n",
"Identified intent:\n",
"\u001B[32;1m\u001B[1;3mUPDATE\u001B[0m\n",
"\u001b[32;1m\u001b[1;3mUPDATE\u001b[0m\n",
"Generated SPARQL:\n",
"\u001B[32;1m\u001B[1;3mPREFIX foaf: <http://xmlns.com/foaf/0.1/>\n",
"\u001b[32;1m\u001b[1;3mPREFIX foaf: <http://xmlns.com/foaf/0.1/>\n",
"INSERT {\n",
" ?person foaf:workplaceHomepage <http://www.w3.org/foo/bar/> .\n",
"}\n",
"WHERE {\n",
" ?person foaf:name \"Timothy Berners-Lee\" .\n",
"}\u001B[0m\n",
"}\u001b[0m\n",
"\n",
"\u001B[1m> Finished chain.\u001B[0m\n"
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
@@ -234,7 +235,9 @@
}
],
"source": [
"chain.run(\"Save that the person with the name 'Timothy Berners-Lee' has a work homepage at 'http://www.w3.org/foo/bar/'\")"
"chain.run(\n",
" \"Save that the person with the name 'Timothy Berners-Lee' has a work homepage at 'http://www.w3.org/foo/bar/'\"\n",
")"
]
},
{
@@ -297,4 +300,4 @@
},
"nbformat": 4,
"nbformat_minor": 5
}
}

View File

@@ -38,7 +38,7 @@
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
"texts = text_splitter.split_documents(documents)\n",
"for i, text in enumerate(texts):\n",
" text.metadata['source'] = f\"{i}-pl\"\n",
" text.metadata[\"source\"] = f\"{i}-pl\"\n",
"embeddings = OpenAIEmbeddings()\n",
"docsearch = Chroma.from_documents(texts, embeddings)"
]
@@ -97,8 +97,8 @@
"outputs": [],
"source": [
"final_qa_chain = StuffDocumentsChain(\n",
" llm_chain=qa_chain, \n",
" document_variable_name='context',\n",
" llm_chain=qa_chain,\n",
" document_variable_name=\"context\",\n",
" document_prompt=doc_prompt,\n",
")"
]
@@ -111,8 +111,7 @@
"outputs": [],
"source": [
"retrieval_qa = RetrievalQA(\n",
" retriever=docsearch.as_retriever(),\n",
" combine_documents_chain=final_qa_chain\n",
" retriever=docsearch.as_retriever(), combine_documents_chain=final_qa_chain\n",
")"
]
},
@@ -175,8 +174,8 @@
"outputs": [],
"source": [
"final_qa_chain_pydantic = StuffDocumentsChain(\n",
" llm_chain=qa_chain_pydantic, \n",
" document_variable_name='context',\n",
" llm_chain=qa_chain_pydantic,\n",
" document_variable_name=\"context\",\n",
" document_prompt=doc_prompt,\n",
")"
]
@@ -189,8 +188,7 @@
"outputs": [],
"source": [
"retrieval_qa_pydantic = RetrievalQA(\n",
" retriever=docsearch.as_retriever(),\n",
" combine_documents_chain=final_qa_chain_pydantic\n",
" retriever=docsearch.as_retriever(), combine_documents_chain=final_qa_chain_pydantic\n",
")"
]
},
@@ -235,6 +233,7 @@
"from langchain.chains import ConversationalRetrievalChain\n",
"from langchain.memory import ConversationBufferMemory\n",
"from langchain.chains import LLMChain\n",
"\n",
"memory = ConversationBufferMemory(memory_key=\"chat_history\", return_messages=True)\n",
"_template = \"\"\"Given the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.\\\n",
"Make sure to avoid using any unclear pronouns.\n",
@@ -258,10 +257,10 @@
"outputs": [],
"source": [
"qa = ConversationalRetrievalChain(\n",
" question_generator=condense_question_chain, \n",
" question_generator=condense_question_chain,\n",
" retriever=docsearch.as_retriever(),\n",
" memory=memory, \n",
" combine_docs_chain=final_qa_chain\n",
" memory=memory,\n",
" combine_docs_chain=final_qa_chain,\n",
")"
]
},
@@ -389,7 +388,9 @@
" \"\"\"An answer to the question being asked, with sources.\"\"\"\n",
"\n",
" answer: str = Field(..., description=\"Answer to the question that was asked\")\n",
" countries_referenced: List[str] = Field(..., description=\"All of the countries mentioned in the sources\")\n",
" countries_referenced: List[str] = Field(\n",
" ..., description=\"All of the countries mentioned in the sources\"\n",
" )\n",
" sources: List[str] = Field(\n",
" ..., description=\"List of sources used to answer the question\"\n",
" )\n",
@@ -405,20 +406,23 @@
" HumanMessage(content=\"Answer question using the following context\"),\n",
" HumanMessagePromptTemplate.from_template(\"{context}\"),\n",
" HumanMessagePromptTemplate.from_template(\"Question: {question}\"),\n",
" HumanMessage(content=\"Tips: Make sure to answer in the correct format. Return all of the countries mentioned in the sources in uppercase characters.\"),\n",
" HumanMessage(\n",
" content=\"Tips: Make sure to answer in the correct format. Return all of the countries mentioned in the sources in uppercase characters.\"\n",
" ),\n",
"]\n",
"\n",
"chain_prompt = ChatPromptTemplate(messages=prompt_messages)\n",
"\n",
"qa_chain_pydantic = create_qa_with_structure_chain(llm, CustomResponseSchema, output_parser=\"pydantic\", prompt=chain_prompt)\n",
"qa_chain_pydantic = create_qa_with_structure_chain(\n",
" llm, CustomResponseSchema, output_parser=\"pydantic\", prompt=chain_prompt\n",
")\n",
"final_qa_chain_pydantic = StuffDocumentsChain(\n",
" llm_chain=qa_chain_pydantic,\n",
" document_variable_name='context',\n",
" document_variable_name=\"context\",\n",
" document_prompt=doc_prompt,\n",
")\n",
"retrieval_qa_pydantic = RetrievalQA(\n",
" retriever=docsearch.as_retriever(),\n",
" combine_documents_chain=final_qa_chain_pydantic\n",
" retriever=docsearch.as_retriever(), combine_documents_chain=final_qa_chain_pydantic\n",
")\n",
"query = \"What did he say about russia\"\n",
"retrieval_qa_pydantic.run(query)"

View File

@@ -35,7 +35,9 @@
"metadata": {},
"outputs": [],
"source": [
"chain = get_openapi_chain(\"https://www.klarna.com/us/shopping/public/openai/v0/api-docs/\")"
"chain = get_openapi_chain(\n",
" \"https://www.klarna.com/us/shopping/public/openai/v0/api-docs/\"\n",
")"
]
},
{
@@ -186,7 +188,9 @@
},
"outputs": [],
"source": [
"chain = get_openapi_chain(\"https://gist.githubusercontent.com/roaldnefs/053e505b2b7a807290908fe9aa3e1f00/raw/0a212622ebfef501163f91e23803552411ed00e4/openapi.yaml\")"
"chain = get_openapi_chain(\n",
" \"https://gist.githubusercontent.com/roaldnefs/053e505b2b7a807290908fe9aa3e1f00/raw/0a212622ebfef501163f91e23803552411ed00e4/openapi.yaml\"\n",
")"
]
},
{

View File

@@ -28,7 +28,7 @@
"\n",
"from pydantic import Extra\n",
"\n",
"from langchain.base_language import BaseLanguageModel\n",
"from langchain.schemea import BaseLanguageModel\n",
"from langchain.callbacks.manager import (\n",
" AsyncCallbackManagerForChainRun,\n",
" CallbackManagerForChainRun,\n",

View File

@@ -22,7 +22,8 @@
"from typing import Optional\n",
"\n",
"from langchain.chains.openai_functions import (\n",
" create_openai_fn_chain, create_structured_output_chain\n",
" create_openai_fn_chain,\n",
" create_structured_output_chain,\n",
")\n",
"from langchain.chat_models import ChatOpenAI\n",
"from langchain.prompts import ChatPromptTemplate, HumanMessagePromptTemplate\n",
@@ -58,8 +59,10 @@
"source": [
"from pydantic import BaseModel, Field\n",
"\n",
"\n",
"class Person(BaseModel):\n",
" \"\"\"Identifying information about a person.\"\"\"\n",
"\n",
" name: str = Field(..., description=\"The person's name\")\n",
" age: int = Field(..., description=\"The person's age\")\n",
" fav_food: Optional[str] = Field(None, description=\"The person's favorite food\")"
@@ -103,13 +106,15 @@
"llm = ChatOpenAI(model=\"gpt-3.5-turbo-0613\", temperature=0)\n",
"\n",
"prompt_msgs = [\n",
" SystemMessage(\n",
" content=\"You are a world class algorithm for extracting information in structured formats.\"\n",
" ),\n",
" HumanMessage(content=\"Use the given format to extract information from the following input:\"),\n",
" HumanMessagePromptTemplate.from_template(\"{input}\"),\n",
" HumanMessage(content=\"Tips: Make sure to answer in the correct format\"),\n",
" ]\n",
" SystemMessage(\n",
" content=\"You are a world class algorithm for extracting information in structured formats.\"\n",
" ),\n",
" HumanMessage(\n",
" content=\"Use the given format to extract information from the following input:\"\n",
" ),\n",
" HumanMessagePromptTemplate.from_template(\"{input}\"),\n",
" HumanMessage(content=\"Tips: Make sure to answer in the correct format\"),\n",
"]\n",
"prompt = ChatPromptTemplate(messages=prompt_msgs)\n",
"\n",
"chain = create_structured_output_chain(Person, llm, prompt, verbose=True)\n",
@@ -162,12 +167,17 @@
"source": [
"from typing import Sequence\n",
"\n",
"\n",
"class People(BaseModel):\n",
" \"\"\"Identifying information about all people in a text.\"\"\"\n",
"\n",
" people: Sequence[Person] = Field(..., description=\"The people in the text\")\n",
" \n",
"\n",
"\n",
"chain = create_structured_output_chain(People, llm, prompt, verbose=True)\n",
"chain.run(\"Sally is 13, Joey just turned 12 and loves spinach. Caroline is 10 years older than Sally, so she's 23.\")"
"chain.run(\n",
" \"Sally is 13, Joey just turned 12 and loves spinach. Caroline is 10 years older than Sally, so she's 23.\"\n",
")"
]
},
{
@@ -192,27 +202,16 @@
" \"description\": \"Identifying information about a person.\",\n",
" \"type\": \"object\",\n",
" \"properties\": {\n",
" \"name\": {\n",
" \"title\": \"Name\",\n",
" \"description\": \"The person's name\",\n",
" \"type\": \"string\"\n",
" },\n",
" \"age\": {\n",
" \"title\": \"Age\",\n",
" \"description\": \"The person's age\",\n",
" \"type\": \"integer\"\n",
" },\n",
" \"fav_food\": {\n",
" \"title\": \"Fav Food\",\n",
" \"description\": \"The person's favorite food\",\n",
" \"type\": \"string\"\n",
" }\n",
" \"name\": {\"title\": \"Name\", \"description\": \"The person's name\", \"type\": \"string\"},\n",
" \"age\": {\"title\": \"Age\", \"description\": \"The person's age\", \"type\": \"integer\"},\n",
" \"fav_food\": {\n",
" \"title\": \"Fav Food\",\n",
" \"description\": \"The person's favorite food\",\n",
" \"type\": \"string\",\n",
" },\n",
" },\n",
" \"required\": [\n",
" \"name\",\n",
" \"age\"\n",
" ]\n",
"}\n"
" \"required\": [\"name\", \"age\"],\n",
"}"
]
},
{
@@ -286,13 +285,15 @@
"source": [
"class RecordPerson(BaseModel):\n",
" \"\"\"Record some identifying information about a pe.\"\"\"\n",
"\n",
" name: str = Field(..., description=\"The person's name\")\n",
" age: int = Field(..., description=\"The person's age\")\n",
" fav_food: Optional[str] = Field(None, description=\"The person's favorite food\")\n",
"\n",
" \n",
"\n",
"class RecordDog(BaseModel):\n",
" \"\"\"Record some identifying information about a dog.\"\"\"\n",
"\n",
" name: str = Field(..., description=\"The dog's name\")\n",
" color: str = Field(..., description=\"The dog's color\")\n",
" fav_food: Optional[str] = Field(None, description=\"The dog's favorite food\")"
@@ -333,10 +334,10 @@
],
"source": [
"prompt_msgs = [\n",
" SystemMessage(\n",
" content=\"You are a world class algorithm for recording entities\"\n",
" SystemMessage(content=\"You are a world class algorithm for recording entities\"),\n",
" HumanMessage(\n",
" content=\"Make calls to the relevant function to record the entities in the following input:\"\n",
" ),\n",
" HumanMessage(content=\"Make calls to the relevant function to record the entities in the following input:\"),\n",
" HumanMessagePromptTemplate.from_template(\"{input}\"),\n",
" HumanMessage(content=\"Tips: Make sure to answer in the correct format\"),\n",
"]\n",
@@ -393,11 +394,16 @@
"source": [
"class OptionalFavFood(BaseModel):\n",
" \"\"\"Either a food or null.\"\"\"\n",
" food: Optional[str] = Field(None, description=\"Either the name of a food or null. Should be null if the food isn't known.\")\n",
"\n",
" food: Optional[str] = Field(\n",
" None,\n",
" description=\"Either the name of a food or null. Should be null if the food isn't known.\",\n",
" )\n",
"\n",
"\n",
"def record_person(name: str, age: int, fav_food: OptionalFavFood) -> str:\n",
" \"\"\"Record some basic identifying information about a person.\n",
" \n",
"\n",
" Args:\n",
" name: The person's name.\n",
" age: The person's age in years.\n",
@@ -405,9 +411,11 @@
" \"\"\"\n",
" return f\"Recording person {name} of age {age} with favorite food {fav_food.food}!\"\n",
"\n",
" \n",
"\n",
"chain = create_openai_fn_chain([record_person], llm, prompt, verbose=True)\n",
"chain.run(\"The most important thing to remember about Tommy, my 12 year old, is that he'll do anything for apple pie.\")"
"chain.run(\n",
" \"The most important thing to remember about Tommy, my 12 year old, is that he'll do anything for apple pie.\"\n",
")"
]
},
{
@@ -458,7 +466,7 @@
"source": [
"def record_dog(name: str, color: str, fav_food: OptionalFavFood) -> str:\n",
" \"\"\"Record some basic identifying information about a dog.\n",
" \n",
"\n",
" Args:\n",
" name: The dog's name.\n",
" color: The dog's color.\n",
@@ -468,7 +476,9 @@
"\n",
"\n",
"chain = create_openai_fn_chain([record_person, record_dog], llm, prompt, verbose=True)\n",
"chain.run(\"I can't find my dog Henry anywhere, he's a small brown beagle. Could you send a message about him?\")"
"chain.run(\n",
" \"I can't find my dog Henry anywhere, he's a small brown beagle. Could you send a message about him?\"\n",
")"
]
},
{

View File

@@ -77,7 +77,9 @@
}
],
"source": [
"loader = BraveSearchLoader(query=\"obama middle name\", api_key=api_key, search_kwargs={\"count\": 3})\n",
"loader = BraveSearchLoader(\n",
" query=\"obama middle name\", api_key=api_key, search_kwargs={\"count\": 3}\n",
")\n",
"docs = loader.load()\n",
"len(docs)"
]

View File

@@ -0,0 +1,96 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# Datadog Logs\n",
"\n",
">[Datadog](https://www.datadoghq.com/) is a monitoring and analytics platform for cloud-scale applications.\n",
"\n",
"This loader fetches the logs from your applications in Datadog using the `datadog_api_client` Python package. You must initialize the loader with your `Datadog API key` and `APP key`, and you need to pass in the query to extract the desired logs."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import DatadogLogsLoader"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#!pip install datadog-api-client"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"query = \"service:agent status:error\"\n",
"\n",
"loader = DatadogLogsLoader(\n",
" query=query,\n",
" api_key=DD_API_KEY,\n",
" app_key=DD_APP_KEY,\n",
" from_time=1688732708951, # Optional, timestamp in milliseconds\n",
" to_time=1688736308951, # Optional, timestamp in milliseconds\n",
" limit=100, # Optional, default is 100\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[Document(page_content='message: grep: /etc/datadog-agent/system-probe.yaml: No such file or directory', metadata={'id': 'AgAAAYkwpLImvkjRpQAAAAAAAAAYAAAAAEFZa3dwTUFsQUFEWmZfLU5QdElnM3dBWQAAACQAAAAAMDE4OTMwYTQtYzk3OS00MmJjLTlhNDAtOTY4N2EwY2I5ZDdk', 'status': 'error', 'service': 'agent', 'tags': ['accessible-from-goog-gke-node', 'allow-external-ingress-high-ports', 'allow-external-ingress-http', 'allow-external-ingress-https', 'container_id:c7d8ecd27b5b3cfdf3b0df04b8965af6f233f56b7c3c2ffabfab5e3b6ccbd6a5', 'container_name:lab_datadog_1', 'datadog.pipelines:false', 'datadog.submission_auth:private_api_key', 'docker_image:datadog/agent:7.41.1', 'env:dd101-dev', 'hostname:lab-host', 'image_name:datadog/agent', 'image_tag:7.41.1', 'instance-id:7497601202021312403', 'instance-type:custom-1-4096', 'instruqt_aws_accounts:', 'instruqt_azure_subscriptions:', 'instruqt_gcp_projects:', 'internal-hostname:lab-host.d4rjybavkary.svc.cluster.local', 'numeric_project_id:3390740675', 'p-d4rjybavkary', 'project:instruqt-prod', 'service:agent', 'short_image:agent', 'source:agent', 'zone:europe-west1-b'], 'timestamp': datetime.datetime(2023, 7, 7, 13, 57, 27, 206000, tzinfo=tzutc())}),\n",
" Document(page_content='message: grep: /etc/datadog-agent/system-probe.yaml: No such file or directory', metadata={'id': 'AgAAAYkwpLImvkjRpgAAAAAAAAAYAAAAAEFZa3dwTUFsQUFEWmZfLU5QdElnM3dBWgAAACQAAAAAMDE4OTMwYTQtYzk3OS00MmJjLTlhNDAtOTY4N2EwY2I5ZDdk', 'status': 'error', 'service': 'agent', 'tags': ['accessible-from-goog-gke-node', 'allow-external-ingress-high-ports', 'allow-external-ingress-http', 'allow-external-ingress-https', 'container_id:c7d8ecd27b5b3cfdf3b0df04b8965af6f233f56b7c3c2ffabfab5e3b6ccbd6a5', 'container_name:lab_datadog_1', 'datadog.pipelines:false', 'datadog.submission_auth:private_api_key', 'docker_image:datadog/agent:7.41.1', 'env:dd101-dev', 'hostname:lab-host', 'image_name:datadog/agent', 'image_tag:7.41.1', 'instance-id:7497601202021312403', 'instance-type:custom-1-4096', 'instruqt_aws_accounts:', 'instruqt_azure_subscriptions:', 'instruqt_gcp_projects:', 'internal-hostname:lab-host.d4rjybavkary.svc.cluster.local', 'numeric_project_id:3390740675', 'p-d4rjybavkary', 'project:instruqt-prod', 'service:agent', 'short_image:agent', 'source:agent', 'zone:europe-west1-b'], 'timestamp': datetime.datetime(2023, 7, 7, 13, 57, 27, 206000, tzinfo=tzutc())})]"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"documents = loader.load()\n",
"documents"
]
}
],
"metadata": {
"kernelspec": {
"display_name": ".venv",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.11"
},
"orig_nbformat": 4
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -0,0 +1,5 @@
Stanley Cups
Team Location Stanley Cups
Blues STL 1
Flyers PHI 2
Maple Leafs TOR 13
1 Stanley Cups
2 Team Location Stanley Cups
3 Blues STL 1
4 Flyers PHI 2
5 Maple Leafs TOR 13

View File

@@ -126,11 +126,11 @@
"metadata": {},
"outputs": [],
"source": [
"file_id=\"1x9WBtFPWMEAdjcJzPScRsjpjQvpSo_kz\"\n",
"file_id = \"1x9WBtFPWMEAdjcJzPScRsjpjQvpSo_kz\"\n",
"loader = GoogleDriveLoader(\n",
" file_ids=[file_id],\n",
" file_loader_cls=UnstructuredFileIOLoader,\n",
" file_loader_kwargs={\"mode\": \"elements\"}\n",
" file_loader_kwargs={\"mode\": \"elements\"},\n",
")"
]
},
@@ -180,11 +180,11 @@
"metadata": {},
"outputs": [],
"source": [
"folder_id=\"1asMOHY1BqBS84JcRbOag5LOJac74gpmD\"\n",
"folder_id = \"1asMOHY1BqBS84JcRbOag5LOJac74gpmD\"\n",
"loader = GoogleDriveLoader(\n",
" folder_id=folder_id,\n",
" file_loader_cls=UnstructuredFileIOLoader,\n",
" file_loader_kwargs={\"mode\": \"elements\"}\n",
" file_loader_kwargs={\"mode\": \"elements\"},\n",
")"
]
},

View File

@@ -101,7 +101,7 @@
" \"../Papers/\",\n",
" glob=\"*\",\n",
" suffixes=[\".pdf\"],\n",
" parser= GrobidParser(segment_sentences=False)\n",
" parser=GrobidParser(segment_sentences=False),\n",
")\n",
"docs = loader.load()"
]

View File

@@ -18,7 +18,10 @@
"outputs": [],
"source": [
"from langchain.document_loaders import WebBaseLoader\n",
"loader_web = WebBaseLoader(\"https://github.com/basecamp/handbook/blob/master/37signals-is-you.md\")"
"\n",
"loader_web = WebBaseLoader(\n",
" \"https://github.com/basecamp/handbook/blob/master/37signals-is-you.md\"\n",
")"
]
},
{
@@ -29,6 +32,7 @@
"outputs": [],
"source": [
"from langchain.document_loaders import PyPDFLoader\n",
"\n",
"loader_pdf = PyPDFLoader(\"../MachineLearning-Lecture01.pdf\")"
]
},
@@ -40,7 +44,8 @@
"outputs": [],
"source": [
"from langchain.document_loaders.merge import MergedDataLoader\n",
"loader_all=MergedDataLoader(loaders=[loader_web,loader_pdf])"
"\n",
"loader_all = MergedDataLoader(loaders=[loader_web, loader_pdf])"
]
},
{
@@ -50,7 +55,7 @@
"metadata": {},
"outputs": [],
"source": [
"docs_all=loader_all.load()"
"docs_all = loader_all.load()"
]
},
{

View File

@@ -36,7 +36,9 @@
],
"source": [
"# Create a new loader object for the MHTML file\n",
"loader = MHTMLLoader(file_path='../../../../../../tests/integration_tests/examples/example.mht')\n",
"loader = MHTMLLoader(\n",
" file_path=\"../../../../../../tests/integration_tests/examples/example.mht\"\n",
")\n",
"\n",
"# Load the document from the file\n",
"documents = loader.load()\n",

View File

@@ -53,11 +53,9 @@
"metadata": {},
"outputs": [],
"source": [
"dataset = \"vw6y-z8j6\" # 311 data\n",
"dataset = \"tmnf-yvry\" # crime data\n",
"loader = OpenCityDataLoader(city_id=\"data.sfgov.org\",\n",
" dataset_id=dataset,\n",
" limit=2000)"
"dataset = \"vw6y-z8j6\" # 311 data\n",
"dataset = \"tmnf-yvry\" # crime data\n",
"loader = OpenCityDataLoader(city_id=\"data.sfgov.org\", dataset_id=dataset, limit=2000)"
]
},
{

View File

@@ -33,9 +33,7 @@
"metadata": {},
"outputs": [],
"source": [
"loader = UnstructuredOrgModeLoader(\n",
" file_path=\"example_data/README.org\", mode=\"elements\"\n",
")\n",
"loader = UnstructuredOrgModeLoader(file_path=\"example_data/README.org\", mode=\"elements\")\n",
"docs = loader.load()"
]
},

View File

@@ -239,7 +239,7 @@
}
],
"source": [
"# Use lazy load for larger table, which won't read the full table into memory \n",
"# Use lazy load for larger table, which won't read the full table into memory\n",
"for i in loader.lazy_load():\n",
" print(i)"
]

View File

@@ -49,9 +49,9 @@
"metadata": {},
"outputs": [],
"source": [
"url = 'https://js.langchain.com/docs/modules/memory/examples/'\n",
"loader=RecursiveUrlLoader(url=url)\n",
"docs=loader.load()"
"url = \"https://js.langchain.com/docs/modules/memory/examples/\"\n",
"loader = RecursiveUrlLoader(url=url)\n",
"docs = loader.load()"
]
},
{
@@ -138,10 +138,10 @@
"metadata": {},
"outputs": [],
"source": [
"url = 'https://js.langchain.com/docs/'\n",
"exclude_dirs=['https://js.langchain.com/docs/api/']\n",
"loader=RecursiveUrlLoader(url=url,exclude_dirs=exclude_dirs)\n",
"docs=loader.load()"
"url = \"https://js.langchain.com/docs/\"\n",
"exclude_dirs = [\"https://js.langchain.com/docs/api/\"]\n",
"loader = RecursiveUrlLoader(url=url, exclude_dirs=exclude_dirs)\n",
"docs = loader.load()"
]
},
{

View File

@@ -33,9 +33,7 @@
"metadata": {},
"outputs": [],
"source": [
"loader = UnstructuredRSTLoader(\n",
" file_path=\"example_data/README.rst\", mode=\"elements\"\n",
")\n",
"loader = UnstructuredRSTLoader(file_path=\"example_data/README.rst\", mode=\"elements\")\n",
"docs = loader.load()"
]
},

View File

@@ -30,7 +30,8 @@
"outputs": [],
"source": [
"import warnings\n",
"warnings.filterwarnings('ignore')\n",
"\n",
"warnings.filterwarnings(\"ignore\")\n",
"from pprint import pprint\n",
"from langchain.text_splitter import Language\n",
"from langchain.document_loaders.generic import GenericLoader\n",
@@ -48,7 +49,7 @@
" \"./example_data/source_code\",\n",
" glob=\"*\",\n",
" suffixes=[\".py\", \".js\"],\n",
" parser=LanguageParser()\n",
" parser=LanguageParser(),\n",
")\n",
"docs = loader.load()"
]
@@ -200,7 +201,7 @@
" \"./example_data/source_code\",\n",
" glob=\"*\",\n",
" suffixes=[\".py\"],\n",
" parser=LanguageParser(language=Language.PYTHON, parser_threshold=1000)\n",
" parser=LanguageParser(language=Language.PYTHON, parser_threshold=1000),\n",
")\n",
"docs = loader.load()"
]
@@ -281,7 +282,7 @@
" \"./example_data/source_code\",\n",
" glob=\"*\",\n",
" suffixes=[\".js\"],\n",
" parser=LanguageParser(language=Language.JS)\n",
" parser=LanguageParser(language=Language.JS),\n",
")\n",
"docs = loader.load()"
]

View File

@@ -43,10 +43,10 @@
"outputs": [],
"source": [
"conf = CosConfig(\n",
" Region=\"your cos region\",\n",
" SecretId=\"your cos secret_id\",\n",
" SecretKey=\"your cos secret_key\",\n",
" )\n",
" Region=\"your cos region\",\n",
" SecretId=\"your cos secret_id\",\n",
" SecretKey=\"your cos secret_key\",\n",
")\n",
"loader = TencentCOSDirectoryLoader(conf=conf, bucket=\"you_cos_bucket\")"
]
},

View File

@@ -43,10 +43,10 @@
"outputs": [],
"source": [
"conf = CosConfig(\n",
" Region=\"your cos region\",\n",
" SecretId=\"your cos secret_id\",\n",
" SecretKey=\"your cos secret_key\",\n",
" )\n",
" Region=\"your cos region\",\n",
" SecretId=\"your cos secret_id\",\n",
" SecretKey=\"your cos secret_key\",\n",
")\n",
"loader = TencentCOSFileLoader(conf=conf, bucket=\"you_cos_bucket\", key=\"fake.docx\")"
]
},

View File

@@ -0,0 +1,181 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# TSV\n",
"\n",
">A [tab-separated values (TSV)](https://en.wikipedia.org/wiki/Tab-separated_values) file is a simple, text-based file format for storing tabular data.[3] Records are separated by newlines, and values within a record are separated by tab characters."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## `UnstructuredTSVLoader`\n",
"\n",
"You can also load the table using the `UnstructuredTSVLoader`. One advantage of using `UnstructuredTSVLoader` is that if you use it in `\"elements\"` mode, an HTML representation of the table will be available in the metadata."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders.tsv import UnstructuredTSVLoader"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"loader = UnstructuredTSVLoader(\n",
" file_path=\"example_data/mlb_teams_2012.csv\", mode=\"elements\"\n",
")\n",
"docs = loader.load()"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<table border=\"1\" class=\"dataframe\">\n",
" <tbody>\n",
" <tr>\n",
" <td>Nationals, 81.34, 98</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Reds, 82.20, 97</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Yankees, 197.96, 95</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Giants, 117.62, 94</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Braves, 83.31, 94</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Athletics, 55.37, 94</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Rangers, 120.51, 93</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Orioles, 81.43, 93</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Rays, 64.17, 90</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Angels, 154.49, 89</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Tigers, 132.30, 88</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Cardinals, 110.30, 88</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Dodgers, 95.14, 86</td>\n",
" </tr>\n",
" <tr>\n",
" <td>White Sox, 96.92, 85</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Brewers, 97.65, 83</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Phillies, 174.54, 81</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Diamondbacks, 74.28, 81</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Pirates, 63.43, 79</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Padres, 55.24, 76</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Mariners, 81.97, 75</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Mets, 93.35, 74</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Blue Jays, 75.48, 73</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Royals, 60.91, 72</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Marlins, 118.07, 69</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Red Sox, 173.18, 69</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Indians, 78.43, 68</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Twins, 94.08, 66</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Rockies, 78.06, 64</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Cubs, 88.19, 61</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Astros, 60.65, 55</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n"
]
}
],
"source": [
"print(docs[0].metadata[\"text_as_html\"])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.13"
}
},
"nbformat": 4,
"nbformat_minor": 4
}

View File

@@ -233,7 +233,8 @@
],
"metadata": {
"collapsed": false
}
},
"id": "672264ad"
},
{
"cell_type": "code",
@@ -241,16 +242,18 @@
"outputs": [],
"source": [
"loader = WebBaseLoader(\n",
" \"https://www.walmart.com/search?q=parrots\", proxies={\n",
" \"https://www.walmart.com/search?q=parrots\",\n",
" proxies={\n",
" \"http\": \"http://{username}:{password}:@proxy.service.com:6666/\",\n",
" \"https\": \"https://{username}:{password}:@proxy.service.com:6666/\"\n",
" }\n",
" \"https\": \"https://{username}:{password}:@proxy.service.com:6666/\",\n",
" },\n",
")\n",
"docs = loader.load()\n"
"docs = loader.load()"
],
"metadata": {
"collapsed": false
}
},
"id": "9caf0310"
}
],
"metadata": {
@@ -274,4 +277,4 @@
},
"nbformat": 4,
"nbformat_minor": 5
}
}

View File

@@ -0,0 +1,304 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# Xorbits Pandas DataFrame\n",
"\n",
"This notebook goes over how to load data from a [xorbits.pandas](https://doc.xorbits.io/en/latest/reference/pandas/frame.html) DataFrame."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"#!pip install xorbits"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"import xorbits.pandas as pd"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"df = pd.read_csv(\"example_data/mlb_teams_2012.csv\")"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "b0d1d84e23c04f1296f63b3ea3dd1e5b",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
" 0%| | 0.00/100 [00:00<?, ?it/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Team</th>\n",
" <th>\"Payroll (millions)\"</th>\n",
" <th>\"Wins\"</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Nationals</td>\n",
" <td>81.34</td>\n",
" <td>98</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Reds</td>\n",
" <td>82.20</td>\n",
" <td>97</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Yankees</td>\n",
" <td>197.96</td>\n",
" <td>95</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Giants</td>\n",
" <td>117.62</td>\n",
" <td>94</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Braves</td>\n",
" <td>83.31</td>\n",
" <td>94</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Team \"Payroll (millions)\" \"Wins\"\n",
"0 Nationals 81.34 98\n",
"1 Reds 82.20 97\n",
"2 Yankees 197.96 95\n",
"3 Giants 117.62 94\n",
"4 Braves 83.31 94"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head()"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import XorbitsLoader"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"loader = XorbitsLoader(df, page_content_column=\"Team\")"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "c8c8b67f1aae4a3c9de7734bb6cf738e",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
" 0%| | 0.00/100 [00:00<?, ?it/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/plain": [
"[Document(page_content='Nationals', metadata={' \"Payroll (millions)\"': 81.34, ' \"Wins\"': 98}),\n",
" Document(page_content='Reds', metadata={' \"Payroll (millions)\"': 82.2, ' \"Wins\"': 97}),\n",
" Document(page_content='Yankees', metadata={' \"Payroll (millions)\"': 197.96, ' \"Wins\"': 95}),\n",
" Document(page_content='Giants', metadata={' \"Payroll (millions)\"': 117.62, ' \"Wins\"': 94}),\n",
" Document(page_content='Braves', metadata={' \"Payroll (millions)\"': 83.31, ' \"Wins\"': 94}),\n",
" Document(page_content='Athletics', metadata={' \"Payroll (millions)\"': 55.37, ' \"Wins\"': 94}),\n",
" Document(page_content='Rangers', metadata={' \"Payroll (millions)\"': 120.51, ' \"Wins\"': 93}),\n",
" Document(page_content='Orioles', metadata={' \"Payroll (millions)\"': 81.43, ' \"Wins\"': 93}),\n",
" Document(page_content='Rays', metadata={' \"Payroll (millions)\"': 64.17, ' \"Wins\"': 90}),\n",
" Document(page_content='Angels', metadata={' \"Payroll (millions)\"': 154.49, ' \"Wins\"': 89}),\n",
" Document(page_content='Tigers', metadata={' \"Payroll (millions)\"': 132.3, ' \"Wins\"': 88}),\n",
" Document(page_content='Cardinals', metadata={' \"Payroll (millions)\"': 110.3, ' \"Wins\"': 88}),\n",
" Document(page_content='Dodgers', metadata={' \"Payroll (millions)\"': 95.14, ' \"Wins\"': 86}),\n",
" Document(page_content='White Sox', metadata={' \"Payroll (millions)\"': 96.92, ' \"Wins\"': 85}),\n",
" Document(page_content='Brewers', metadata={' \"Payroll (millions)\"': 97.65, ' \"Wins\"': 83}),\n",
" Document(page_content='Phillies', metadata={' \"Payroll (millions)\"': 174.54, ' \"Wins\"': 81}),\n",
" Document(page_content='Diamondbacks', metadata={' \"Payroll (millions)\"': 74.28, ' \"Wins\"': 81}),\n",
" Document(page_content='Pirates', metadata={' \"Payroll (millions)\"': 63.43, ' \"Wins\"': 79}),\n",
" Document(page_content='Padres', metadata={' \"Payroll (millions)\"': 55.24, ' \"Wins\"': 76}),\n",
" Document(page_content='Mariners', metadata={' \"Payroll (millions)\"': 81.97, ' \"Wins\"': 75}),\n",
" Document(page_content='Mets', metadata={' \"Payroll (millions)\"': 93.35, ' \"Wins\"': 74}),\n",
" Document(page_content='Blue Jays', metadata={' \"Payroll (millions)\"': 75.48, ' \"Wins\"': 73}),\n",
" Document(page_content='Royals', metadata={' \"Payroll (millions)\"': 60.91, ' \"Wins\"': 72}),\n",
" Document(page_content='Marlins', metadata={' \"Payroll (millions)\"': 118.07, ' \"Wins\"': 69}),\n",
" Document(page_content='Red Sox', metadata={' \"Payroll (millions)\"': 173.18, ' \"Wins\"': 69}),\n",
" Document(page_content='Indians', metadata={' \"Payroll (millions)\"': 78.43, ' \"Wins\"': 68}),\n",
" Document(page_content='Twins', metadata={' \"Payroll (millions)\"': 94.08, ' \"Wins\"': 66}),\n",
" Document(page_content='Rockies', metadata={' \"Payroll (millions)\"': 78.06, ' \"Wins\"': 64}),\n",
" Document(page_content='Cubs', metadata={' \"Payroll (millions)\"': 88.19, ' \"Wins\"': 61}),\n",
" Document(page_content='Astros', metadata={' \"Payroll (millions)\"': 60.65, ' \"Wins\"': 55})]"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"loader.load()"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "fc85c9f59b3644689d05853159fbd358",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
" 0%| | 0.00/100 [00:00<?, ?it/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"page_content='Nationals' metadata={' \"Payroll (millions)\"': 81.34, ' \"Wins\"': 98}\n",
"page_content='Reds' metadata={' \"Payroll (millions)\"': 82.2, ' \"Wins\"': 97}\n",
"page_content='Yankees' metadata={' \"Payroll (millions)\"': 197.96, ' \"Wins\"': 95}\n",
"page_content='Giants' metadata={' \"Payroll (millions)\"': 117.62, ' \"Wins\"': 94}\n",
"page_content='Braves' metadata={' \"Payroll (millions)\"': 83.31, ' \"Wins\"': 94}\n",
"page_content='Athletics' metadata={' \"Payroll (millions)\"': 55.37, ' \"Wins\"': 94}\n",
"page_content='Rangers' metadata={' \"Payroll (millions)\"': 120.51, ' \"Wins\"': 93}\n",
"page_content='Orioles' metadata={' \"Payroll (millions)\"': 81.43, ' \"Wins\"': 93}\n",
"page_content='Rays' metadata={' \"Payroll (millions)\"': 64.17, ' \"Wins\"': 90}\n",
"page_content='Angels' metadata={' \"Payroll (millions)\"': 154.49, ' \"Wins\"': 89}\n",
"page_content='Tigers' metadata={' \"Payroll (millions)\"': 132.3, ' \"Wins\"': 88}\n",
"page_content='Cardinals' metadata={' \"Payroll (millions)\"': 110.3, ' \"Wins\"': 88}\n",
"page_content='Dodgers' metadata={' \"Payroll (millions)\"': 95.14, ' \"Wins\"': 86}\n",
"page_content='White Sox' metadata={' \"Payroll (millions)\"': 96.92, ' \"Wins\"': 85}\n",
"page_content='Brewers' metadata={' \"Payroll (millions)\"': 97.65, ' \"Wins\"': 83}\n",
"page_content='Phillies' metadata={' \"Payroll (millions)\"': 174.54, ' \"Wins\"': 81}\n",
"page_content='Diamondbacks' metadata={' \"Payroll (millions)\"': 74.28, ' \"Wins\"': 81}\n",
"page_content='Pirates' metadata={' \"Payroll (millions)\"': 63.43, ' \"Wins\"': 79}\n",
"page_content='Padres' metadata={' \"Payroll (millions)\"': 55.24, ' \"Wins\"': 76}\n",
"page_content='Mariners' metadata={' \"Payroll (millions)\"': 81.97, ' \"Wins\"': 75}\n",
"page_content='Mets' metadata={' \"Payroll (millions)\"': 93.35, ' \"Wins\"': 74}\n",
"page_content='Blue Jays' metadata={' \"Payroll (millions)\"': 75.48, ' \"Wins\"': 73}\n",
"page_content='Royals' metadata={' \"Payroll (millions)\"': 60.91, ' \"Wins\"': 72}\n",
"page_content='Marlins' metadata={' \"Payroll (millions)\"': 118.07, ' \"Wins\"': 69}\n",
"page_content='Red Sox' metadata={' \"Payroll (millions)\"': 173.18, ' \"Wins\"': 69}\n",
"page_content='Indians' metadata={' \"Payroll (millions)\"': 78.43, ' \"Wins\"': 68}\n",
"page_content='Twins' metadata={' \"Payroll (millions)\"': 94.08, ' \"Wins\"': 66}\n",
"page_content='Rockies' metadata={' \"Payroll (millions)\"': 78.06, ' \"Wins\"': 64}\n",
"page_content='Cubs' metadata={' \"Payroll (millions)\"': 88.19, ' \"Wins\"': 61}\n",
"page_content='Astros' metadata={' \"Payroll (millions)\"': 60.65, ' \"Wins\"': 55}\n"
]
}
],
"source": [
"# Use lazy load for larger table, which won't read the full table into memory\n",
"for i in loader.lazy_load():\n",
" print(i)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "base",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.13"
},
"orig_nbformat": 4
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -155,9 +155,12 @@
"\n",
"# Char-level splits\n",
"from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
"\n",
"chunk_size = 250\n",
"chunk_overlap = 30\n",
"text_splitter = RecursiveCharacterTextSplitter(chunk_size=chunk_size, chunk_overlap=chunk_overlap)\n",
"text_splitter = RecursiveCharacterTextSplitter(\n",
" chunk_size=chunk_size, chunk_overlap=chunk_overlap\n",
")\n",
"\n",
"# Split\n",
"splits = text_splitter.split_documents(md_header_splits)\n",

View File

@@ -28,14 +28,14 @@
"# Load blog post\n",
"loader = WebBaseLoader(\"https://lilianweng.github.io/posts/2023-06-23-agent/\")\n",
"data = loader.load()\n",
" \n",
"\n",
"# Split\n",
"text_splitter = RecursiveCharacterTextSplitter(chunk_size = 500, chunk_overlap = 0)\n",
"text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)\n",
"splits = text_splitter.split_documents(data)\n",
"\n",
"# VectorDB\n",
"embedding = OpenAIEmbeddings()\n",
"vectordb = Chroma.from_documents(documents=splits,embedding=embedding)"
"vectordb = Chroma.from_documents(documents=splits, embedding=embedding)"
]
},
{
@@ -57,9 +57,12 @@
"source": [
"from langchain.chat_models import ChatOpenAI\n",
"from langchain.retrievers.multi_query import MultiQueryRetriever\n",
"question=\"What are the approaches to Task Decomposition?\"\n",
"\n",
"question = \"What are the approaches to Task Decomposition?\"\n",
"llm = ChatOpenAI(temperature=0)\n",
"retriever_from_llm = MultiQueryRetriever.from_llm(retriever=vectordb.as_retriever(),llm=llm)"
"retriever_from_llm = MultiQueryRetriever.from_llm(\n",
" retriever=vectordb.as_retriever(), llm=llm\n",
")"
]
},
{
@@ -71,8 +74,9 @@
"source": [
"# Set logging for the queries\n",
"import logging\n",
"\n",
"logging.basicConfig()\n",
"logging.getLogger('langchain.retrievers.multi_query').setLevel(logging.INFO)"
"logging.getLogger(\"langchain.retrievers.multi_query\").setLevel(logging.INFO)"
]
},
{
@@ -127,20 +131,24 @@
"from langchain.prompts import PromptTemplate\n",
"from langchain.output_parsers import PydanticOutputParser\n",
"\n",
"\n",
"# Output parser will split the LLM result into a list of queries\n",
"class LineList(BaseModel):\n",
" # \"lines\" is the key (attribute name) of the parsed output\n",
" lines: List[str] = Field(description=\"Lines of text\")\n",
"\n",
"\n",
"class LineListOutputParser(PydanticOutputParser):\n",
" def __init__(self) -> None:\n",
" super().__init__(pydantic_object=LineList)\n",
"\n",
" def parse(self, text: str) -> LineList:\n",
" lines = text.strip().split(\"\\n\")\n",
" return LineList(lines=lines)\n",
"\n",
"\n",
"output_parser = LineListOutputParser()\n",
" \n",
"\n",
"QUERY_PROMPT = PromptTemplate(\n",
" input_variables=[\"question\"],\n",
" template=\"\"\"You are an AI language model assistant. Your task is to generate five \n",
@@ -153,10 +161,10 @@
"llm = ChatOpenAI(temperature=0)\n",
"\n",
"# Chain\n",
"llm_chain = LLMChain(llm=llm,prompt=QUERY_PROMPT,output_parser=output_parser)\n",
" \n",
"llm_chain = LLMChain(llm=llm, prompt=QUERY_PROMPT, output_parser=output_parser)\n",
"\n",
"# Other inputs\n",
"question=\"What are the approaches to Task Decomposition?\""
"question = \"What are the approaches to Task Decomposition?\""
]
},
{
@@ -185,12 +193,14 @@
],
"source": [
"# Run\n",
"retriever = MultiQueryRetriever(retriever=vectordb.as_retriever(), \n",
" llm_chain=llm_chain,\n",
" parser_key=\"lines\") # \"lines\" is the key (attribute name) of the parsed output\n",
"retriever = MultiQueryRetriever(\n",
" retriever=vectordb.as_retriever(), llm_chain=llm_chain, parser_key=\"lines\"\n",
") # \"lines\" is the key (attribute name) of the parsed output\n",
"\n",
"# Results\n",
"unique_docs = retriever.get_relevant_documents(query=\"What does the course say about regression?\")\n",
"unique_docs = retriever.get_relevant_documents(\n",
" query=\"What does the course say about regression?\"\n",
")\n",
"len(unique_docs)"
]
}

View File

@@ -59,11 +59,11 @@
"import os\n",
"import getpass\n",
"\n",
"os.environ['OPENAI_API_KEY'] = getpass.getpass('OpenAI API Key:')\n",
"os.environ['MYSCALE_HOST'] = getpass.getpass('MyScale URL:')\n",
"os.environ['MYSCALE_PORT'] = getpass.getpass('MyScale Port:')\n",
"os.environ['MYSCALE_USERNAME'] = getpass.getpass('MyScale Username:')\n",
"os.environ['MYSCALE_PASSWORD'] = getpass.getpass('MyScale Password:')"
"os.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"OpenAI API Key:\")\n",
"os.environ[\"MYSCALE_HOST\"] = getpass.getpass(\"MyScale URL:\")\n",
"os.environ[\"MYSCALE_PORT\"] = getpass.getpass(\"MyScale Port:\")\n",
"os.environ[\"MYSCALE_USERNAME\"] = getpass.getpass(\"MyScale Username:\")\n",
"os.environ[\"MYSCALE_PASSWORD\"] = getpass.getpass(\"MyScale Password:\")"
]
},
{
@@ -103,16 +103,40 @@
"outputs": [],
"source": [
"docs = [\n",
" Document(page_content=\"A bunch of scientists bring back dinosaurs and mayhem breaks loose\", metadata={\"date\": \"1993-07-02\", \"rating\": 7.7, \"genre\": [\"science fiction\"]}),\n",
" Document(page_content=\"Leo DiCaprio gets lost in a dream within a dream within a dream within a ...\", metadata={\"date\": \"2010-12-30\", \"director\": \"Christopher Nolan\", \"rating\": 8.2}),\n",
" Document(page_content=\"A psychologist / detective gets lost in a series of dreams within dreams within dreams and Inception reused the idea\", metadata={\"date\": \"2006-04-23\", \"director\": \"Satoshi Kon\", \"rating\": 8.6}),\n",
" Document(page_content=\"A bunch of normal-sized women are supremely wholesome and some men pine after them\", metadata={\"date\": \"2019-08-22\", \"director\": \"Greta Gerwig\", \"rating\": 8.3}),\n",
" Document(page_content=\"Toys come alive and have a blast doing so\", metadata={\"date\": \"1995-02-11\", \"genre\": [\"animated\"]}),\n",
" Document(page_content=\"Three men walk into the Zone, three men walk out of the Zone\", metadata={\"date\": \"1979-09-10\", \"rating\": 9.9, \"director\": \"Andrei Tarkovsky\", \"genre\": [\"science fiction\", \"adventure\"], \"rating\": 9.9})\n",
" Document(\n",
" page_content=\"A bunch of scientists bring back dinosaurs and mayhem breaks loose\",\n",
" metadata={\"date\": \"1993-07-02\", \"rating\": 7.7, \"genre\": [\"science fiction\"]},\n",
" ),\n",
" Document(\n",
" page_content=\"Leo DiCaprio gets lost in a dream within a dream within a dream within a ...\",\n",
" metadata={\"date\": \"2010-12-30\", \"director\": \"Christopher Nolan\", \"rating\": 8.2},\n",
" ),\n",
" Document(\n",
" page_content=\"A psychologist / detective gets lost in a series of dreams within dreams within dreams and Inception reused the idea\",\n",
" metadata={\"date\": \"2006-04-23\", \"director\": \"Satoshi Kon\", \"rating\": 8.6},\n",
" ),\n",
" Document(\n",
" page_content=\"A bunch of normal-sized women are supremely wholesome and some men pine after them\",\n",
" metadata={\"date\": \"2019-08-22\", \"director\": \"Greta Gerwig\", \"rating\": 8.3},\n",
" ),\n",
" Document(\n",
" page_content=\"Toys come alive and have a blast doing so\",\n",
" metadata={\"date\": \"1995-02-11\", \"genre\": [\"animated\"]},\n",
" ),\n",
" Document(\n",
" page_content=\"Three men walk into the Zone, three men walk out of the Zone\",\n",
" metadata={\n",
" \"date\": \"1979-09-10\",\n",
" \"rating\": 9.9,\n",
" \"director\": \"Andrei Tarkovsky\",\n",
" \"genre\": [\"science fiction\", \"adventure\"],\n",
" \"rating\": 9.9,\n",
" },\n",
" ),\n",
"]\n",
"vectorstore = MyScale.from_documents(\n",
" docs, \n",
" embeddings, \n",
" docs,\n",
" embeddings,\n",
")"
]
},
@@ -138,39 +162,39 @@
"from langchain.retrievers.self_query.base import SelfQueryRetriever\n",
"from langchain.chains.query_constructor.base import AttributeInfo\n",
"\n",
"metadata_field_info=[\n",
"metadata_field_info = [\n",
" AttributeInfo(\n",
" name=\"genre\",\n",
" description=\"The genres of the movie\", \n",
" type=\"list[string]\", \n",
" description=\"The genres of the movie\",\n",
" type=\"list[string]\",\n",
" ),\n",
" # If you want to include length of a list, just define it as a new column\n",
" # This will teach the LLM to use it as a column when constructing filter.\n",
" AttributeInfo(\n",
" name=\"length(genre)\",\n",
" description=\"The length of genres of the movie\", \n",
" type=\"integer\", \n",
" description=\"The length of genres of the movie\",\n",
" type=\"integer\",\n",
" ),\n",
" # Now you can define a column as timestamp. By simply set the type to timestamp.\n",
" AttributeInfo(\n",
" name=\"date\",\n",
" description=\"The date the movie was released\", \n",
" type=\"timestamp\", \n",
" description=\"The date the movie was released\",\n",
" type=\"timestamp\",\n",
" ),\n",
" AttributeInfo(\n",
" name=\"director\",\n",
" description=\"The name of the movie director\", \n",
" type=\"string\", \n",
" description=\"The name of the movie director\",\n",
" type=\"string\",\n",
" ),\n",
" AttributeInfo(\n",
" name=\"rating\",\n",
" description=\"A 1-10 rating for the movie\",\n",
" type=\"float\"\n",
" name=\"rating\", description=\"A 1-10 rating for the movie\", type=\"float\"\n",
" ),\n",
"]\n",
"document_content_description = \"Brief summary of a movie\"\n",
"llm = OpenAI(temperature=0)\n",
"retriever = SelfQueryRetriever.from_llm(llm, vectorstore, document_content_description, metadata_field_info, verbose=True)"
"retriever = SelfQueryRetriever.from_llm(\n",
" llm, vectorstore, document_content_description, metadata_field_info, verbose=True\n",
")"
]
},
{
@@ -225,7 +249,9 @@
"outputs": [],
"source": [
"# This example specifies a composite filter\n",
"retriever.get_relevant_documents(\"What's a highly rated (above 8.5) science fiction film?\")"
"retriever.get_relevant_documents(\n",
" \"What's a highly rated (above 8.5) science fiction film?\"\n",
")"
]
},
{
@@ -236,7 +262,9 @@
"outputs": [],
"source": [
"# This example specifies a query and composite filter\n",
"retriever.get_relevant_documents(\"What's a movie after 1990 but before 2005 that's all about toys, and preferably is animated\")"
"retriever.get_relevant_documents(\n",
" \"What's a movie after 1990 but before 2005 that's all about toys, and preferably is animated\"\n",
")"
]
},
{
@@ -290,7 +318,9 @@
"outputs": [],
"source": [
"# Contain works for lists: so you can match a list with contain comparator!\n",
"retriever.get_relevant_documents(\"What's a movie who has genres science fiction and adventure?\")"
"retriever.get_relevant_documents(\n",
" \"What's a movie who has genres science fiction and adventure?\"\n",
")"
]
},
{
@@ -315,12 +345,12 @@
"outputs": [],
"source": [
"retriever = SelfQueryRetriever.from_llm(\n",
" llm, \n",
" vectorstore, \n",
" document_content_description, \n",
" metadata_field_info, \n",
" llm,\n",
" vectorstore,\n",
" document_content_description,\n",
" metadata_field_info,\n",
" enable_limit=True,\n",
" verbose=True\n",
" verbose=True,\n",
")"
]
},

View File

@@ -18,7 +18,7 @@
"## Creating a Pinecone index\n",
"First we'll want to create a `Pinecone` VectorStore and seed it with some data. We've created a small demo set of documents that contain summaries of movies.\n",
"\n",
"To use Pinecone, you to have `pinecone` package installed and you must have an API key and an Environment. Here are the [installation instructions](https://docs.pinecone.io/docs/quickstart).\n",
"To use Pinecone, you have to have `pinecone` package installed and you must have an API key and an Environment. Here are the [installation instructions](https://docs.pinecone.io/docs/quickstart).\n",
"\n",
"NOTE: The self-query retriever requires you to have `lark` package installed."
]

View File

@@ -53,9 +53,7 @@
"metadata": {},
"outputs": [],
"source": [
"retriever = AmazonKendraRetriever(\n",
" index_id=\"c0806df7-e76b-4bce-9b5c-d5582f6b1a03\"\n",
")"
"retriever = AmazonKendraRetriever(index_id=\"c0806df7-e76b-4bce-9b5c-d5582f6b1a03\")"
]
},
{

View File

@@ -1,21 +1,31 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"id": "9fc6205b",
"metadata": {},
"source": [
"# Databerry\n",
"# Chaindesk\n",
"\n",
">[Databerry platform](https://docs.databerry.ai/introduction) brings data from anywhere (Datsources: Text, PDF, Word, PowerPpoint, Excel, Notion, Airtable, Google Sheets, etc..) into Datastores (container of multiple Datasources).\n",
"Then your Datastores can be connected to ChatGPT via Plugins or any other Large Langue Model (LLM) via the `Databerry API`.\n",
">[Chaindesk platform](https://docs.chaindesk.ai/introduction) brings data from anywhere (Datsources: Text, PDF, Word, PowerPpoint, Excel, Notion, Airtable, Google Sheets, etc..) into Datastores (container of multiple Datasources).\n",
"Then your Datastores can be connected to ChatGPT via Plugins or any other Large Langue Model (LLM) via the `Chaindesk API`.\n",
"\n",
"This notebook shows how to use [Databerry's](https://www.databerry.ai/) retriever.\n",
"This notebook shows how to use [Chaindesk's](https://www.chaindesk.ai/) retriever.\n",
"\n",
"First, you will need to sign up for Databerry, create a datastore, add some data and get your datastore api endpoint url. You need the [API Key](https://docs.databerry.ai/api-reference/authentication)."
"First, you will need to sign up for Chaindesk, create a datastore, add some data and get your datastore api endpoint url. You need the [API Key](https://docs.chaindesk.ai/api-reference/authentication)."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3697b9fd",
"metadata": {},
"outputs": [],
"source": []
},
{
"attachments": {},
"cell_type": "markdown",
"id": "944e172b",
"metadata": {},
@@ -34,7 +44,7 @@
},
"outputs": [],
"source": [
"from langchain.retrievers import DataberryRetriever"
"from langchain.retrievers import ChaindeskRetriever"
]
},
{
@@ -46,9 +56,9 @@
},
"outputs": [],
"source": [
"retriever = DataberryRetriever(\n",
" datastore_url=\"https://clg1xg2h80000l708dymr0fxc.databerry.ai/query\",\n",
" # api_key=\"DATABERRY_API_KEY\", # optional if datastore is public\n",
"retriever = ChaindeskRetriever(\n",
" datastore_url=\"https://clg1xg2h80000l708dymr0fxc.chaindesk.ai/query\",\n",
" # api_key=\"CHAINDESK_API_KEY\", # optional if datastore is public\n",
" # top_k=10 # optional\n",
")"
]

View File

@@ -111,10 +111,10 @@
"db.index(\n",
" [\n",
" MyDoc(\n",
" title=f'My document {i}',\n",
" title_embedding=embeddings.embed_query(f'query {i}'),\n",
" title=f\"My document {i}\",\n",
" title_embedding=embeddings.embed_query(f\"query {i}\"),\n",
" year=i,\n",
" color=random.choice(['red', 'green', 'blue']),\n",
" color=random.choice([\"red\", \"green\", \"blue\"]),\n",
" )\n",
" for i in range(100)\n",
" ]\n",
@@ -142,15 +142,15 @@
"source": [
"# create a retriever\n",
"retriever = DocArrayRetriever(\n",
" index=db, \n",
" embeddings=embeddings, \n",
" search_field='title_embedding', \n",
" content_field='title',\n",
" index=db,\n",
" embeddings=embeddings,\n",
" search_field=\"title_embedding\",\n",
" content_field=\"title\",\n",
" filters=filter_query,\n",
")\n",
"\n",
"# find the relevant document\n",
"doc = retriever.get_relevant_documents('some query')\n",
"doc = retriever.get_relevant_documents(\"some query\")\n",
"print(doc)"
]
},
@@ -179,16 +179,16 @@
"\n",
"\n",
"# initialize the index\n",
"db = HnswDocumentIndex[MyDoc](work_dir='hnsw_index')\n",
"db = HnswDocumentIndex[MyDoc](work_dir=\"hnsw_index\")\n",
"\n",
"# index data\n",
"db.index(\n",
" [\n",
" MyDoc(\n",
" title=f'My document {i}',\n",
" title_embedding=embeddings.embed_query(f'query {i}'),\n",
" title=f\"My document {i}\",\n",
" title_embedding=embeddings.embed_query(f\"query {i}\"),\n",
" year=i,\n",
" color=random.choice(['red', 'green', 'blue']),\n",
" color=random.choice([\"red\", \"green\", \"blue\"]),\n",
" )\n",
" for i in range(100)\n",
" ]\n",
@@ -216,15 +216,15 @@
"source": [
"# create a retriever\n",
"retriever = DocArrayRetriever(\n",
" index=db, \n",
" embeddings=embeddings, \n",
" search_field='title_embedding', \n",
" content_field='title',\n",
" index=db,\n",
" embeddings=embeddings,\n",
" search_field=\"title_embedding\",\n",
" content_field=\"title\",\n",
" filters=filter_query,\n",
")\n",
"\n",
"# find the relevant document\n",
"doc = retriever.get_relevant_documents('some query')\n",
"doc = retriever.get_relevant_documents(\"some query\")\n",
"print(doc)"
]
},
@@ -249,11 +249,12 @@
},
"outputs": [],
"source": [
"# There's a small difference with the Weaviate backend compared to the others. \n",
"# Here, you need to 'mark' the field used for vector search with 'is_embedding=True'. \n",
"# There's a small difference with the Weaviate backend compared to the others.\n",
"# Here, you need to 'mark' the field used for vector search with 'is_embedding=True'.\n",
"# So, let's create a new schema for Weaviate that takes care of this requirement.\n",
"\n",
"from pydantic import Field \n",
"from pydantic import Field\n",
"\n",
"\n",
"class WeaviateDoc(BaseDoc):\n",
" title: str\n",
@@ -275,19 +276,17 @@
"\n",
"\n",
"# initialize the index\n",
"dbconfig = WeaviateDocumentIndex.DBConfig(\n",
" host=\"http://localhost:8080\"\n",
")\n",
"dbconfig = WeaviateDocumentIndex.DBConfig(host=\"http://localhost:8080\")\n",
"db = WeaviateDocumentIndex[WeaviateDoc](db_config=dbconfig)\n",
"\n",
"# index data\n",
"db.index(\n",
" [\n",
" MyDoc(\n",
" title=f'My document {i}',\n",
" title_embedding=embeddings.embed_query(f'query {i}'),\n",
" title=f\"My document {i}\",\n",
" title_embedding=embeddings.embed_query(f\"query {i}\"),\n",
" year=i,\n",
" color=random.choice(['red', 'green', 'blue']),\n",
" color=random.choice([\"red\", \"green\", \"blue\"]),\n",
" )\n",
" for i in range(100)\n",
" ]\n",
@@ -315,15 +314,15 @@
"source": [
"# create a retriever\n",
"retriever = DocArrayRetriever(\n",
" index=db, \n",
" embeddings=embeddings, \n",
" search_field='title_embedding', \n",
" content_field='title',\n",
" index=db,\n",
" embeddings=embeddings,\n",
" search_field=\"title_embedding\",\n",
" content_field=\"title\",\n",
" filters=filter_query,\n",
")\n",
"\n",
"# find the relevant document\n",
"doc = retriever.get_relevant_documents('some query')\n",
"doc = retriever.get_relevant_documents(\"some query\")\n",
"print(doc)"
]
},
@@ -353,18 +352,17 @@
"\n",
"# initialize the index\n",
"db = ElasticDocIndex[MyDoc](\n",
" hosts=\"http://localhost:9200\", \n",
" index_name=\"docarray_retriever\"\n",
" hosts=\"http://localhost:9200\", index_name=\"docarray_retriever\"\n",
")\n",
"\n",
"# index data\n",
"db.index(\n",
" [\n",
" MyDoc(\n",
" title=f'My document {i}',\n",
" title_embedding=embeddings.embed_query(f'query {i}'),\n",
" title=f\"My document {i}\",\n",
" title_embedding=embeddings.embed_query(f\"query {i}\"),\n",
" year=i,\n",
" color=random.choice(['red', 'green', 'blue']),\n",
" color=random.choice([\"red\", \"green\", \"blue\"]),\n",
" )\n",
" for i in range(100)\n",
" ]\n",
@@ -392,15 +390,15 @@
"source": [
"# create a retriever\n",
"retriever = DocArrayRetriever(\n",
" index=db, \n",
" embeddings=embeddings, \n",
" search_field='title_embedding', \n",
" content_field='title',\n",
" index=db,\n",
" embeddings=embeddings,\n",
" search_field=\"title_embedding\",\n",
" content_field=\"title\",\n",
" filters=filter_query,\n",
")\n",
"\n",
"# find the relevant document\n",
"doc = retriever.get_relevant_documents('some query')\n",
"doc = retriever.get_relevant_documents(\"some query\")\n",
"print(doc)"
]
},
@@ -445,10 +443,10 @@
"db.index(\n",
" [\n",
" MyDoc(\n",
" title=f'My document {i}',\n",
" title_embedding=embeddings.embed_query(f'query {i}'),\n",
" title=f\"My document {i}\",\n",
" title_embedding=embeddings.embed_query(f\"query {i}\"),\n",
" year=i,\n",
" color=random.choice(['red', 'green', 'blue']),\n",
" color=random.choice([\"red\", \"green\", \"blue\"]),\n",
" )\n",
" for i in range(100)\n",
" ]\n",
@@ -486,15 +484,15 @@
"source": [
"# create a retriever\n",
"retriever = DocArrayRetriever(\n",
" index=db, \n",
" embeddings=embeddings, \n",
" search_field='title_embedding', \n",
" content_field='title',\n",
" index=db,\n",
" embeddings=embeddings,\n",
" search_field=\"title_embedding\",\n",
" content_field=\"title\",\n",
" filters=filter_query,\n",
")\n",
"\n",
"# find the relevant document\n",
"doc = retriever.get_relevant_documents('some query')\n",
"doc = retriever.get_relevant_documents(\"some query\")\n",
"print(doc)"
]
},
@@ -552,7 +550,7 @@
" \"director\": \"Francis Ford Coppola\",\n",
" \"rating\": 9.2,\n",
" },\n",
"]\n"
"]"
]
},
{
@@ -573,9 +571,9 @@
],
"source": [
"import getpass\n",
"import os \n",
"import os\n",
"\n",
"os.environ['OPENAI_API_KEY'] = getpass.getpass('OpenAI API Key:')"
"os.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"OpenAI API Key:\")"
]
},
{
@@ -591,6 +589,7 @@
"from docarray.typing import NdArray\n",
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
"\n",
"\n",
"# define schema for your movie documents\n",
"class MyDoc(BaseDoc):\n",
" title: str\n",
@@ -598,7 +597,7 @@
" description_embedding: NdArray[1536]\n",
" rating: float\n",
" director: str\n",
" \n",
"\n",
"\n",
"embeddings = OpenAIEmbeddings()\n",
"\n",
@@ -626,7 +625,7 @@
"from docarray.index import HnswDocumentIndex\n",
"\n",
"# initialize the index\n",
"db = HnswDocumentIndex[MyDoc](work_dir='movie_search')\n",
"db = HnswDocumentIndex[MyDoc](work_dir=\"movie_search\")\n",
"\n",
"# add data\n",
"db.index(docs)"
@@ -663,14 +662,14 @@
"\n",
"# create a retriever\n",
"retriever = DocArrayRetriever(\n",
" index=db, \n",
" embeddings=embeddings, \n",
" search_field='description_embedding', \n",
" content_field='description'\n",
" index=db,\n",
" embeddings=embeddings,\n",
" search_field=\"description_embedding\",\n",
" content_field=\"description\",\n",
")\n",
"\n",
"# find the relevant document\n",
"doc = retriever.get_relevant_documents('movie about dreams')\n",
"doc = retriever.get_relevant_documents(\"movie about dreams\")\n",
"print(doc)"
]
},
@@ -703,16 +702,16 @@
"\n",
"# create a retriever\n",
"retriever = DocArrayRetriever(\n",
" index=db, \n",
" embeddings=embeddings, \n",
" search_field='description_embedding', \n",
" content_field='description',\n",
" filters={'director': {'$eq': 'Christopher Nolan'}},\n",
" index=db,\n",
" embeddings=embeddings,\n",
" search_field=\"description_embedding\",\n",
" content_field=\"description\",\n",
" filters={\"director\": {\"$eq\": \"Christopher Nolan\"}},\n",
" top_k=2,\n",
")\n",
"\n",
"# find relevant documents\n",
"docs = retriever.get_relevant_documents('space travel')\n",
"docs = retriever.get_relevant_documents(\"space travel\")\n",
"print(docs)"
]
},
@@ -745,17 +744,17 @@
"\n",
"# create a retriever\n",
"retriever = DocArrayRetriever(\n",
" index=db, \n",
" embeddings=embeddings, \n",
" search_field='description_embedding', \n",
" content_field='description',\n",
" filters={'rating': {'$gte': 8.7}},\n",
" search_type='mmr',\n",
" index=db,\n",
" embeddings=embeddings,\n",
" search_field=\"description_embedding\",\n",
" content_field=\"description\",\n",
" filters={\"rating\": {\"$gte\": 8.7}},\n",
" search_type=\"mmr\",\n",
" top_k=3,\n",
")\n",
"\n",
"# find relevant documents\n",
"docs = retriever.get_relevant_documents('action movies')\n",
"docs = retriever.get_relevant_documents(\"action movies\")\n",
"print(docs)"
]
},

View File

@@ -1,6 +1,7 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"id": "fc0db1bc",
"metadata": {},
@@ -25,7 +26,10 @@
"from langchain.vectorstores import Chroma\n",
"from langchain.embeddings import HuggingFaceEmbeddings\n",
"from langchain.embeddings import OpenAIEmbeddings\n",
"from langchain.document_transformers import EmbeddingsRedundantFilter\n",
"from langchain.document_transformers import (\n",
" EmbeddingsRedundantFilter,\n",
" EmbeddingsClusteringFilter,\n",
")\n",
"from langchain.retrievers.document_compressors import DocumentCompressorPipeline\n",
"from langchain.retrievers import ContextualCompressionRetriever\n",
"\n",
@@ -70,6 +74,7 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "c152339d",
"metadata": {},
@@ -92,6 +97,46 @@
" base_compressor=pipeline, base_retriever=lotr\n",
")"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "c10022fa",
"metadata": {},
"source": [
"## Pick a representative sample of documents from the merged retrievers."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b3885482",
"metadata": {},
"outputs": [],
"source": [
"# This filter will divide the documents vectors into clusters or \"centers\" of meaning.\n",
"# Then it will pick the closest document to that center for the final results.\n",
"# By default the result document will be ordered/grouped by clusters.\n",
"filter_ordered_cluster = EmbeddingsClusteringFilter(\n",
" embeddings=filter_embeddings,\n",
" num_clusters=10,\n",
" num_closest=1,\n",
")\n",
"\n",
"# If you want the final document to be ordered by the original retriever scores\n",
"# you need to add the \"sorted\" parameter.\n",
"filter_ordered_by_retriever = EmbeddingsClusteringFilter(\n",
" embeddings=filter_embeddings,\n",
" num_clusters=10,\n",
" num_closest=1,\n",
" sorted=True,\n",
")\n",
"\n",
"pipeline = DocumentCompressorPipeline(transformers=[filter_ordered_by_retriever])\n",
"compression_retriever = ContextualCompressionRetriever(\n",
" base_compressor=pipeline, base_retriever=lotr\n",
")"
]
}
],
"metadata": {

View File

@@ -111,9 +111,7 @@
"\n",
"# Set up Zep Chat History. We'll use this to add chat histories to the memory store\n",
"zep_chat_history = ZepChatMessageHistory(\n",
" session_id=session_id,\n",
" url=ZEP_API_URL,\n",
" api_key=zep_api_key\n",
" session_id=session_id, url=ZEP_API_URL, api_key=zep_api_key\n",
")"
]
},
@@ -247,7 +245,7 @@
" session_id=session_id, # Ensure that you provide the session_id when instantiating the Retriever\n",
" url=ZEP_API_URL,\n",
" top_k=5,\n",
" api_key=zep_api_key\n",
" api_key=zep_api_key,\n",
")\n",
"\n",
"await zep_retriever.aget_relevant_documents(\"Who wrote Parable of the Sower?\")"

View File

@@ -65,7 +65,7 @@
}
],
"source": [
"# Please login and get your API key from https://clarifai.com/settings/security \n",
"# Please login and get your API key from https://clarifai.com/settings/security\n",
"from getpass import getpass\n",
"\n",
"CLARIFAI_PAT = getpass()"
@@ -130,9 +130,9 @@
"metadata": {},
"outputs": [],
"source": [
"USER_ID = 'openai'\n",
"APP_ID = 'embed'\n",
"MODEL_ID = 'text-embedding-ada'\n",
"USER_ID = \"openai\"\n",
"APP_ID = \"embed\"\n",
"MODEL_ID = \"text-embedding-ada\"\n",
"\n",
"# You can provide a specific model version as the model_version_id arg.\n",
"# MODEL_VERSION_ID = \"MODEL_VERSION_ID\""
@@ -148,7 +148,9 @@
"outputs": [],
"source": [
"# Initialize a Clarifai embedding model\n",
"embeddings = ClarifaiEmbeddings(pat=CLARIFAI_PAT, user_id=USER_ID, app_id=APP_ID, model_id=MODEL_ID)"
"embeddings = ClarifaiEmbeddings(\n",
" pat=CLARIFAI_PAT, user_id=USER_ID, app_id=APP_ID, model_id=MODEL_ID\n",
")"
]
},
{

View File

@@ -24,10 +24,7 @@
"metadata": {},
"outputs": [],
"source": [
"\n",
"from langchain.embeddings.spacy_embeddings import SpacyEmbeddings\n",
"\n",
"\n"
"from langchain.embeddings.spacy_embeddings import SpacyEmbeddings"
]
},
{
@@ -44,8 +41,7 @@
"metadata": {},
"outputs": [],
"source": [
"\n",
"embedder = SpacyEmbeddings()\n"
"embedder = SpacyEmbeddings()"
]
},
{
@@ -62,15 +58,12 @@
"metadata": {},
"outputs": [],
"source": [
"\n",
"\n",
"texts = [\n",
" \"The quick brown fox jumps over the lazy dog.\",\n",
" \"Pack my box with five dozen liquor jugs.\",\n",
" \"How vexingly quick daft zebras jump!\",\n",
" \"Bright vixens jump; dozy fowl quack.\"\n",
"]\n",
"\n"
" \"Bright vixens jump; dozy fowl quack.\",\n",
"]"
]
},
{
@@ -87,11 +80,9 @@
"metadata": {},
"outputs": [],
"source": [
"\n",
"embeddings = embedder.embed_documents(texts)\n",
"for i, embedding in enumerate(embeddings):\n",
" print(f\"Embedding for document {i+1}: {embedding}\")\n",
"\n"
" print(f\"Embedding for document {i+1}: {embedding}\")"
]
},
{
@@ -108,7 +99,6 @@
"metadata": {},
"outputs": [],
"source": [
"\n",
"query = \"Quick foxes and lazy dogs.\"\n",
"query_embedding = embedder.embed_query(query)\n",
"print(f\"Embedding for query: {query_embedding}\")"

View File

@@ -62,7 +62,7 @@
"import os\n",
"import getpass\n",
"\n",
"os.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"OpenAI API Key:\")\n"
"os.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"OpenAI API Key:\")"
],
"metadata": {
"collapsed": false,

View File

@@ -40,7 +40,7 @@
"metadata": {},
"outputs": [],
"source": [
"loader = TextLoader('../../../state_of_the_union.txt')\n",
"loader = TextLoader(\"../../../state_of_the_union.txt\")\n",
"documents = loader.load()\n",
"text_splitter = CharacterTextSplitter(chunk_size=100, chunk_overlap=0)\n",
"docs = text_splitter.split_documents(documents)"
@@ -154,10 +154,11 @@
"outputs": [],
"source": [
"awadb_client = awadb.Client()\n",
"ret = awadb_client.Load('langchain_awadb')\n",
"if ret : print('awadb load table success')\n",
"ret = awadb_client.Load(\"langchain_awadb\")\n",
"if ret:\n",
" print(\"awadb load table success\")\n",
"else:\n",
" print('awadb load table failed')"
" print(\"awadb load table failed\")"
]
},
{

View File

@@ -44,16 +44,18 @@
"import os\n",
"import getpass\n",
"\n",
"database_mode = (input('\\n(C)assandra or (A)stra DB? ')).upper()\n",
"database_mode = (input(\"\\n(C)assandra or (A)stra DB? \")).upper()\n",
"\n",
"keyspace_name = input('\\nKeyspace name? ')\n",
"keyspace_name = input(\"\\nKeyspace name? \")\n",
"\n",
"if database_mode == 'A':\n",
"if database_mode == \"A\":\n",
" ASTRA_DB_APPLICATION_TOKEN = getpass.getpass('\\nAstra DB Token (\"AstraCS:...\") ')\n",
" #\n",
" ASTRA_DB_SECURE_BUNDLE_PATH = input('Full path to your Secure Connect Bundle? ')\n",
"elif database_mode == 'C':\n",
" CASSANDRA_CONTACT_POINTS = input('Contact points? (comma-separated, empty for localhost) ').strip()"
" ASTRA_DB_SECURE_BUNDLE_PATH = input(\"Full path to your Secure Connect Bundle? \")\n",
"elif database_mode == \"C\":\n",
" CASSANDRA_CONTACT_POINTS = input(\n",
" \"Contact points? (comma-separated, empty for localhost) \"\n",
" ).strip()"
]
},
{
@@ -74,17 +76,15 @@
"from cassandra.cluster import Cluster\n",
"from cassandra.auth import PlainTextAuthProvider\n",
"\n",
"if database_mode == 'C':\n",
"if database_mode == \"C\":\n",
" if CASSANDRA_CONTACT_POINTS:\n",
" cluster = Cluster([\n",
" cp.strip()\n",
" for cp in CASSANDRA_CONTACT_POINTS.split(',')\n",
" if cp.strip()\n",
" ])\n",
" cluster = Cluster(\n",
" [cp.strip() for cp in CASSANDRA_CONTACT_POINTS.split(\",\") if cp.strip()]\n",
" )\n",
" else:\n",
" cluster = Cluster()\n",
" session = cluster.connect()\n",
"elif database_mode == 'A':\n",
"elif database_mode == \"A\":\n",
" ASTRA_DB_CLIENT_ID = \"token\"\n",
" cluster = Cluster(\n",
" cloud={\n",
@@ -117,7 +117,7 @@
"metadata": {},
"outputs": [],
"source": [
"os.environ['OPENAI_API_KEY'] = getpass.getpass('OpenAI API Key:')"
"os.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"OpenAI API Key:\")"
]
},
{
@@ -151,7 +151,8 @@
"outputs": [],
"source": [
"from langchain.document_loaders import TextLoader\n",
"loader = TextLoader('../../../state_of_the_union.txt')\n",
"\n",
"loader = TextLoader(\"../../../state_of_the_union.txt\")\n",
"documents = loader.load()\n",
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
"docs = text_splitter.split_documents(documents)\n",
@@ -166,7 +167,7 @@
"metadata": {},
"outputs": [],
"source": [
"table_name = 'my_vector_db_table'\n",
"table_name = \"my_vector_db_table\"\n",
"\n",
"docsearch = Cassandra.from_documents(\n",
" documents=docs,\n",

View File

@@ -146,11 +146,11 @@
"# save to disk\n",
"db2 = Chroma.from_documents(docs, embedding_function, persist_directory=\"./chroma_db\")\n",
"db2.persist()\n",
"docs = db.similarity_search(query)\n",
"docs = db2.similarity_search(query)\n",
"\n",
"# load from disk\n",
"db3 = Chroma(persist_directory=\"./chroma_db\")\n",
"docs = db.similarity_search(query)\n",
"db3 = Chroma(persist_directory=\"./chroma_db\", embedding_function=embedding_function)\n",
"docs = db3.similarity_search(query)\n",
"print(docs[0].page_content)"
]
},
@@ -205,19 +205,25 @@
"import chromadb\n",
"import uuid\n",
"from chromadb.config import Settings\n",
"client = chromadb.Client(Settings(chroma_api_impl=\"rest\",\n",
" chroma_server_host=\"localhost\",\n",
" chroma_server_http_port=\"8000\"\n",
" ))\n",
"client.reset() # resets the database\n",
"\n",
"client = chromadb.Client(\n",
" Settings(\n",
" chroma_api_impl=\"rest\",\n",
" chroma_server_host=\"localhost\",\n",
" chroma_server_http_port=\"8000\",\n",
" )\n",
")\n",
"client.reset() # resets the database\n",
"collection = client.create_collection(\"my_collection\")\n",
"for doc in docs:\n",
" collection.add(ids=[str(uuid.uuid1())], metadatas=doc.metadata, documents=doc.page_content)\n",
" collection.add(\n",
" ids=[str(uuid.uuid1())], metadatas=doc.metadata, documents=doc.page_content\n",
" )\n",
"\n",
"# tell LangChain to use our client and collection name\n",
"db4 = Chroma(client=client, collection_name=\"my_collection\")\n",
"docs = db.similarity_search(query)\n",
"print(docs[0].page_content)\n"
"print(docs[0].page_content)"
]
},
{
@@ -262,7 +268,7 @@
],
"source": [
"# create simple ids\n",
"ids = [str(i) for i in range(1, len(docs)+1)]\n",
"ids = [str(i) for i in range(1, len(docs) + 1)]\n",
"\n",
"# add data\n",
"example_db = Chroma.from_documents(docs, embedding_function, ids=ids)\n",
@@ -270,14 +276,17 @@
"print(docs[0].metadata)\n",
"\n",
"# update the metadata for a document\n",
"docs[0].metadata = {'source': '../../../state_of_the_union.txt', 'new_value': 'hello world'}\n",
"docs[0].metadata = {\n",
" \"source\": \"../../../state_of_the_union.txt\",\n",
" \"new_value\": \"hello world\",\n",
"}\n",
"example_db.update_document(ids[0], docs[0])\n",
"print(example_db._collection.get(ids=[ids[0]]))\n",
"\n",
"# delete the last document\n",
"print(\"count before\", example_db._collection.count())\n",
"example_db._collection.delete(ids=[ids[-1]])\n",
"print(\"count after\", example_db._collection.count())\n"
"print(\"count after\", example_db._collection.count())"
]
},
{
@@ -317,6 +326,7 @@
"outputs": [],
"source": [
"import os\n",
"\n",
"os.environ[\"OPENAI_API_KEY\"] = OPENAI_API_KEY"
]
},

View File

@@ -55,7 +55,7 @@
"metadata": {},
"outputs": [],
"source": [
"# Please login and get your API key from https://clarifai.com/settings/security \n",
"# Please login and get your API key from https://clarifai.com/settings/security\n",
"from getpass import getpass\n",
"\n",
"CLARIFAI_PAT = getpass()"
@@ -104,8 +104,8 @@
"metadata": {},
"outputs": [],
"source": [
"USER_ID = 'USERNAME_ID'\n",
"APP_ID = 'APPLICATION_ID'\n",
"USER_ID = \"USERNAME_ID\"\n",
"APP_ID = \"APPLICATION_ID\"\n",
"NUMBER_OF_DOCS = 4"
]
},
@@ -126,8 +126,13 @@
"metadata": {},
"outputs": [],
"source": [
"texts = [\"I really enjoy spending time with you\", \"I hate spending time with my dog\", \"I want to go for a run\", \\\n",
" \"I went to the movies yesterday\", \"I love playing soccer with my friends\"]\n",
"texts = [\n",
" \"I really enjoy spending time with you\",\n",
" \"I hate spending time with my dog\",\n",
" \"I want to go for a run\",\n",
" \"I went to the movies yesterday\",\n",
" \"I love playing soccer with my friends\",\n",
"]\n",
"\n",
"metadatas = [{\"id\": i, \"text\": text} for i, text in enumerate(texts)]"
]
@@ -139,7 +144,14 @@
"metadata": {},
"outputs": [],
"source": [
"clarifai_vector_db = Clarifai.from_texts(user_id=USER_ID, app_id=APP_ID, texts=texts, pat=CLARIFAI_PAT, number_of_docs=NUMBER_OF_DOCS, metadatas = metadatas)"
"clarifai_vector_db = Clarifai.from_texts(\n",
" user_id=USER_ID,\n",
" app_id=APP_ID,\n",
" texts=texts,\n",
" pat=CLARIFAI_PAT,\n",
" number_of_docs=NUMBER_OF_DOCS,\n",
" metadatas=metadatas,\n",
")"
]
},
{
@@ -184,7 +196,7 @@
"metadata": {},
"outputs": [],
"source": [
"loader = TextLoader('../../../state_of_the_union.txt')\n",
"loader = TextLoader(\"../../../state_of_the_union.txt\")\n",
"documents = loader.load()\n",
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
"docs = text_splitter.split_documents(documents)"
@@ -221,8 +233,8 @@
"metadata": {},
"outputs": [],
"source": [
"USER_ID = 'USERNAME_ID'\n",
"APP_ID = 'APPLICATION_ID'\n",
"USER_ID = \"USERNAME_ID\"\n",
"APP_ID = \"APPLICATION_ID\"\n",
"NUMBER_OF_DOCS = 4"
]
},
@@ -233,7 +245,13 @@
"metadata": {},
"outputs": [],
"source": [
"clarifai_vector_db = Clarifai.from_documents(user_id=USER_ID, app_id=APP_ID, documents=docs, pat=CLARIFAI_PAT_KEY, number_of_docs=NUMBER_OF_DOCS)"
"clarifai_vector_db = Clarifai.from_documents(\n",
" user_id=USER_ID,\n",
" app_id=APP_ID,\n",
" documents=docs,\n",
" pat=CLARIFAI_PAT_KEY,\n",
" number_of_docs=NUMBER_OF_DOCS,\n",
")"
]
},
{

View File

@@ -84,8 +84,8 @@
"import os\n",
"import getpass\n",
"\n",
"if not os.environ['OPENAI_API_KEY']:\n",
" os.environ['OPENAI_API_KEY'] = getpass.getpass('OpenAI API Key:')"
"if not os.environ[\"OPENAI_API_KEY\"]:\n",
" os.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"OpenAI API Key:\")"
]
},
{
@@ -120,7 +120,8 @@
"outputs": [],
"source": [
"from langchain.document_loaders import TextLoader\n",
"loader = TextLoader('../../../state_of_the_union.txt')\n",
"\n",
"loader = TextLoader(\"../../../state_of_the_union.txt\")\n",
"documents = loader.load()\n",
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
"docs = text_splitter.split_documents(documents)\n",
@@ -149,7 +150,7 @@
],
"source": [
"for d in docs:\n",
" d.metadata = {'some': 'metadata'}\n",
" d.metadata = {\"some\": \"metadata\"}\n",
"settings = ClickhouseSettings(table=\"clickhouse_vector_search_example\")\n",
"docsearch = Clickhouse.from_documents(docs, embeddings, config=settings)\n",
"\n",
@@ -308,7 +309,7 @@
"from langchain.vectorstores import Clickhouse, ClickhouseSettings\n",
"from langchain.document_loaders import TextLoader\n",
"\n",
"loader = TextLoader('../../../state_of_the_union.txt')\n",
"loader = TextLoader(\"../../../state_of_the_union.txt\")\n",
"documents = loader.load()\n",
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
"docs = text_splitter.split_documents(documents)\n",
@@ -316,7 +317,7 @@
"embeddings = OpenAIEmbeddings()\n",
"\n",
"for i, d in enumerate(docs):\n",
" d.metadata = {'doc_id': i}\n",
" d.metadata = {\"doc_id\": i}\n",
"\n",
"docsearch = Clickhouse.from_documents(docs, embeddings)"
]
@@ -345,10 +346,13 @@
],
"source": [
"meta = docsearch.metadata_column\n",
"output = docsearch.similarity_search_with_relevance_scores('What did the president say about Ketanji Brown Jackson?', \n",
" k=4, where_str=f\"{meta}.doc_id<10\")\n",
"output = docsearch.similarity_search_with_relevance_scores(\n",
" \"What did the president say about Ketanji Brown Jackson?\",\n",
" k=4,\n",
" where_str=f\"{meta}.doc_id<10\",\n",
")\n",
"for d, dist in output:\n",
" print(dist, d.metadata, d.page_content[:20] + '...')"
" print(dist, d.metadata, d.page_content[:20] + \"...\")"
]
},
{

View File

@@ -1,14 +1,15 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# Deep Lake\n",
"# Activeloop's Deep Lake\n",
"\n",
">[Deep Lake](https://docs.activeloop.ai/) as a Multi-Modal Vector Store that stores embeddings and their metadata including text, jsons, images, audio, video, and more. It saves the data locally, in your cloud, or on Activeloop storage. It performs hybrid search including embeddings and their attributes.\n",
">[Activeloop's Deep Lake](https://docs.activeloop.ai/) as a Multi-Modal Vector Store that stores embeddings and their metadata including text, jsons, images, audio, video, and more. It saves the data locally, in your cloud, or on Activeloop storage. It performs hybrid search including embeddings and their attributes.\n",
"\n",
"This notebook showcases basic functionality related to `Deep Lake`. While `Deep Lake` can store embeddings, it is capable of storing any type of data. It is a fully fledged serverless data lake with version control, query engine and streaming dataloader to deep learning frameworks. \n",
"This notebook showcases basic functionality related to `Activeloop's Deep Lake`. While `Deep Lake` can store embeddings, it is capable of storing any type of data. It is a serverless data lake with version control, query engine and streaming dataloaders to deep learning frameworks. \n",
"\n",
"For more information, please see the Deep Lake [documentation](https://docs.activeloop.ai) or [api reference](https://docs.deeplake.ai)"
]
@@ -16,12 +17,10 @@
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": []
},
"metadata": {},
"outputs": [],
"source": [
"!pip install openai deeplake tiktoken"
"!pip install openai 'deeplake[enterprise]' tiktoken"
]
},
{
@@ -61,7 +60,7 @@
"source": [
"from langchain.document_loaders import TextLoader\n",
"\n",
"loader = TextLoader(\"docs/modules/state_of_the_union.txt\")\n",
"loader = TextLoader(\"../../../state_of_the_union.txt\")\n",
"documents = loader.load()\n",
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
"docs = text_splitter.split_documents(documents)\n",
@@ -70,6 +69,7 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
@@ -78,31 +78,9 @@
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": []
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Dataset(path='./my_deeplake/', tensors=['embedding', 'id', 'metadata', 'text'])\n",
"\n",
" tensor htype shape dtype compression\n",
" ------- ------- ------- ------- ------- \n",
" embedding embedding (42, 1536) float32 None \n",
" id text (42, 1) str None \n",
" metadata json (42, 1) str None \n",
" text text (42, 1) str None \n"
]
}
],
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"db = DeepLake(\n",
" dataset_path=\"./my_deeplake/\", embedding_function=embeddings, overwrite=True\n",
@@ -116,30 +94,15 @@
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while youre at it, pass the Disclose Act so Americans can know who is funding our elections. \n",
"\n",
"Tonight, Id like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n",
"\n",
"One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n",
"\n",
"And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nations top legal minds, who will continue Justice Breyers legacy of excellence.\n"
]
}
],
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(docs[0].page_content)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
@@ -148,19 +111,9 @@
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Deep Lake Dataset in ./my_deeplake/ already exists, loading from the storage\n"
]
}
],
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"db = DeepLake(\n",
" dataset_path=\"./my_deeplake/\", embedding_function=embeddings, read_only=True\n",
@@ -169,6 +122,7 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
@@ -176,6 +130,7 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
@@ -184,20 +139,9 @@
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/Users/adilkhansarsen/Documents/work/LangChain/langchain/langchain/llms/openai.py:751: UserWarning: You are trying to use a chat model. This way of initializing it is no longer supported. Instead, please use: `from langchain.chat_models import ChatOpenAI`\n",
" warnings.warn(\n"
]
}
],
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain.chains import RetrievalQA\n",
"from langchain.llms import OpenAIChat\n",
@@ -211,28 +155,16 @@
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/plain": [
"'The President nominated Ketanji Brown Jackson to serve on the United States Supreme Court and spoke highly of her legal expertise and reputation as a consensus builder.'"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
"qa.run(query)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
@@ -240,35 +172,18 @@
]
},
{
"cell_type": "code",
"execution_count": 9,
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": []
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Dataset(path='./my_deeplake/', tensors=['embedding', 'id', 'metadata', 'text'])\n",
"\n",
" tensor htype shape dtype compression\n",
" ------- ------- ------- ------- ------- \n",
" embedding embedding (4, 1536) float32 None \n",
" id text (4, 1) str None \n",
" metadata json (4, 1) str None \n",
" text text (4, 1) str None \n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": []
}
],
"source": [
"Let's create another vector store containing metadata with the year the documents were created."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import random\n",
"\n",
@@ -282,29 +197,9 @@
},
{
"cell_type": "code",
"execution_count": 10,
"execution_count": null,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"100%|██████████| 4/4 [00:00<00:00, 3300.00it/s]\n"
]
},
{
"data": {
"text/plain": [
"[Document(lc_kwargs={'page_content': 'Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while youre at it, pass the Disclose Act so Americans can know who is funding our elections. \\n\\nTonight, Id like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \\n\\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \\n\\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nations top legal minds, who will continue Justice Breyers legacy of excellence.', 'metadata': {'source': 'docs/modules/state_of_the_union.txt', 'year': 2013}}, page_content='Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while youre at it, pass the Disclose Act so Americans can know who is funding our elections. \\n\\nTonight, Id like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \\n\\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \\n\\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nations top legal minds, who will continue Justice Breyers legacy of excellence.', metadata={'source': 'docs/modules/state_of_the_union.txt', 'year': 2013}),\n",
" Document(lc_kwargs={'page_content': 'A former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus builder. Since shes been nominated, shes received a broad range of support—from the Fraternal Order of Police to former judges appointed by Democrats and Republicans. \\n\\nAnd if we are to advance liberty and justice, we need to secure the Border and fix the immigration system. \\n\\nWe can do both. At our border, weve installed new technology like cutting-edge scanners to better detect drug smuggling. \\n\\nWeve set up joint patrols with Mexico and Guatemala to catch more human traffickers. \\n\\nWere putting in place dedicated immigration judges so families fleeing persecution and violence can have their cases heard faster. \\n\\nWere securing commitments and supporting partners in South and Central America to host more refugees and secure their own borders.', 'metadata': {'source': 'docs/modules/state_of_the_union.txt', 'year': 2013}}, page_content='A former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus builder. Since shes been nominated, shes received a broad range of support—from the Fraternal Order of Police to former judges appointed by Democrats and Republicans. \\n\\nAnd if we are to advance liberty and justice, we need to secure the Border and fix the immigration system. \\n\\nWe can do both. At our border, weve installed new technology like cutting-edge scanners to better detect drug smuggling. \\n\\nWeve set up joint patrols with Mexico and Guatemala to catch more human traffickers. \\n\\nWere putting in place dedicated immigration judges so families fleeing persecution and violence can have their cases heard faster. \\n\\nWere securing commitments and supporting partners in South and Central America to host more refugees and secure their own borders.', metadata={'source': 'docs/modules/state_of_the_union.txt', 'year': 2013}),\n",
" Document(lc_kwargs={'page_content': 'Tonight, Im announcing a crackdown on these companies overcharging American businesses and consumers. \\n\\nAnd as Wall Street firms take over more nursing homes, quality in those homes has gone down and costs have gone up. \\n\\nThat ends on my watch. \\n\\nMedicare is going to set higher standards for nursing homes and make sure your loved ones get the care they deserve and expect. \\n\\nWell also cut costs and keep the economy going strong by giving workers a fair shot, provide more training and apprenticeships, hire them based on their skills not degrees. \\n\\nLets pass the Paycheck Fairness Act and paid leave. \\n\\nRaise the minimum wage to $15 an hour and extend the Child Tax Credit, so no one has to raise a family in poverty. \\n\\nLets increase Pell Grants and increase our historic support of HBCUs, and invest in what Jill—our First Lady who teaches full-time—calls Americas best-kept secret: community colleges.', 'metadata': {'source': 'docs/modules/state_of_the_union.txt', 'year': 2013}}, page_content='Tonight, Im announcing a crackdown on these companies overcharging American businesses and consumers. \\n\\nAnd as Wall Street firms take over more nursing homes, quality in those homes has gone down and costs have gone up. \\n\\nThat ends on my watch. \\n\\nMedicare is going to set higher standards for nursing homes and make sure your loved ones get the care they deserve and expect. \\n\\nWell also cut costs and keep the economy going strong by giving workers a fair shot, provide more training and apprenticeships, hire them based on their skills not degrees. \\n\\nLets pass the Paycheck Fairness Act and paid leave. \\n\\nRaise the minimum wage to $15 an hour and extend the Child Tax Credit, so no one has to raise a family in poverty. \\n\\nLets increase Pell Grants and increase our historic support of HBCUs, and invest in what Jill—our First Lady who teaches full-time—calls Americas best-kept secret: community colleges.', metadata={'source': 'docs/modules/state_of_the_union.txt', 'year': 2013})]"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"outputs": [],
"source": [
"db.similarity_search(\n",
" \"What did the president say about Ketanji Brown Jackson\",\n",
@@ -313,6 +208,7 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
@@ -322,23 +218,9 @@
},
{
"cell_type": "code",
"execution_count": 11,
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[Document(lc_kwargs={'page_content': 'Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while youre at it, pass the Disclose Act so Americans can know who is funding our elections. \\n\\nTonight, Id like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \\n\\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \\n\\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nations top legal minds, who will continue Justice Breyers legacy of excellence.', 'metadata': {'source': 'docs/modules/state_of_the_union.txt', 'year': 2013}}, page_content='Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while youre at it, pass the Disclose Act so Americans can know who is funding our elections. \\n\\nTonight, Id like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \\n\\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \\n\\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nations top legal minds, who will continue Justice Breyers legacy of excellence.', metadata={'source': 'docs/modules/state_of_the_union.txt', 'year': 2013}),\n",
" Document(lc_kwargs={'page_content': 'A former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus builder. Since shes been nominated, shes received a broad range of support—from the Fraternal Order of Police to former judges appointed by Democrats and Republicans. \\n\\nAnd if we are to advance liberty and justice, we need to secure the Border and fix the immigration system. \\n\\nWe can do both. At our border, weve installed new technology like cutting-edge scanners to better detect drug smuggling. \\n\\nWeve set up joint patrols with Mexico and Guatemala to catch more human traffickers. \\n\\nWere putting in place dedicated immigration judges so families fleeing persecution and violence can have their cases heard faster. \\n\\nWere securing commitments and supporting partners in South and Central America to host more refugees and secure their own borders.', 'metadata': {'source': 'docs/modules/state_of_the_union.txt', 'year': 2013}}, page_content='A former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus builder. Since shes been nominated, shes received a broad range of support—from the Fraternal Order of Police to former judges appointed by Democrats and Republicans. \\n\\nAnd if we are to advance liberty and justice, we need to secure the Border and fix the immigration system. \\n\\nWe can do both. At our border, weve installed new technology like cutting-edge scanners to better detect drug smuggling. \\n\\nWeve set up joint patrols with Mexico and Guatemala to catch more human traffickers. \\n\\nWere putting in place dedicated immigration judges so families fleeing persecution and violence can have their cases heard faster. \\n\\nWere securing commitments and supporting partners in South and Central America to host more refugees and secure their own borders.', metadata={'source': 'docs/modules/state_of_the_union.txt', 'year': 2013}),\n",
" Document(lc_kwargs={'page_content': 'Tonight, Im announcing a crackdown on these companies overcharging American businesses and consumers. \\n\\nAnd as Wall Street firms take over more nursing homes, quality in those homes has gone down and costs have gone up. \\n\\nThat ends on my watch. \\n\\nMedicare is going to set higher standards for nursing homes and make sure your loved ones get the care they deserve and expect. \\n\\nWell also cut costs and keep the economy going strong by giving workers a fair shot, provide more training and apprenticeships, hire them based on their skills not degrees. \\n\\nLets pass the Paycheck Fairness Act and paid leave. \\n\\nRaise the minimum wage to $15 an hour and extend the Child Tax Credit, so no one has to raise a family in poverty. \\n\\nLets increase Pell Grants and increase our historic support of HBCUs, and invest in what Jill—our First Lady who teaches full-time—calls Americas best-kept secret: community colleges.', 'metadata': {'source': 'docs/modules/state_of_the_union.txt', 'year': 2013}}, page_content='Tonight, Im announcing a crackdown on these companies overcharging American businesses and consumers. \\n\\nAnd as Wall Street firms take over more nursing homes, quality in those homes has gone down and costs have gone up. \\n\\nThat ends on my watch. \\n\\nMedicare is going to set higher standards for nursing homes and make sure your loved ones get the care they deserve and expect. \\n\\nWell also cut costs and keep the economy going strong by giving workers a fair shot, provide more training and apprenticeships, hire them based on their skills not degrees. \\n\\nLets pass the Paycheck Fairness Act and paid leave. \\n\\nRaise the minimum wage to $15 an hour and extend the Child Tax Credit, so no one has to raise a family in poverty. \\n\\nLets increase Pell Grants and increase our historic support of HBCUs, and invest in what Jill—our First Lady who teaches full-time—calls Americas best-kept secret: community colleges.', metadata={'source': 'docs/modules/state_of_the_union.txt', 'year': 2013}),\n",
" Document(lc_kwargs={'page_content': 'And for our LGBTQ+ Americans, lets finally get the bipartisan Equality Act to my desk. The onslaught of state laws targeting transgender Americans and their families is wrong. \\n\\nAs I said last year, especially to our younger transgender Americans, I will always have your back as your President, so you can be yourself and reach your God-given potential. \\n\\nWhile it often appears that we never agree, that isnt true. I signed 80 bipartisan bills into law last year. From preventing government shutdowns to protecting Asian-Americans from still-too-common hate crimes to reforming military justice. \\n\\nAnd soon, well strengthen the Violence Against Women Act that I first wrote three decades ago. It is important for us to show the nation that we can come together and do big things. \\n\\nSo tonight Im offering a Unity Agenda for the Nation. Four big things we can do together. \\n\\nFirst, beat the opioid epidemic.', 'metadata': {'source': 'docs/modules/state_of_the_union.txt', 'year': 2012}}, page_content='And for our LGBTQ+ Americans, lets finally get the bipartisan Equality Act to my desk. The onslaught of state laws targeting transgender Americans and their families is wrong. \\n\\nAs I said last year, especially to our younger transgender Americans, I will always have your back as your President, so you can be yourself and reach your God-given potential. \\n\\nWhile it often appears that we never agree, that isnt true. I signed 80 bipartisan bills into law last year. From preventing government shutdowns to protecting Asian-Americans from still-too-common hate crimes to reforming military justice. \\n\\nAnd soon, well strengthen the Violence Against Women Act that I first wrote three decades ago. It is important for us to show the nation that we can come together and do big things. \\n\\nSo tonight Im offering a Unity Agenda for the Nation. Four big things we can do together. \\n\\nFirst, beat the opioid epidemic.', metadata={'source': 'docs/modules/state_of_the_union.txt', 'year': 2012})]"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"outputs": [],
"source": [
"db.similarity_search(\n",
" \"What did the president say about Ketanji Brown Jackson?\", distance_metric=\"cos\"\n",
@@ -346,6 +228,7 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
@@ -355,23 +238,9 @@
},
{
"cell_type": "code",
"execution_count": 12,
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[Document(lc_kwargs={'page_content': 'Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while youre at it, pass the Disclose Act so Americans can know who is funding our elections. \\n\\nTonight, Id like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \\n\\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \\n\\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nations top legal minds, who will continue Justice Breyers legacy of excellence.', 'metadata': {'source': 'docs/modules/state_of_the_union.txt', 'year': 2013}}, page_content='Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while youre at it, pass the Disclose Act so Americans can know who is funding our elections. \\n\\nTonight, Id like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \\n\\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \\n\\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nations top legal minds, who will continue Justice Breyers legacy of excellence.', metadata={'source': 'docs/modules/state_of_the_union.txt', 'year': 2013}),\n",
" Document(lc_kwargs={'page_content': 'Tonight, Im announcing a crackdown on these companies overcharging American businesses and consumers. \\n\\nAnd as Wall Street firms take over more nursing homes, quality in those homes has gone down and costs have gone up. \\n\\nThat ends on my watch. \\n\\nMedicare is going to set higher standards for nursing homes and make sure your loved ones get the care they deserve and expect. \\n\\nWell also cut costs and keep the economy going strong by giving workers a fair shot, provide more training and apprenticeships, hire them based on their skills not degrees. \\n\\nLets pass the Paycheck Fairness Act and paid leave. \\n\\nRaise the minimum wage to $15 an hour and extend the Child Tax Credit, so no one has to raise a family in poverty. \\n\\nLets increase Pell Grants and increase our historic support of HBCUs, and invest in what Jill—our First Lady who teaches full-time—calls Americas best-kept secret: community colleges.', 'metadata': {'source': 'docs/modules/state_of_the_union.txt', 'year': 2013}}, page_content='Tonight, Im announcing a crackdown on these companies overcharging American businesses and consumers. \\n\\nAnd as Wall Street firms take over more nursing homes, quality in those homes has gone down and costs have gone up. \\n\\nThat ends on my watch. \\n\\nMedicare is going to set higher standards for nursing homes and make sure your loved ones get the care they deserve and expect. \\n\\nWell also cut costs and keep the economy going strong by giving workers a fair shot, provide more training and apprenticeships, hire them based on their skills not degrees. \\n\\nLets pass the Paycheck Fairness Act and paid leave. \\n\\nRaise the minimum wage to $15 an hour and extend the Child Tax Credit, so no one has to raise a family in poverty. \\n\\nLets increase Pell Grants and increase our historic support of HBCUs, and invest in what Jill—our First Lady who teaches full-time—calls Americas best-kept secret: community colleges.', metadata={'source': 'docs/modules/state_of_the_union.txt', 'year': 2013}),\n",
" Document(lc_kwargs={'page_content': 'A former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus builder. Since shes been nominated, shes received a broad range of support—from the Fraternal Order of Police to former judges appointed by Democrats and Republicans. \\n\\nAnd if we are to advance liberty and justice, we need to secure the Border and fix the immigration system. \\n\\nWe can do both. At our border, weve installed new technology like cutting-edge scanners to better detect drug smuggling. \\n\\nWeve set up joint patrols with Mexico and Guatemala to catch more human traffickers. \\n\\nWere putting in place dedicated immigration judges so families fleeing persecution and violence can have their cases heard faster. \\n\\nWere securing commitments and supporting partners in South and Central America to host more refugees and secure their own borders.', 'metadata': {'source': 'docs/modules/state_of_the_union.txt', 'year': 2013}}, page_content='A former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus builder. Since shes been nominated, shes received a broad range of support—from the Fraternal Order of Police to former judges appointed by Democrats and Republicans. \\n\\nAnd if we are to advance liberty and justice, we need to secure the Border and fix the immigration system. \\n\\nWe can do both. At our border, weve installed new technology like cutting-edge scanners to better detect drug smuggling. \\n\\nWeve set up joint patrols with Mexico and Guatemala to catch more human traffickers. \\n\\nWere putting in place dedicated immigration judges so families fleeing persecution and violence can have their cases heard faster. \\n\\nWere securing commitments and supporting partners in South and Central America to host more refugees and secure their own borders.', metadata={'source': 'docs/modules/state_of_the_union.txt', 'year': 2013}),\n",
" Document(lc_kwargs={'page_content': 'And for our LGBTQ+ Americans, lets finally get the bipartisan Equality Act to my desk. The onslaught of state laws targeting transgender Americans and their families is wrong. \\n\\nAs I said last year, especially to our younger transgender Americans, I will always have your back as your President, so you can be yourself and reach your God-given potential. \\n\\nWhile it often appears that we never agree, that isnt true. I signed 80 bipartisan bills into law last year. From preventing government shutdowns to protecting Asian-Americans from still-too-common hate crimes to reforming military justice. \\n\\nAnd soon, well strengthen the Violence Against Women Act that I first wrote three decades ago. It is important for us to show the nation that we can come together and do big things. \\n\\nSo tonight Im offering a Unity Agenda for the Nation. Four big things we can do together. \\n\\nFirst, beat the opioid epidemic.', 'metadata': {'source': 'docs/modules/state_of_the_union.txt', 'year': 2012}}, page_content='And for our LGBTQ+ Americans, lets finally get the bipartisan Equality Act to my desk. The onslaught of state laws targeting transgender Americans and their families is wrong. \\n\\nAs I said last year, especially to our younger transgender Americans, I will always have your back as your President, so you can be yourself and reach your God-given potential. \\n\\nWhile it often appears that we never agree, that isnt true. I signed 80 bipartisan bills into law last year. From preventing government shutdowns to protecting Asian-Americans from still-too-common hate crimes to reforming military justice. \\n\\nAnd soon, well strengthen the Violence Against Women Act that I first wrote three decades ago. It is important for us to show the nation that we can come together and do big things. \\n\\nSo tonight Im offering a Unity Agenda for the Nation. Four big things we can do together. \\n\\nFirst, beat the opioid epidemic.', metadata={'source': 'docs/modules/state_of_the_union.txt', 'year': 2012})]"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"outputs": [],
"source": [
"db.max_marginal_relevance_search(\n",
" \"What did the president say about Ketanji Brown Jackson?\"\n",
@@ -379,6 +248,7 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
@@ -401,6 +271,7 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
@@ -423,11 +294,12 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Deep Lake datasets on cloud (Activeloop, AWS, GCS, etc.) or in memory\n",
"By default deep lake datasets are stored locally, in case you want to store them in memory, in the Deep Lake Managed DB, or in any object storage, you can provide the [corresponding path to the dataset](https://docs.activeloop.ai/storage-and-credentials/storage-options). You can retrieve your user token from [app.activeloop.ai](https://app.activeloop.ai/)"
"By default, Deep Lake datasets are stored locally. To store them in memory, in the Deep Lake Managed DB, or in any object storage, you can provide the [corresponding path and credentials when creating the vector store](https://docs.activeloop.ai/storage-and-credentials/storage-options). Some paths require registration with Activeloop and creation of an API token that can be [retrieved here](https://app.activeloop.ai/)"
]
},
{
@@ -439,106 +311,11 @@
"os.environ[\"ACTIVELOOP_TOKEN\"] = activeloop_token"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Deeplake now supports running the inference in 3 modes. `python` naive way of searching inside of the data, `tensor_db` which is managed database, it runs tql on a remote optimized engine and sends results back, and `compute_engine` which is C++ implementation of search that runs locally."
]
},
{
"cell_type": "code",
"execution_count": 16,
"execution_count": null,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Your Deep Lake dataset has been successfully created!\n",
"The dataset is private so make sure you are logged in!\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"-"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Dataset(path='hub://adilkhan/langchain_testing_python', tensors=['embedding', 'id', 'metadata', 'text'])\n",
"\n",
" tensor htype shape dtype compression\n",
" ------- ------- ------- ------- ------- \n",
" embedding embedding (42, 1536) float32 None \n",
" id text (42, 1) str None \n",
" metadata json (42, 1) str None \n",
" text text (42, 1) str None \n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
" \r"
]
},
{
"data": {
"text/plain": [
"['d604b1ac-093c-11ee-bdba-76d8a30504e0',\n",
" 'd604b238-093c-11ee-bdba-76d8a30504e0',\n",
" 'd604b260-093c-11ee-bdba-76d8a30504e0',\n",
" 'd604b27e-093c-11ee-bdba-76d8a30504e0',\n",
" 'd604b29c-093c-11ee-bdba-76d8a30504e0',\n",
" 'd604b2ba-093c-11ee-bdba-76d8a30504e0',\n",
" 'd604b2d8-093c-11ee-bdba-76d8a30504e0',\n",
" 'd604b2f6-093c-11ee-bdba-76d8a30504e0',\n",
" 'd604b314-093c-11ee-bdba-76d8a30504e0',\n",
" 'd604b332-093c-11ee-bdba-76d8a30504e0',\n",
" 'd604b350-093c-11ee-bdba-76d8a30504e0',\n",
" 'd604b36e-093c-11ee-bdba-76d8a30504e0',\n",
" 'd604b38c-093c-11ee-bdba-76d8a30504e0',\n",
" 'd604b3a0-093c-11ee-bdba-76d8a30504e0',\n",
" 'd604b3be-093c-11ee-bdba-76d8a30504e0',\n",
" 'd604b3dc-093c-11ee-bdba-76d8a30504e0',\n",
" 'd604b3fa-093c-11ee-bdba-76d8a30504e0',\n",
" 'd604b418-093c-11ee-bdba-76d8a30504e0',\n",
" 'd604b436-093c-11ee-bdba-76d8a30504e0',\n",
" 'd604b454-093c-11ee-bdba-76d8a30504e0',\n",
" 'd604b472-093c-11ee-bdba-76d8a30504e0',\n",
" 'd604b490-093c-11ee-bdba-76d8a30504e0',\n",
" 'd604b4a4-093c-11ee-bdba-76d8a30504e0',\n",
" 'd604b4c2-093c-11ee-bdba-76d8a30504e0',\n",
" 'd604b4e0-093c-11ee-bdba-76d8a30504e0',\n",
" 'd604b4fe-093c-11ee-bdba-76d8a30504e0',\n",
" 'd604b51c-093c-11ee-bdba-76d8a30504e0',\n",
" 'd604b53a-093c-11ee-bdba-76d8a30504e0',\n",
" 'd604b558-093c-11ee-bdba-76d8a30504e0',\n",
" 'd604b576-093c-11ee-bdba-76d8a30504e0',\n",
" 'd604b594-093c-11ee-bdba-76d8a30504e0',\n",
" 'd604b5b2-093c-11ee-bdba-76d8a30504e0',\n",
" 'd604b5c6-093c-11ee-bdba-76d8a30504e0',\n",
" 'd604b5e4-093c-11ee-bdba-76d8a30504e0',\n",
" 'd604b602-093c-11ee-bdba-76d8a30504e0',\n",
" 'd604b620-093c-11ee-bdba-76d8a30504e0',\n",
" 'd604b63e-093c-11ee-bdba-76d8a30504e0',\n",
" 'd604b65c-093c-11ee-bdba-76d8a30504e0',\n",
" 'd604b67a-093c-11ee-bdba-76d8a30504e0',\n",
" 'd604b698-093c-11ee-bdba-76d8a30504e0',\n",
" 'd604b6b6-093c-11ee-bdba-76d8a30504e0',\n",
" 'd604b6d4-093c-11ee-bdba-76d8a30504e0']"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"outputs": [],
"source": [
"# Embed and store the texts\n",
"username = \"<username>\" # your username on app.activeloop.ai\n",
@@ -553,23 +330,9 @@
},
{
"cell_type": "code",
"execution_count": 17,
"execution_count": null,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while youre at it, pass the Disclose Act so Americans can know who is funding our elections. \n",
"\n",
"Tonight, Id like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n",
"\n",
"One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n",
"\n",
"And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nations top legal minds, who will continue Justice Breyers legacy of excellence.\n"
]
}
],
"outputs": [],
"source": [
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
"docs = db.similarity_search(query)\n",
@@ -577,102 +340,30 @@
]
},
{
"cell_type": "code",
"execution_count": 20,
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Your Deep Lake dataset has been successfully created!\n",
"The dataset is private so make sure you are logged in!\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"|"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Dataset(path='hub://adilkhan/langchain_testing', tensors=['embedding', 'id', 'metadata', 'text'])\n",
"\n",
" tensor htype shape dtype compression\n",
" ------- ------- ------- ------- ------- \n",
" embedding embedding (42, 1536) float32 None \n",
" id text (42, 1) str None \n",
" metadata json (42, 1) str None \n",
" text text (42, 1) str None \n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
" \r"
]
},
{
"data": {
"text/plain": [
"['6584c33a-093d-11ee-bdba-76d8a30504e0',\n",
" '6584c3ee-093d-11ee-bdba-76d8a30504e0',\n",
" '6584c420-093d-11ee-bdba-76d8a30504e0',\n",
" '6584c43e-093d-11ee-bdba-76d8a30504e0',\n",
" '6584c466-093d-11ee-bdba-76d8a30504e0',\n",
" '6584c484-093d-11ee-bdba-76d8a30504e0',\n",
" '6584c4a2-093d-11ee-bdba-76d8a30504e0',\n",
" '6584c4c0-093d-11ee-bdba-76d8a30504e0',\n",
" '6584c4de-093d-11ee-bdba-76d8a30504e0',\n",
" '6584c4fc-093d-11ee-bdba-76d8a30504e0',\n",
" '6584c51a-093d-11ee-bdba-76d8a30504e0',\n",
" '6584c538-093d-11ee-bdba-76d8a30504e0',\n",
" '6584c556-093d-11ee-bdba-76d8a30504e0',\n",
" '6584c574-093d-11ee-bdba-76d8a30504e0',\n",
" '6584c592-093d-11ee-bdba-76d8a30504e0',\n",
" '6584c5b0-093d-11ee-bdba-76d8a30504e0',\n",
" '6584c5ce-093d-11ee-bdba-76d8a30504e0',\n",
" '6584c5f6-093d-11ee-bdba-76d8a30504e0',\n",
" '6584c614-093d-11ee-bdba-76d8a30504e0',\n",
" '6584c632-093d-11ee-bdba-76d8a30504e0',\n",
" '6584c646-093d-11ee-bdba-76d8a30504e0',\n",
" '6584c66e-093d-11ee-bdba-76d8a30504e0',\n",
" '6584c682-093d-11ee-bdba-76d8a30504e0',\n",
" '6584c6a0-093d-11ee-bdba-76d8a30504e0',\n",
" '6584c6be-093d-11ee-bdba-76d8a30504e0',\n",
" '6584c6e6-093d-11ee-bdba-76d8a30504e0',\n",
" '6584c704-093d-11ee-bdba-76d8a30504e0',\n",
" '6584c722-093d-11ee-bdba-76d8a30504e0',\n",
" '6584c740-093d-11ee-bdba-76d8a30504e0',\n",
" '6584c75e-093d-11ee-bdba-76d8a30504e0',\n",
" '6584c77c-093d-11ee-bdba-76d8a30504e0',\n",
" '6584c79a-093d-11ee-bdba-76d8a30504e0',\n",
" '6584c7ae-093d-11ee-bdba-76d8a30504e0',\n",
" '6584c7cc-093d-11ee-bdba-76d8a30504e0',\n",
" '6584c7ea-093d-11ee-bdba-76d8a30504e0',\n",
" '6584c808-093d-11ee-bdba-76d8a30504e0',\n",
" '6584c826-093d-11ee-bdba-76d8a30504e0',\n",
" '6584c844-093d-11ee-bdba-76d8a30504e0',\n",
" '6584c862-093d-11ee-bdba-76d8a30504e0',\n",
" '6584c876-093d-11ee-bdba-76d8a30504e0',\n",
" '6584c894-093d-11ee-bdba-76d8a30504e0',\n",
" '6584c8bc-093d-11ee-bdba-76d8a30504e0']"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#### `tensor_db` execution option "
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"In order to utilize Deep Lake's Managed Tensor Database, it is necessary to specify the runtime parameter as {'tensor_db': True} during the creation of the vector store. This configuration enables the execution of queries on the Managed Tensor Database, rather than on the client side. It should be noted that this functionality is not applicable to datasets stored locally or in-memory. In the event that a vector store has already been created outside of the Managed Tensor Database, it is possible to transfer it to the Managed Tensor Database by following the prescribed steps."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Embed and store the texts\n",
"username = \"adilkhan\" # your username on app.activeloop.ai\n",
"dataset_path = f\"hub://{username}/langchain_testing\" # could be also ./local/path (much faster locally), s3://bucket/path/to/dataset, gcs://path/to/dataset, etc.\n",
"dataset_path = f\"hub://{username}/langchain_testing\"\n",
"\n",
"docs = text_splitter.split_documents(documents)\n",
"\n",
@@ -681,44 +372,13 @@
" dataset_path=dataset_path,\n",
" embedding_function=embeddings,\n",
" overwrite=True,\n",
" exec_option=\"tensor_db\",\n",
" runtime={\"tensor_db\": True},\n",
")\n",
"db.add_documents(docs)"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while youre at it, pass the Disclose Act so Americans can know who is funding our elections. \n",
"\n",
"Tonight, Id like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n",
"\n",
"One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n",
"\n",
"And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nations top legal minds, who will continue Justice Breyers legacy of excellence.\n"
]
}
],
"source": [
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
"docs = db.similarity_search(query, exec_option=\"tensor_db\")\n",
"print(docs[0].page_content)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##### The difference will be apparent on a bigger datasets (~10000 rows)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
@@ -726,15 +386,16 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"now we can use tql search with DeepLake"
"Furthermore, the execution of queries is also supported within the similarity_search method, whereby the query can be specified utilizing Deep Lake's Tensor Query Language (TQL)."
]
},
{
"cell_type": "code",
"execution_count": 23,
"execution_count": 20,
"metadata": {},
"outputs": [],
"source": [
@@ -743,42 +404,31 @@
},
{
"cell_type": "code",
"execution_count": 24,
"execution_count": 21,
"metadata": {},
"outputs": [],
"source": [
"docs = db.similarity_search(\n",
" query=None,\n",
" tql_query=f\"SELECT * WHERE id == '{search_id[0]}'\",\n",
" exec_option=\"tensor_db\",\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 25,
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[Document(lc_kwargs={'page_content': 'Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and the Cabinet. Justices of the Supreme Court. My fellow Americans. \\n\\nLast year COVID-19 kept us apart. This year we are finally together again. \\n\\nTonight, we meet as Democrats Republicans and Independents. But most importantly as Americans. \\n\\nWith a duty to one another to the American people to the Constitution. \\n\\nAnd with an unwavering resolve that freedom will always triumph over tyranny. \\n\\nSix days ago, Russias Vladimir Putin sought to shake the foundations of the free world thinking he could make it bend to his menacing ways. But he badly miscalculated. \\n\\nHe thought he could roll into Ukraine and the world would roll over. Instead he met a wall of strength he never imagined. \\n\\nHe met the Ukrainian people. \\n\\nFrom President Zelenskyy to every Ukrainian, their fearlessness, their courage, their determination, inspires the world.', 'metadata': {'source': 'docs/modules/state_of_the_union.txt'}}, page_content='Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and the Cabinet. Justices of the Supreme Court. My fellow Americans. \\n\\nLast year COVID-19 kept us apart. This year we are finally together again. \\n\\nTonight, we meet as Democrats Republicans and Independents. But most importantly as Americans. \\n\\nWith a duty to one another to the American people to the Constitution. \\n\\nAnd with an unwavering resolve that freedom will always triumph over tyranny. \\n\\nSix days ago, Russias Vladimir Putin sought to shake the foundations of the free world thinking he could make it bend to his menacing ways. But he badly miscalculated. \\n\\nHe thought he could roll into Ukraine and the world would roll over. Instead he met a wall of strength he never imagined. \\n\\nHe met the Ukrainian people. \\n\\nFrom President Zelenskyy to every Ukrainian, their fearlessness, their courage, their determination, inspires the world.', metadata={'source': 'docs/modules/state_of_the_union.txt'})]"
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"outputs": [],
"source": [
"docs"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"### Creating dataset on AWS S3"
"### Creating vector stores on AWS S3"
]
},
{
@@ -841,11 +491,12 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Deep Lake API\n",
"you can access the Deep Lake dataset at `db.ds`"
"you can access the Deep Lake dataset at `db.vectorstore`"
]
},
{
@@ -884,6 +535,7 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [

View File

@@ -51,7 +51,8 @@
"outputs": [],
"source": [
"from langchain.document_loaders import TextLoader\n",
"loader = TextLoader('../../../state_of_the_union.txt')\n",
"\n",
"loader = TextLoader(\"../../../state_of_the_union.txt\")\n",
"documents = loader.load()\n",
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
"docs = text_splitter.split_documents(documents)"
@@ -72,11 +73,11 @@
}
],
"source": [
"import marqo \n",
"import marqo\n",
"\n",
"# initialize marqo\n",
"marqo_url = \"http://localhost:8882\" # if using marqo cloud replace with your endpoint (console.marqo.ai)\n",
"marqo_api_key = \"\" # if using marqo cloud replace with your api key (console.marqo.ai)\n",
"marqo_url = \"http://localhost:8882\" # if using marqo cloud replace with your endpoint (console.marqo.ai)\n",
"marqo_api_key = \"\" # if using marqo cloud replace with your api key (console.marqo.ai)\n",
"\n",
"client = marqo.Client(url=marqo_url, api_key=marqo_api_key)\n",
"\n",
@@ -190,7 +191,6 @@
}
],
"source": [
"\n",
"# use a new index\n",
"index_name = \"langchain-multimodal-demo\"\n",
"\n",
@@ -199,22 +199,22 @@
" client.delete_index(index_name)\n",
"except Exception:\n",
" print(f\"Creating {index_name}\")\n",
" \n",
"\n",
"# This index could have been created by another system\n",
"settings = {\"treat_urls_and_pointers_as_images\": True, \"model\": \"ViT-L/14\"}\n",
"client.create_index(index_name, **settings)\n",
"client.index(index_name).add_documents(\n",
" [ \n",
" [\n",
" # image of a bus\n",
" {\n",
" \"caption\": \"Bus\",\n",
" \"image\": \"https://raw.githubusercontent.com/marqo-ai/marqo/mainline/examples/ImageSearchGuide/data/image4.jpg\"\n",
" \"image\": \"https://raw.githubusercontent.com/marqo-ai/marqo/mainline/examples/ImageSearchGuide/data/image4.jpg\",\n",
" },\n",
" # image of a plane\n",
" { \n",
" \"caption\": \"Plane\", \n",
" \"image\": \"https://raw.githubusercontent.com/marqo-ai/marqo/mainline/examples/ImageSearchGuide/data/image2.jpg\"\n",
" }\n",
" {\n",
" \"caption\": \"Plane\",\n",
" \"image\": \"https://raw.githubusercontent.com/marqo-ai/marqo/mainline/examples/ImageSearchGuide/data/image2.jpg\",\n",
" },\n",
" ],\n",
")"
]
@@ -230,6 +230,7 @@
" \"\"\"Helper to format Marqo's documents into text to be used as page_content\"\"\"\n",
" return f\"{res['caption']}: {res['image']}\"\n",
"\n",
"\n",
"docsearch = Marqo(client, index_name, page_content_builder=get_content)\n",
"\n",
"\n",
@@ -292,7 +293,6 @@
}
],
"source": [
"\n",
"# use a new index\n",
"index_name = \"langchain-byo-index-demo\"\n",
"\n",
@@ -305,17 +305,17 @@
"# This index could have been created by another system\n",
"client.create_index(index_name)\n",
"client.index(index_name).add_documents(\n",
" [ \n",
" [\n",
" {\n",
" \"Title\": \"Smartphone\",\n",
" \"Description\": \"A smartphone is a portable computer device that combines mobile telephone \"\n",
" \"functions and computing functions into one unit.\",\n",
" },\n",
" { \n",
" {\n",
" \"Title\": \"Telephone\",\n",
" \"Description\": \"A telephone is a telecommunications device that permits two or more users to\"\n",
" \"conduct a conversation when they are too far apart to be easily heard directly.\",\n",
" }\n",
" },\n",
" ],\n",
")"
]
@@ -341,16 +341,17 @@
"# Note text indexes retain the ability to use add_texts despite different field names in documents\n",
"# this is because the page_content_builder callback lets you handle these document fields as required\n",
"\n",
"\n",
"def get_content(res):\n",
" \"\"\"Helper to format Marqo's documents into text to be used as page_content\"\"\"\n",
" if 'text' in res:\n",
" return res['text']\n",
" return res['Description']\n",
" if \"text\" in res:\n",
" return res[\"text\"]\n",
" return res[\"Description\"]\n",
"\n",
"\n",
"docsearch = Marqo(client, index_name, page_content_builder=get_content)\n",
"\n",
"docsearch.add_texts([\"This is a document that is about elephants\"])\n"
"docsearch.add_texts([\"This is a document that is about elephants\"])"
]
},
{
@@ -421,7 +422,7 @@
}
],
"source": [
"query = {\"communications devices\" : 1.0}\n",
"query = {\"communications devices\": 1.0}\n",
"doc_results = docsearch.similarity_search(query)\n",
"print(doc_results[0].page_content)"
]
@@ -441,7 +442,7 @@
}
],
"source": [
"query = {\"communications devices\" : 1.0, \"technology post 2000\": -1.0}\n",
"query = {\"communications devices\": 1.0, \"technology post 2000\": -1.0}\n",
"doc_results = docsearch.similarity_search(query)\n",
"print(doc_results[0].page_content)"
]

View File

@@ -40,8 +40,7 @@
"import os\n",
"import getpass\n",
"\n",
"MONGODB_ATLAS_CLUSTER_URI = getpass.getpass(\"MongoDB Atlas Cluster URI:\")\n",
"MONGODB_ATLAS_CLUSTER_URI = os.environ[\"MONGODB_ATLAS_CLUSTER_URI\"]"
"MONGODB_ATLAS_CLUSTER_URI = getpass.getpass(\"MongoDB Atlas Cluster URI:\")"
]
},
{
@@ -60,8 +59,7 @@
"metadata": {},
"outputs": [],
"source": [
"os.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"OpenAI API Key:\")\n",
"OPENAI_API_KEY = os.environ[\"OPENAI_API_KEY\"]"
"os.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"OpenAI API Key:\")"
]
},
{
@@ -101,16 +99,6 @@
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
"from langchain.text_splitter import CharacterTextSplitter\n",
"from langchain.vectorstores import MongoDBAtlasVectorSearch\n",
"from langchain.document_loaders import TextLoader"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "a3c3999a",
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import TextLoader\n",
"\n",
"loader = TextLoader(\"../../../state_of_the_union.txt\")\n",
@@ -164,7 +152,7 @@
"id": "851a2ec9-9390-49a4-8412-3e132c9f789d",
"metadata": {},
"source": [
"You can reuse the vector search index you created, make sure the `OPENAI_API_KEY` environment variable is set up, then execute another query."
"You can also instantiate the vector store directly and execute a query as follows:"
]
},
{
@@ -174,29 +162,14 @@
"metadata": {},
"outputs": [],
"source": [
"from pymongo import MongoClient\n",
"from langchain.vectorstores import MongoDBAtlasVectorSearch\n",
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
"import os\n",
"\n",
"MONGODB_ATLAS_URI = os.environ[\"MONGODB_ATLAS_URI\"]\n",
"\n",
"# initialize MongoDB python client\n",
"client = MongoClient(MONGODB_ATLAS_URI)\n",
"\n",
"db_name = \"langchain_db\"\n",
"collection_name = \"langchain_col\"\n",
"collection = client[db_name][collection_name]\n",
"index_name = \"langchain_demo\"\n",
"\n",
"# initialize vector store\n",
"vectorStore = MongoDBAtlasVectorSearch(\n",
"vectorstore = MongoDBAtlasVectorSearch(\n",
" collection, OpenAIEmbeddings(), index_name=index_name\n",
")\n",
"\n",
"# perform a similarity search between a query and the ingested documents\n",
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
"docs = vectorStore.similarity_search(query)\n",
"docs = vectorstore.similarity_search(query)\n",
"\n",
"print(docs[0].page_content)"
]

View File

@@ -123,7 +123,7 @@
},
{
"cell_type": "code",
"execution_count": 62,
"execution_count": 1,
"metadata": {
"tags": []
},
@@ -138,7 +138,7 @@
},
{
"cell_type": "code",
"execution_count": 63,
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
@@ -152,49 +152,24 @@
},
{
"cell_type": "code",
"execution_count": 5,
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"## PGVector needs the connection string to the database.\n",
"## We will load it from the environment variables.\n",
"import os\n",
"# PGVector needs the connection string to the database.\n",
"CONNECTION_STRING = \"postgresql+psycopg2://harrisonchase@localhost:5432/test3\"\n",
"\n",
"CONNECTION_STRING = PGVector.connection_string_from_db_params(\n",
" driver=os.environ.get(\"PGVECTOR_DRIVER\", \"psycopg2\"),\n",
" host=os.environ.get(\"PGVECTOR_HOST\", \"localhost\"),\n",
" port=int(os.environ.get(\"PGVECTOR_PORT\", \"5432\")),\n",
" database=os.environ.get(\"PGVECTOR_DATABASE\", \"postgres\"),\n",
" user=os.environ.get(\"PGVECTOR_USER\", \"postgres\"),\n",
" password=os.environ.get(\"PGVECTOR_PASSWORD\", \"postgres\"),\n",
")\n",
"\n",
"\n",
"## Example\n",
"# postgresql+psycopg2://username:password@localhost:5432/database_name"
]
},
{
"cell_type": "code",
"execution_count": 64,
"metadata": {},
"outputs": [],
"source": [
"# ## PGVector needs the connection string to the database.\n",
"# ## We will load it from the environment variables.\n",
"# # Alternatively, you can create it from enviornment variables.\n",
"# import os\n",
"\n",
"# CONNECTION_STRING = PGVector.connection_string_from_db_params(\n",
"# driver=os.environ.get(\"PGVECTOR_DRIVER\", \"psycopg2\"),\n",
"# host=os.environ.get(\"PGVECTOR_HOST\", \"localhost\"),\n",
"# port=int(os.environ.get(\"PGVECTOR_PORT\", \"5432\")),\n",
"# database=os.environ.get(\"PGVECTOR_DATABASE\", \"rd-embeddings\"),\n",
"# user=os.environ.get(\"PGVECTOR_USER\", \"admin\"),\n",
"# password=os.environ.get(\"PGVECTOR_PASSWORD\", \"password\"),\n",
"# )\n",
"\n",
"\n",
"# ## Example\n",
"# # postgresql+psycopg2://username:password@localhost:5432/database_name"
"# database=os.environ.get(\"PGVECTOR_DATABASE\", \"postgres\"),\n",
"# user=os.environ.get(\"PGVECTOR_USER\", \"postgres\"),\n",
"# password=os.environ.get(\"PGVECTOR_PASSWORD\", \"postgres\"),\n",
"# )"
]
},
{
@@ -206,27 +181,36 @@
},
{
"cell_type": "code",
"execution_count": 69,
"execution_count": 16,
"metadata": {},
"outputs": [],
"source": [
"# The PGVector Module will try to create a table with the name of the collection. So, make sure that the collection name is unique and the user has the\n",
"# permission to create a table.\n",
"# The PGVector Module will try to create a table with the name of the collection.\n",
"# So, make sure that the collection name is unique and the user has the permission to create a table.\n",
"\n",
"COLLECTION_NAME = \"state_of_the_union_test\"\n",
"\n",
"db = PGVector.from_documents(\n",
" embedding=embeddings,\n",
" documents=docs,\n",
" collection_name=\"state_of_the_union\",\n",
" collection_name=COLLECTION_NAME,\n",
" connection_string=CONNECTION_STRING,\n",
")\n",
"\n",
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
"docs_with_score: List[Tuple[Document, float]] = db.similarity_search_with_score(query)"
")"
]
},
{
"cell_type": "code",
"execution_count": 70,
"execution_count": 17,
"metadata": {},
"outputs": [],
"source": [
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
"docs_with_score = db.similarity_search_with_score(query)"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
@@ -234,7 +218,7 @@
"output_type": "stream",
"text": [
"--------------------------------------------------------------------------------\n",
"Score: 0.6076804864602984\n",
"Score: 0.18460171628856903\n",
"Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while youre at it, pass the Disclose Act so Americans can know who is funding our elections. \n",
"\n",
"Tonight, Id like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n",
@@ -244,7 +228,7 @@
"And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nations top legal minds, who will continue Justice Breyers legacy of excellence.\n",
"--------------------------------------------------------------------------------\n",
"--------------------------------------------------------------------------------\n",
"Score: 0.6076804864602984\n",
"Score: 0.18460171628856903\n",
"Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while youre at it, pass the Disclose Act so Americans can know who is funding our elections. \n",
"\n",
"Tonight, Id like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n",
@@ -254,21 +238,17 @@
"And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nations top legal minds, who will continue Justice Breyers legacy of excellence.\n",
"--------------------------------------------------------------------------------\n",
"--------------------------------------------------------------------------------\n",
"Score: 0.659062774389974\n",
"A former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus builder. Since shes been nominated, shes received a broad range of support—from the Fraternal Order of Police to former judges appointed by Democrats and Republicans. \n",
"Score: 0.18470284560586236\n",
"Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while youre at it, pass the Disclose Act so Americans can know who is funding our elections. \n",
"\n",
"And if we are to advance liberty and justice, we need to secure the Border and fix the immigration system. \n",
"Tonight, Id like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n",
"\n",
"We can do both. At our border, weve installed new technology like cutting-edge scanners to better detect drug smuggling. \n",
"One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n",
"\n",
"Weve set up joint patrols with Mexico and Guatemala to catch more human traffickers. \n",
"\n",
"Were putting in place dedicated immigration judges so families fleeing persecution and violence can have their cases heard faster. \n",
"\n",
"Were securing commitments and supporting partners in South and Central America to host more refugees and secure their own borders.\n",
"And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nations top legal minds, who will continue Justice Breyers legacy of excellence.\n",
"--------------------------------------------------------------------------------\n",
"--------------------------------------------------------------------------------\n",
"Score: 0.659062774389974\n",
"Score: 0.21730864082247825\n",
"A former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus builder. Since shes been nominated, shes received a broad range of support—from the Fraternal Order of Police to former judges appointed by Democrats and Republicans. \n",
"\n",
"And if we are to advance liberty and justice, we need to secure the Border and fix the immigration system. \n",
@@ -296,32 +276,22 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Working with vectorstore"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Uploading a vectorstore"
"## Working with vectorstore\n",
"\n",
"Above, we created a vectorstore from scratch. However, often times we want to work with an existing vectorstore.\n",
"In order to do that, we can initialize it directly."
]
},
{
"cell_type": "code",
"execution_count": 55,
"execution_count": 8,
"metadata": {},
"outputs": [],
"source": [
"data = docs\n",
"api_key = os.environ[\"OPENAI_API_KEY\"]\n",
"db = PGVector.from_documents(\n",
" documents=docs,\n",
" embedding=embeddings,\n",
" collection_name=collection_name,\n",
" connection_string=connection_string,\n",
" distance_strategy=DistanceStrategy.COSINE,\n",
" openai_api_key=api_key,\n",
" pre_delete_collection=False,\n",
"store = PGVector(\n",
" collection_name=COLLECTION_NAME,\n",
" connection_string=CONNECTION_STRING,\n",
" embedding_function=embeddings,\n",
")"
]
},
@@ -329,150 +299,166 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Retrieving a vectorstore"
"### Add documents\n",
"We can add documents to the existing vectorstore."
]
},
{
"cell_type": "code",
"execution_count": 56,
"execution_count": 19,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['048c2e14-1cf3-11ee-8777-e65801318980']"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"store.add_documents([Document(page_content=\"foo\")])"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [],
"source": [
"connection_string = CONNECTION_STRING\n",
"embedding = embeddings\n",
"collection_name = \"state_of_the_union\"\n",
"from langchain.vectorstores.pgvector import DistanceStrategy\n",
"\n",
"store = PGVector(\n",
" connection_string=connection_string,\n",
" embedding_function=embedding,\n",
" collection_name=collection_name,\n",
" distance_strategy=DistanceStrategy.COSINE,\n",
")\n",
"docs_with_score = db.similarity_search_with_score(\"foo\")"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(Document(page_content='foo', metadata={}), 3.3203430005457335e-09)"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"docs_with_score[0]"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(Document(page_content='A former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus builder. Since shes been nominated, shes received a broad range of support—from the Fraternal Order of Police to former judges appointed by Democrats and Republicans. \\n\\nAnd if we are to advance liberty and justice, we need to secure the Border and fix the immigration system. \\n\\nWe can do both. At our border, weve installed new technology like cutting-edge scanners to better detect drug smuggling. \\n\\nWeve set up joint patrols with Mexico and Guatemala to catch more human traffickers. \\n\\nWere putting in place dedicated immigration judges so families fleeing persecution and violence can have their cases heard faster. \\n\\nWere securing commitments and supporting partners in South and Central America to host more refugees and secure their own borders.', metadata={'source': '../../../state_of_the_union.txt'}),\n",
" 0.2404395365581814)"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"docs_with_score[1]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Overriding a vectorstore\n",
"\n",
"If you have an existing collection, you override it by doing `from_documents` and setting `pre_delete_collection` = True"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [],
"source": [
"db = PGVector.from_documents(\n",
" documents=docs,\n",
" embedding=embeddings,\n",
" collection_name=COLLECTION_NAME,\n",
" connection_string=CONNECTION_STRING,\n",
" pre_delete_collection=True,\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [],
"source": [
"docs_with_score = db.similarity_search_with_score(\"foo\")"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(Document(page_content='A former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus builder. Since shes been nominated, shes received a broad range of support—from the Fraternal Order of Police to former judges appointed by Democrats and Republicans. \\n\\nAnd if we are to advance liberty and justice, we need to secure the Border and fix the immigration system. \\n\\nWe can do both. At our border, weve installed new technology like cutting-edge scanners to better detect drug smuggling. \\n\\nWeve set up joint patrols with Mexico and Guatemala to catch more human traffickers. \\n\\nWere putting in place dedicated immigration judges so families fleeing persecution and violence can have their cases heard faster. \\n\\nWere securing commitments and supporting partners in South and Central America to host more refugees and secure their own borders.', metadata={'source': '../../../state_of_the_union.txt'}),\n",
" 0.2404115088144465)"
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"docs_with_score[0]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Using a VectorStore as a Retriever"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [],
"source": [
"retriever = store.as_retriever()"
]
},
{
"cell_type": "code",
"execution_count": 57,
"execution_count": 27,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"vectorstore=<langchain.vectorstores.pgvector.PGVector object at 0x7fe9a1b1c670> search_type='similarity' search_kwargs={}\n"
"tags=None metadata=None vectorstore=<langchain.vectorstores.pgvector.PGVector object at 0x29f94f880> search_type='similarity' search_kwargs={}\n"
]
}
],
"source": [
"print(retriever)"
]
},
{
"cell_type": "code",
"execution_count": 83,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[(Document(page_content='Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while youre at it, pass the Disclose Act so Americans can know who is funding our elections. \\n\\nTonight, Id like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \\n\\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \\n\\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nations top legal minds, who will continue Justice Breyers legacy of excellence.', metadata={'source': '../../../state_of_the_union.txt'}), 0.6075870262188066), (Document(page_content='Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while youre at it, pass the Disclose Act so Americans can know who is funding our elections. \\n\\nTonight, Id like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \\n\\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \\n\\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nations top legal minds, who will continue Justice Breyers legacy of excellence.', metadata={'source': '../../../state_of_the_union.txt'}), 0.6075870262188066), (Document(page_content='A former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus builder. Since shes been nominated, shes received a broad range of support—from the Fraternal Order of Police to former judges appointed by Democrats and Republicans. \\n\\nAnd if we are to advance liberty and justice, we need to secure the Border and fix the immigration system. \\n\\nWe can do both. At our border, weve installed new technology like cutting-edge scanners to better detect drug smuggling. \\n\\nWeve set up joint patrols with Mexico and Guatemala to catch more human traffickers. \\n\\nWere putting in place dedicated immigration judges so families fleeing persecution and violence can have their cases heard faster. \\n\\nWere securing commitments and supporting partners in South and Central America to host more refugees and secure their own borders.', metadata={'source': '../../../state_of_the_union.txt'}), 0.6589478388546668), (Document(page_content='A former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus builder. Since shes been nominated, shes received a broad range of support—from the Fraternal Order of Police to former judges appointed by Democrats and Republicans. \\n\\nAnd if we are to advance liberty and justice, we need to secure the Border and fix the immigration system. \\n\\nWe can do both. At our border, weve installed new technology like cutting-edge scanners to better detect drug smuggling. \\n\\nWeve set up joint patrols with Mexico and Guatemala to catch more human traffickers. \\n\\nWere putting in place dedicated immigration judges so families fleeing persecution and violence can have their cases heard faster. \\n\\nWere securing commitments and supporting partners in South and Central America to host more refugees and secure their own borders.', metadata={'source': '../../../state_of_the_union.txt'}), 0.6589478388546668)]\n"
]
}
],
"source": [
"# When we have an existing PG VEctor\n",
"DEFAULT_DISTANCE_STRATEGY = DistanceStrategy.EUCLIDEAN\n",
"db1 = PGVector.from_existing_index(\n",
" embedding=embeddings,\n",
" collection_name=\"state_of_the_union\",\n",
" distance_strategy=DEFAULT_DISTANCE_STRATEGY,\n",
" pre_delete_collection=False,\n",
" connection_string=CONNECTION_STRING,\n",
")\n",
"\n",
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
"docs_with_score: List[Tuple[Document, float]] = db1.similarity_search_with_score(query)\n",
"print(docs_with_score)"
]
},
{
"cell_type": "code",
"execution_count": 81,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"--------------------------------------------------------------------------------\n",
"Score: 0.6075870262188066\n",
"Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while youre at it, pass the Disclose Act so Americans can know who is funding our elections. \n",
"\n",
"Tonight, Id like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n",
"\n",
"One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n",
"\n",
"And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nations top legal minds, who will continue Justice Breyers legacy of excellence.\n",
"--------------------------------------------------------------------------------\n",
"--------------------------------------------------------------------------------\n",
"Score: 0.6075870262188066\n",
"Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while youre at it, pass the Disclose Act so Americans can know who is funding our elections. \n",
"\n",
"Tonight, Id like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n",
"\n",
"One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n",
"\n",
"And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nations top legal minds, who will continue Justice Breyers legacy of excellence.\n",
"--------------------------------------------------------------------------------\n",
"--------------------------------------------------------------------------------\n",
"Score: 0.6589478388546668\n",
"A former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus builder. Since shes been nominated, shes received a broad range of support—from the Fraternal Order of Police to former judges appointed by Democrats and Republicans. \n",
"\n",
"And if we are to advance liberty and justice, we need to secure the Border and fix the immigration system. \n",
"\n",
"We can do both. At our border, weve installed new technology like cutting-edge scanners to better detect drug smuggling. \n",
"\n",
"Weve set up joint patrols with Mexico and Guatemala to catch more human traffickers. \n",
"\n",
"Were putting in place dedicated immigration judges so families fleeing persecution and violence can have their cases heard faster. \n",
"\n",
"Were securing commitments and supporting partners in South and Central America to host more refugees and secure their own borders.\n",
"--------------------------------------------------------------------------------\n",
"--------------------------------------------------------------------------------\n",
"Score: 0.6589478388546668\n",
"A former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus builder. Since shes been nominated, shes received a broad range of support—from the Fraternal Order of Police to former judges appointed by Democrats and Republicans. \n",
"\n",
"And if we are to advance liberty and justice, we need to secure the Border and fix the immigration system. \n",
"\n",
"We can do both. At our border, weve installed new technology like cutting-edge scanners to better detect drug smuggling. \n",
"\n",
"Weve set up joint patrols with Mexico and Guatemala to catch more human traffickers. \n",
"\n",
"Were putting in place dedicated immigration judges so families fleeing persecution and violence can have their cases heard faster. \n",
"\n",
"Were securing commitments and supporting partners in South and Central America to host more refugees and secure their own borders.\n",
"--------------------------------------------------------------------------------\n"
]
}
],
"source": [
"for doc, score in docs_with_score:\n",
" print(\"-\" * 80)\n",
" print(\"Score: \", score)\n",
" print(doc.page_content)\n",
" print(\"-\" * 80)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
@@ -491,7 +477,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.6"
"version": "3.9.1"
}
},
"nbformat": 4,

View File

@@ -249,24 +249,9 @@
"id": "93540013",
"metadata": {},
"source": [
"## Reusing the same collection\n",
"## Recreating the collection\n",
"\n",
"Both `Qdrant.from_texts` and `Qdrant.from_documents` methods are great to start using Qdrant with LangChain, but **they are going to destroy the collection and create it from scratch**! If you want to reuse the existing collection, you can always create an instance of `Qdrant` on your own and pass the `QdrantClient` instance with the connection details."
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "b7b432d7",
"metadata": {
"ExecuteTime": {
"end_time": "2023-04-04T10:51:24.843090Z",
"start_time": "2023-04-04T10:51:24.840041Z"
}
},
"outputs": [],
"source": [
"del qdrant"
"Both `Qdrant.from_texts` and `Qdrant.from_documents` methods are great to start using Qdrant with Langchain. In the previous versions the collection was recreated every time you called any of them. That behaviour has changed. Currently, the collection is going to be reused if it already exists. Setting `force_recreate` to `True` allows to remove the old collection and start from scratch."
]
},
{
@@ -281,10 +266,15 @@
},
"outputs": [],
"source": [
"import qdrant_client\n",
"\n",
"client = qdrant_client.QdrantClient(path=\"/tmp/local_qdrant\", prefer_grpc=True)\n",
"qdrant = Qdrant(client=client, collection_name=\"my_documents\", embeddings=embeddings)"
"url = \"<---qdrant url here --->\"\n",
"qdrant = Qdrant.from_documents(\n",
" docs,\n",
" embeddings,\n",
" url,\n",
" prefer_grpc=True,\n",
" collection_name=\"my_documents\",\n",
" force_recreate=True,\n",
")"
]
},
{

View File

@@ -118,10 +118,9 @@
"texts = [d.page_content for d in docs]\n",
"metadatas = [d.metadata for d in docs]\n",
"\n",
"rds, keys = Redis.from_texts_return_keys(texts,\n",
" embeddings,\n",
" redis_url=\"redis://localhost:6379\",\n",
" index_name=\"link\")"
"rds, keys = Redis.from_texts_return_keys(\n",
" texts, embeddings, redis_url=\"redis://localhost:6379\", index_name=\"link\"\n",
")"
]
},
{

View File

@@ -94,12 +94,14 @@
"\n",
"# Make sure env variable ROCKSET_API_KEY is set\n",
"ROCKSET_API_KEY = os.environ.get(\"ROCKSET_API_KEY\")\n",
"ROCKSET_API_SERVER = rockset.Regions.usw2a1 # Make sure this points to the correct Rockset region\n",
"ROCKSET_API_SERVER = (\n",
" rockset.Regions.usw2a1\n",
") # Make sure this points to the correct Rockset region\n",
"rockset_client = rockset.RocksetClient(ROCKSET_API_SERVER, ROCKSET_API_KEY)\n",
"\n",
"COLLECTION_NAME='langchain_demo'\n",
"TEXT_KEY='description'\n",
"EMBEDDING_KEY='description_embedding'"
"COLLECTION_NAME = \"langchain_demo\"\n",
"TEXT_KEY = \"description\"\n",
"EMBEDDING_KEY = \"description_embedding\""
]
},
{
@@ -124,7 +126,7 @@
"from langchain.document_loaders import TextLoader\n",
"from langchain.vectorstores.rocksetdb import RocksetDB\n",
"\n",
"loader = TextLoader('../../../state_of_the_union.txt')\n",
"loader = TextLoader(\"../../../state_of_the_union.txt\")\n",
"documents = loader.load()\n",
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
"docs = text_splitter.split_documents(documents)"
@@ -148,7 +150,7 @@
"# Make sure the environment variable OPENAI_API_KEY is set up\n",
"embeddings = OpenAIEmbeddings()\n",
"\n",
"docsearch=RocksetDB(\n",
"docsearch = RocksetDB(\n",
" client=rockset_client,\n",
" embeddings=embeddings,\n",
" collection_name=COLLECTION_NAME,\n",
@@ -156,7 +158,7 @@
" embedding_key=EMBEDDING_KEY,\n",
")\n",
"\n",
"ids=docsearch.add_texts(\n",
"ids = docsearch.add_texts(\n",
" texts=[d.page_content for d in docs],\n",
" metadatas=[d.metadata for d in docs],\n",
")\n",
@@ -182,11 +184,13 @@
"outputs": [],
"source": [
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
"output = docsearch.similarity_search_with_relevance_scores(query, 4, RocksetDB.DistanceFunction.COSINE_SIM)\n",
"output = docsearch.similarity_search_with_relevance_scores(\n",
" query, 4, RocksetDB.DistanceFunction.COSINE_SIM\n",
")\n",
"print(\"output length:\", len(output))\n",
"for d, dist in output:\n",
" print(dist, d.metadata, d.page_content[:20] + '...')\n",
" \n",
" print(dist, d.metadata, d.page_content[:20] + \"...\")\n",
"\n",
"##\n",
"# output length: 4\n",
"# 0.764990692109871 {'source': '../../../state_of_the_union.txt'} Madam Speaker, Madam...\n",
@@ -218,12 +222,12 @@
" query,\n",
" 4,\n",
" RocksetDB.DistanceFunction.COSINE_SIM,\n",
" where_str=\"{} NOT LIKE '%citizens%'\".format(TEXT_KEY)\n",
" where_str=\"{} NOT LIKE '%citizens%'\".format(TEXT_KEY),\n",
")\n",
"print(\"output length:\", len(output))\n",
"for d, dist in output:\n",
" print(dist, d.metadata, d.page_content[:20] + '...')\n",
" \n",
" print(dist, d.metadata, d.page_content[:20] + \"...\")\n",
"\n",
"##\n",
"# output length: 4\n",
"# 0.7651359650263554 {'source': '../../../state_of_the_union.txt'} Madam Speaker, Madam...\n",

View File

@@ -59,8 +59,8 @@
"metadata": {},
"outputs": [],
"source": [
"# Load text samples \n",
"loader = TextLoader('../../../state_of_the_union.txt')\n",
"# Load text samples\n",
"loader = TextLoader(\"../../../state_of_the_union.txt\")\n",
"documents = loader.load()\n",
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
"docs = text_splitter.split_documents(documents)\n",
@@ -90,7 +90,7 @@
"docsearch = SingleStoreDB.from_documents(\n",
" docs,\n",
" embeddings,\n",
" table_name = \"notebook\", # use table with a custom name \n",
" table_name=\"notebook\", # use table with a custom name\n",
")"
]
},

View File

@@ -62,7 +62,7 @@
"from langchain.vectorstores.starrocks import StarRocksSettings\n",
"from langchain.vectorstores import Chroma\n",
"from langchain.text_splitter import CharacterTextSplitter, TokenTextSplitter\n",
"from langchain import OpenAI,VectorDBQA\n",
"from langchain import OpenAI, VectorDBQA\n",
"from langchain.document_loaders import DirectoryLoader\n",
"from langchain.chains import RetrievalQA\n",
"from langchain.document_loaders import TextLoader, UnstructuredMarkdownLoader\n",
@@ -95,7 +95,9 @@
"metadata": {},
"outputs": [],
"source": [
"loader = DirectoryLoader('./docs', glob='**/*.md', loader_cls=UnstructuredMarkdownLoader)\n",
"loader = DirectoryLoader(\n",
" \"./docs\", glob=\"**/*.md\", loader_cls=UnstructuredMarkdownLoader\n",
")\n",
"documents = loader.load()"
]
},
@@ -158,7 +160,7 @@
}
],
"source": [
"print('# docs = %d, # splits = %d' % (len(documents), len(split_docs)))"
"print(\"# docs = %d, # splits = %d\" % (len(documents), len(split_docs)))"
]
},
{
@@ -186,10 +188,10 @@
"source": [
"def gen_starrocks(update_vectordb, embeddings, settings):\n",
" if update_vectordb:\n",
" docsearch = StarRocks.from_documents(split_docs, embeddings, config = settings) \n",
" docsearch = StarRocks.from_documents(split_docs, embeddings, config=settings)\n",
" else:\n",
" docsearch = StarRocks(embeddings, settings) \n",
" return docsearch\n"
" docsearch = StarRocks(embeddings, settings)\n",
" return docsearch"
]
},
{
@@ -255,10 +257,10 @@
"# configure starrocks settings(host/port/user/pw/db)\n",
"settings = StarRocksSettings()\n",
"settings.port = 41003\n",
"settings.host = '127.0.0.1'\n",
"settings.username = 'root'\n",
"settings.password = ''\n",
"settings.database = 'zya'\n",
"settings.host = \"127.0.0.1\"\n",
"settings.username = \"root\"\n",
"settings.password = \"\"\n",
"settings.database = \"zya\"\n",
"docsearch = gen_starrocks(update_vectordb, embeddings, settings)\n",
"\n",
"print(docsearch)\n",
@@ -290,7 +292,9 @@
],
"source": [
"llm = OpenAI()\n",
"qa = RetrievalQA.from_chain_type(llm=llm, chain_type=\"stuff\", retriever=docsearch.as_retriever())\n",
"qa = RetrievalQA.from_chain_type(\n",
" llm=llm, chain_type=\"stuff\", retriever=docsearch.as_retriever()\n",
")\n",
"query = \"is profile enabled by default? if not, how to enable profile?\"\n",
"resp = qa.run(query)\n",
"print(resp)"

View File

@@ -55,10 +55,10 @@
"metadata": {},
"outputs": [],
"source": [
"loader = TextLoader('../../../state_of_the_union.txt')\n",
"loader = TextLoader(\"../../../state_of_the_union.txt\")\n",
"documents = loader.load()\n",
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
"docs = text_splitter.split_documents(documents)\n"
"docs = text_splitter.split_documents(documents)"
]
},
{
@@ -74,9 +74,11 @@
},
"outputs": [],
"source": [
"vectara = Vectara.from_documents(docs, \n",
" embedding=FakeEmbeddings(size=768), \n",
" doc_metadata = {\"speech\": \"state-of-the-union\"})"
"vectara = Vectara.from_documents(\n",
" docs,\n",
" embedding=FakeEmbeddings(size=768),\n",
" doc_metadata={\"speech\": \"state-of-the-union\"},\n",
")"
]
},
{
@@ -104,11 +106,17 @@
"import urllib.request\n",
"\n",
"urls = [\n",
" ['https://www.gilderlehrman.org/sites/default/files/inline-pdfs/king.dreamspeech.excerpts.pdf', 'I-have-a-dream'],\n",
" ['https://www.parkwayschools.net/cms/lib/MO01931486/Centricity/Domain/1578/Churchill_Beaches_Speech.pdf', 'we shall fight on the beaches'],\n",
" [\n",
" \"https://www.gilderlehrman.org/sites/default/files/inline-pdfs/king.dreamspeech.excerpts.pdf\",\n",
" \"I-have-a-dream\",\n",
" ],\n",
" [\n",
" \"https://www.parkwayschools.net/cms/lib/MO01931486/Centricity/Domain/1578/Churchill_Beaches_Speech.pdf\",\n",
" \"we shall fight on the beaches\",\n",
" ],\n",
"]\n",
"files_list = []\n",
"for url,_ in urls:\n",
"for url, _ in urls:\n",
" name = tempfile.NamedTemporaryFile().name\n",
" urllib.request.urlretrieve(url, name)\n",
" files_list.append(name)\n",
@@ -116,7 +124,7 @@
"docsearch: Vectara = Vectara.from_files(\n",
" files=files_list,\n",
" embedding=FakeEmbeddings(size=768),\n",
" metadatas=[{\"url\": url, \"speech\": title} for url,title in urls],\n",
" metadatas=[{\"url\": url, \"speech\": title} for url, title in urls],\n",
")"
]
},
@@ -150,7 +158,9 @@
"outputs": [],
"source": [
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
"found_docs = vectara.similarity_search(query, n_sentence_context=0, filter=\"doc.speech = 'state-of-the-union'\")"
"found_docs = vectara.similarity_search(\n",
" query, n_sentence_context=0, filter=\"doc.speech = 'state-of-the-union'\"\n",
")"
]
},
{
@@ -207,7 +217,9 @@
"outputs": [],
"source": [
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
"found_docs = vectara.similarity_search_with_score(query, filter=\"doc.speech = 'state-of-the-union'\")"
"found_docs = vectara.similarity_search_with_score(\n",
" query, filter=\"doc.speech = 'state-of-the-union'\"\n",
")"
]
},
{
@@ -269,7 +281,9 @@
],
"source": [
"query = \"We must forever conduct our struggle\"\n",
"found_docs = vectara.similarity_search_with_score(query, filter=\"doc.speech = 'I-have-a-dream'\")\n",
"found_docs = vectara.similarity_search_with_score(\n",
" query, filter=\"doc.speech = 'I-have-a-dream'\"\n",
")\n",
"print(found_docs[0])\n",
"print(found_docs[1])"
]

View File

@@ -44,16 +44,18 @@
"import os\n",
"import getpass\n",
"\n",
"database_mode = (input('\\n(C)assandra or (A)stra DB? ')).upper()\n",
"database_mode = (input(\"\\n(C)assandra or (A)stra DB? \")).upper()\n",
"\n",
"keyspace_name = input('\\nKeyspace name? ')\n",
"keyspace_name = input(\"\\nKeyspace name? \")\n",
"\n",
"if database_mode == 'A':\n",
"if database_mode == \"A\":\n",
" ASTRA_DB_APPLICATION_TOKEN = getpass.getpass('\\nAstra DB Token (\"AstraCS:...\") ')\n",
" #\n",
" ASTRA_DB_SECURE_BUNDLE_PATH = input('Full path to your Secure Connect Bundle? ')\n",
"elif database_mode == 'C':\n",
" CASSANDRA_CONTACT_POINTS = input('Contact points? (comma-separated, empty for localhost) ').strip()"
" ASTRA_DB_SECURE_BUNDLE_PATH = input(\"Full path to your Secure Connect Bundle? \")\n",
"elif database_mode == \"C\":\n",
" CASSANDRA_CONTACT_POINTS = input(\n",
" \"Contact points? (comma-separated, empty for localhost) \"\n",
" ).strip()"
]
},
{
@@ -74,17 +76,15 @@
"from cassandra.cluster import Cluster\n",
"from cassandra.auth import PlainTextAuthProvider\n",
"\n",
"if database_mode == 'C':\n",
"if database_mode == \"C\":\n",
" if CASSANDRA_CONTACT_POINTS:\n",
" cluster = Cluster([\n",
" cp.strip()\n",
" for cp in CASSANDRA_CONTACT_POINTS.split(',')\n",
" if cp.strip()\n",
" ])\n",
" cluster = Cluster(\n",
" [cp.strip() for cp in CASSANDRA_CONTACT_POINTS.split(\",\") if cp.strip()]\n",
" )\n",
" else:\n",
" cluster = Cluster()\n",
" session = cluster.connect()\n",
"elif database_mode == 'A':\n",
"elif database_mode == \"A\":\n",
" ASTRA_DB_CLIENT_ID = \"token\"\n",
" cluster = Cluster(\n",
" cloud={\n",

View File

@@ -40,12 +40,15 @@
"cell_type": "code",
"execution_count": 1,
"metadata": {
"is_executing": true
"ExecuteTime": {
"end_time": "2023-07-09T19:20:49.003167Z",
"start_time": "2023-07-09T19:20:47.446370Z"
}
},
"outputs": [],
"source": [
"from langchain.memory.chat_message_histories import ZepChatMessageHistory\n",
"from langchain.memory import ConversationBufferMemory\n",
"from langchain.memory import ZepMemory\n",
"from langchain.retrievers import ZepRetriever\n",
"from langchain import OpenAI\n",
"from langchain.schema import HumanMessage, AIMessage\n",
"from langchain.utilities import WikipediaAPIWrapper\n",
@@ -64,19 +67,11 @@
"execution_count": 2,
"metadata": {
"ExecuteTime": {
"end_time": "2023-05-25T15:09:41.762056Z",
"start_time": "2023-05-25T15:09:41.755238Z"
"end_time": "2023-07-09T19:23:14.378234Z",
"start_time": "2023-07-09T19:20:49.005041Z"
}
},
"outputs": [
{
"name": "stdin",
"output_type": "stream",
"text": [
" ········\n"
]
}
],
"outputs": [],
"source": [
"# Provide your OpenAI key\n",
"import getpass\n",
@@ -87,16 +82,13 @@
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdin",
"output_type": "stream",
"text": [
" ········\n"
]
"metadata": {
"ExecuteTime": {
"end_time": "2023-07-09T19:23:16.329934Z",
"start_time": "2023-07-09T19:23:14.345580Z"
}
],
},
"outputs": [],
"source": [
"# Provide your Zep API key. Note that this is optional. See https://docs.getzep.com/deployment/auth\n",
"\n",
@@ -116,8 +108,8 @@
"execution_count": 4,
"metadata": {
"ExecuteTime": {
"end_time": "2023-05-25T15:09:41.840440Z",
"start_time": "2023-05-25T15:09:41.762277Z"
"end_time": "2023-07-09T19:23:16.528212Z",
"start_time": "2023-07-09T19:23:16.279045Z"
}
},
"outputs": [],
@@ -132,15 +124,11 @@
"]\n",
"\n",
"# Set up Zep Chat History\n",
"zep_chat_history = ZepChatMessageHistory(\n",
"memory = ZepMemory(\n",
" session_id=session_id,\n",
" url=ZEP_API_URL,\n",
" api_key=zep_api_key\n",
")\n",
"\n",
"# Use a standard ConversationBufferMemory to encapsulate the Zep chat history\n",
"memory = ConversationBufferMemory(\n",
" memory_key=\"chat_history\", chat_memory=zep_chat_history\n",
" api_key=zep_api_key,\n",
" memory_key=\"chat_history\",\n",
")\n",
"\n",
"# Initialize the agent\n",
@@ -167,8 +155,8 @@
"execution_count": 5,
"metadata": {
"ExecuteTime": {
"end_time": "2023-05-25T15:09:41.960661Z",
"start_time": "2023-05-25T15:09:41.842656Z"
"end_time": "2023-07-09T19:23:16.659484Z",
"start_time": "2023-07-09T19:23:16.532090Z"
}
},
"outputs": [],
@@ -230,14 +218,16 @@
" \" living in a dystopian future where society has collapsed due to\"\n",
" \" environmental disasters, poverty, and violence.\"\n",
" ),\n",
" \"metadata\": {\"foo\": \"bar\"},\n",
" },\n",
"]\n",
"\n",
"for msg in test_history:\n",
" zep_chat_history.add_message(\n",
" memory.chat_memory.add_message(\n",
" HumanMessage(content=msg[\"content\"])\n",
" if msg[\"role\"] == \"human\"\n",
" else AIMessage(content=msg[\"content\"])\n",
" else AIMessage(content=msg[\"content\"]),\n",
" metadata=msg.get(\"metadata\", {}),\n",
" )"
]
},
@@ -256,8 +246,8 @@
"execution_count": 6,
"metadata": {
"ExecuteTime": {
"end_time": "2023-05-25T15:09:50.485377Z",
"start_time": "2023-05-25T15:09:41.962287Z"
"end_time": "2023-07-09T19:23:19.348822Z",
"start_time": "2023-07-09T19:23:16.660130Z"
}
},
"outputs": [
@@ -269,16 +259,14 @@
"\n",
"\u001B[1m> Entering new chain...\u001B[0m\n",
"\u001B[32;1m\u001B[1;3mThought: Do I need to use a tool? No\n",
"AI: Parable of the Sower is a prescient novel that speaks to the challenges facing contemporary society, such as climate change, economic inequality, and the rise of authoritarianism. It is a cautionary tale that warns of the dangers of ignoring these issues and the importance of taking action to address them.\u001B[0m\n",
"AI: Parable of the Sower is a prescient novel that speaks to the challenges facing contemporary society, such as climate change, inequality, and violence. It is a cautionary tale that warns of the dangers of unchecked greed and the need for individuals to take responsibility for their own lives and the lives of those around them.\u001B[0m\n",
"\n",
"\u001B[1m> Finished chain.\u001B[0m\n"
]
},
{
"data": {
"text/plain": [
"'Parable of the Sower is a prescient novel that speaks to the challenges facing contemporary society, such as climate change, economic inequality, and the rise of authoritarianism. It is a cautionary tale that warns of the dangers of ignoring these issues and the importance of taking action to address them.'"
]
"text/plain": "'Parable of the Sower is a prescient novel that speaks to the challenges facing contemporary society, such as climate change, inequality, and violence. It is a cautionary tale that warns of the dangers of unchecked greed and the need for individuals to take responsibility for their own lives and the lives of those around them.'"
},
"execution_count": 6,
"metadata": {},
@@ -287,7 +275,7 @@
],
"source": [
"agent_chain.run(\n",
" input=\"WWhat is the book's relevance to the challenges facing contemporary society?\"\n",
" input=\"What is the book's relevance to the challenges facing contemporary society?\",\n",
")"
]
},
@@ -305,11 +293,11 @@
},
{
"cell_type": "code",
"execution_count": 7,
"execution_count": 9,
"metadata": {
"ExecuteTime": {
"end_time": "2023-05-25T15:09:50.493438Z",
"start_time": "2023-05-25T15:09:50.479230Z"
"end_time": "2023-07-09T19:23:41.042254Z",
"start_time": "2023-07-09T19:23:41.016815Z"
}
},
"outputs": [
@@ -317,29 +305,39 @@
"name": "stdout",
"output_type": "stream",
"text": [
"The human asks about Octavia Butler and the AI identifies her as an American science fiction author. They continue to discuss her works and the fact that the FX series Kindred is based on one of her novels. The AI also lists Ursula K. Le Guin, Samuel R. Delany, and Joanna Russ as Butler's contemporaries.\n",
"The human inquires about Octavia Butler. The AI identifies her as an American science fiction author. The human then asks which books of hers were made into movies. The AI responds by mentioning the FX series Kindred, based on her novel of the same name. The human then asks about her contemporaries, and the AI lists Ursula K. Le Guin, Samuel R. Delany, and Joanna Russ.\n",
"\n",
"\n",
"{'role': 'human', 'content': 'What awards did she win?', 'uuid': 'a4bdc592-71a5-47d0-9c64-230b882aab48', 'created_at': '2023-06-26T23:37:56.383953Z', 'token_count': 8, 'metadata': {'system': {'entities': [], 'intent': 'The subject is asking about the awards someone won, likely referring to a specific individual.'}}}\n",
"{'role': 'ai', 'content': 'Octavia Butler won the Hugo Award, the Nebula Award, and the MacArthur Fellowship.', 'uuid': '60cc6e6b-7cd4-4a81-aebc-72ef997286b4', 'created_at': '2023-06-26T23:37:56.389935Z', 'token_count': 21, 'metadata': {'system': {'entities': [{'Label': 'PERSON', 'Matches': [{'End': 14, 'Start': 0, 'Text': 'Octavia Butler'}], 'Name': 'Octavia Butler'}, {'Label': 'WORK_OF_ART', 'Matches': [{'End': 33, 'Start': 19, 'Text': 'the Hugo Award'}], 'Name': 'the Hugo Award'}, {'Label': 'EVENT', 'Matches': [{'End': 81, 'Start': 57, 'Text': 'the MacArthur Fellowship'}], 'Name': 'the MacArthur Fellowship'}], 'intent': 'The subject is stating the accomplishments and awards received by Octavia Butler.'}}}\n",
"{'role': 'human', 'content': 'Which other women sci-fi writers might I want to read?', 'uuid': 'b189fc60-1510-4a4b-a503-899481d652de', 'created_at': '2023-06-26T23:37:56.395722Z', 'token_count': 14, 'metadata': {'system': {'entities': [], 'intent': 'The subject is looking for recommendations on women science fiction writers to read.'}}}\n",
"{'role': 'ai', 'content': 'You might want to read Ursula K. Le Guin or Joanna Russ.', 'uuid': '4be1ccbb-a915-45d6-9f18-7a0c1cbd9907', 'created_at': '2023-06-26T23:37:56.403596Z', 'token_count': 18, 'metadata': {'system': {'entities': [{'Label': 'ORG', 'Matches': [{'End': 40, 'Start': 23, 'Text': 'Ursula K. Le Guin'}], 'Name': 'Ursula K. Le Guin'}, {'Label': 'PERSON', 'Matches': [{'End': 55, 'Start': 44, 'Text': 'Joanna Russ'}], 'Name': 'Joanna Russ'}], 'intent': 'The subject is suggesting reading material and making a literary recommendation.'}}}\n",
"{'role': 'human', 'content': \"Write a short synopsis of Butler's book, Parable of the Sower. What is it about?\", 'uuid': 'ac3c5e3e-26a7-4f3b-aeb0-bba084e22753', 'created_at': '2023-06-26T23:37:56.410662Z', 'token_count': 23, 'metadata': {'system': {'entities': [{'Label': 'ORG', 'Matches': [{'End': 32, 'Start': 26, 'Text': 'Butler'}], 'Name': 'Butler'}, {'Label': 'WORK_OF_ART', 'Matches': [{'End': 61, 'Start': 41, 'Text': 'Parable of the Sower'}], 'Name': 'Parable of the Sower'}], 'intent': 'The subject is asking for a brief overview or summary of the book \"Parable of the Sower\" written by Butler.'}}}\n",
"{'role': 'ai', 'content': 'Parable of the Sower is a science fiction novel by Octavia Butler, published in 1993. It follows the story of Lauren Olamina, a young woman living in a dystopian future where society has collapsed due to environmental disasters, poverty, and violence.', 'uuid': '4a463b4c-bcab-473c-bed1-fc56a7a20ae2', 'created_at': '2023-06-26T23:37:56.41764Z', 'token_count': 56, 'metadata': {'system': {'entities': [{'Label': 'GPE', 'Matches': [{'End': 20, 'Start': 15, 'Text': 'Sower'}], 'Name': 'Sower'}, {'Label': 'PERSON', 'Matches': [{'End': 65, 'Start': 51, 'Text': 'Octavia Butler'}], 'Name': 'Octavia Butler'}, {'Label': 'DATE', 'Matches': [{'End': 84, 'Start': 80, 'Text': '1993'}], 'Name': '1993'}, {'Label': 'PERSON', 'Matches': [{'End': 124, 'Start': 110, 'Text': 'Lauren Olamina'}], 'Name': 'Lauren Olamina'}]}}}\n",
"{'role': 'human', 'content': \"WWhat is the book's relevance to the challenges facing contemporary society?\", 'uuid': '41bab0c7-5e20-40a4-9303-f82069977c91', 'created_at': '2023-06-26T23:38:03.559642Z', 'token_count': 16, 'metadata': {'system': {'entities': [{'Label': 'ORG', 'Matches': [{'End': 5, 'Start': 0, 'Text': 'WWhat'}], 'Name': 'WWhat'}]}}}\n",
"{'role': 'ai', 'content': 'Parable of the Sower is a prescient novel that speaks to the challenges facing contemporary society, such as climate change, economic inequality, and the rise of authoritarianism. It is a cautionary tale that warns of the dangers of ignoring these issues and the importance of taking action to address them.', 'uuid': 'bfd8146a-4632-4c8c-98b6-9468bb624339', 'created_at': '2023-06-26T23:38:03.589312Z', 'token_count': 62, 'metadata': {'system': {'entities': [{'Label': 'GPE', 'Matches': [{'End': 20, 'Start': 15, 'Text': 'Sower'}], 'Name': 'Sower'}]}}}\n"
"system :\n",
" {'content': 'The human inquires about Octavia Butler. The AI identifies her as an American science fiction author. The human then asks which books of hers were made into movies. The AI responds by mentioning the FX series Kindred, based on her novel of the same name. The human then asks about her contemporaries, and the AI lists Ursula K. Le Guin, Samuel R. Delany, and Joanna Russ.', 'additional_kwargs': {}}\n",
"human :\n",
" {'content': 'What awards did she win?', 'additional_kwargs': {'uuid': '6b733f0b-6778-49ae-b3ec-4e077c039f31', 'created_at': '2023-07-09T19:23:16.611232Z', 'token_count': 8, 'metadata': {'system': {'entities': [], 'intent': 'The subject is inquiring about the awards that someone, whose identity is not specified, has won.'}}}, 'example': False}\n",
"ai :\n",
" {'content': 'Octavia Butler won the Hugo Award, the Nebula Award, and the MacArthur Fellowship.', 'additional_kwargs': {'uuid': '2f6d80c6-3c08-4fd4-8d4e-7bbee341ac90', 'created_at': '2023-07-09T19:23:16.618947Z', 'token_count': 21, 'metadata': {'system': {'entities': [{'Label': 'PERSON', 'Matches': [{'End': 14, 'Start': 0, 'Text': 'Octavia Butler'}], 'Name': 'Octavia Butler'}, {'Label': 'WORK_OF_ART', 'Matches': [{'End': 33, 'Start': 19, 'Text': 'the Hugo Award'}], 'Name': 'the Hugo Award'}, {'Label': 'EVENT', 'Matches': [{'End': 81, 'Start': 57, 'Text': 'the MacArthur Fellowship'}], 'Name': 'the MacArthur Fellowship'}], 'intent': 'The subject is stating that Octavia Butler received the Hugo Award, the Nebula Award, and the MacArthur Fellowship.'}}}, 'example': False}\n",
"human :\n",
" {'content': 'Which other women sci-fi writers might I want to read?', 'additional_kwargs': {'uuid': 'ccdcc901-ea39-4981-862f-6fe22ab9289b', 'created_at': '2023-07-09T19:23:16.62678Z', 'token_count': 14, 'metadata': {'system': {'entities': [], 'intent': 'The subject is seeking recommendations for additional women science fiction writers to explore.'}}}, 'example': False}\n",
"ai :\n",
" {'content': 'You might want to read Ursula K. Le Guin or Joanna Russ.', 'additional_kwargs': {'uuid': '7977099a-0c62-4c98-bfff-465bbab6c9c3', 'created_at': '2023-07-09T19:23:16.631721Z', 'token_count': 18, 'metadata': {'system': {'entities': [{'Label': 'ORG', 'Matches': [{'End': 40, 'Start': 23, 'Text': 'Ursula K. Le Guin'}], 'Name': 'Ursula K. Le Guin'}, {'Label': 'PERSON', 'Matches': [{'End': 55, 'Start': 44, 'Text': 'Joanna Russ'}], 'Name': 'Joanna Russ'}], 'intent': 'The subject is suggesting that the person should consider reading the works of Ursula K. Le Guin or Joanna Russ.'}}}, 'example': False}\n",
"human :\n",
" {'content': \"Write a short synopsis of Butler's book, Parable of the Sower. What is it about?\", 'additional_kwargs': {'uuid': 'e439b7e6-286a-4278-a8cb-dc260fa2e089', 'created_at': '2023-07-09T19:23:16.63623Z', 'token_count': 23, 'metadata': {'system': {'entities': [{'Label': 'ORG', 'Matches': [{'End': 32, 'Start': 26, 'Text': 'Butler'}], 'Name': 'Butler'}, {'Label': 'WORK_OF_ART', 'Matches': [{'End': 61, 'Start': 41, 'Text': 'Parable of the Sower'}], 'Name': 'Parable of the Sower'}], 'intent': 'The subject is requesting a brief summary or explanation of the book \"Parable of the Sower\" by Butler.'}}}, 'example': False}\n",
"ai :\n",
" {'content': 'Parable of the Sower is a science fiction novel by Octavia Butler, published in 1993. It follows the story of Lauren Olamina, a young woman living in a dystopian future where society has collapsed due to environmental disasters, poverty, and violence.', 'additional_kwargs': {'uuid': '6760489b-19c9-41aa-8b45-fae6cb1d7ee6', 'created_at': '2023-07-09T19:23:16.647524Z', 'token_count': 56, 'metadata': {'foo': 'bar', 'system': {'entities': [{'Label': 'GPE', 'Matches': [{'End': 20, 'Start': 15, 'Text': 'Sower'}], 'Name': 'Sower'}, {'Label': 'PERSON', 'Matches': [{'End': 65, 'Start': 51, 'Text': 'Octavia Butler'}], 'Name': 'Octavia Butler'}, {'Label': 'DATE', 'Matches': [{'End': 84, 'Start': 80, 'Text': '1993'}], 'Name': '1993'}, {'Label': 'PERSON', 'Matches': [{'End': 124, 'Start': 110, 'Text': 'Lauren Olamina'}], 'Name': 'Lauren Olamina'}], 'intent': 'The subject is providing information about the novel \"Parable of the Sower\" by Octavia Butler, including its genre, publication date, and a brief summary of the plot.'}}}, 'example': False}\n",
"human :\n",
" {'content': \"What is the book's relevance to the challenges facing contemporary society?\", 'additional_kwargs': {'uuid': '7dbbbb93-492b-4739-800f-cad2b6e0e764', 'created_at': '2023-07-09T19:23:19.315182Z', 'token_count': 15, 'metadata': {'system': {'entities': [], 'intent': 'The subject is asking about the relevance of a book to the challenges currently faced by society.'}}}, 'example': False}\n",
"ai :\n",
" {'content': 'Parable of the Sower is a prescient novel that speaks to the challenges facing contemporary society, such as climate change, inequality, and violence. It is a cautionary tale that warns of the dangers of unchecked greed and the need for individuals to take responsibility for their own lives and the lives of those around them.', 'additional_kwargs': {'uuid': '3e14ac8f-b7c1-4360-958b-9f3eae1f784f', 'created_at': '2023-07-09T19:23:19.332517Z', 'token_count': 66, 'metadata': {'system': {'entities': [{'Label': 'GPE', 'Matches': [{'End': 20, 'Start': 15, 'Text': 'Sower'}], 'Name': 'Sower'}], 'intent': 'The subject is providing an analysis and evaluation of the novel \"Parable of the Sower\" and highlighting its relevance to contemporary societal challenges.'}}}, 'example': False}\n"
]
}
],
"source": [
"def print_messages(messages):\n",
" for m in messages:\n",
" print(m.to_dict())\n",
" print(m.type, \":\\n\", m.dict())\n",
"\n",
"\n",
"print(zep_chat_history.zep_summary)\n",
"print(memory.chat_memory.zep_summary)\n",
"print(\"\\n\")\n",
"print_messages(zep_chat_history.zep_messages)"
"print_messages(memory.chat_memory.messages)"
]
},
{
@@ -349,16 +347,18 @@
"source": [
"### Vector search over the Zep memory\n",
"\n",
"Zep provides native vector search over historical conversation memory. Embedding happens automatically.\n"
"Zep provides native vector search over historical conversation memory via the `ZepRetriever`.\n",
"\n",
"You can use the `ZepRetriever` with chains that support passing in a Langchain `Retriever` object.\n"
]
},
{
"cell_type": "code",
"execution_count": 8,
"execution_count": 11,
"metadata": {
"ExecuteTime": {
"end_time": "2023-05-25T15:09:50.751203Z",
"start_time": "2023-05-25T15:09:50.495050Z"
"end_time": "2023-07-09T19:24:30.781893Z",
"start_time": "2023-07-09T19:24:30.595650Z"
}
},
"outputs": [
@@ -366,38 +366,36 @@
"name": "stdout",
"output_type": "stream",
"text": [
"{'uuid': 'b189fc60-1510-4a4b-a503-899481d652de', 'created_at': '2023-06-26T23:37:56.395722Z', 'role': 'human', 'content': 'Which other women sci-fi writers might I want to read?', 'metadata': {'system': {'entities': [], 'intent': 'The subject is looking for recommendations on women science fiction writers to read.'}}, 'token_count': 14} 0.9119619869747062\n",
"{'uuid': '4be1ccbb-a915-45d6-9f18-7a0c1cbd9907', 'created_at': '2023-06-26T23:37:56.403596Z', 'role': 'ai', 'content': 'You might want to read Ursula K. Le Guin or Joanna Russ.', 'metadata': {'system': {'entities': [{'Label': 'ORG', 'Matches': [{'End': 40, 'Start': 23, 'Text': 'Ursula K. Le Guin'}], 'Name': 'Ursula K. Le Guin'}, {'Label': 'PERSON', 'Matches': [{'End': 55, 'Start': 44, 'Text': 'Joanna Russ'}], 'Name': 'Joanna Russ'}], 'intent': 'The subject is suggesting reading material and making a literary recommendation.'}}, 'token_count': 18} 0.8534346954749745\n",
"{'uuid': '76ec2a3d-b908-4c23-a55d-71ff92865a7a', 'created_at': '2023-06-26T23:37:56.378345Z', 'role': 'ai', 'content': \"Octavia Butler's contemporaries included Ursula K. Le Guin, Samuel R. Delany, and Joanna Russ.\", 'metadata': {'system': {'entities': [{'Label': 'PERSON', 'Matches': [{'End': 16, 'Start': 0, 'Text': \"Octavia Butler's\"}], 'Name': \"Octavia Butler's\"}, {'Label': 'ORG', 'Matches': [{'End': 58, 'Start': 41, 'Text': 'Ursula K. Le Guin'}], 'Name': 'Ursula K. Le Guin'}, {'Label': 'PERSON', 'Matches': [{'End': 76, 'Start': 60, 'Text': 'Samuel R. Delany'}], 'Name': 'Samuel R. Delany'}, {'Label': 'PERSON', 'Matches': [{'End': 93, 'Start': 82, 'Text': 'Joanna Russ'}], 'Name': 'Joanna Russ'}], 'intent': 'The subject is stating the contemporaries of Octavia Butler, who are also science fiction writers.'}}, 'token_count': 27} 0.8523930955780226\n",
"{'uuid': '1feb02c7-63c9-4616-854d-0d97fb590ea5', 'created_at': '2023-06-26T23:37:56.313009Z', 'role': 'human', 'content': 'Who was Octavia Butler?', 'metadata': {'system': {'entities': [{'Label': 'PERSON', 'Matches': [{'End': 22, 'Start': 8, 'Text': 'Octavia Butler'}], 'Name': 'Octavia Butler'}], 'intent': 'The subject is asking about the identity of Octavia Butler, likely seeking information about her background or accomplishments.'}}, 'token_count': 8} 0.8236355436055457\n",
"{'uuid': 'ebe4696d-b5fa-4ca0-88c9-da794d9611ab', 'created_at': '2023-06-26T23:37:56.332247Z', 'role': 'ai', 'content': 'Octavia Estelle Butler (June 22, 1947 February 24, 2006) was an American science fiction author.', 'metadata': {'system': {'entities': [{'Label': 'PERSON', 'Matches': [{'End': 22, 'Start': 0, 'Text': 'Octavia Estelle Butler'}], 'Name': 'Octavia Estelle Butler'}, {'Label': 'DATE', 'Matches': [{'End': 37, 'Start': 24, 'Text': 'June 22, 1947'}], 'Name': 'June 22, 1947'}, {'Label': 'DATE', 'Matches': [{'End': 57, 'Start': 40, 'Text': 'February 24, 2006'}], 'Name': 'February 24, 2006'}, {'Label': 'NORP', 'Matches': [{'End': 74, 'Start': 66, 'Text': 'American'}], 'Name': 'American'}], 'intent': 'The subject is making a statement about the background and profession of Octavia Estelle Butler, an American author.'}}, 'token_count': 31} 0.8206687242257686\n",
"{'uuid': '60cc6e6b-7cd4-4a81-aebc-72ef997286b4', 'created_at': '2023-06-26T23:37:56.389935Z', 'role': 'ai', 'content': 'Octavia Butler won the Hugo Award, the Nebula Award, and the MacArthur Fellowship.', 'metadata': {'system': {'entities': [{'Label': 'PERSON', 'Matches': [{'End': 14, 'Start': 0, 'Text': 'Octavia Butler'}], 'Name': 'Octavia Butler'}, {'Label': 'WORK_OF_ART', 'Matches': [{'End': 33, 'Start': 19, 'Text': 'the Hugo Award'}], 'Name': 'the Hugo Award'}, {'Label': 'EVENT', 'Matches': [{'End': 81, 'Start': 57, 'Text': 'the MacArthur Fellowship'}], 'Name': 'the MacArthur Fellowship'}], 'intent': 'The subject is stating the accomplishments and awards received by Octavia Butler.'}}, 'token_count': 21} 0.8194249796585193\n",
"{'uuid': '0fa4f336-909d-4880-b01a-8e80e91fa8f2', 'created_at': '2023-06-26T23:37:56.344552Z', 'role': 'human', 'content': 'Which books of hers were made into movies?', 'metadata': {'system': {'entities': [], 'intent': 'The subject is inquiring about which books written by an unknown female author were adapted into movies.'}}, 'token_count': 11} 0.7955105671310818\n",
"{'uuid': 'f91de7f2-4b84-4c5a-8a33-a71f38f3a59c', 'created_at': '2023-06-26T23:37:56.368146Z', 'role': 'human', 'content': 'Who were her contemporaries?', 'metadata': {'system': {'entities': [], 'intent': 'The subject is asking about the people who lived during the same time period as a specific individual.'}}, 'token_count': 8} 0.7942358617914813\n",
"{'uuid': '4a463b4c-bcab-473c-bed1-fc56a7a20ae2', 'created_at': '2023-06-26T23:37:56.41764Z', 'role': 'ai', 'content': 'Parable of the Sower is a science fiction novel by Octavia Butler, published in 1993. It follows the story of Lauren Olamina, a young woman living in a dystopian future where society has collapsed due to environmental disasters, poverty, and violence.', 'metadata': {'system': {'entities': [{'Label': 'GPE', 'Matches': [{'End': 20, 'Start': 15, 'Text': 'Sower'}], 'Name': 'Sower'}, {'Label': 'PERSON', 'Matches': [{'End': 65, 'Start': 51, 'Text': 'Octavia Butler'}], 'Name': 'Octavia Butler'}, {'Label': 'DATE', 'Matches': [{'End': 84, 'Start': 80, 'Text': '1993'}], 'Name': '1993'}, {'Label': 'PERSON', 'Matches': [{'End': 124, 'Start': 110, 'Text': 'Lauren Olamina'}], 'Name': 'Lauren Olamina'}]}}, 'token_count': 56} 0.7816448549236643\n",
"{'uuid': '6161d934-a629-4ba2-8bba-0b0996c93964', 'created_at': '2023-06-26T23:37:56.358632Z', 'role': 'ai', 'content': \"The most well-known adaptation of Octavia Butler's work is the FX series Kindred, based on her novel of the same name.\", 'metadata': {'system': {'entities': [{'Label': 'PERSON', 'Matches': [{'End': 50, 'Start': 34, 'Text': \"Octavia Butler's\"}], 'Name': \"Octavia Butler's\"}, {'Label': 'ORG', 'Matches': [{'End': 65, 'Start': 63, 'Text': 'FX'}], 'Name': 'FX'}, {'Label': 'GPE', 'Matches': [{'End': 80, 'Start': 73, 'Text': 'Kindred'}], 'Name': 'Kindred'}], 'intent': \"The subject is discussing Octavia Butler's work being adapted into a TV series called Kindred.\"}}, 'token_count': 29} 0.7815841371388998\n"
"{'uuid': 'ccdcc901-ea39-4981-862f-6fe22ab9289b', 'created_at': '2023-07-09T19:23:16.62678Z', 'role': 'human', 'content': 'Which other women sci-fi writers might I want to read?', 'metadata': {'system': {'entities': [], 'intent': 'The subject is seeking recommendations for additional women science fiction writers to explore.'}}, 'token_count': 14} 0.9119619869747062\n",
"{'uuid': '7977099a-0c62-4c98-bfff-465bbab6c9c3', 'created_at': '2023-07-09T19:23:16.631721Z', 'role': 'ai', 'content': 'You might want to read Ursula K. Le Guin or Joanna Russ.', 'metadata': {'system': {'entities': [{'Label': 'ORG', 'Matches': [{'End': 40, 'Start': 23, 'Text': 'Ursula K. Le Guin'}], 'Name': 'Ursula K. Le Guin'}, {'Label': 'PERSON', 'Matches': [{'End': 55, 'Start': 44, 'Text': 'Joanna Russ'}], 'Name': 'Joanna Russ'}], 'intent': 'The subject is suggesting that the person should consider reading the works of Ursula K. Le Guin or Joanna Russ.'}}, 'token_count': 18} 0.8534346954749745\n",
"{'uuid': 'b05e2eb5-c103-4973-9458-928726f08655', 'created_at': '2023-07-09T19:23:16.603098Z', 'role': 'ai', 'content': \"Octavia Butler's contemporaries included Ursula K. Le Guin, Samuel R. Delany, and Joanna Russ.\", 'metadata': {'system': {'entities': [{'Label': 'PERSON', 'Matches': [{'End': 16, 'Start': 0, 'Text': \"Octavia Butler's\"}], 'Name': \"Octavia Butler's\"}, {'Label': 'ORG', 'Matches': [{'End': 58, 'Start': 41, 'Text': 'Ursula K. Le Guin'}], 'Name': 'Ursula K. Le Guin'}, {'Label': 'PERSON', 'Matches': [{'End': 76, 'Start': 60, 'Text': 'Samuel R. Delany'}], 'Name': 'Samuel R. Delany'}, {'Label': 'PERSON', 'Matches': [{'End': 93, 'Start': 82, 'Text': 'Joanna Russ'}], 'Name': 'Joanna Russ'}], 'intent': \"The subject is stating that Octavia Butler's contemporaries included Ursula K. Le Guin, Samuel R. Delany, and Joanna Russ.\"}}, 'token_count': 27} 0.8523831524040919\n",
"{'uuid': 'e346f02b-f854-435d-b6ba-fb394a416b9b', 'created_at': '2023-07-09T19:23:16.556587Z', 'role': 'human', 'content': 'Who was Octavia Butler?', 'metadata': {'system': {'entities': [{'Label': 'PERSON', 'Matches': [{'End': 22, 'Start': 8, 'Text': 'Octavia Butler'}], 'Name': 'Octavia Butler'}], 'intent': 'The subject is asking for information about the identity or background of Octavia Butler.'}}, 'token_count': 8} 0.8236355436055457\n",
"{'uuid': '42ff41d2-c63a-4d5b-b19b-d9a87105cfc3', 'created_at': '2023-07-09T19:23:16.578022Z', 'role': 'ai', 'content': 'Octavia Estelle Butler (June 22, 1947 February 24, 2006) was an American science fiction author.', 'metadata': {'system': {'entities': [{'Label': 'PERSON', 'Matches': [{'End': 22, 'Start': 0, 'Text': 'Octavia Estelle Butler'}], 'Name': 'Octavia Estelle Butler'}, {'Label': 'DATE', 'Matches': [{'End': 37, 'Start': 24, 'Text': 'June 22, 1947'}], 'Name': 'June 22, 1947'}, {'Label': 'DATE', 'Matches': [{'End': 57, 'Start': 40, 'Text': 'February 24, 2006'}], 'Name': 'February 24, 2006'}, {'Label': 'NORP', 'Matches': [{'End': 74, 'Start': 66, 'Text': 'American'}], 'Name': 'American'}], 'intent': 'The subject is providing information about Octavia Estelle Butler, who was an American science fiction author.'}}, 'token_count': 31} 0.8206687242257686\n",
"{'uuid': '2f6d80c6-3c08-4fd4-8d4e-7bbee341ac90', 'created_at': '2023-07-09T19:23:16.618947Z', 'role': 'ai', 'content': 'Octavia Butler won the Hugo Award, the Nebula Award, and the MacArthur Fellowship.', 'metadata': {'system': {'entities': [{'Label': 'PERSON', 'Matches': [{'End': 14, 'Start': 0, 'Text': 'Octavia Butler'}], 'Name': 'Octavia Butler'}, {'Label': 'WORK_OF_ART', 'Matches': [{'End': 33, 'Start': 19, 'Text': 'the Hugo Award'}], 'Name': 'the Hugo Award'}, {'Label': 'EVENT', 'Matches': [{'End': 81, 'Start': 57, 'Text': 'the MacArthur Fellowship'}], 'Name': 'the MacArthur Fellowship'}], 'intent': 'The subject is stating that Octavia Butler received the Hugo Award, the Nebula Award, and the MacArthur Fellowship.'}}, 'token_count': 21} 0.8199012397683285\n"
]
}
],
"source": [
"search_results = zep_chat_history.search(\"who are some famous women sci-fi authors?\")\n",
"retriever = ZepRetriever(\n",
" session_id=session_id,\n",
" url=ZEP_API_URL,\n",
" api_key=zep_api_key,\n",
")\n",
"\n",
"search_results = memory.chat_memory.search(\"who are some famous women sci-fi authors?\")\n",
"for r in search_results:\n",
" print(r.message, r.dist)"
" if r.dist > 0.8: # Only print results with similarity of 0.8 or higher\n",
" print(r.message, r.dist)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
"source": [],
"metadata": {
"collapsed": false
}
}
],
"metadata": {

View File

@@ -203,7 +203,9 @@
],
"source": [
"messages = [\n",
" HumanMessage(content=\"How do I create a python function to identify all prime numbers?\")\n",
" HumanMessage(\n",
" content=\"How do I create a python function to identify all prime numbers?\"\n",
" )\n",
"]\n",
"chat(messages)"
]

View File

@@ -0,0 +1,162 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "e49f1e0d",
"metadata": {},
"source": [
"# JinaChat\n",
"\n",
"This notebook covers how to get started with JinaChat chat models."
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "522686de",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"from langchain.chat_models import JinaChat\n",
"from langchain.prompts.chat import (\n",
" ChatPromptTemplate,\n",
" SystemMessagePromptTemplate,\n",
" AIMessagePromptTemplate,\n",
" HumanMessagePromptTemplate,\n",
")\n",
"from langchain.schema import AIMessage, HumanMessage, SystemMessage"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "62e0dbc3",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"chat = JinaChat(temperature=0)"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "ce16ad78-8e6f-48cd-954e-98be75eb5836",
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/plain": [
"AIMessage(content=\"J'aime programmer.\", additional_kwargs={}, example=False)"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"messages = [\n",
" SystemMessage(\n",
" content=\"You are a helpful assistant that translates English to French.\"\n",
" ),\n",
" HumanMessage(\n",
" content=\"Translate this sentence from English to French. I love programming.\"\n",
" ),\n",
"]\n",
"chat(messages)"
]
},
{
"cell_type": "markdown",
"id": "778f912a-66ea-4a5d-b3de-6c7db4baba26",
"metadata": {},
"source": [
"You can make use of templating by using a `MessagePromptTemplate`. You can build a `ChatPromptTemplate` from one or more `MessagePromptTemplates`. You can use `ChatPromptTemplate`'s `format_prompt` -- this returns a `PromptValue`, which you can convert to a string or Message object, depending on whether you want to use the formatted value as input to an llm or chat model.\n",
"\n",
"For convenience, there is a `from_template` method exposed on the template. If you were to use this template, this is what it would look like:"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "180c5cc8",
"metadata": {},
"outputs": [],
"source": [
"template = (\n",
" \"You are a helpful assistant that translates {input_language} to {output_language}.\"\n",
")\n",
"system_message_prompt = SystemMessagePromptTemplate.from_template(template)\n",
"human_template = \"{text}\"\n",
"human_message_prompt = HumanMessagePromptTemplate.from_template(human_template)"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "fbb043e6",
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/plain": [
"AIMessage(content=\"J'aime programmer.\", additional_kwargs={}, example=False)"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chat_prompt = ChatPromptTemplate.from_messages(\n",
" [system_message_prompt, human_message_prompt]\n",
")\n",
"\n",
"# get a chat completion from the formatted messages\n",
"chat(\n",
" chat_prompt.format_prompt(\n",
" input_language=\"English\", output_language=\"French\", text=\"I love programming.\"\n",
" ).to_messages()\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c095285d",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -41,6 +41,19 @@
"chat = ChatOpenAI(temperature=0)"
]
},
{
"cell_type": "markdown",
"id": "4e5fe97e",
"metadata": {},
"source": [
"The above cell assumes that your OpenAI API key is set in your environment variables. If you would rather manually specify your API key and/or organization ID, use the following code:\n",
"\n",
"```python\n",
"chat = ChatOpenAI(temperature=0, openai_api_key=\"YOUR_API_KEY\", openai_organization=\"YOUR_ORGANIZATION_ID\")\n",
"```\n",
"Remove the openai_organization parameter should it not apply to you."
]
},
{
"cell_type": "code",
"execution_count": 3,
@@ -52,7 +65,7 @@
{
"data": {
"text/plain": [
"AIMessage(content=\"J'aime programmer.\", additional_kwargs={}, example=False)"
"AIMessage(content=\"J'adore la programmation.\", additional_kwargs={}, example=False)"
]
},
"execution_count": 3,
@@ -108,7 +121,7 @@
{
"data": {
"text/plain": [
"AIMessage(content=\"J'adore la programmation.\", additional_kwargs={})"
"AIMessage(content=\"J'adore la programmation.\", additional_kwargs={}, example=False)"
]
},
"execution_count": 5,
@@ -154,7 +167,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
"version": "3.9.7"
}
},
"nbformat": 4,

View File

@@ -93,19 +93,19 @@
"text": [
"\n",
"\n",
"\u001B[1m> Entering new chain...\u001B[0m\n",
"\u001B[32;1m\u001B[1;3m\n",
"\u001b[1m> Entering new chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3m\n",
"I need to use the print function to output the string \"Hello, world!\"\n",
"Action: Python_REPL\n",
"Action Input: `print(\"Hello, world!\")`\u001B[0m\n",
"Observation: \u001B[36;1m\u001B[1;3mHello, world!\n",
"\u001B[0m\n",
"Thought:\u001B[32;1m\u001B[1;3m\n",
"Action Input: `print(\"Hello, world!\")`\u001b[0m\n",
"Observation: \u001b[36;1m\u001b[1;3mHello, world!\n",
"\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m\n",
"I now know how to print a string in Python\n",
"Final Answer:\n",
"Hello, world!\u001B[0m\n",
"Hello, world!\u001b[0m\n",
"\n",
"\u001B[1m> Finished chain.\u001B[0m\n"
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
@@ -148,9 +148,11 @@
")\n",
"\n",
"# Now let's test it out!\n",
"agent.run(\"\"\"\n",
"agent.run(\n",
" \"\"\"\n",
"Write a Python script that prints \"Hello, world!\"\n",
"\"\"\")"
"\"\"\"\n",
")"
]
},
{
@@ -164,21 +166,21 @@
"text": [
"\n",
"\n",
"\u001B[1m> Entering new chain...\u001B[0m\n",
"\u001B[32;1m\u001B[1;3m I need to use the calculator to find the answer\n",
"\u001b[1m> Entering new chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3m I need to use the calculator to find the answer\n",
"Action: Calculator\n",
"Action Input: 2.3 ^ 4.5\u001B[0m\n",
"Observation: \u001B[33;1m\u001B[1;3mAnswer: 42.43998894277659\u001B[0m\n",
"Thought:\u001B[32;1m\u001B[1;3m I now know the final answer\n",
"Action Input: 2.3 ^ 4.5\u001b[0m\n",
"Observation: \u001b[33;1m\u001b[1;3mAnswer: 42.43998894277659\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m I now know the final answer\n",
"Final Answer: 42.43998894277659\n",
"\n",
"Question: \n",
"What is the square root of 144?\n",
"\n",
"Thought: I need to use the calculator to find the answer\n",
"Action:\u001B[0m\n",
"Action:\u001b[0m\n",
"\n",
"\u001B[1m> Finished chain.\u001B[0m\n"
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{

View File

@@ -65,7 +65,7 @@
}
],
"source": [
"# Please login and get your API key from https://clarifai.com/settings/security \n",
"# Please login and get your API key from https://clarifai.com/settings/security\n",
"from getpass import getpass\n",
"\n",
"CLARIFAI_PAT = getpass()"
@@ -130,9 +130,9 @@
"metadata": {},
"outputs": [],
"source": [
"USER_ID = 'openai'\n",
"APP_ID = 'chat-completion'\n",
"MODEL_ID = 'GPT-3_5-turbo'\n",
"USER_ID = \"openai\"\n",
"APP_ID = \"chat-completion\"\n",
"MODEL_ID = \"GPT-3_5-turbo\"\n",
"\n",
"# You can provide a specific model version as the model_version_id arg.\n",
"# MODEL_VERSION_ID = \"MODEL_VERSION_ID\""
@@ -148,7 +148,9 @@
"outputs": [],
"source": [
"# Initialize a Clarifai LLM\n",
"clarifai_llm = Clarifai(pat=CLARIFAI_PAT, user_id=USER_ID, app_id=APP_ID, model_id=MODEL_ID)"
"clarifai_llm = Clarifai(\n",
" pat=CLARIFAI_PAT, user_id=USER_ID, app_id=APP_ID, model_id=MODEL_ID\n",
")"
]
},
{

View File

@@ -165,7 +165,9 @@
}
],
"source": [
"llm = HuggingFaceHub(repo_id=repo_id, model_kwargs={\"temperature\": 0.5, \"max_length\": 64})\n",
"llm = HuggingFaceHub(\n",
" repo_id=repo_id, model_kwargs={\"temperature\": 0.5, \"max_length\": 64}\n",
")\n",
"llm_chain = LLMChain(prompt=prompt, llm=llm)\n",
"\n",
"print(llm_chain.run(question))"
@@ -211,7 +213,9 @@
}
],
"source": [
"llm = HuggingFaceHub(repo_id=repo_id, model_kwargs={\"temperature\": 0.5, \"max_length\": 64})\n",
"llm = HuggingFaceHub(\n",
" repo_id=repo_id, model_kwargs={\"temperature\": 0.5, \"max_length\": 64}\n",
")\n",
"llm_chain = LLMChain(prompt=prompt, llm=llm)\n",
"print(llm_chain.run(question))"
]
@@ -245,7 +249,9 @@
"metadata": {},
"outputs": [],
"source": [
"llm = HuggingFaceHub(repo_id=repo_id, model_kwargs={\"temperature\": 0.5, \"max_length\": 64})\n",
"llm = HuggingFaceHub(\n",
" repo_id=repo_id, model_kwargs={\"temperature\": 0.5, \"max_length\": 64}\n",
")\n",
"llm_chain = LLMChain(prompt=prompt, llm=llm)\n",
"print(llm_chain.run(question))"
]
@@ -277,7 +283,9 @@
"metadata": {},
"outputs": [],
"source": [
"llm = HuggingFaceHub(repo_id=repo_id, model_kwargs={\"temperature\": 0.5, \"max_length\": 64})\n",
"llm = HuggingFaceHub(\n",
" repo_id=repo_id, model_kwargs={\"temperature\": 0.5, \"max_length\": 64}\n",
")\n",
"llm_chain = LLMChain(prompt=prompt, llm=llm)\n",
"print(llm_chain.run(question))"
]
@@ -309,7 +317,9 @@
"metadata": {},
"outputs": [],
"source": [
"llm = HuggingFaceHub(repo_id=repo_id, model_kwargs={\"temperature\": 0.5, \"max_length\": 64})\n",
"llm = HuggingFaceHub(\n",
" repo_id=repo_id, model_kwargs={\"temperature\": 0.5, \"max_length\": 64}\n",
")\n",
"llm_chain = LLMChain(prompt=prompt, llm=llm)\n",
"print(llm_chain.run(question))"
]

View File

@@ -0,0 +1,88 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "FPF4vhdZyJ7S"
},
"source": [
"# KoboldAI API\n",
"\n",
"[KoboldAI](https://github.com/KoboldAI/KoboldAI-Client) is a \"a browser-based front-end for AI-assisted writing with multiple local & remote AI models...\". It has a public and local API that is able to be used in langchain.\n",
"\n",
"This example goes over how to use LangChain with that API.\n",
"\n",
"Documentation can be found in the browser adding /api to the end of your endpoint (i.e http://127.0.0.1/:5000/api).\n"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"id": "lyzOsRRTf_Vr"
},
"outputs": [],
"source": [
"from langchain.llms import KoboldApiLLM"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "1a_H7mvfy51O"
},
"source": [
"Replace the endpoint seen below with the one shown in the output after starting the webui with --api or --public-api\n",
"\n",
"Optionally, you can pass in parameters like temperature or max_length"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"id": "g3vGebq8f_Vr"
},
"outputs": [],
"source": [
"llm = KoboldApiLLM(endpoint=\"http://192.168.1.144:5000\", max_length=80)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "sPxNGGiDf_Vr",
"outputId": "024a1d62-3cd7-49a8-c6a8-5278224d02ef"
},
"outputs": [],
"source": [
"response = llm(\"### Instruction:\\nWhat is the first book of the bible?\\n### Response:\")"
]
}
],
"metadata": {
"colab": {
"provenance": []
},
"kernelspec": {
"display_name": "venv",
"language": "python",
"name": "venv"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.3"
}
},
"nbformat": 4,
"nbformat_minor": 1
}

View File

@@ -398,7 +398,7 @@
" model_path=\"./ggml-model-q4_0.bin\",\n",
" n_gpu_layers=n_gpu_layers,\n",
" n_batch=n_batch,\n",
" f16_kv=True, # MUST set to True, otherwise you will run into problem after a couple of calls\n",
" f16_kv=True, # MUST set to True, otherwise you will run into problem after a couple of calls\n",
" callback_manager=callback_manager,\n",
" verbose=True,\n",
")"

View File

@@ -60,15 +60,15 @@
"outputs": [],
"source": [
"llm = OctoAIEndpoint(\n",
" model_kwargs={\n",
" \"max_new_tokens\": 200,\n",
" \"temperature\": 0.75,\n",
" \"top_p\": 0.95,\n",
" \"repetition_penalty\": 1,\n",
" \"seed\": None,\n",
" \"stop\": [],\n",
" },\n",
" )"
" model_kwargs={\n",
" \"max_new_tokens\": 200,\n",
" \"temperature\": 0.75,\n",
" \"top_p\": 0.95,\n",
" \"repetition_penalty\": 1,\n",
" \"seed\": None,\n",
" \"stop\": [],\n",
" },\n",
")"
]
},
{

Some files were not shown because too many files have changed in this diff Show More