Compare commits

..

46 Commits

Author SHA1 Message Date
Dev 2049
f585557e92 undo 2023-05-23 19:34:29 -07:00
Dev 2049
ef29276496 cr 2023-05-23 19:34:10 -07:00
Fabrizio Ruocco
13760d3bff azuresearch as optional 2023-05-23 21:12:37 +00:00
Fabrizio Ruocco
42a5300137 fix linting 2023-05-23 20:53:23 +00:00
Fabrizio Ruocco
228023cfa3 fixing poetry for older version 2023-05-23 20:52:39 +00:00
Fabrizio Ruocco
11433f6955 azure cognitive search as vector store 2023-05-23 22:14:48 +02:00
Davis Chase
753f4cfc26 bump 178 (#5130) 2023-05-23 07:43:56 -07:00
Ayan Bandyopadhyay
5c87dbf5a8 Add link to Psychic from document loaders documentation page (#5115)
# Add link to Psychic from document loaders documentation page

In my previous PR I forgot to update `document_loaders.rst` to link to
`psychic.ipynb` to make it discoverable from the main documentation.
2023-05-23 06:47:23 -07:00
Tian Wei
d7f807b71f Add AzureCognitiveServicesToolkit to call Azure Cognitive Services API (#5012)
# Add AzureCognitiveServicesToolkit to call Azure Cognitive Services
API: achieve some multimodal capabilities

This PR adds a toolkit named AzureCognitiveServicesToolkit which bundles
the following tools:
- AzureCogsImageAnalysisTool: calls Azure Cognitive Services image
analysis API to extract caption, objects, tags, and text from images.
- AzureCogsFormRecognizerTool: calls Azure Cognitive Services form
recognizer API to extract text, tables, and key-value pairs from
documents.
- AzureCogsSpeech2TextTool: calls Azure Cognitive Services speech to
text API to transcribe speech to text.
- AzureCogsText2SpeechTool: calls Azure Cognitive Services text to
speech API to synthesize text to speech.

This toolkit can be used to process image, document, and audio inputs.
---------

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-23 06:45:48 -07:00
Jamie Broomall
d4fd589638 WhyLabs callback (#4906)
# Add a WhyLabs callback handler

* Adds a simple WhyLabsCallbackHandler
* Add required dependencies as optional
* protect against missing modules with imports
* Add docs/ecosystem basic example

based on initial prototype from @andrewelizondo

> this integration gathers privacy preserving telemetry on text with
whylogs and sends stastical profiles to WhyLabs platform to monitoring
these metrics over time. For more information on what WhyLabs is see:
https://whylabs.ai

After you run the notebook (if you have env variables set for the API
Keys, org_id and dataset_id) you get something like this in WhyLabs:
![Screenshot
(443)](https://github.com/hwchase17/langchain/assets/88007022/6bdb3e1c-4243-4ae8-b974-23a8bb12edac)

Co-authored-by: Andre Elizondo <andre@whylabs.ai>
Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-22 20:29:47 -07:00
Eugene Yurtsev
d56313acba Improve effeciency of TextSplitter.split_documents, iterate once (#5111)
# Improve TextSplitter.split_documents, collect page_content and
metadata in one iteration

## Who can review?

Community members can review the PR once tests pass. Tag
maintainers/contributors who might be interested:

@eyurtsev In the case where documents is a generator that can only be
iterated once making this change is a huge help. Otherwise a silent
issue happens where metadata is empty for all documents when documents
is a generator. So we expand the argument from `List[Document]` to
`Union[Iterable[Document], Sequence[Document]]`

---------

Co-authored-by: Steven Tartakovsky <tartakovsky.developer@gmail.com>
2023-05-22 23:00:24 -04:00
Jettro Coenradie
b950022894 Fixes issue #5072 - adds additional support to Weaviate (#5085)
Implementation is similar to search_distance and where_filter

# adds 'additional' support to Weaviate queries

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-22 18:57:10 -07:00
Zander Chase
87bba2e8d3 Pass Dataset Name by Name not Position (#5108)
Pass dataset name by name
2023-05-23 01:21:39 +00:00
Matt Rickard
de6a401a22 Add OpenLM LLM multi-provider (#4993)
OpenLM is a zero-dependency OpenAI-compatible LLM provider that can call
different inference endpoints directly via HTTP. It implements the
OpenAI Completion class so that it can be used as a drop-in replacement
for the OpenAI API. This changeset utilizes BaseOpenAI for minimal added
code.

---------

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-22 18:09:53 -07:00
Gergely Imreh
69de33e024 Add Mastodon toots loader (#5036)
# Add Mastodon toots loader.

Loader works either with public toots, or Mastodon app credentials. Toot
text and user info is loaded.

I've also added integration test for this new loader as it works with
public data, and a notebook with example output run now.

---------

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-22 16:43:07 -07:00
mbchang
e173e032bc fix: assign current_time to datetime.now() if current_time is None (#5045)
# Assign `current_time` to `datetime.now()` if it `current_time is None`
in `time_weighted_retriever`

Fixes #4825 

As implemented, `add_documents` in `TimeWeightedVectorStoreRetriever`
assigns `doc.metadata["last_accessed_at"]` and
`doc.metadata["created_at"]` to `datetime.datetime.now()` if
`current_time` is not in `kwargs`.
```python
    def add_documents(self, documents: List[Document], **kwargs: Any) -> List[str]:
        """Add documents to vectorstore."""
        current_time = kwargs.get("current_time", datetime.datetime.now())
        # Avoid mutating input documents
        dup_docs = [deepcopy(d) for d in documents]
        for i, doc in enumerate(dup_docs):
            if "last_accessed_at" not in doc.metadata:
                doc.metadata["last_accessed_at"] = current_time
            if "created_at" not in doc.metadata:
                doc.metadata["created_at"] = current_time
            doc.metadata["buffer_idx"] = len(self.memory_stream) + i
        self.memory_stream.extend(dup_docs)
        return self.vectorstore.add_documents(dup_docs, **kwargs)
``` 
However, from the way `add_documents` is being called from
`GenerativeAgentMemory`, `current_time` is set as a `kwarg`, but it is
given a value of `None`:
```python
    def add_memory(
        self, memory_content: str, now: Optional[datetime] = None
    ) -> List[str]:
        """Add an observation or memory to the agent's memory."""
        importance_score = self._score_memory_importance(memory_content)
        self.aggregate_importance += importance_score
        document = Document(
            page_content=memory_content, metadata={"importance": importance_score}
        )
        result = self.memory_retriever.add_documents([document], current_time=now)
```
The default of `now` was set in #4658 to be None. The proposed fix is
the following:
```python
    def add_documents(self, documents: List[Document], **kwargs: Any) -> List[str]:
        """Add documents to vectorstore."""
        current_time = kwargs.get("current_time", datetime.datetime.now())
        # `current_time` may exist in kwargs, but may still have the value of None.
        if current_time is None:
            current_time = datetime.datetime.now()
```
Alternatively, we could just set the default of `now` to be
`datetime.datetime.now()` everywhere instead. Thoughts @hwchase17? If we
still want to keep the default to be `None`, then this PR should fix the
above issue. If we want to set the default to be
`datetime.datetime.now()` instead, I can update this PR with that
alternative fix. EDIT: seems like from #5018 it looks like we would
prefer to keep the default to be `None`, in which case this PR should
fix the error.
2023-05-22 15:47:03 -07:00
Leonid Ganeline
c28cc0f1ac changed ValueError to ImportError (#5103)
# changed ValueError to ImportError

Code cleaning.
Fixed inconsistencies in ImportError handling. Sometimes it raises
ImportError and sometime ValueError.
I've changed all cases to the `raise ImportError`
Also:
- added installation instruction in the error message, where it missed;
- fixed several installation instructions in the error message;
- fixed several error handling in regards to the ImportError
2023-05-22 15:24:45 -07:00
venetisgr
5e47c648ed Update serpapi.py (#4947)
Added link option in  _process_response

<!--
In _process_respons "snippet" provided non working links for the case
that "links" had the correct answer. Thus added an elif statement before
snippet
-->

<!-- Remove if not applicable -->

Fixes # (issue)
In _process_response link provided correct answers while the snippet
reply provided non working links

@vowelparrot 
## Before submitting

<!-- If you're adding a new integration, include an integration test and
an example notebook showing its use! -->

## Who can review?

Community members can review the PR once tests pass. Tag
maintainers/contributors who might be interested:

<!-- For a quicker response, figure out the right person to tag with @

        @hwchase17 - project lead

        Tracing / Callbacks
        - @agola11

        Async
        - @agola11

        DataLoaders
        - @eyurtsev

        Models
        - @hwchase17
        - @agola11

        Agents / Tools / Toolkits
        - @vowelparrot
        
        VectorStores / Retrievers / Memory
        - @dev2049
        
 -->

---------

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-22 13:34:36 -07:00
Ankit Arya
5b2b436fab Fixed import error for AutoGPT e.g. from langchain.experimental.auton… (#5101)
`from langchain.experimental.autonomous_agents.autogpt.agent import
AutoGPT` results in an import error as AutoGPT is not defined in the
__init__.py file

https://python.langchain.com/en/latest/use_cases/autonomous_agents/marathon_times.html

An Alternate, way would be to be directly update the import statement to
be `from langchain.experimental import AutoGPT`

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-22 13:26:25 -07:00
Ankush Gola
467ca6f025 update langchainplus client and docker file to reflect port changes (#5005)
# Currently, only the dev images are updated
2023-05-22 12:53:05 -07:00
Shawn91
9e649462ce fix: add_texts method of Weaviate vector store creats wrong embeddings (#4933)
# fix a bug in the add_texts method of Weaviate vector store that creats
wrong embeddings

The following is the original code in the `add_texts` method of the
Weaviate vector store, from line 131 to 153, which contains a bug. The
code here includes some extra explanations in the form of comments and
some omissions.

```python
            for i, doc in enumerate(texts):

                # some code omitted

                if self._embedding is not None:
                    # variable texts is a list of string and doc here is just a string. 
                    # list(doc) actually breaks up the string into characters.
                    # so, embeddings[0] is just the embedding of the first character
                    embeddings = self._embedding.embed_documents(list(doc))
                    batch.add_data_object(
                        data_object=data_properties,
                        class_name=self._index_name,
                        uuid=_id,
                        vector=embeddings[0],
                    )
```

To fix this bug, I pulled the embedding operation out of the for loop
and embed all texts at once.

Co-authored-by: Shawn91 <zyx199199@qq.com>
Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-22 12:35:52 -07:00
Eduard van Valkenburg
1cb04f2b26 PowerBI major refinement in working of tool and tweaks in the rest (#5090)
# PowerBI major refinement in working of tool and tweaks in the rest

I've gained some experience with more complex sets and the earlier
implementation had too many tries by the agent to create DAX, so
refactored the code to run the LLM to create dax based on a question and
then immediately run the same against the dataset, with retries and a
prompt that includes the error for the retry. This works much better!

Also did some other refactoring of the inner workings, making things
clearer, more concise and faster.
2023-05-22 11:58:28 -07:00
hwaking
e57ebf3922 add get_top_k_cosine_similarity method to get max top k score and index (#5059)
# Row-wise cosine similarity between two equal-width matrices and return
the max top_k score and index, the score all greater than
threshold_score.

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-22 11:55:48 -07:00
Donger
039f8f1abb Add the usage of SSL certificates for Elasticsearch and user password authentication (#5058)
Enhance the code to support SSL authentication for Elasticsearch when
using the VectorStore module, as previous versions did not provide this
capability.
@dev2049

---------

Co-authored-by: caidong <zhucaidong1992@gmail.com>
Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-22 11:51:32 -07:00
Andreas Liebschner
44dc959584 Improve pinecone hybrid search retriever adding metadata support (#5098)
# Improve pinecone hybrid search retriever adding metadata support

I simply remove the hardwiring of metadata to the existing
implementation allowing one to pass `metadatas` attribute to the
constructors and in `get_relevant_documents`. I also add one missing pip
install to the accompanying notebook (I am not adding dependencies, they
were pre-existing).

First contribution, just hoping to help, feel free to critique :) 
my twitter username is `@andreliebschner`

While looking at hybrid search I noticed #3043 and #1743. I think the
former can be closed as following the example right now (even prior to
my improvements) works just fine, the latter I think can be also closed
safely, maybe pointing out the relevant classes and example. Should I
reply those issues mentioning someone?

@dev2049, @hwchase17

---------

Co-authored-by: Andreas Liebschner <a.liebschner@shopfully.com>
2023-05-22 11:42:54 -07:00
Deepak S V
5cd12102be Improving Resilience of MRKL Agent (#5014)
This is a highly optimized update to the pull request
https://github.com/hwchase17/langchain/pull/3269

Summary:
1) Added ability to MRKL agent to self solve the ValueError(f"Could not
parse LLM output: `{llm_output}`") error, whenever llm (especially
gpt-3.5-turbo) does not follow the format of MRKL Agent, while returning
"Action:" & "Action Input:".
2) The way I am solving this error is by responding back to the llm with
the messages "Invalid Format: Missing 'Action:' after 'Thought:'" &
"Invalid Format: Missing 'Action Input:' after 'Action:'" whenever
Action: and Action Input: are not present in the llm output
respectively.

For a detailed explanation, look at the previous pull request.

New Updates:
1) Since @hwchase17 , requested in the previous PR to communicate the
self correction (error) message, using the OutputParserException, I have
added new ability to the OutputParserException class to store the
observation & previous llm_output in order to communicate it to the next
Agent's prompt. This is done, without breaking/modifying any of the
functionality OutputParserException previously performs (i.e.
OutputParserException can be used in the same way as before, without
passing any observation & previous llm_output too).

---------

Co-authored-by: Deepak S V <svdeepak99@users.noreply.github.com>
2023-05-22 11:08:08 -07:00
Michael Landis
6eacd88ae7 fix: revert docarray explicit transitive dependencies and use extras instead (#5015)
tldr: The docarray [integration
PR](https://github.com/hwchase17/langchain/pull/4483) introduced a
pinned dependency to protobuf. This is a docarray dependency, not a
langchain dependency. Since this is handled by the docarray
dependencies, it is unnecessary here.

Further, as a pinned dependency, this quickly leads to incompatibilities
with application code that consumes the library. Much less with a
heavily used library like protobuf.

Detail: as we see in the [docarray

integration](https://github.com/hwchase17/langchain/pull/4483/files#diff-50c86b7ed8ac2cf95bd48334961bf0530cdc77b5a56f852c5c61b89d735fd711R81-R83),
the transitive dependencies of docarray were also listed as langchain
dependencies. This is unnecessary as the docarray project has an
appropriate
[extras](a01a05542d/pyproject.toml (L70)).
The docarray project also does not require this _pinned_ version of
protobuf, rather [a minimum
version](a01a05542d/pyproject.toml (L41)).
So this pinned version was likely in error.

To fix this, this PR reverts the explicit hnswlib and protobuf
dependencies and adds the hnswlib extras install for docarray (which
installs hnswlib and protobuf, as originally intended). Because version
`0.32.0`
of the docarray hnswlib extras added protobuf, we bump the docarray
dependency from `^0.31.0` to `^0.32.0`.

# revert docarray explicit transitive dependencies and use extras
instead

## Who can review?

@dev2049 -- reviewed the original PR
@eyurtsev -- bumped the pinned protobuf dependency a few days ago

---------

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-22 12:48:09 -04:00
Davis Chase
fcd88bccb3 Bump 177 (#5095) 2023-05-22 08:19:06 -07:00
Harrison Chase
10ba201d05 Harrison/neo4j (#5078)
Co-authored-by: Tomaz Bratanic <bratanic.tomaz@gmail.com>
Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-22 07:31:48 -07:00
Deepak S V
49ca02711e Improved query, print & exception handling in REPL Tool (#4997)
Update to pull request https://github.com/hwchase17/langchain/pull/3215

Summary:
1) Improved the sanitization of query (using regex), by removing python
command (since gpt-3.5-turbo sometimes assumes python console as a
terminal, and runs python command first which causes error). Also
sometimes 1 line python codes contain single backticks.
2) Added 7 new test cases.

For more details, view the previous pull request.

---------

Co-authored-by: Deepak S V <svdeepak99@users.noreply.github.com>
2023-05-22 13:43:44 +00:00
Zander Chase
785502edb3 Add 'get_token_ids' method (#4784)
Let user inspect the token ids in addition to getting th enumber of tokens

---------

Co-authored-by: Zach Schillaci <40636930+zachschillaci27@users.noreply.github.com>
2023-05-22 13:17:26 +00:00
Zander Chase
ef7d015be5 Separate Runner Functions from Client (#5079)
Extract the methods specific to running an LLM or Chain on a dataset to
separate utility functions.

This simplifies the client a bit and lets us separate concerns of LCP
details from running examples (e.g., for evals)
2023-05-22 05:28:47 +00:00
Leonid Ganeline
443ebe22f4 docs: Deployments page moved into Ecosystem/ (#4949)
# docs: `deployments` page moved into `ecosystem/`

The `Deployments` page moved into the `Ecosystem/` group

Small fixes:
- `index` page: fixed order of items in the `Modules` list, in the `Use
Cases` list
- item `References/Installation` was lost in the `index` page (not on
the Navbar!). Restored it.
- added `|` marker in several places.

NOTE: I also thought about moving the `Additional Resources/Gallery`
page into the `Ecosystem` group but decided to leave it unchanged.
Please, advise on this.

## Who can review?

Community members can review the PR once tests pass. Tag
maintainers/contributors who might be interested:
@dev2049
2023-05-21 21:18:22 -07:00
Hans van Dam
a395ff7c90 preserve language in conversation retrieval (#4969)
Without the addition of 'in its original language', the condensing
response, more often than not, outputs the rephrased question in
English, even when the conversation is in another language. This
question in English then transfers to the question in the retrieval
prompt and the chatbot is stuck in English.

I'm sometimes surprised that this does not happen more often, but
apparently the GPT models are smart enough to understand that when the
template contains

Question: ....
Answer:

then the answer should be in in the language of the question.
2023-05-21 21:16:03 -07:00
Matt Robinson
bf3f554357 feat: batch multiple files in a single Unstructured API request (#4525)
### Submit Multiple Files to the Unstructured API

Enables batching multiple files into a single Unstructured API requests.
Support for requests with multiple files was added to both
`UnstructuredAPIFileLoader` and `UnstructuredAPIFileIOLoader`. Note that
if you submit multiple files in "single" mode, the result will be
concatenated into a single document. We recommend using this feature in
"elements" mode.

### Testing

The following should load both documents, using two of the example docs
from the integration tests folder.

```python
    from langchain.document_loaders import UnstructuredAPIFileLoader

    file_paths = ["examples/layout-parser-paper.pdf",  "examples/whatsapp_chat.txt"]

    loader = UnstructuredAPIFileLoader(
        file_paths=file_paths,
        api_key="FAKE_API_KEY",
        strategy="fast",
        mode="elements",
    )
    docs = loader.load()
```
2023-05-21 20:48:20 -07:00
Harrison Chase
0c3de0a0b3 Merge branch 'master' of github.com:hwchase17/langchain 2023-05-21 09:22:43 -07:00
Harrison Chase
224f73e978 move docs 2023-05-21 09:22:35 -07:00
Harrison Chase
6c25f860fd bump to 176 (#5064) 2023-05-21 09:19:25 -07:00
Harrison Chase
b0431c672b Harrison/psychic (#5063)
Co-authored-by: Ayan Bandyopadhyay <ayanb9440@gmail.com>
Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-21 09:13:20 -07:00
Harrison Chase
8c661baefb change to type checking (#5062) 2023-05-21 09:09:49 -07:00
Jeffrey Zheng
424a573266 DOC: Misspelling in agents.rst documentation (#5038)
# Corrected Misspelling in agents.rst Documentation

<!--
Thank you for contributing to LangChain! Your PR will appear in our next
release under the title you set. Please make sure it highlights your
valuable contribution.

Replace this with a description of the change, the issue it fixes (if
applicable), and relevant context. List any dependencies required for
this change.

After you're done, someone will review your PR. They may suggest
improvements. If no one reviews your PR within a few days, feel free to
@-mention the same people again, as notifications can get
-->

In the
[documentation](https://python.langchain.com/en/latest/modules/agents.html)
it says "in fact, it is often best to have an Action Agent be in
**change** of the execution for the Plan and Execute agent."

**Suggested Change:** I propose correcting change to charge.

Fix for issue: #5039
2023-05-20 22:24:08 -07:00
Gengliang Wang
f9f08c4b69 Add documentation for Databricks integration (#5013)
# Add documentation for Databricks integration

This is a follow-up of https://github.com/hwchase17/langchain/pull/4702
It documents the details of how to integrate Databricks using langchain.
It also provides examples in a notebook.


## Who can review?
@dev2049 @hwchase17 since you are aware of the context. We will promote
the integration after this doc is ready. Thanks in advance!
2023-05-20 22:06:24 -07:00
tornikeo
a6ef20d7fe Fix annoying typo in docs (#5029)
# Fixes an annoying typo in docs

<!--
Thank you for contributing to LangChain! Your PR will appear in our next
release under the title you set. Please make sure it highlights your
valuable contribution.

Replace this with a description of the change, the issue it fixes (if
applicable), and relevant context. List any dependencies required for
this change.

After you're done, someone will review your PR. They may suggest
improvements. If no one reviews your PR within a few days, feel free to
@-mention the same people again, as notifications can get lost.
-->

<!-- Remove if not applicable -->

Fixes Annoying typo in docs - "Therefor" -> "Therefore". It's so
annoying to read that I just had to make this PR.
2023-05-20 22:02:21 -07:00
Davis Chase
9d1280d451 bump v175 (#5041) 2023-05-20 09:24:17 -07:00
UmerHA
7388248b3e Streaming only final output of agent (#2483) (#4630)
# Streaming only final output of agent (#2483)
As requested in issue #2483, this Callback allows to stream only the
final output of an agent (ie not the intermediate steps).

Fixes #2483

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-20 09:20:17 -07:00
Davis Chase
3bc0bf0079 fix prompt saving (#4987)
will add unit tests
2023-05-20 08:21:52 -07:00
146 changed files with 5929 additions and 1072 deletions

View File

@@ -67,8 +67,8 @@ For each module LangChain provides standard, extendable interfaces. LangChain al
./modules/models.rst
./modules/prompts.rst
./modules/indexes.md
./modules/memory.md
./modules/indexes.md
./modules/chains.md
./modules/agents.md
./modules/callbacks/getting_started.ipynb
@@ -115,8 +115,8 @@ Use Cases
./use_cases/tabular.rst
./use_cases/code.md
./use_cases/apis.md
./use_cases/summarization.md
./use_cases/extraction.md
./use_cases/summarization.md
./use_cases/evaluation.rst
@@ -126,7 +126,10 @@ Reference Docs
| Full documentation on all methods, classes, installation methods, and integration setups for LangChain.
- `LangChain Installation <./reference/installation.html>`_
- `Reference Documentation <./reference.html>`_
.. toctree::
:maxdepth: 1
:caption: Reference
@@ -141,14 +144,16 @@ Ecosystem
------------
| LangChain integrates a lot of different LLMs, systems, and products.
From the other side, many systems and products depend on LangChain.
It creates a vibrant and thriving ecosystem.
| From the other side, many systems and products depend on LangChain.
| It creates a vibrant and thriving ecosystem.
- `Integrations <./integrations.html>`_: Guides for how other products can be used with LangChain.
- `Dependents <./dependents.html>`_: List of repositories that use LangChain.
- `Deployments <./ecosystem/deployments.html>`_: A collection of instructions, code snippets, and template repositories for deploying LangChain apps.
.. toctree::
:maxdepth: 2
@@ -159,6 +164,7 @@ It creates a vibrant and thriving ecosystem.
./integrations.rst
./dependents.md
./ecosystem/deployments.md
Additional Resources
@@ -170,8 +176,6 @@ Additional Resources
- `Gallery <https://github.com/kyrolabs/awesome-langchain>`_: A collection of great projects that use Langchain, compiled by the folks at `Kyrolabs <https://kyrolabs.com>`_. Useful for finding inspiration and example implementations.
- `Deployments <./additional_resources/deployments.html>`_: A collection of instructions, code snippets, and template repositories for deploying LangChain apps.
- `Tracing <./additional_resources/tracing.html>`_: A guide on using tracing in LangChain to visualize the execution of chains and agents.
- `Model Laboratory <./additional_resources/model_laboratory.html>`_: Experimenting with different prompts, models, and chains is a big part of developing the best possible application. The ModelLaboratory makes it easy to do so.
@@ -191,7 +195,6 @@ Additional Resources
LangChainHub <https://github.com/hwchase17/langchain-hub>
Gallery <https://github.com/kyrolabs/awesome-langchain>
./additional_resources/deployments.md
./additional_resources/tracing.md
./additional_resources/model_laboratory.ipynb
Discord <https://discord.gg/6adMQxSpJS>

View File

@@ -6,7 +6,7 @@ LangChain integrates with many LLMs, systems, and products.
Integrations by Module
--------------------------------
Integrations grouped by the core LangChain module they map to:
| Integrations grouped by the core LangChain module they map to:
- `LLM Providers <./modules/models/llms/integrations.html>`_
@@ -23,7 +23,7 @@ Integrations grouped by the core LangChain module they map to:
All Integrations
-------------------------------------------
A comprehensive list of LLMs, systems, and products integrated with LangChain:
| A comprehensive list of LLMs, systems, and products integrated with LangChain:
.. toctree::

View File

@@ -0,0 +1,280 @@
{
"cells": [
{
"cell_type": "markdown",
"source": [
"# Databricks\n",
"\n",
"This notebook covers how to connect to the [Databricks runtimes](https://docs.databricks.com/runtime/index.html) and [Databricks SQL](https://www.databricks.com/product/databricks-sql) using the SQLDatabase wrapper of LangChain.\n",
"It is broken into 3 parts: installation and setup, connecting to Databricks, and examples."
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "markdown",
"source": [
"## Installation and Setup"
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "code",
"execution_count": 1,
"outputs": [],
"source": [
"!pip install databricks-sql-connector"
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "markdown",
"source": [
"## Connecting to Databricks\n",
"\n",
"You can connect to [Databricks runtimes](https://docs.databricks.com/runtime/index.html) and [Databricks SQL](https://www.databricks.com/product/databricks-sql) using the `SQLDatabase.from_databricks()` method.\n",
"\n",
"### Syntax\n",
"```python\n",
"SQLDatabase.from_databricks(\n",
" catalog: str,\n",
" schema: str,\n",
" host: Optional[str] = None,\n",
" api_token: Optional[str] = None,\n",
" warehouse_id: Optional[str] = None,\n",
" cluster_id: Optional[str] = None,\n",
" engine_args: Optional[dict] = None,\n",
" **kwargs: Any)\n",
"```\n",
"### Required Parameters\n",
"* `catalog`: The catalog name in the Databricks database.\n",
"* `schema`: The schema name in the catalog.\n",
"\n",
"### Optional Parameters\n",
"There following parameters are optional. When executing the method in a Databricks notebook, you don't need to provide them in most of the cases.\n",
"* `host`: The Databricks workspace hostname, excluding 'https://' part. Defaults to 'DATABRICKS_HOST' environment variable or current workspace if in a Databricks notebook.\n",
"* `api_token`: The Databricks personal access token for accessing the Databricks SQL warehouse or the cluster. Defaults to 'DATABRICKS_API_TOKEN' environment variable or a temporary one is generated if in a Databricks notebook.\n",
"* `warehouse_id`: The warehouse ID in the Databricks SQL.\n",
"* `cluster_id`: The cluster ID in the Databricks Runtime. If running in a Databricks notebook and both 'warehouse_id' and 'cluster_id' are None, it uses the ID of the cluster the notebook is attached to.\n",
"* `engine_args`: The arguments to be used when connecting Databricks.\n",
"* `**kwargs`: Additional keyword arguments for the `SQLDatabase.from_uri` method."
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "markdown",
"source": [
"## Examples"
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "code",
"execution_count": 2,
"outputs": [],
"source": [
"# Connecting to Databricks with SQLDatabase wrapper\n",
"from langchain import SQLDatabase\n",
"\n",
"db = SQLDatabase.from_databricks(catalog='samples', schema='nyctaxi')"
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "code",
"execution_count": 3,
"outputs": [],
"source": [
"# Creating a OpenAI Chat LLM wrapper\n",
"from langchain.chat_models import ChatOpenAI\n",
"\n",
"llm = ChatOpenAI(temperature=0, model_name=\"gpt-4\")"
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "markdown",
"source": [
"### SQL Chain example\n",
"\n",
"This example demonstrates the use of the [SQL Chain](https://python.langchain.com/en/latest/modules/chains/examples/sqlite.html) for answering a question over a Databricks database."
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "code",
"execution_count": 4,
"id": "36f2270b",
"metadata": {},
"outputs": [],
"source": [
"from langchain import SQLDatabaseChain\n",
"\n",
"db_chain = SQLDatabaseChain.from_llm(llm, db, verbose=True)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "4e2b5f25",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001B[1m> Entering new SQLDatabaseChain chain...\u001B[0m\n",
"What is the average duration of taxi rides that start between midnight and 6am?\n",
"SQLQuery:\u001B[32;1m\u001B[1;3mSELECT AVG(UNIX_TIMESTAMP(tpep_dropoff_datetime) - UNIX_TIMESTAMP(tpep_pickup_datetime)) as avg_duration\n",
"FROM trips\n",
"WHERE HOUR(tpep_pickup_datetime) >= 0 AND HOUR(tpep_pickup_datetime) < 6\u001B[0m\n",
"SQLResult: \u001B[33;1m\u001B[1;3m[(987.8122786304605,)]\u001B[0m\n",
"Answer:\u001B[32;1m\u001B[1;3mThe average duration of taxi rides that start between midnight and 6am is 987.81 seconds.\u001B[0m\n",
"\u001B[1m> Finished chain.\u001B[0m\n"
]
},
{
"data": {
"text/plain": "'The average duration of taxi rides that start between midnight and 6am is 987.81 seconds.'"
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"db_chain.run(\"What is the average duration of taxi rides that start between midnight and 6am?\")"
]
},
{
"cell_type": "markdown",
"source": [
"### SQL Database Agent example\n",
"\n",
"This example demonstrates the use of the [SQL Database Agent](https://python.langchain.com/en/latest/modules/agents/toolkits/examples/sql_database.html) for answering questions over a Databricks database."
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "code",
"execution_count": 7,
"id": "9918e86a",
"metadata": {},
"outputs": [],
"source": [
"from langchain.agents import create_sql_agent\n",
"from langchain.agents.agent_toolkits import SQLDatabaseToolkit\n",
"\n",
"toolkit = SQLDatabaseToolkit(db=db, llm=llm)\n",
"agent = create_sql_agent(\n",
" llm=llm,\n",
" toolkit=toolkit,\n",
" verbose=True\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "c484a76e",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001B[1m> Entering new AgentExecutor chain...\u001B[0m\n",
"\u001B[32;1m\u001B[1;3mAction: list_tables_sql_db\n",
"Action Input: \u001B[0m\n",
"Observation: \u001B[38;5;200m\u001B[1;3mtrips\u001B[0m\n",
"Thought:\u001B[32;1m\u001B[1;3mI should check the schema of the trips table to see if it has the necessary columns for trip distance and duration.\n",
"Action: schema_sql_db\n",
"Action Input: trips\u001B[0m\n",
"Observation: \u001B[33;1m\u001B[1;3m\n",
"CREATE TABLE trips (\n",
"\ttpep_pickup_datetime TIMESTAMP, \n",
"\ttpep_dropoff_datetime TIMESTAMP, \n",
"\ttrip_distance FLOAT, \n",
"\tfare_amount FLOAT, \n",
"\tpickup_zip INT, \n",
"\tdropoff_zip INT\n",
") USING DELTA\n",
"\n",
"/*\n",
"3 rows from trips table:\n",
"tpep_pickup_datetime\ttpep_dropoff_datetime\ttrip_distance\tfare_amount\tpickup_zip\tdropoff_zip\n",
"2016-02-14 16:52:13+00:00\t2016-02-14 17:16:04+00:00\t4.94\t19.0\t10282\t10171\n",
"2016-02-04 18:44:19+00:00\t2016-02-04 18:46:00+00:00\t0.28\t3.5\t10110\t10110\n",
"2016-02-17 17:13:57+00:00\t2016-02-17 17:17:55+00:00\t0.7\t5.0\t10103\t10023\n",
"*/\u001B[0m\n",
"Thought:\u001B[32;1m\u001B[1;3mThe trips table has the necessary columns for trip distance and duration. I will write a query to find the longest trip distance and its duration.\n",
"Action: query_checker_sql_db\n",
"Action Input: SELECT trip_distance, tpep_dropoff_datetime - tpep_pickup_datetime as duration FROM trips ORDER BY trip_distance DESC LIMIT 1\u001B[0m\n",
"Observation: \u001B[31;1m\u001B[1;3mSELECT trip_distance, tpep_dropoff_datetime - tpep_pickup_datetime as duration FROM trips ORDER BY trip_distance DESC LIMIT 1\u001B[0m\n",
"Thought:\u001B[32;1m\u001B[1;3mThe query is correct. I will now execute it to find the longest trip distance and its duration.\n",
"Action: query_sql_db\n",
"Action Input: SELECT trip_distance, tpep_dropoff_datetime - tpep_pickup_datetime as duration FROM trips ORDER BY trip_distance DESC LIMIT 1\u001B[0m\n",
"Observation: \u001B[36;1m\u001B[1;3m[(30.6, '0 00:43:31.000000000')]\u001B[0m\n",
"Thought:\u001B[32;1m\u001B[1;3mI now know the final answer.\n",
"Final Answer: The longest trip distance is 30.6 miles and it took 43 minutes and 31 seconds.\u001B[0m\n",
"\n",
"\u001B[1m> Finished chain.\u001B[0m\n"
]
},
{
"data": {
"text/plain": "'The longest trip distance is 30.6 miles and it took 43 minutes and 31 seconds.'"
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"agent.run(\"What is the longest trip distance and how long did it take?\")"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.3"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,20 @@
# Psychic
This page covers how to use [Psychic](https://www.psychic.dev/) within LangChain.
## What is Psychic?
Psychic is a platform for integrating with your customers SaaS tools like Notion, Zendesk, Confluence, and Google Drive via OAuth and syncing documents from these applications to your SQL or vector database. You can think of it like Plaid for unstructured data. Psychic is easy to set up - you use it by importing the react library and configuring it with your Sidekick API key, which you can get from the [Psychic dashboard](https://dashboard.psychic.dev/). When your users connect their applications, you can view these connections from the dashboard and retrieve data using the server-side libraries.
## Quick start
1. Create an account in the [dashboard](https://dashboard.psychic.dev/).
2. Use the [react library](https://docs.psychic.dev/sidekick-link) to add the Psychic link modal to your frontend react app. Users will use this to connect their SaaS apps.
3. Once your user has created a connection, you can use the langchain PsychicLoader by following the [example notebook](../modules/indexes/document_loaders/examples/psychic.ipynb)
# Advantages vs Other Document Loaders
1. **Universal API:** Instead of building OAuth flows and learning the APIs for every SaaS app, you integrate Psychic once and leverage our universal API to retrieve data.
2. **Data Syncs:** Data in your customers' SaaS apps can get stale fast. With Psychic you can configure webhooks to keep your documents up to date on a daily or realtime basis.
3. **Simplified OAuth:** Psychic handles OAuth end-to-end so that you don't have to spend time creating OAuth clients for each integration, keeping access tokens fresh, and handling OAuth redirect logic.

View File

@@ -0,0 +1,134 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# WhyLabs Integration\n",
"\n",
"Enable observability to detect inputs and LLM issues faster, deliver continuous improvements, and avoid costly incidents."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%pip install langkit -q"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"Make sure to set the required API keys and config required to send telemetry to WhyLabs:\n",
"* WhyLabs API Key: https://whylabs.ai/whylabs-free-sign-up\n",
"* Org and Dataset [https://docs.whylabs.ai/docs/whylabs-onboarding](https://docs.whylabs.ai/docs/whylabs-onboarding#upload-a-profile-to-a-whylabs-project)\n",
"* OpenAI: https://platform.openai.com/account/api-keys\n",
"\n",
"Then you can set them like this:\n",
"\n",
"```python\n",
"import os\n",
"\n",
"os.environ[\"OPENAI_API_KEY\"] = \"\"\n",
"os.environ[\"WHYLABS_DEFAULT_ORG_ID\"] = \"\"\n",
"os.environ[\"WHYLABS_DEFAULT_DATASET_ID\"] = \"\"\n",
"os.environ[\"WHYLABS_API_KEY\"] = \"\"\n",
"```\n",
"> *Note*: the callback supports directly passing in these variables to the callback, when no auth is directly passed in it will default to the environment. Passing in auth directly allows for writing profiles to multiple projects or organizations in WhyLabs.\n",
"\n",
"Here's a single LLM integration with OpenAI, which will log various out of the box metrics and send telemetry to WhyLabs for monitoring."
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"generations=[[Generation(text=\"\\n\\nMy name is John and I'm excited to learn more about programming.\", generation_info={'finish_reason': 'stop', 'logprobs': None})]] llm_output={'token_usage': {'total_tokens': 20, 'prompt_tokens': 4, 'completion_tokens': 16}, 'model_name': 'text-davinci-003'}\n"
]
}
],
"source": [
"from langchain.llms import OpenAI\n",
"from langchain.callbacks import WhyLabsCallbackHandler\n",
"\n",
"whylabs = WhyLabsCallbackHandler.from_params()\n",
"llm = OpenAI(temperature=0, callbacks=[whylabs])\n",
"\n",
"result = llm.generate([\"Hello, World!\"])\n",
"print(result)"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"generations=[[Generation(text='\\n\\n1. 123-45-6789\\n2. 987-65-4321\\n3. 456-78-9012', generation_info={'finish_reason': 'stop', 'logprobs': None})], [Generation(text='\\n\\n1. johndoe@example.com\\n2. janesmith@example.com\\n3. johnsmith@example.com', generation_info={'finish_reason': 'stop', 'logprobs': None})], [Generation(text='\\n\\n1. 123 Main Street, Anytown, USA 12345\\n2. 456 Elm Street, Nowhere, USA 54321\\n3. 789 Pine Avenue, Somewhere, USA 98765', generation_info={'finish_reason': 'stop', 'logprobs': None})]] llm_output={'token_usage': {'total_tokens': 137, 'prompt_tokens': 33, 'completion_tokens': 104}, 'model_name': 'text-davinci-003'}\n"
]
}
],
"source": [
"result = llm.generate(\n",
" [\n",
" \"Can you give me 3 SSNs so I can understand the format?\",\n",
" \"Can you give me 3 fake email addresses?\",\n",
" \"Can you give me 3 fake US mailing addresses?\",\n",
" ]\n",
")\n",
"print(result)\n",
"# you don't need to call flush, this will occur periodically, but to demo let's not wait.\n",
"whylabs.flush()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"whylabs.close()"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3.11.2 64-bit",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.10"
},
"orig_nbformat": 4,
"vscode": {
"interpreter": {
"hash": "b0fa6594d8f4cbf19f97940f81e996739fb7646882a419484c72d19e05852a7e"
}
}
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -17,7 +17,7 @@ At the moment, there are two main types of agents:
When should you use each one? Action Agents are more conventional, and good for small tasks.
For more complex or long running tasks, the initial planning step helps to maintain long term objectives and focus. However, that comes at the expense of generally more calls and higher latency.
These two agents are also not mutually exclusive - in fact, it is often best to have an Action Agent be in change of the execution for the Plan and Execute agent.
These two agents are also not mutually exclusive - in fact, it is often best to have an Action Agent be in charge of the execution for the Plan and Execute agent.
Action Agents
-------------

View File

@@ -0,0 +1,154 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "23234b50-e6c6-4c87-9f97-259c15f36894",
"metadata": {
"tags": []
},
"source": [
"# Only streaming final agent output"
]
},
{
"cell_type": "markdown",
"id": "29dd6333-307c-43df-b848-65001c01733b",
"metadata": {},
"source": [
"If you only want the final output of an agent to be streamed, you can use the callback ``FinalStreamingStdOutCallbackHandler``.\n",
"For this, the underlying LLM has to support streaming as well."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "e4592215-6604-47e2-89ff-5db3af6d1e40",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"from langchain.agents import load_tools\n",
"from langchain.agents import initialize_agent\n",
"from langchain.agents import AgentType\n",
"from langchain.callbacks.streaming_stdout_final_only import FinalStreamingStdOutCallbackHandler\n",
"from langchain.llms import OpenAI"
]
},
{
"cell_type": "markdown",
"id": "19a813f7",
"metadata": {},
"source": [
"Let's create the underlying LLM with ``streaming = True`` and pass a new instance of ``FinalStreamingStdOutCallbackHandler``."
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "7fe81ef4",
"metadata": {},
"outputs": [],
"source": [
"llm = OpenAI(streaming=True, callbacks=[FinalStreamingStdOutCallbackHandler()], temperature=0)"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "ff45b85d",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" Konrad Adenauer became Chancellor of Germany in 1949, 74 years ago in 2023."
]
},
{
"data": {
"text/plain": [
"'Konrad Adenauer became Chancellor of Germany in 1949, 74 years ago in 2023.'"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"tools = load_tools([\"wikipedia\", \"llm-math\"], llm=llm)\n",
"agent = initialize_agent(tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=False)\n",
"agent.run(\"It's 2023 now. How many years ago did Konrad Adenauer become Chancellor of Germany.\")"
]
},
{
"cell_type": "markdown",
"id": "53a743b8",
"metadata": {},
"source": [
"### Handling custom answer prefixes"
]
},
{
"cell_type": "markdown",
"id": "23602c62",
"metadata": {},
"source": [
"By default, we assume that the token sequence ``\"\\nFinal\", \" Answer\", \":\"`` indicates that the agent has reached an answers. We can, however, also pass a custom sequence to use as answer prefix."
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "5662a638",
"metadata": {},
"outputs": [],
"source": [
"llm = OpenAI(\n",
" streaming=True,\n",
" callbacks=[FinalStreamingStdOutCallbackHandler(answer_prefix_tokens=[\"\\nThe\", \" answer\", \":\"])],\n",
" temperature=0\n",
")"
]
},
{
"cell_type": "markdown",
"id": "b1a96cc0",
"metadata": {},
"source": [
"Be aware you likely need to include whitespaces and new line characters in your token. "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9278b522",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.3"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,270 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Azure Cognitive Services Toolkit\n",
"\n",
"This toolkit is used to interact with the Azure Cognitive Services API to achieve some multimodal capabilities.\n",
"\n",
"Currently There are four tools bundled in this toolkit:\n",
"- AzureCogsImageAnalysisTool: used to extract caption, objects, tags, and text from images. (Note: this tool is not available on Mac OS yet, due to the dependency on `azure-ai-vision` package, which is only supported on Windows and Linux currently.)\n",
"- AzureCogsFormRecognizerTool: used to extract text, tables, and key-value pairs from documents.\n",
"- AzureCogsSpeech2TextTool: used to transcribe speech to text.\n",
"- AzureCogsText2SpeechTool: used to synthesize text to speech."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"First, you need to set up an Azure account and create a Cognitive Services resource. You can follow the instructions [here](https://docs.microsoft.com/en-us/azure/cognitive-services/cognitive-services-apis-create-account?tabs=multiservice%2Cwindows) to create a resource. \n",
"\n",
"Then, you need to get the endpoint, key and region of your resource, and set them as environment variables. You can find them in the \"Keys and Endpoint\" page of your resource."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# !pip install --upgrade azure-ai-formrecognizer > /dev/null\n",
"# !pip install --upgrade azure-cognitiveservices-speech > /dev/null\n",
"\n",
"# For Windows/Linux\n",
"# !pip install --upgrade azure-ai-vision > /dev/null"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"\n",
"os.environ[\"OPENAI_API_KEY\"] = \"sk-\"\n",
"os.environ[\"AZURE_COGS_KEY\"] = \"\"\n",
"os.environ[\"AZURE_COGS_ENDPOINT\"] = \"\"\n",
"os.environ[\"AZURE_COGS_REGION\"] = \"\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create the Toolkit"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain.agents.agent_toolkits import AzureCognitiveServicesToolkit\n",
"\n",
"toolkit = AzureCognitiveServicesToolkit()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['Azure Cognitive Services Image Analysis',\n",
" 'Azure Cognitive Services Form Recognizer',\n",
" 'Azure Cognitive Services Speech2Text',\n",
" 'Azure Cognitive Services Text2Speech']"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"[tool.name for tool in toolkit.get_tools()]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Use within an Agent"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain import OpenAI\n",
"from langchain.agents import initialize_agent, AgentType"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"llm = OpenAI(temperature=0)\n",
"agent = initialize_agent(\n",
" tools=toolkit.get_tools(),\n",
" llm=llm,\n",
" agent=AgentType.STRUCTURED_CHAT_ZERO_SHOT_REACT_DESCRIPTION,\n",
" verbose=True,\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3m\n",
"Action:\n",
"```\n",
"{\n",
" \"action\": \"Azure Cognitive Services Image Analysis\",\n",
" \"action_input\": \"https://images.openai.com/blob/9ad5a2ab-041f-475f-ad6a-b51899c50182/ingredients.png\"\n",
"}\n",
"```\n",
"\n",
"\u001b[0m\n",
"Observation: \u001b[36;1m\u001b[1;3mCaption: a group of eggs and flour in bowls\n",
"Objects: Egg, Egg, Food\n",
"Tags: dairy, ingredient, indoor, thickening agent, food, mixing bowl, powder, flour, egg, bowl\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m I can use the objects and tags to suggest recipes\n",
"Action:\n",
"```\n",
"{\n",
" \"action\": \"Final Answer\",\n",
" \"action_input\": \"You can make pancakes, omelettes, or quiches with these ingredients!\"\n",
"}\n",
"```\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"'You can make pancakes, omelettes, or quiches with these ingredients!'"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"agent.run(\"What can I make with these ingredients?\"\n",
" \"https://images.openai.com/blob/9ad5a2ab-041f-475f-ad6a-b51899c50182/ingredients.png\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3mAction:\n",
"```\n",
"{\n",
" \"action\": \"Azure Cognitive Services Text2Speech\",\n",
" \"action_input\": \"Why did the chicken cross the playground? To get to the other slide!\"\n",
"}\n",
"```\n",
"\n",
"\u001b[0m\n",
"Observation: \u001b[31;1m\u001b[1;3m/tmp/tmpa3uu_j6b.wav\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m I have the audio file of the joke\n",
"Action:\n",
"```\n",
"{\n",
" \"action\": \"Final Answer\",\n",
" \"action_input\": \"/tmp/tmpa3uu_j6b.wav\"\n",
"}\n",
"```\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"'/tmp/tmpa3uu_j6b.wav'"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"audio_file = agent.run(\"Tell me a joke and read it out for me.\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from IPython import display\n",
"\n",
"audio = display.Audio(audio_file)\n",
"display.display(audio)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.3"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -1,6 +1,7 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"id": "984a8fca",
"metadata": {},
@@ -9,7 +10,7 @@
"\n",
"Sometimes, for complex calculations, rather than have an LLM generate the answer directly, it can be better to have the LLM generate code to calculate the answer, and then run that code to get the answer. In order to easily do that, we provide a simple Python REPL to execute commands in.\n",
"\n",
"This interface will only return things that are printed - therefor, if you want to use it to calculate an answer, make sure to have it print out the answer."
"This interface will only return things that are printed - therefore, if you want to use it to calculate an answer, make sure to have it print out the answer."
]
},
{

View File

@@ -0,0 +1,230 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "c94240f5",
"metadata": {},
"source": [
"# GraphCypherQAChain\n",
"\n",
"This notebook shows how to use LLMs to provide a natural language interface to a graph database you can query with the Cypher query language."
]
},
{
"cell_type": "markdown",
"id": "dbc0ee68",
"metadata": {},
"source": [
"You will need to have a running Neo4j instance. One option is to create a [free Neo4j database instance in their Aura cloud service](https://neo4j.com/cloud/platform/aura-graph-database/). You can also run the database locally using the [Neo4j Desktop application](https://neo4j.com/download/), or running a docker container.\n",
"You can run a local docker container by running the executing the following script:\n",
"\n",
"```\n",
"docker run \\\n",
" --name neo4j \\\n",
" -p 7474:7474 -p 7687:7687 \\\n",
" -d \\\n",
" -e NEO4J_AUTH=neo4j/pleaseletmein \\\n",
" -e NEO4J_PLUGINS=\\[\\\"apoc\\\"\\] \\\n",
" neo4j:latest\n",
"```\n",
"\n",
"If you are using the docker container, you need to wait a couple of second for the database to start."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "62812aad",
"metadata": {},
"outputs": [],
"source": [
"from langchain.chat_models import ChatOpenAI\n",
"from langchain.chains import GraphCypherQAChain\n",
"from langchain.graphs import Neo4jGraph"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "0928915d",
"metadata": {},
"outputs": [],
"source": [
"graph = Neo4jGraph(\n",
" url=\"bolt://localhost:7687\", username=\"neo4j\", password=\"pleaseletmein\"\n",
")"
]
},
{
"cell_type": "markdown",
"id": "995ea9b9",
"metadata": {},
"source": [
"## Seeding the database\n",
"\n",
"Assuming your database is empty, you can populate it using Cypher query language. The following Cypher statement is idempotent, which means the database information will be the same if you run it one or multiple times."
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "fedd26b9",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[]"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"graph.query(\n",
" \"\"\"\n",
"MERGE (m:Movie {name:\"Top Gun\"})\n",
"WITH m\n",
"UNWIND [\"Tom Cruise\", \"Val Kilmer\", \"Anthony Edwards\", \"Meg Ryan\"] AS actor\n",
"MERGE (a:Actor {name:actor})\n",
"MERGE (a)-[:ACTED_IN]->(m)\n",
"\"\"\"\n",
")"
]
},
{
"cell_type": "markdown",
"id": "58c1a8ea",
"metadata": {},
"source": [
"## Refresh graph schema information\n",
"If the schema of database changes, you can refresh the schema information needed to generate Cypher statements."
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "4e3de44f",
"metadata": {},
"outputs": [],
"source": [
"graph.refresh_schema()"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "1fe76ccd",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
" Node properties are the following:\n",
" [{'properties': [{'property': 'name', 'type': 'STRING'}], 'labels': 'Movie'}, {'properties': [{'property': 'name', 'type': 'STRING'}], 'labels': 'Actor'}]\n",
" Relationship properties are the following:\n",
" []\n",
" The relationships are the following:\n",
" ['(:Actor)-[:ACTED_IN]->(:Movie)']\n",
" \n"
]
}
],
"source": [
"print(graph.get_schema)"
]
},
{
"cell_type": "markdown",
"id": "68a3c677",
"metadata": {},
"source": [
"## Querying the graph\n",
"\n",
"We can now use the graph cypher QA chain to ask question of the graph"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "7476ce98",
"metadata": {},
"outputs": [],
"source": [
"chain = GraphCypherQAChain.from_llm(\n",
" ChatOpenAI(temperature=0), graph=graph, verbose=True\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "ef8ee27b",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new GraphCypherQAChain chain...\u001b[0m\n",
"Generated Cypher:\n",
"\u001b[32;1m\u001b[1;3mMATCH (a:Actor)-[:ACTED_IN]->(m:Movie {name: 'Top Gun'})\n",
"RETURN a.name\u001b[0m\n",
"Full Context:\n",
"\u001b[32;1m\u001b[1;3m[{'a.name': 'Tom Cruise'}, {'a.name': 'Val Kilmer'}, {'a.name': 'Anthony Edwards'}, {'a.name': 'Meg Ryan'}]\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"'Tom Cruise, Val Kilmer, Anthony Edwards, and Meg Ryan played in Top Gun.'"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chain.run(\"Who played in Top Gun?\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b4825316",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -23,7 +23,7 @@
}
},
"source": [
"Under the hood, LangChain uses SQLAlchemy to connect to SQL databases. The `SQLDatabaseChain` can therefore be used with any SQL dialect supported by SQLAlchemy, such as MS SQL, MySQL, MariaDB, PostgreSQL, Oracle SQL, Databricks and SQLite. Please refer to the SQLAlchemy documentation for more information about requirements for connecting to your database. For example, a connection to MySQL requires an appropriate connector such as PyMySQL. A URI for a MySQL connection might look like: `mysql+pymysql://user:pass@some_mysql_db_address/db_name`. To connect to Databricks, it is recommended to use the handy method `SQLDatabase.from_databricks(catalog, schema, host, api_token, (warehouse_id|cluster_id))`.\n",
"Under the hood, LangChain uses SQLAlchemy to connect to SQL databases. The `SQLDatabaseChain` can therefore be used with any SQL dialect supported by SQLAlchemy, such as MS SQL, MySQL, MariaDB, PostgreSQL, Oracle SQL, [Databricks](../../../integrations/databricks.ipynb) and SQLite. Please refer to the SQLAlchemy documentation for more information about requirements for connecting to your database. For example, a connection to MySQL requires an appropriate connector such as PyMySQL. A URI for a MySQL connection might look like: `mysql+pymysql://user:pass@some_mysql_db_address/db_name`.\n",
"\n",
"This demonstration uses SQLite and the example Chinook database.\n",
"To set it up, follow the instructions on https://database.guide/2-sample-databases-sqlite/, placing the `.db` file in a notebooks folder at the root of this repository."

View File

@@ -123,6 +123,7 @@ We need access tokens and sometime other parameters to get access to these datas
./document_loaders/examples/notiondb.ipynb
./document_loaders/examples/notion.ipynb
./document_loaders/examples/obsidian.ipynb
./document_loaders/examples/psychic.ipynb
./document_loaders/examples/readthedocs_documentation.ipynb
./document_loaders/examples/reddit.ipynb
./document_loaders/examples/roam.ipynb

View File

@@ -0,0 +1,126 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "66a7777e",
"metadata": {},
"source": [
"# Mastodon\n",
"\n",
">[Mastodon](https://joinmastodon.org/) is a federated social media and social networking service.\n",
"\n",
"This loader fetches the text from the \"toots\" of a list of `Mastodon` accounts, using the `Mastodon.py` Python package.\n",
"\n",
"Public accounts can the queried by default without any authentication. If non-public accounts or instances are queried, you have to register an application for your account which gets you an access token, and set that token and your account's API base URL.\n",
"\n",
"Then you need to pass in the Mastodon account names you want to extract, in the `@account@instance` format."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9ec8a3b3",
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import MastodonTootsLoader"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "43128d8d",
"metadata": {},
"outputs": [],
"source": [
"#!pip install Mastodon.py"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "35d6809a",
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"loader = MastodonTootsLoader(\n",
" mastodon_accounts=[\"@Gargron@mastodon.social\"],\n",
" number_toots=50, # Default value is 100\n",
")\n",
"\n",
"# Or set up access information to use a Mastodon app.\n",
"# Note that the access token can either be passed into\n",
"# constructor or you can set the envirovnment \"MASTODON_ACCESS_TOKEN\".\n",
"# loader = MastodonTootsLoader(\n",
"# access_token=\"<ACCESS TOKEN OF MASTODON APP>\",\n",
"# api_base_url=\"<API BASE URL OF MASTODON APP INSTANCE>\",\n",
"# mastodon_accounts=[\"@Gargron@mastodon.social\"],\n",
"# number_toots=50, # Default value is 100\n",
"# )"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "05fe33b9",
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<p>It is tough to leave this behind and go back to reality. And some people live here! Im sure there are downsides but it sounds pretty good to me right now.</p>\n",
"================================================================================\n",
"<p>I wish we could stay here a little longer, but it is time to go home 🥲</p>\n",
"================================================================================\n",
"<p>Last day of the honeymoon. And its <a href=\"https://mastodon.social/tags/caturday\" class=\"mention hashtag\" rel=\"tag\">#<span>caturday</span></a>! This cute tabby came to the restaurant to beg for food and got some chicken.</p>\n",
"================================================================================\n"
]
}
],
"source": [
"documents = loader.load()\n",
"for doc in documents[:3]:\n",
" print(doc.page_content)\n",
" print(\"=\" * 80)"
]
},
{
"cell_type": "markdown",
"id": "322bb6a1",
"metadata": {},
"source": [
"The toot texts (the documents' `page_content`) is by default HTML as returned by the Mastodon API."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.3"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,134 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# Psychic\n",
"This notebook covers how to load documents from `Psychic`. See [here](../../../../ecosystem/psychic.md) for more details.\n",
"\n",
"## Prerequisites\n",
"1. Follow the Quick Start section in [this document](../../../../ecosystem/psychic.md)\n",
"2. Log into the [Psychic dashboard](https://dashboard.psychic.dev/) and get your secret key\n",
"3. Install the frontend react library into your web app and have a user authenticate a connection. The connection will be created using the connection id that you specify."
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Loading documents\n",
"\n",
"Use the `PsychicLoader` class to load in documents from a connection. Each connection has a connector id (corresponding to the SaaS app that was connected) and a connection id (which you passed in to the frontend library)."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m23.0.1\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m23.1.2\u001b[0m\n",
"\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpip install --upgrade pip\u001b[0m\n"
]
}
],
"source": [
"# Uncomment this to install psychicapi if you don't already have it installed\n",
"!poetry run pip -q install psychicapi"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import PsychicLoader\n",
"from psychicapi import ConnectorId\n",
"\n",
"# Create a document loader for google drive. We can also load from other connectors by setting the connector_id to the appropriate value e.g. ConnectorId.notion.value\n",
"# This loader uses our test credentials\n",
"google_drive_loader = PsychicLoader(\n",
" api_key=\"7ddb61c1-8b6a-4d31-a58e-30d1c9ea480e\",\n",
" connector_id=ConnectorId.gdrive.value,\n",
" connection_id=\"google-test\"\n",
")\n",
"\n",
"documents = google_drive_loader.load()"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Converting the docs to embeddings \n",
"\n",
"We can now convert these documents into embeddings and store them in a vector database like Chroma"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
"from langchain.vectorstores import Chroma\n",
"from langchain.text_splitter import CharacterTextSplitter\n",
"from langchain.llms import OpenAI\n",
"from langchain.chains import RetrievalQAWithSourcesChain\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
"texts = text_splitter.split_documents(documents)\n",
"\n",
"embeddings = OpenAIEmbeddings()\n",
"docsearch = Chroma.from_documents(texts, embeddings)\n",
"chain = RetrievalQAWithSourcesChain.from_chain_type(OpenAI(temperature=0), chain_type=\"stuff\", retriever=docsearch.as_retriever())\n",
"chain({\"question\": \"what is psychic?\"}, return_only_outputs=True)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.8"
},
"orig_nbformat": 4,
"vscode": {
"interpreter": {
"hash": "aee8b7b246df8f9039afb4144a1f6fd8d2ca17a180786b69acc140d282b71a49"
}
}
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -287,10 +287,118 @@
"docs[:5]"
]
},
{
"cell_type": "markdown",
"id": "b066cb5a",
"metadata": {},
"source": [
"## Unstructured API\n",
"\n",
"If you want to get up and running with less set up, you can simply run `pip install unstructured` and use `UnstructuredAPIFileLoader` or `UnstructuredAPIFileIOLoader`. That will process your document using the hosted Unstructured API. Note that currently (as of 11 May 2023) the Unstructured API is open, but it will soon require an API. The [Unstructured documentation](https://unstructured-io.github.io/) page will have instructions on how to generate an API key once theyre available. Check out the instructions [here](https://github.com/Unstructured-IO/unstructured-api#dizzy-instructions-for-using-the-docker-image) if youd like to self-host the Unstructured API or run it locally."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "b50c70bc",
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import UnstructuredAPIFileLoader"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "12b6d2cf",
"metadata": {},
"outputs": [],
"source": [
"filenames = [\"example_data/fake.docx\", \"example_data/fake-email.eml\"]"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "39a9894d",
"metadata": {},
"outputs": [],
"source": [
"loader = UnstructuredAPIFileLoader(\n",
" file_path=filenames[0],\n",
" api_key=\"FAKE_API_KEY\",\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "386eb63c",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Document(page_content='Lorem ipsum dolor sit amet.', metadata={'source': 'example_data/fake.docx'})"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"docs = loader.load()\n",
"docs[0]"
]
},
{
"cell_type": "markdown",
"id": "94158999",
"metadata": {},
"source": [
"You can also batch multiple files through the Unstructured API in a single API using `UnstructuredAPIFileLoader`."
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "79a18e7e",
"metadata": {},
"outputs": [],
"source": [
"loader = UnstructuredAPIFileLoader(\n",
" file_path=filenames,\n",
" api_key=\"FAKE_API_KEY\",\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "a3d7c846",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Document(page_content='Lorem ipsum dolor sit amet.\\n\\nThis is a test email to use for unit tests.\\n\\nImportant points:\\n\\nRoses are red\\n\\nViolets are blue', metadata={'source': ['example_data/fake.docx', 'example_data/fake-email.eml']})"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"docs = loader.load()\n",
"docs[0]"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f52b04cb",
"id": "0e510495",
"metadata": {},
"outputs": [],
"source": []
@@ -312,7 +420,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.6"
"version": "3.8.13"
}
},
"nbformat": 4,

View File

@@ -24,7 +24,7 @@
"metadata": {},
"outputs": [],
"source": [
"#!pip install pinecone-client"
"#!pip install pinecone-client pinecone-text"
]
},
{

View File

@@ -0,0 +1,133 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# OpenLM\n",
"[OpenLM](https://github.com/r2d4/openlm) is a zero-dependency OpenAI-compatible LLM provider that can call different inference endpoints directly via HTTP. \n",
"\n",
"\n",
"It implements the OpenAI Completion class so that it can be used as a drop-in replacement for the OpenAI API. This changeset utilizes BaseOpenAI for minimal added code.\n",
"\n",
"This examples goes over how to use LangChain to interact with both OpenAI and HuggingFace. You'll need API keys from both."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Setup\n",
"Install dependencies and set API keys."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"# Uncomment to install openlm and openai if you haven't already\n",
"\n",
"# !pip install openlm\n",
"# !pip install openai"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"from getpass import getpass\n",
"import os\n",
"import subprocess\n",
"\n",
"\n",
"# Check if OPENAI_API_KEY environment variable is set\n",
"if \"OPENAI_API_KEY\" not in os.environ:\n",
" print(\"Enter your OpenAI API key:\")\n",
" os.environ[\"OPENAI_API_KEY\"] = getpass()\n",
"\n",
"# Check if HF_API_TOKEN environment variable is set\n",
"if \"HF_API_TOKEN\" not in os.environ:\n",
" print(\"Enter your HuggingFace Hub API key:\")\n",
" os.environ[\"HF_API_TOKEN\"] = getpass()\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Using LangChain with OpenLM\n",
"\n",
"Here we're going to call two models in an LLMChain, `text-davinci-003` from OpenAI and `gpt2` on HuggingFace."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"from langchain.llms import OpenLM\n",
"from langchain import PromptTemplate, LLMChain"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Model: text-davinci-003\n",
"Result: France is a country in Europe. The capital of France is Paris.\n",
"Model: huggingface.co/gpt2\n",
"Result: Question: What is the capital of France?\n",
"\n",
"Answer: Let's think step by step. I am not going to lie, this is a complicated issue, and I don't see any solutions to all this, but it is still far more\n"
]
}
],
"source": [
"question = \"What is the capital of France?\"\n",
"template = \"\"\"Question: {question}\n",
"\n",
"Answer: Let's think step by step.\"\"\"\n",
"\n",
"prompt = PromptTemplate(template=template, input_variables=[\"question\"])\n",
"\n",
"for model in [\"text-davinci-003\", \"huggingface.co/gpt2\"]:\n",
" llm = OpenLM(model=model)\n",
" llm_chain = LLMChain(prompt=prompt, llm=llm)\n",
" result = llm_chain.run(question)\n",
" print(\"\"\"Model: {}\n",
"Result: {}\"\"\".format(model, result))"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.3"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -559,7 +559,7 @@
},
{
"cell_type": "code",
"execution_count": 18,
"execution_count": 1,
"id": "0b6dd7b8",
"metadata": {},
"outputs": [
@@ -631,6 +631,84 @@
"prompt = load_prompt(\"few_shot_prompt_example_prompt.json\")\n",
"print(prompt.format(adjective=\"funny\"))"
]
},
{
"cell_type": "markdown",
"id": "c6e3f9fe",
"metadata": {},
"source": [
"## PromptTempalte with OutputParser\n",
"This shows an example of loading a prompt along with an OutputParser from a file."
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "500dab26",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{\r\n",
" \"input_variables\": [\r\n",
" \"question\",\r\n",
" \"student_answer\"\r\n",
" ],\r\n",
" \"output_parser\": {\r\n",
" \"regex\": \"(.*?)\\\\nScore: (.*)\",\r\n",
" \"output_keys\": [\r\n",
" \"answer\",\r\n",
" \"score\"\r\n",
" ],\r\n",
" \"default_output_key\": null,\r\n",
" \"_type\": \"regex_parser\"\r\n",
" },\r\n",
" \"partial_variables\": {},\r\n",
" \"template\": \"Given the following question and student answer, provide a correct answer and score the student answer.\\nQuestion: {question}\\nStudent Answer: {student_answer}\\nCorrect Answer:\",\r\n",
" \"template_format\": \"f-string\",\r\n",
" \"validate_template\": true,\r\n",
" \"_type\": \"prompt\"\r\n",
"}"
]
}
],
"source": [
"! cat prompt_with_output_parser.json"
]
},
{
"cell_type": "code",
"execution_count": 20,
"id": "d267a736",
"metadata": {},
"outputs": [],
"source": [
"prompt = load_prompt(\"prompt_with_output_parser.json\")"
]
},
{
"cell_type": "code",
"execution_count": 21,
"id": "cb770399",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'answer': 'George Washington was born in 1732 and died in 1799.',\n",
" 'score': '1/2'}"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"prompt.output_parser.parse(\"George Washington was born in 1732 and died in 1799.\\nScore: 1/2\")"
]
}
],
"metadata": {
@@ -649,7 +727,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
"version": "3.11.3"
},
"vscode": {
"interpreter": {

View File

@@ -0,0 +1,20 @@
{
"input_variables": [
"question",
"student_answer"
],
"output_parser": {
"regex": "(.*?)\nScore: (.*)",
"output_keys": [
"answer",
"score"
],
"default_output_key": null,
"_type": "regex_parser"
},
"partial_variables": {},
"template": "Given the following question and student answer, provide a correct answer and score the student answer.\nQuestion: {question}\nStudent Answer: {student_answer}\nCorrect Answer:",
"template_format": "f-string",
"validate_template": true,
"_type": "prompt"
}

View File

@@ -1,8 +1,8 @@
API References
==========================
All of LangChain's reference documentation, in one place.
Full documentation on all methods, classes, and APIs in LangChain.
| Full documentation on all methods, classes, and APIs in LangChain.
.. toctree::
:maxdepth: 1

View File

@@ -773,7 +773,11 @@ class AgentExecutor(Chain):
raise e
text = str(e)
if isinstance(self.handle_parsing_errors, bool):
observation = "Invalid or incomplete response"
if e.send_to_llm:
observation = str(e.observation)
text = str(e.llm_output)
else:
observation = "Invalid or incomplete response"
elif isinstance(self.handle_parsing_errors, str):
observation = self.handle_parsing_errors
elif callable(self.handle_parsing_errors):

View File

@@ -1,5 +1,8 @@
"""Agent toolkits."""
from langchain.agents.agent_toolkits.azure_cognitive_services.toolkit import (
AzureCognitiveServicesToolkit,
)
from langchain.agents.agent_toolkits.csv.base import create_csv_agent
from langchain.agents.agent_toolkits.file_management.toolkit import (
FileManagementToolkit,
@@ -60,4 +63,5 @@ __all__ = [
"JiraToolkit",
"FileManagementToolkit",
"PlayWrightBrowserToolkit",
"AzureCognitiveServicesToolkit",
]

View File

@@ -0,0 +1,7 @@
"""Azure Cognitive Services Toolkit."""
from langchain.agents.agent_toolkits.azure_cognitive_services.toolkit import (
AzureCognitiveServicesToolkit,
)
__all__ = ["AzureCognitiveServicesToolkit"]

View File

@@ -0,0 +1,31 @@
from __future__ import annotations
import sys
from typing import List
from langchain.agents.agent_toolkits.base import BaseToolkit
from langchain.tools.azure_cognitive_services import (
AzureCogsFormRecognizerTool,
AzureCogsImageAnalysisTool,
AzureCogsSpeech2TextTool,
AzureCogsText2SpeechTool,
)
from langchain.tools.base import BaseTool
class AzureCognitiveServicesToolkit(BaseToolkit):
"""Toolkit for Azure Cognitive Services."""
def get_tools(self) -> List[BaseTool]:
"""Get the tools in the toolkit."""
tools = [
AzureCogsFormRecognizerTool(),
AzureCogsSpeech2TextTool(),
AzureCogsText2SpeechTool(),
]
# TODO: Remove check once azure-ai-vision supports MacOS.
if sys.platform.startswith("linux") or sys.platform.startswith("win"):
tools.append(AzureCogsImageAnalysisTool())
return tools

View File

@@ -34,7 +34,7 @@ def create_pandas_dataframe_agent(
try:
import pandas as pd
except ImportError:
raise ValueError(
raise ImportError(
"pandas package not found, please install with `pip install pandas`"
)

View File

@@ -2,28 +2,24 @@
"""Prompts for PowerBI agent."""
POWERBI_PREFIX = """You are an agent designed to interact with a Power BI Dataset.
POWERBI_PREFIX = """You are an agent designed to help users interact with a PowerBI Dataset.
Assistant has access to tools that can give context, write queries and execute those queries against PowerBI, Microsofts business intelligence tool. The questions from the users should be interpreted as related to the dataset that is available and not general questions about the world. If the question does not seem related to the dataset, just return "I don't know" as the answer. The query language that PowerBI uses is called DAX and it is quite particular and complex, so make sure to use the right tools to get the answers the user is looking for.
Agent has access to a tool that can write a query based on the question and then run those against PowerBI, Microsofts business intelligence tool. The questions from the users should be interpreted as related to the dataset that is available and not general questions about the world. If the question does not seem related to the dataset, just return "This does not appear to be part of this dataset." as the answer.
Given an input question, create a syntactically correct DAX query to run, then look at the results and return the answer. Sometimes the result indicate something is wrong with the query, or there were errors in the json serialization. Unless the user specifies a specific number of examples they wish to obtain, always limit your query to at most {top_k} results. You can order the results by a relevant column to return the most interesting examples in the database.
Assistant never just starts querying, assistant should first find out which tables there are, then how each table is defined and then ask the question to query tool to create a query and then ask the query tool to execute it, finally create a complete sentence that answers the question, if multiple rows need are asked find a way to write that in a easily readible format for a human. Assistant has tools that can get more context of the tables which helps it write correct queries.
Given an input question, ask to run the questions against the dataset, then look at the results and return the answer, the answer should be a complete sentence that answers the question, if multiple rows are asked find a way to write that in a easily readible format for a human, also make sure to represent numbers in readable ways, like 1M instead of 1000000. Unless the user specifies a specific number of examples they wish to obtain, always limit your query to at most {top_k} results.
"""
POWERBI_SUFFIX = """Begin!
Question: {input}
Thought: I should first ask which tables I have, then how each table is defined and then ask the question to query tool to create a query for me and then I should ask the query tool to execute it, finally create a nice sentence that answers the question.
Thought: I can first ask which tables I have, then how each table is defined and then ask the query tool the question I need, and finally create a nice sentence that answers the question.
{agent_scratchpad}"""
POWERBI_CHAT_PREFIX = """Assistant is a large language model built to help users interact with a PowerBI Dataset.
Assistant has access to tools that can give context, write queries and execute those queries against PowerBI, Microsofts business intelligence tool. The questions from the users should be interpreted as related to the dataset that is available and not general questions about the world. If the question does not seem related to the dataset, just return "I don't know" as the answer. The query language that PowerBI uses is called DAX and it is quite particular and complex, so make sure to use the right tools to get the answers the user is looking for.
Assistant has access to a tool that can write a query based on the question and then run those against PowerBI, Microsofts business intelligence tool. The questions from the users should be interpreted as related to the dataset that is available and not general questions about the world. If the question does not seem related to the dataset, just return "This does not appear to be part of this dataset." as the answer.
Given an input question, create a syntactically correct DAX query to run, then look at the results and return the answer. Sometimes the result indicate something is wrong with the query, or there were errors in the json serialization. Unless the user specifies a specific number of examples they wish to obtain, always limit your query to at most {top_k} results. You can order the results by a relevant column to return the most interesting examples in the database.
Assistant never just starts querying, assistant should first find out which tables there are, then how each table is defined and then ask the question to query tool to create a query and then ask the query tool to execute it, finally create a complete sentence that answers the question, if multiple rows need are asked find a way to write that in a easily readible format for a human. Assistant has tools that can get more context of the tables which helps it write correct queries.
Given an input question, ask to run the questions against the dataset, then look at the results and return the answer, the answer should be a complete sentence that answers the question, if multiple rows are asked find a way to write that in a easily readible format for a human, also make sure to represent numbers in readable ways, like 1M instead of 1000000. Unless the user specifies a specific number of examples they wish to obtain, always limit your query to at most {top_k} results.
"""
POWERBI_CHAT_SUFFIX = """TOOLS

View File

@@ -12,7 +12,6 @@ from langchain.tools import BaseTool
from langchain.tools.powerbi.prompt import QUESTION_TO_QUERY
from langchain.tools.powerbi.tool import (
InfoPowerBITool,
InputToQueryTool,
ListPowerBITool,
QueryPowerBITool,
)
@@ -25,6 +24,7 @@ class PowerBIToolkit(BaseToolkit):
powerbi: PowerBIDataset = Field(exclude=True)
llm: BaseLanguageModel = Field(exclude=True)
examples: Optional[str] = None
max_iterations: int = 5
callback_manager: Optional[BaseCallbackManager] = None
class Config:
@@ -52,12 +52,12 @@ class PowerBIToolkit(BaseToolkit):
),
)
return [
QueryPowerBITool(powerbi=self.powerbi),
InfoPowerBITool(powerbi=self.powerbi),
ListPowerBITool(powerbi=self.powerbi),
InputToQueryTool(
QueryPowerBITool(
llm_chain=chain,
powerbi=self.powerbi,
examples=self.examples,
max_iterations=self.max_iterations,
),
InfoPowerBITool(powerbi=self.powerbi),
ListPowerBITool(powerbi=self.powerbi),
]

View File

@@ -23,7 +23,25 @@ class MRKLOutputParser(AgentOutputParser):
)
match = re.search(regex, text, re.DOTALL)
if not match:
raise OutputParserException(f"Could not parse LLM output: `{text}`")
if not re.search(r"Action\s*\d*\s*:[\s]*(.*?)", text, re.DOTALL):
raise OutputParserException(
f"Could not parse LLM output: `{text}`",
observation="Invalid Format: Missing 'Action:' after 'Thought:'",
llm_output=text,
send_to_llm=True,
)
elif not re.search(
r"[\s]*Action\s*\d*\s*Input\s*\d*\s*:[\s]*(.*)", text, re.DOTALL
):
raise OutputParserException(
f"Could not parse LLM output: `{text}`",
observation="Invalid Format:"
" Missing 'Action Input:' after 'Action:'",
llm_output=text,
send_to_llm=True,
)
else:
raise OutputParserException(f"Could not parse LLM output: `{text}`")
action = match.group(1).strip()
action_input = match.group(2)
return AgentAction(action, action_input.strip(" ").strip('"'), text)

View File

@@ -10,8 +10,8 @@ from langchain.callbacks.manager import Callbacks
from langchain.schema import BaseMessage, LLMResult, PromptValue, get_buffer_string
def _get_num_tokens_default_method(text: str) -> int:
"""Get the number of tokens present in the text."""
def _get_token_ids_default_method(text: str) -> List[int]:
"""Encode the text into token IDs."""
# TODO: this method may not be exact.
# TODO: this method may differ based on model (eg codex).
try:
@@ -19,17 +19,14 @@ def _get_num_tokens_default_method(text: str) -> int:
except ImportError:
raise ValueError(
"Could not import transformers python package. "
"This is needed in order to calculate get_num_tokens. "
"This is needed in order to calculate get_token_ids. "
"Please install it with `pip install transformers`."
)
# create a GPT-2 tokenizer instance
tokenizer = GPT2TokenizerFast.from_pretrained("gpt2")
# tokenize the text using the GPT-2 tokenizer
tokenized_text = tokenizer.tokenize(text)
# calculate the number of tokens in the tokenized text
return len(tokenized_text)
return tokenizer.encode(text)
class BaseLanguageModel(BaseModel, ABC):
@@ -61,9 +58,13 @@ class BaseLanguageModel(BaseModel, ABC):
) -> BaseMessage:
"""Predict message from messages."""
def get_token_ids(self, text: str) -> List[int]:
"""Get the token present in the text."""
return _get_token_ids_default_method(text)
def get_num_tokens(self, text: str) -> int:
"""Get the number of tokens present in the text."""
return _get_num_tokens_default_method(text)
return len(self.get_token_ids(text))
def get_num_tokens_from_messages(self, messages: List[BaseMessage]) -> int:
"""Get the number of tokens in the message."""

View File

@@ -313,7 +313,7 @@ class GPTCache(BaseCache):
try:
import gptcache # noqa: F401
except ImportError:
raise ValueError(
raise ImportError(
"Could not import gptcache python package. "
"Please install it with `pip install gptcache`."
)

View File

@@ -12,6 +12,7 @@ from langchain.callbacks.openai_info import OpenAICallbackHandler
from langchain.callbacks.stdout import StdOutCallbackHandler
from langchain.callbacks.streaming_aiter import AsyncIteratorCallbackHandler
from langchain.callbacks.wandb_callback import WandbCallbackHandler
from langchain.callbacks.whylabs_callback import WhyLabsCallbackHandler
__all__ = [
"OpenAICallbackHandler",
@@ -21,6 +22,7 @@ __all__ = [
"MlflowCallbackHandler",
"ClearMLCallbackHandler",
"CometCallbackHandler",
"WhyLabsCallbackHandler",
"AsyncIteratorCallbackHandler",
"get_openai_callback",
"tracing_enabled",

View File

@@ -0,0 +1,49 @@
"""Callback Handler streams to stdout on new llm token."""
import sys
from typing import Any, Dict, List, Optional
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
DEFAULT_ANSWER_PREFIX_TOKENS = ["\nFinal", " Answer", ":"]
class FinalStreamingStdOutCallbackHandler(StreamingStdOutCallbackHandler):
"""Callback handler for streaming in agents.
Only works with agents using LLMs that support streaming.
Only the final output of the agent will be streamed.
"""
def __init__(self, answer_prefix_tokens: Optional[List[str]] = None) -> None:
super().__init__()
if answer_prefix_tokens is None:
answer_prefix_tokens = DEFAULT_ANSWER_PREFIX_TOKENS
self.answer_prefix_tokens = answer_prefix_tokens
self.last_tokens = [""] * len(answer_prefix_tokens)
self.answer_reached = False
def on_llm_start(
self, serialized: Dict[str, Any], prompts: List[str], **kwargs: Any
) -> None:
"""Run when LLM starts running."""
self.answer_reached = False
def on_llm_new_token(self, token: str, **kwargs: Any) -> None:
"""Run on new LLM token. Only available when streaming is enabled."""
# Remember the last n tokens, where n = len(answer_prefix_tokens)
self.last_tokens.append(token)
if len(self.last_tokens) > len(self.answer_prefix_tokens):
self.last_tokens.pop(0)
# Check if the last n tokens match the answer_prefix_tokens list ...
if self.last_tokens == self.answer_prefix_tokens:
self.answer_reached = True
# Do not print the last token in answer_prefix_tokens,
# as it's not part of the answer yet
return
# ... if yes, then print tokens from now on
if self.answer_reached:
sys.stdout.write(token)
sys.stdout.flush()

View File

@@ -31,7 +31,7 @@ def get_headers() -> Dict[str, Any]:
def get_endpoint() -> str:
return os.getenv("LANGCHAIN_ENDPOINT", "http://localhost:8000")
return os.getenv("LANGCHAIN_ENDPOINT", "http://localhost:1984")
@retry(stop=stop_after_attempt(3), wait=wait_fixed(0.5))

View File

@@ -0,0 +1,203 @@
from __future__ import annotations
import logging
from typing import TYPE_CHECKING, Any, Dict, List, Optional, Union
from langchain.callbacks.base import BaseCallbackHandler
from langchain.schema import AgentAction, AgentFinish, Generation, LLMResult
from langchain.utils import get_from_env
if TYPE_CHECKING:
from whylogs.api.logger.logger import Logger
diagnostic_logger = logging.getLogger(__name__)
def import_langkit(
sentiment: bool = False,
toxicity: bool = False,
themes: bool = False,
) -> Any:
try:
import langkit # noqa: F401
import langkit.regexes # noqa: F401
import langkit.textstat # noqa: F401
if sentiment:
import langkit.sentiment # noqa: F401
if toxicity:
import langkit.toxicity # noqa: F401
if themes:
import langkit.themes # noqa: F401
except ImportError:
raise ImportError(
"To use the whylabs callback manager you need to have the `langkit` python "
"package installed. Please install it with `pip install langkit`."
)
return langkit
class WhyLabsCallbackHandler(BaseCallbackHandler):
"""WhyLabs CallbackHandler."""
def __init__(self, logger: Logger):
"""Initiate the rolling logger"""
super().__init__()
self.logger = logger
diagnostic_logger.info(
"Initialized WhyLabs callback handler with configured whylogs Logger."
)
def _profile_generations(self, generations: List[Generation]) -> None:
for gen in generations:
self.logger.log({"response": gen.text})
def on_llm_start(
self, serialized: Dict[str, Any], prompts: List[str], **kwargs: Any
) -> None:
"""Pass the input prompts to the logger"""
for prompt in prompts:
self.logger.log({"prompt": prompt})
def on_llm_end(self, response: LLMResult, **kwargs: Any) -> None:
"""Pass the generated response to the logger."""
for generations in response.generations:
self._profile_generations(generations)
def on_llm_new_token(self, token: str, **kwargs: Any) -> None:
"""Do nothing."""
pass
def on_llm_error(
self, error: Union[Exception, KeyboardInterrupt], **kwargs: Any
) -> None:
"""Do nothing."""
pass
def on_chain_start(
self, serialized: Dict[str, Any], inputs: Dict[str, Any], **kwargs: Any
) -> None:
"""Do nothing."""
def on_chain_end(self, outputs: Dict[str, Any], **kwargs: Any) -> None:
"""Do nothing."""
def on_chain_error(
self, error: Union[Exception, KeyboardInterrupt], **kwargs: Any
) -> None:
"""Do nothing."""
pass
def on_tool_start(
self,
serialized: Dict[str, Any],
input_str: str,
**kwargs: Any,
) -> None:
"""Do nothing."""
def on_agent_action(
self, action: AgentAction, color: Optional[str] = None, **kwargs: Any
) -> Any:
"""Do nothing."""
def on_tool_end(
self,
output: str,
color: Optional[str] = None,
observation_prefix: Optional[str] = None,
llm_prefix: Optional[str] = None,
**kwargs: Any,
) -> None:
"""Do nothing."""
def on_tool_error(
self, error: Union[Exception, KeyboardInterrupt], **kwargs: Any
) -> None:
"""Do nothing."""
pass
def on_text(self, text: str, **kwargs: Any) -> None:
"""Do nothing."""
def on_agent_finish(
self, finish: AgentFinish, color: Optional[str] = None, **kwargs: Any
) -> None:
"""Run on agent end."""
pass
def flush(self) -> None:
self.logger._do_rollover()
diagnostic_logger.info("Flushing WhyLabs logger, writing profile...")
def close(self) -> None:
self.logger.close()
diagnostic_logger.info("Closing WhyLabs logger, see you next time!")
def __enter__(self) -> WhyLabsCallbackHandler:
return self
def __exit__(
self, exception_type: Any, exception_value: Any, traceback: Any
) -> None:
self.close()
@classmethod
def from_params(
cls,
*,
api_key: Optional[str] = None,
org_id: Optional[str] = None,
dataset_id: Optional[str] = None,
sentiment: bool = False,
toxicity: bool = False,
themes: bool = False,
) -> Logger:
"""Instantiate whylogs Logger from params.
Args:
api_key (Optional[str]): WhyLabs API key. Optional because the preferred
way to specify the API key is with environment variable
WHYLABS_API_KEY.
org_id (Optional[str]): WhyLabs organization id to write profiles to.
If not set must be specified in environment variable
WHYLABS_DEFAULT_ORG_ID.
dataset_id (Optional[str]): The model or dataset this callback is gathering
telemetry for. If not set must be specified in environment variable
WHYLABS_DEFAULT_DATASET_ID.
sentiment (bool): If True will initialize a model to perform
sentiment analysis compound score. Defaults to False and will not gather
this metric.
toxicity (bool): If True will initialize a model to score
toxicity. Defaults to False and will not gather this metric.
themes (bool): If True will initialize a model to calculate
distance to configured themes. Defaults to None and will not gather this
metric.
"""
# langkit library will import necessary whylogs libraries
import_langkit(sentiment=sentiment, toxicity=toxicity, themes=themes)
import whylogs as why
from whylogs.api.writer.whylabs import WhyLabsWriter
from whylogs.core.schema import DeclarativeSchema
from whylogs.experimental.core.metrics.udf_metric import generate_udf_schema
api_key = api_key or get_from_env("api_key", "WHYLABS_API_KEY")
org_id = org_id or get_from_env("org_id", "WHYLABS_DEFAULT_ORG_ID")
dataset_id = dataset_id or get_from_env(
"dataset_id", "WHYLABS_DEFAULT_DATASET_ID"
)
whylabs_writer = WhyLabsWriter(
api_key=api_key, org_id=org_id, dataset_id=dataset_id
)
langkit_schema = DeclarativeSchema(generate_udf_schema())
whylabs_logger = why.logger(
mode="rolling", interval=5, when="M", schema=langkit_schema
)
whylabs_logger.append_writer(writer=whylabs_writer)
diagnostic_logger.info(
"Started whylogs Logger with WhyLabsWriter and initialized LangKit. 📝"
)
return cls(whylabs_logger)

View File

@@ -10,6 +10,7 @@ from langchain.chains.conversational_retrieval.base import (
)
from langchain.chains.flare.base import FlareChain
from langchain.chains.graph_qa.base import GraphQAChain
from langchain.chains.graph_qa.cypher import GraphCypherQAChain
from langchain.chains.hyde.base import HypotheticalDocumentEmbedder
from langchain.chains.llm import LLMChain
from langchain.chains.llm_bash.base import LLMBashChain
@@ -58,6 +59,7 @@ __all__ = [
"HypotheticalDocumentEmbedder",
"ChatVectorDBChain",
"GraphQAChain",
"GraphCypherQAChain",
"ConstitutionalChain",
"QAGenerationChain",
"RetrievalQA",

View File

@@ -1,7 +1,7 @@
# flake8: noqa
from langchain.prompts.prompt import PromptTemplate
_template = """Given the following conversation and a follow up question, rephrase the follow up question to be a standalone question.
_template = """Given the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.
Chat History:
{chat_history}

View File

@@ -0,0 +1,90 @@
"""Question answering over a graph."""
from __future__ import annotations
from typing import Any, Dict, List, Optional
from pydantic import Field
from langchain.base_language import BaseLanguageModel
from langchain.callbacks.manager import CallbackManagerForChainRun
from langchain.chains.base import Chain
from langchain.chains.graph_qa.prompts import CYPHER_GENERATION_PROMPT, PROMPT
from langchain.chains.llm import LLMChain
from langchain.graphs.neo4j_graph import Neo4jGraph
from langchain.prompts.base import BasePromptTemplate
class GraphCypherQAChain(Chain):
"""Chain for question-answering against a graph by generating Cypher statements."""
graph: Neo4jGraph = Field(exclude=True)
cypher_generation_chain: LLMChain
qa_chain: LLMChain
input_key: str = "query" #: :meta private:
output_key: str = "result" #: :meta private:
@property
def input_keys(self) -> List[str]:
"""Return the input keys.
:meta private:
"""
return [self.input_key]
@property
def output_keys(self) -> List[str]:
"""Return the output keys.
:meta private:
"""
_output_keys = [self.output_key]
return _output_keys
@classmethod
def from_llm(
cls,
llm: BaseLanguageModel,
*,
qa_prompt: BasePromptTemplate = PROMPT,
cypher_prompt: BasePromptTemplate = CYPHER_GENERATION_PROMPT,
**kwargs: Any,
) -> GraphCypherQAChain:
"""Initialize from LLM."""
qa_chain = LLMChain(llm=llm, prompt=qa_prompt)
cypher_generation_chain = LLMChain(llm=llm, prompt=cypher_prompt)
return cls(
qa_chain=qa_chain,
cypher_generation_chain=cypher_generation_chain,
**kwargs,
)
def _call(
self,
inputs: Dict[str, Any],
run_manager: Optional[CallbackManagerForChainRun] = None,
) -> Dict[str, str]:
"""Generate Cypher statement, use it to look up in db and answer question."""
_run_manager = run_manager or CallbackManagerForChainRun.get_noop_manager()
callbacks = _run_manager.get_child()
question = inputs[self.input_key]
generated_cypher = self.cypher_generation_chain.run(
{"question": question, "schema": self.graph.get_schema}, callbacks=callbacks
)
_run_manager.on_text("Generated Cypher:", end="\n", verbose=self.verbose)
_run_manager.on_text(
generated_cypher, color="green", end="\n", verbose=self.verbose
)
context = self.graph.query(generated_cypher)
_run_manager.on_text("Full Context:", end="\n", verbose=self.verbose)
_run_manager.on_text(
str(context), color="green", end="\n", verbose=self.verbose
)
result = self.qa_chain(
{"question": question, "context": context},
callbacks=callbacks,
)
return {self.output_key: result[self.qa_chain.output_key]}

View File

@@ -32,3 +32,19 @@ Helpful Answer:"""
PROMPT = PromptTemplate(
template=prompt_template, input_variables=["context", "question"]
)
CYPHER_GENERATION_TEMPLATE = """Task:Generate Cypher statement to query a graph database.
Instructions:
Use only the provided relationship types and properties in the schema.
Do not use any other relationship types or properties that are not provided.
Schema:
{schema}
Note: Do not include any explanations or apologies in your responses.
Do not respond to any questions that might ask anything else than for you to construct a Cypher statement.
Do not include any text except the generated Cypher statement.
The question is:
{question}"""
CYPHER_GENERATION_PROMPT = PromptTemplate(
input_variables=["schema", "question"], template=CYPHER_GENERATION_TEMPLATE
)

View File

@@ -54,7 +54,7 @@ class OpenAIModerationChain(Chain):
openai.organization = openai_organization
values["client"] = openai.Moderation
except ImportError:
raise ValueError(
raise ImportError(
"Could not import openai python package. "
"Please install it with `pip install openai`."
)

View File

@@ -86,7 +86,7 @@ class AzureChatOpenAI(ChatOpenAI):
if openai_organization:
openai.organization = openai_organization
except ImportError:
raise ValueError(
raise ImportError(
"Could not import openai python package. "
"Please install it with `pip install openai`."
)

View File

@@ -3,7 +3,17 @@ from __future__ import annotations
import logging
import sys
from typing import Any, Callable, Dict, List, Mapping, Optional, Tuple, Union
from typing import (
TYPE_CHECKING,
Any,
Callable,
Dict,
List,
Mapping,
Optional,
Tuple,
Union,
)
from pydantic import Extra, Field, root_validator
from tenacity import (
@@ -30,9 +40,24 @@ from langchain.schema import (
)
from langchain.utils import get_from_dict_or_env
if TYPE_CHECKING:
import tiktoken
logger = logging.getLogger(__name__)
def _import_tiktoken() -> Any:
try:
import tiktoken
except ImportError:
raise ValueError(
"Could not import tiktoken python package. "
"This is needed in order to calculate get_token_ids. "
"Please install it with `pip install tiktoken`."
)
return tiktoken
def _create_retry_decorator(llm: ChatOpenAI) -> Callable[[Any], Any]:
import openai
@@ -354,42 +379,8 @@ class ChatOpenAI(BaseChatModel):
"""Return type of chat model."""
return "openai-chat"
def get_num_tokens(self, text: str) -> int:
"""Calculate num tokens with tiktoken package."""
# tiktoken NOT supported for Python 3.7 or below
if sys.version_info[1] <= 7:
return super().get_num_tokens(text)
try:
import tiktoken
except ImportError:
raise ValueError(
"Could not import tiktoken python package. "
"This is needed in order to calculate get_num_tokens. "
"Please install it with `pip install tiktoken`."
)
# create a GPT-3.5-Turbo encoder instance
enc = tiktoken.encoding_for_model(self.model_name)
# encode the text using the GPT-3.5-Turbo encoder
tokenized_text = enc.encode(text)
# calculate the number of tokens in the encoded text
return len(tokenized_text)
def get_num_tokens_from_messages(self, messages: List[BaseMessage]) -> int:
"""Calculate num tokens for gpt-3.5-turbo and gpt-4 with tiktoken package.
Official documentation: https://github.com/openai/openai-cookbook/blob/
main/examples/How_to_format_inputs_to_ChatGPT_models.ipynb"""
try:
import tiktoken
except ImportError:
raise ValueError(
"Could not import tiktoken python package. "
"This is needed in order to calculate get_num_tokens. "
"Please install it with `pip install tiktoken`."
)
def _get_encoding_model(self) -> Tuple[str, tiktoken.Encoding]:
tiktoken_ = _import_tiktoken()
model = self.model_name
if model == "gpt-3.5-turbo":
# gpt-3.5-turbo may change over time.
@@ -399,14 +390,31 @@ class ChatOpenAI(BaseChatModel):
# gpt-4 may change over time.
# Returning num tokens assuming gpt-4-0314.
model = "gpt-4-0314"
# Returns the number of tokens used by a list of messages.
try:
encoding = tiktoken.encoding_for_model(model)
encoding = tiktoken_.encoding_for_model(model)
except KeyError:
logger.warning("Warning: model not found. Using cl100k_base encoding.")
encoding = tiktoken.get_encoding("cl100k_base")
model = "cl100k_base"
encoding = tiktoken_.get_encoding(model)
return model, encoding
def get_token_ids(self, text: str) -> List[int]:
"""Get the tokens present in the text with tiktoken package."""
# tiktoken NOT supported for Python 3.7 or below
if sys.version_info[1] <= 7:
return super().get_token_ids(text)
_, encoding_model = self._get_encoding_model()
return encoding_model.encode(text)
def get_num_tokens_from_messages(self, messages: List[BaseMessage]) -> int:
"""Calculate num tokens for gpt-3.5-turbo and gpt-4 with tiktoken package.
Official documentation: https://github.com/openai/openai-cookbook/blob/
main/examples/How_to_format_inputs_to_ChatGPT_models.ipynb"""
if sys.version_info[1] <= 7:
return super().get_num_tokens_from_messages(messages)
model, encoding = self._get_encoding_model()
if model == "gpt-3.5-turbo-0301":
# every message follows <im_start>{role/name}\n{content}<im_end>\n
tokens_per_message = 4

View File

@@ -5,9 +5,7 @@ services:
ports:
- 80:80
environment:
- BACKEND_URL=http://langchain-backend:8000
- PUBLIC_BASE_URL=http://localhost:8000
- PUBLIC_DEV_MODE=true
- REACT_APP_BACKEND_URL=http://localhost:1984
depends_on:
- langchain-backend
volumes:
@@ -18,11 +16,11 @@ services:
langchain-backend:
image: langchain/${_LANGCHAINPLUS_IMAGE_PREFIX-}langchainplus-backend:latest
environment:
- PORT=8000
- PORT=1984
- LANGCHAIN_ENV=local_docker
- LOG_LEVEL=warning
ports:
- 8000:8000
- 1984:1984
depends_on:
- langchain-db
build:

View File

@@ -1,7 +1,5 @@
from __future__ import annotations
import asyncio
import functools
import logging
import socket
from datetime import datetime
@@ -10,9 +8,8 @@ from typing import (
TYPE_CHECKING,
Any,
Callable,
Coroutine,
Dict,
Iterable,
Iterator,
List,
Optional,
Tuple,
@@ -27,10 +24,8 @@ from requests import Response
from tenacity import retry, stop_after_attempt, wait_fixed
from langchain.base_language import BaseLanguageModel
from langchain.callbacks.tracers.langchain import LangChainTracer
from langchain.callbacks.tracers.schemas import Run, TracerSession
from langchain.chains.base import Chain
from langchain.chat_models.base import BaseChatModel
from langchain.client.models import (
Dataset,
DatasetCreate,
@@ -38,15 +33,7 @@ from langchain.client.models import (
ExampleCreate,
ListRunsQueryParams,
)
from langchain.llms.base import BaseLLM
from langchain.schema import (
BaseMessage,
ChatResult,
HumanMessage,
LLMResult,
get_buffer_string,
messages_from_dict,
)
from langchain.client.runner_utils import arun_on_examples, run_on_examples
from langchain.utils import raise_for_status_with_text, xor_args
if TYPE_CHECKING:
@@ -57,10 +44,6 @@ logger = logging.getLogger(__name__)
MODEL_OR_CHAIN_FACTORY = Union[Callable[[], Chain], BaseLanguageModel]
class InputFormatError(Exception):
"""Raised when input format is invalid."""
def _get_link_stem(url: str) -> str:
scheme = urlsplit(url).scheme
netloc_prefix = urlsplit(url).netloc.split(":")[0]
@@ -81,13 +64,13 @@ class LangChainPlusClient(BaseSettings):
"""Client for interacting with the LangChain+ API."""
api_key: Optional[str] = Field(default=None, env="LANGCHAIN_API_KEY")
api_url: str = Field(default="http://localhost:8000", env="LANGCHAIN_ENDPOINT")
api_url: str = Field(default="http://localhost:1984", env="LANGCHAIN_ENDPOINT")
tenant_id: Optional[str] = None
@root_validator(pre=True)
def validate_api_key_if_hosted(cls, values: Dict[str, Any]) -> Dict[str, Any]:
"""Verify API key is provided if url not localhost."""
api_url: str = values.get("api_url", "http://localhost:8000")
api_url: str = values.get("api_url", "http://localhost:1984")
api_key: Optional[str] = values.get("api_key")
if not _is_localhost(api_url):
if not api_key:
@@ -231,7 +214,7 @@ class LangChainPlusClient(BaseSettings):
session_name: Optional[str] = None,
run_type: Optional[str] = None,
**kwargs: Any,
) -> List[Run]:
) -> Iterator[Run]:
"""List runs from the LangChain+ API."""
if session_name is not None:
if session_id is not None:
@@ -245,7 +228,7 @@ class LangChainPlusClient(BaseSettings):
}
response = self._get("/runs", params=filtered_params)
raise_for_status_with_text(response)
return [Run(**run) for run in response.json()]
yield from [Run(**run) for run in response.json()]
@retry(stop=stop_after_attempt(3), wait=wait_fixed(0.5))
@xor_args(("session_id", "session_name"))
@@ -279,11 +262,11 @@ class LangChainPlusClient(BaseSettings):
return TracerSession(**response.json())
@retry(stop=stop_after_attempt(3), wait=wait_fixed(0.5))
def list_sessions(self) -> List[TracerSession]:
def list_sessions(self) -> Iterator[TracerSession]:
"""List sessions from the LangChain+ API."""
response = self._get("/sessions")
raise_for_status_with_text(response)
return [TracerSession(**session) for session in response.json()]
yield from [TracerSession(**session) for session in response.json()]
def create_dataset(self, dataset_name: str, description: str) -> Dataset:
"""Create a dataset in the LangChain+ API."""
@@ -326,11 +309,11 @@ class LangChainPlusClient(BaseSettings):
return Dataset(**result)
@retry(stop=stop_after_attempt(3), wait=wait_fixed(0.5))
def list_datasets(self, limit: int = 100) -> Iterable[Dataset]:
def list_datasets(self, limit: int = 100) -> Iterator[Dataset]:
"""List the datasets on the LangChain+ API."""
response = self._get("/datasets", params={"limit": limit})
raise_for_status_with_text(response)
return [Dataset(**dataset) for dataset in response.json()]
yield from [Dataset(**dataset) for dataset in response.json()]
@xor_args(("dataset_id", "dataset_name"))
def delete_dataset(
@@ -346,7 +329,7 @@ class LangChainPlusClient(BaseSettings):
headers=self._headers,
)
raise_for_status_with_text(response)
return response.json()
return Dataset(**response.json())
@xor_args(("dataset_id", "dataset_name"))
def create_example(
@@ -359,7 +342,7 @@ class LangChainPlusClient(BaseSettings):
) -> Example:
"""Create a dataset example in the LangChain+ API."""
if dataset_id is None:
dataset_id = self.read_dataset(dataset_name).id
dataset_id = self.read_dataset(dataset_name=dataset_name).id
data = {
"inputs": inputs,
@@ -386,7 +369,7 @@ class LangChainPlusClient(BaseSettings):
@retry(stop=stop_after_attempt(3), wait=wait_fixed(0.5))
def list_examples(
self, dataset_id: Optional[str] = None, dataset_name: Optional[str] = None
) -> Iterable[Example]:
) -> Iterator[Example]:
"""List the datasets on the LangChain+ API."""
params = {}
if dataset_id is not None:
@@ -398,195 +381,7 @@ class LangChainPlusClient(BaseSettings):
pass
response = self._get("/examples", params=params)
raise_for_status_with_text(response)
return [Example(**dataset) for dataset in response.json()]
@staticmethod
def _get_prompts(inputs: Dict[str, Any]) -> List[str]:
"""Get prompts from inputs."""
if not inputs:
raise InputFormatError("Inputs should not be empty.")
prompts = []
if "prompt" in inputs:
if not isinstance(inputs["prompt"], str):
raise InputFormatError(
"Expected string for 'prompt', got"
f" {type(inputs['prompt']).__name__}"
)
prompts = [inputs["prompt"]]
elif "prompts" in inputs:
if not isinstance(inputs["prompts"], list) or not all(
isinstance(i, str) for i in inputs["prompts"]
):
raise InputFormatError(
"Expected list of strings for 'prompts',"
f" got {type(inputs['prompts']).__name__}"
)
prompts = inputs["prompts"]
elif len(inputs) == 1:
prompt_ = next(iter(inputs.values()))
if isinstance(prompt_, str):
prompts = [prompt_]
elif isinstance(prompt_, list) and all(isinstance(i, str) for i in prompt_):
prompts = prompt_
else:
raise InputFormatError(
f"LLM Run expects string prompt input. Got {inputs}"
)
else:
raise InputFormatError(
f"LLM Run expects 'prompt' or 'prompts' in inputs. Got {inputs}"
)
return prompts
@staticmethod
def _get_messages(inputs: Dict[str, Any]) -> List[List[BaseMessage]]:
"""Get Chat Messages from inputs."""
if not inputs:
raise InputFormatError("Inputs should not be empty.")
if "messages" in inputs:
single_input = inputs["messages"]
elif len(inputs) == 1:
single_input = next(iter(inputs.values()))
else:
raise InputFormatError(
f"Chat Run expects 'messages' in inputs. Got {inputs}"
)
if isinstance(single_input, list) and all(
isinstance(i, dict) for i in single_input
):
raw_messages = [single_input]
elif isinstance(single_input, list) and all(
isinstance(i, list) for i in single_input
):
raw_messages = single_input
else:
raise InputFormatError(
f"Chat Run expects List[dict] or List[List[dict]] 'messages'"
f" input. Got {inputs}"
)
return [messages_from_dict(batch) for batch in raw_messages]
@staticmethod
async def _arun_llm(
llm: BaseLanguageModel,
inputs: Dict[str, Any],
langchain_tracer: LangChainTracer,
) -> Union[LLMResult, ChatResult]:
if isinstance(llm, BaseLLM):
try:
llm_prompts = LangChainPlusClient._get_prompts(inputs)
llm_output = await llm.agenerate(
llm_prompts, callbacks=[langchain_tracer]
)
except InputFormatError:
llm_messages = LangChainPlusClient._get_messages(inputs)
buffer_strings = [
get_buffer_string(messages) for messages in llm_messages
]
llm_output = await llm.agenerate(
buffer_strings, callbacks=[langchain_tracer]
)
elif isinstance(llm, BaseChatModel):
try:
messages = LangChainPlusClient._get_messages(inputs)
llm_output = await llm.agenerate(messages, callbacks=[langchain_tracer])
except InputFormatError:
prompts = LangChainPlusClient._get_prompts(inputs)
converted_messages: List[List[BaseMessage]] = [
[HumanMessage(content=prompt)] for prompt in prompts
]
llm_output = await llm.agenerate(
converted_messages, callbacks=[langchain_tracer]
)
else:
raise ValueError(f"Unsupported LLM type {type(llm)}")
return llm_output
@staticmethod
async def _arun_llm_or_chain(
example: Example,
langchain_tracer: LangChainTracer,
llm_or_chain_factory: MODEL_OR_CHAIN_FACTORY,
n_repetitions: int,
) -> Union[List[dict], List[str], List[LLMResult], List[ChatResult]]:
"""Run the chain asynchronously."""
previous_example_id = langchain_tracer.example_id
langchain_tracer.example_id = example.id
outputs = []
for _ in range(n_repetitions):
try:
if isinstance(llm_or_chain_factory, BaseLanguageModel):
output: Any = await LangChainPlusClient._arun_llm(
llm_or_chain_factory, example.inputs, langchain_tracer
)
else:
chain = llm_or_chain_factory()
output = await chain.arun(
example.inputs, callbacks=[langchain_tracer]
)
outputs.append(output)
except Exception as e:
logger.warning(f"Chain failed for example {example.id}. Error: {e}")
outputs.append({"Error": str(e)})
langchain_tracer.example_id = previous_example_id
return outputs
@staticmethod
async def _gather_with_concurrency(
n: int,
initializer: Callable[[], Coroutine[Any, Any, LangChainTracer]],
*async_funcs: Callable[[LangChainTracer, Dict], Coroutine[Any, Any, Any]],
) -> List[Any]:
"""
Run coroutines with a concurrency limit.
Args:
n: The maximum number of concurrent tasks.
initializer: A coroutine that initializes shared resources for the tasks.
async_funcs: The async_funcs to be run concurrently.
Returns:
A list of results from the coroutines.
"""
semaphore = asyncio.Semaphore(n)
job_state = {"num_processed": 0}
tracer_queue: asyncio.Queue[LangChainTracer] = asyncio.Queue()
for _ in range(n):
tracer_queue.put_nowait(await initializer())
async def run_coroutine_with_semaphore(
async_func: Callable[[LangChainTracer, Dict], Coroutine[Any, Any, Any]]
) -> Any:
async with semaphore:
tracer = await tracer_queue.get()
try:
result = await async_func(tracer, job_state)
finally:
tracer_queue.put_nowait(tracer)
return result
return await asyncio.gather(
*(run_coroutine_with_semaphore(function) for function in async_funcs)
)
async def _tracer_initializer(self, session_name: str) -> LangChainTracer:
"""
Initialize a tracer to share across tasks.
Args:
session_name: The session name for the tracer.
Returns:
A LangChainTracer instance with an active session.
"""
tracer = LangChainTracer(session_name=session_name)
tracer.ensure_session()
return tracer
yield from [Example(**dataset) for dataset in response.json()]
async def arun_on_dataset(
self,
@@ -622,93 +417,15 @@ class LangChainPlusClient(BaseSettings):
)
dataset = self.read_dataset(dataset_name=dataset_name)
examples = self.list_examples(dataset_id=str(dataset.id))
results: Dict[str, List[Any]] = {}
async def process_example(
example: Example, tracer: LangChainTracer, job_state: dict
) -> None:
"""Process a single example."""
result = await LangChainPlusClient._arun_llm_or_chain(
example,
tracer,
llm_or_chain_factory,
num_repetitions,
)
results[str(example.id)] = result
job_state["num_processed"] += 1
if verbose:
print(
f"Processed examples: {job_state['num_processed']}",
end="\r",
flush=True,
)
await self._gather_with_concurrency(
concurrency_level,
functools.partial(self._tracer_initializer, session_name),
*(functools.partial(process_example, e) for e in examples),
return await arun_on_examples(
examples,
llm_or_chain_factory,
concurrency_level=concurrency_level,
num_repetitions=num_repetitions,
session_name=session_name,
verbose=verbose,
)
return results
@staticmethod
def run_llm(
llm: BaseLanguageModel,
inputs: Dict[str, Any],
langchain_tracer: LangChainTracer,
) -> Union[LLMResult, ChatResult]:
"""Run the language model on the example."""
if isinstance(llm, BaseLLM):
try:
llm_prompts = LangChainPlusClient._get_prompts(inputs)
llm_output = llm.generate(llm_prompts, callbacks=[langchain_tracer])
except InputFormatError:
llm_messages = LangChainPlusClient._get_messages(inputs)
buffer_strings = [
get_buffer_string(messages) for messages in llm_messages
]
llm_output = llm.generate(buffer_strings, callbacks=[langchain_tracer])
elif isinstance(llm, BaseChatModel):
try:
messages = LangChainPlusClient._get_messages(inputs)
llm_output = llm.generate(messages, callbacks=[langchain_tracer])
except InputFormatError:
prompts = LangChainPlusClient._get_prompts(inputs)
converted_messages: List[List[BaseMessage]] = [
[HumanMessage(content=prompt)] for prompt in prompts
]
llm_output = llm.generate(
converted_messages, callbacks=[langchain_tracer]
)
else:
raise ValueError(f"Unsupported LLM type {type(llm)}")
return llm_output
@staticmethod
def run_llm_or_chain(
example: Example,
langchain_tracer: LangChainTracer,
llm_or_chain_factory: MODEL_OR_CHAIN_FACTORY,
n_repetitions: int,
) -> Union[List[dict], List[str], List[LLMResult], List[ChatResult]]:
"""Run the chain synchronously."""
previous_example_id = langchain_tracer.example_id
langchain_tracer.example_id = example.id
outputs = []
for _ in range(n_repetitions):
try:
if isinstance(llm_or_chain_factory, BaseLanguageModel):
output: Any = LangChainPlusClient.run_llm(
llm_or_chain_factory, example.inputs, langchain_tracer
)
else:
chain = llm_or_chain_factory()
output = chain.run(example.inputs, callbacks=[langchain_tracer])
outputs.append(output)
except Exception as e:
logger.warning(f"Chain failed for example {example.id}. Error: {e}")
outputs.append({"Error": str(e)})
langchain_tracer.example_id = previous_example_id
return outputs
def run_on_dataset(
self,
@@ -741,18 +458,11 @@ class LangChainPlusClient(BaseSettings):
session_name, llm_or_chain_factory, dataset_name
)
dataset = self.read_dataset(dataset_name=dataset_name)
examples = list(self.list_examples(dataset_id=str(dataset.id)))
results: Dict[str, Any] = {}
tracer = LangChainTracer(session_name=session_name)
tracer.ensure_session()
for i, example in enumerate(examples):
result = self.run_llm_or_chain(
example,
tracer,
llm_or_chain_factory,
num_repetitions,
)
if verbose:
print(f"{i+1} processed", flush=True, end="\r")
results[str(example.id)] = result
return results
examples = self.list_examples(dataset_id=str(dataset.id))
return run_on_examples(
examples,
llm_or_chain_factory,
num_repetitions=num_repetitions,
session_name=session_name,
verbose=verbose,
)

View File

@@ -0,0 +1,375 @@
"""Utilities for running LLMs/Chains over datasets."""
from __future__ import annotations
import asyncio
import functools
import logging
from typing import Any, Callable, Coroutine, Dict, Iterator, List, Optional, Union
from langchain.base_language import BaseLanguageModel
from langchain.callbacks.base import BaseCallbackHandler
from langchain.callbacks.manager import Callbacks
from langchain.callbacks.tracers.langchain import LangChainTracer
from langchain.chains.base import Chain
from langchain.chat_models.base import BaseChatModel
from langchain.client.models import Example
from langchain.llms.base import BaseLLM
from langchain.schema import (
BaseMessage,
ChatResult,
HumanMessage,
LLMResult,
get_buffer_string,
messages_from_dict,
)
logger = logging.getLogger(__name__)
MODEL_OR_CHAIN_FACTORY = Union[Callable[[], Chain], BaseLanguageModel]
class InputFormatError(Exception):
"""Raised when input format is invalid."""
def _get_prompts(inputs: Dict[str, Any]) -> List[str]:
"""Get prompts from inputs."""
if not inputs:
raise InputFormatError("Inputs should not be empty.")
prompts = []
if "prompt" in inputs:
if not isinstance(inputs["prompt"], str):
raise InputFormatError(
"Expected string for 'prompt', got"
f" {type(inputs['prompt']).__name__}"
)
prompts = [inputs["prompt"]]
elif "prompts" in inputs:
if not isinstance(inputs["prompts"], list) or not all(
isinstance(i, str) for i in inputs["prompts"]
):
raise InputFormatError(
"Expected list of strings for 'prompts',"
f" got {type(inputs['prompts']).__name__}"
)
prompts = inputs["prompts"]
elif len(inputs) == 1:
prompt_ = next(iter(inputs.values()))
if isinstance(prompt_, str):
prompts = [prompt_]
elif isinstance(prompt_, list) and all(isinstance(i, str) for i in prompt_):
prompts = prompt_
else:
raise InputFormatError(f"LLM Run expects string prompt input. Got {inputs}")
else:
raise InputFormatError(
f"LLM Run expects 'prompt' or 'prompts' in inputs. Got {inputs}"
)
return prompts
def _get_messages(inputs: Dict[str, Any]) -> List[List[BaseMessage]]:
"""Get Chat Messages from inputs."""
if not inputs:
raise InputFormatError("Inputs should not be empty.")
if "messages" in inputs:
single_input = inputs["messages"]
elif len(inputs) == 1:
single_input = next(iter(inputs.values()))
else:
raise InputFormatError(f"Chat Run expects 'messages' in inputs. Got {inputs}")
if isinstance(single_input, list) and all(
isinstance(i, dict) for i in single_input
):
raw_messages = [single_input]
elif isinstance(single_input, list) and all(
isinstance(i, list) for i in single_input
):
raw_messages = single_input
else:
raise InputFormatError(
f"Chat Run expects List[dict] or List[List[dict]] 'messages'"
f" input. Got {inputs}"
)
return [messages_from_dict(batch) for batch in raw_messages]
async def _arun_llm(
llm: BaseLanguageModel,
inputs: Dict[str, Any],
langchain_tracer: Optional[LangChainTracer],
) -> Union[LLMResult, ChatResult]:
callbacks: Optional[List[BaseCallbackHandler]] = (
[langchain_tracer] if langchain_tracer else None
)
if isinstance(llm, BaseLLM):
try:
llm_prompts = _get_prompts(inputs)
llm_output = await llm.agenerate(llm_prompts, callbacks=callbacks)
except InputFormatError:
llm_messages = _get_messages(inputs)
buffer_strings = [get_buffer_string(messages) for messages in llm_messages]
llm_output = await llm.agenerate(buffer_strings, callbacks=callbacks)
elif isinstance(llm, BaseChatModel):
try:
messages = _get_messages(inputs)
llm_output = await llm.agenerate(messages, callbacks=callbacks)
except InputFormatError:
prompts = _get_prompts(inputs)
converted_messages: List[List[BaseMessage]] = [
[HumanMessage(content=prompt)] for prompt in prompts
]
llm_output = await llm.agenerate(converted_messages, callbacks=callbacks)
else:
raise ValueError(f"Unsupported LLM type {type(llm)}")
return llm_output
async def _arun_llm_or_chain(
example: Example,
llm_or_chain_factory: MODEL_OR_CHAIN_FACTORY,
n_repetitions: int,
langchain_tracer: Optional[LangChainTracer],
) -> Union[List[dict], List[str], List[LLMResult], List[ChatResult]]:
"""Run the chain asynchronously."""
if langchain_tracer is not None:
previous_example_id = langchain_tracer.example_id
langchain_tracer.example_id = example.id
callbacks: Optional[List[BaseCallbackHandler]] = [langchain_tracer]
else:
previous_example_id = None
callbacks = None
outputs = []
for _ in range(n_repetitions):
try:
if isinstance(llm_or_chain_factory, BaseLanguageModel):
output: Any = await _arun_llm(
llm_or_chain_factory, example.inputs, langchain_tracer
)
else:
chain = llm_or_chain_factory()
output = await chain.arun(example.inputs, callbacks=callbacks)
outputs.append(output)
except Exception as e:
logger.warning(f"Chain failed for example {example.id}. Error: {e}")
outputs.append({"Error": str(e)})
if langchain_tracer is not None:
langchain_tracer.example_id = previous_example_id
return outputs
async def _gather_with_concurrency(
n: int,
initializer: Callable[[], Coroutine[Any, Any, Optional[LangChainTracer]]],
*async_funcs: Callable[[Optional[LangChainTracer], Dict], Coroutine[Any, Any, Any]],
) -> List[Any]:
"""
Run coroutines with a concurrency limit.
Args:
n: The maximum number of concurrent tasks.
initializer: A coroutine that initializes shared resources for the tasks.
async_funcs: The async_funcs to be run concurrently.
Returns:
A list of results from the coroutines.
"""
semaphore = asyncio.Semaphore(n)
job_state = {"num_processed": 0}
tracer_queue: asyncio.Queue[Optional[LangChainTracer]] = asyncio.Queue()
for _ in range(n):
tracer_queue.put_nowait(await initializer())
async def run_coroutine_with_semaphore(
async_func: Callable[
[Optional[LangChainTracer], Dict], Coroutine[Any, Any, Any]
]
) -> Any:
async with semaphore:
tracer = await tracer_queue.get()
try:
result = await async_func(tracer, job_state)
finally:
tracer_queue.put_nowait(tracer)
return result
return await asyncio.gather(
*(run_coroutine_with_semaphore(function) for function in async_funcs)
)
async def _tracer_initializer(session_name: Optional[str]) -> Optional[LangChainTracer]:
"""
Initialize a tracer to share across tasks.
Args:
session_name: The session name for the tracer.
Returns:
A LangChainTracer instance with an active session.
"""
if session_name:
tracer = LangChainTracer(session_name=session_name)
tracer.ensure_session()
return tracer
else:
return None
async def arun_on_examples(
examples: Iterator[Example],
llm_or_chain_factory: MODEL_OR_CHAIN_FACTORY,
*,
concurrency_level: int = 5,
num_repetitions: int = 1,
session_name: Optional[str] = None,
verbose: bool = False,
) -> Dict[str, Any]:
"""
Run the chain on examples and store traces to the specified session name.
Args:
examples: Examples to run the model or chain over
llm_or_chain_factory: Language model or Chain constructor to run
over the dataset. The Chain constructor is used to permit
independent calls on each example without carrying over state.
concurrency_level: The number of async tasks to run concurrently.
num_repetitions: Number of times to run the model on each example.
This is useful when testing success rates or generating confidence
intervals.
session_name: Session name to use when tracing runs.
verbose: Whether to print progress.
Returns:
A dictionary mapping example ids to the model outputs.
"""
results: Dict[str, List[Any]] = {}
async def process_example(
example: Example, tracer: LangChainTracer, job_state: dict
) -> None:
"""Process a single example."""
result = await _arun_llm_or_chain(
example,
llm_or_chain_factory,
num_repetitions,
tracer,
)
results[str(example.id)] = result
job_state["num_processed"] += 1
if verbose:
print(
f"Processed examples: {job_state['num_processed']}",
end="\r",
flush=True,
)
await _gather_with_concurrency(
concurrency_level,
functools.partial(_tracer_initializer, session_name),
*(functools.partial(process_example, e) for e in examples),
)
return results
def run_llm(
llm: BaseLanguageModel,
inputs: Dict[str, Any],
callbacks: Callbacks,
) -> Union[LLMResult, ChatResult]:
"""Run the language model on the example."""
if isinstance(llm, BaseLLM):
try:
llm_prompts = _get_prompts(inputs)
llm_output = llm.generate(llm_prompts, callbacks=callbacks)
except InputFormatError:
llm_messages = _get_messages(inputs)
buffer_strings = [get_buffer_string(messages) for messages in llm_messages]
llm_output = llm.generate(buffer_strings, callbacks=callbacks)
elif isinstance(llm, BaseChatModel):
try:
messages = _get_messages(inputs)
llm_output = llm.generate(messages, callbacks=callbacks)
except InputFormatError:
prompts = _get_prompts(inputs)
converted_messages: List[List[BaseMessage]] = [
[HumanMessage(content=prompt)] for prompt in prompts
]
llm_output = llm.generate(converted_messages, callbacks=callbacks)
else:
raise ValueError(f"Unsupported LLM type {type(llm)}")
return llm_output
def run_llm_or_chain(
example: Example,
llm_or_chain_factory: MODEL_OR_CHAIN_FACTORY,
n_repetitions: int,
langchain_tracer: Optional[LangChainTracer] = None,
) -> Union[List[dict], List[str], List[LLMResult], List[ChatResult]]:
"""Run the chain synchronously."""
if langchain_tracer is not None:
previous_example_id = langchain_tracer.example_id
langchain_tracer.example_id = example.id
callbacks: Optional[List[BaseCallbackHandler]] = [langchain_tracer]
else:
previous_example_id = None
callbacks = None
outputs = []
for _ in range(n_repetitions):
try:
if isinstance(llm_or_chain_factory, BaseLanguageModel):
output: Any = run_llm(llm_or_chain_factory, example.inputs, callbacks)
else:
chain = llm_or_chain_factory()
output = chain.run(example.inputs, callbacks=callbacks)
outputs.append(output)
except Exception as e:
logger.warning(f"Chain failed for example {example.id}. Error: {e}")
outputs.append({"Error": str(e)})
if langchain_tracer is not None:
langchain_tracer.example_id = previous_example_id
return outputs
def run_on_examples(
examples: Iterator[Example],
llm_or_chain_factory: MODEL_OR_CHAIN_FACTORY,
*,
num_repetitions: int = 1,
session_name: Optional[str] = None,
verbose: bool = False,
) -> Dict[str, Any]:
"""Run the chain on examples and store traces to the specified session name.
Args:
examples: Examples to run model or chain over.
llm_or_chain_factory: Language model or Chain constructor to run
over the dataset. The Chain constructor is used to permit
independent calls on each example without carrying over state.
concurrency_level: Number of async workers to run in parallel.
num_repetitions: Number of times to run the model on each example.
This is useful when testing success rates or generating confidence
intervals.
session_name: Session name to use when tracing runs.
verbose: Whether to print progress.
Returns:
A dictionary mapping example ids to the model outputs.
"""
results: Dict[str, Any] = {}
tracer = LangChainTracer(session_name=session_name) if session_name else None
for i, example in enumerate(examples):
result = run_llm_or_chain(
example,
llm_or_chain_factory,
num_repetitions,
langchain_tracer=tracer,
)
if verbose:
print(f"{i+1} processed", flush=True, end="\r")
results[str(example.id)] = result
return results

View File

@@ -48,6 +48,7 @@ from langchain.document_loaders.image_captions import ImageCaptionLoader
from langchain.document_loaders.imsdb import IMSDbLoader
from langchain.document_loaders.json_loader import JSONLoader
from langchain.document_loaders.markdown import UnstructuredMarkdownLoader
from langchain.document_loaders.mastodon import MastodonTootsLoader
from langchain.document_loaders.mediawikidump import MWDumpLoader
from langchain.document_loaders.modern_treasury import ModernTreasuryLoader
from langchain.document_loaders.notebook import NotebookLoader
@@ -69,6 +70,7 @@ from langchain.document_loaders.pdf import (
UnstructuredPDFLoader,
)
from langchain.document_loaders.powerpoint import UnstructuredPowerPointLoader
from langchain.document_loaders.psychic import PsychicLoader
from langchain.document_loaders.python import PythonLoader
from langchain.document_loaders.readthedocs import ReadTheDocsLoader
from langchain.document_loaders.reddit import RedditPostsLoader
@@ -159,6 +161,7 @@ __all__ = [
"ImageCaptionLoader",
"JSONLoader",
"MWDumpLoader",
"MastodonTootsLoader",
"MathpixPDFLoader",
"ModernTreasuryLoader",
"NotebookLoader",
@@ -215,4 +218,5 @@ __all__ = [
"YoutubeLoader",
"TelegramChatLoader",
"ToMarkdownLoader",
"PsychicLoader",
]

View File

@@ -41,7 +41,7 @@ class ApifyDatasetLoader(BaseLoader, BaseModel):
values["apify_client"] = ApifyClient()
except ImportError:
raise ValueError(
raise ImportError(
"Could not import apify-client Python package. "
"Please install it with `pip install apify-client`."
)

View File

@@ -63,7 +63,7 @@ class DocugamiLoader(BaseLoader, BaseModel):
try:
from lxml import etree
except ImportError:
raise ValueError(
raise ImportError(
"Could not import lxml python package. "
"Please install it with `pip install lxml`."
)
@@ -259,7 +259,7 @@ class DocugamiLoader(BaseLoader, BaseModel):
try:
from lxml import etree
except ImportError:
raise ValueError(
raise ImportError(
"Could not import lxml python package. "
"Please install it with `pip install lxml`."
)

View File

@@ -33,7 +33,7 @@ class DuckDBLoader(BaseLoader):
try:
import duckdb
except ImportError:
raise ValueError(
raise ImportError(
"Could not import duckdb python package. "
"Please install it with `pip install duckdb`."
)

View File

@@ -39,7 +39,7 @@ class ImageCaptionLoader(BaseLoader):
try:
from transformers import BlipForConditionalGeneration, BlipProcessor
except ImportError:
raise ValueError(
raise ImportError(
"`transformers` package not found, please install with "
"`pip install transformers`."
)
@@ -66,7 +66,7 @@ class ImageCaptionLoader(BaseLoader):
try:
from PIL import Image
except ImportError:
raise ValueError(
raise ImportError(
"`PIL` package not found, please install with `pip install pillow`"
)

View File

@@ -42,7 +42,7 @@ class JSONLoader(BaseLoader):
try:
import jq # noqa:F401
except ImportError:
raise ValueError(
raise ImportError(
"jq package not found, please install it with `pip install jq`"
)

View File

@@ -0,0 +1,88 @@
"""Mastodon document loader."""
from __future__ import annotations
import os
from typing import TYPE_CHECKING, Any, Dict, Iterable, List, Optional, Sequence
from langchain.docstore.document import Document
from langchain.document_loaders.base import BaseLoader
if TYPE_CHECKING:
import mastodon
def _dependable_mastodon_import() -> mastodon:
try:
import mastodon
except ImportError:
raise ValueError(
"Mastodon.py package not found, "
"please install it with `pip install Mastodon.py`"
)
return mastodon
class MastodonTootsLoader(BaseLoader):
"""Mastodon toots loader."""
def __init__(
self,
mastodon_accounts: Sequence[str],
number_toots: Optional[int] = 100,
exclude_replies: bool = False,
access_token: Optional[str] = None,
api_base_url: str = "https://mastodon.social",
):
"""Instantiate Mastodon toots loader.
Args:
mastodon_accounts: The list of Mastodon accounts to query.
number_toots: How many toots to pull for each account.
exclude_replies: Whether to exclude reply toots from the load.
access_token: An access token if toots are loaded as a Mastodon app. Can
also be specified via the environment variables "MASTODON_ACCESS_TOKEN".
api_base_url: A Mastodon API base URL to talk to, if not using the default.
"""
mastodon = _dependable_mastodon_import()
access_token = access_token or os.environ.get("MASTODON_ACCESS_TOKEN")
self.api = mastodon.Mastodon(
access_token=access_token, api_base_url=api_base_url
)
self.mastodon_accounts = mastodon_accounts
self.number_toots = number_toots
self.exclude_replies = exclude_replies
def load(self) -> List[Document]:
"""Load toots into documents."""
results: List[Document] = []
for account in self.mastodon_accounts:
user = self.api.account_lookup(account)
toots = self.api.account_statuses(
user.id,
only_media=False,
pinned=False,
exclude_replies=self.exclude_replies,
exclude_reblogs=True,
limit=self.number_toots,
)
docs = self._format_toots(toots, user)
results.extend(docs)
return results
def _format_toots(
self, toots: List[Dict[str, Any]], user_info: dict
) -> Iterable[Document]:
"""Format toots into documents.
Adding user info, and selected toot fields into the metadata.
"""
for toot in toots:
metadata = {
"created_at": toot["created_at"],
"user_info": user_info,
"is_reply": toot["in_reply_to_id"] is not None,
}
yield Document(
page_content=toot["content"],
metadata=metadata,
)

View File

@@ -83,7 +83,7 @@ class NotebookLoader(BaseLoader):
try:
import pandas as pd
except ImportError:
raise ValueError(
raise ImportError(
"pandas is needed for Notebook Loader, "
"please install with `pip install pandas`"
)

View File

@@ -77,7 +77,7 @@ class OneDriveLoader(BaseLoader, BaseModel):
try:
from O365 import FileSystemTokenBackend
except ImportError:
raise ValueError(
raise ImportError(
"O365 package not found, please install it with `pip install o365`"
)
if self.auth_with_token:

View File

@@ -103,7 +103,7 @@ class PyPDFLoader(BasePDFLoader):
try:
import pypdf # noqa:F401
except ImportError:
raise ValueError(
raise ImportError(
"pypdf package not found, please install it with " "`pip install pypdf`"
)
self.parser = PyPDFParser()
@@ -194,7 +194,7 @@ class PDFMinerLoader(BasePDFLoader):
try:
from pdfminer.high_level import extract_text # noqa:F401
except ImportError:
raise ValueError(
raise ImportError(
"`pdfminer` package not found, please install it with "
"`pip install pdfminer.six`"
)
@@ -222,7 +222,7 @@ class PDFMinerPDFasHTMLLoader(BasePDFLoader):
try:
from pdfminer.high_level import extract_text_to_fp # noqa:F401
except ImportError:
raise ValueError(
raise ImportError(
"`pdfminer` package not found, please install it with "
"`pip install pdfminer.six`"
)
@@ -256,7 +256,7 @@ class PyMuPDFLoader(BasePDFLoader):
try:
import fitz # noqa:F401
except ImportError:
raise ValueError(
raise ImportError(
"`PyMuPDF` package not found, please install it with "
"`pip install pymupdf`"
)
@@ -375,7 +375,7 @@ class PDFPlumberLoader(BasePDFLoader):
try:
import pdfplumber # noqa:F401
except ImportError:
raise ValueError(
raise ImportError(
"pdfplumber package not found, please install it with "
"`pip install pdfplumber`"
)

View File

@@ -23,7 +23,7 @@ class UnstructuredPowerPointLoader(UnstructuredFileLoader):
is_ppt = detect_filetype(self.file_path) == FileType.PPT
except ImportError:
_, extension = os.path.splitext(self.file_path)
_, extension = os.path.splitext(str(self.file_path))
is_ppt = extension == ".ppt"
if is_ppt and unstructured_version < (0, 4, 11):

View File

@@ -0,0 +1,34 @@
"""Loader that loads documents from Psychic.dev."""
from typing import List
from langchain.docstore.document import Document
from langchain.document_loaders.base import BaseLoader
class PsychicLoader(BaseLoader):
"""Loader that loads documents from Psychic.dev."""
def __init__(self, api_key: str, connector_id: str, connection_id: str):
"""Initialize with API key, connector id, and connection id."""
try:
from psychicapi import ConnectorId, Psychic # noqa: F401
except ImportError:
raise ImportError(
"`psychicapi` package not found, please run `pip install psychicapi`"
)
self.psychic = Psychic(secret_key=api_key)
self.connector_id = ConnectorId(connector_id)
self.connection_id = connection_id
def load(self) -> List[Document]:
"""Load documents."""
psychic_docs = self.psychic.get_documents(self.connector_id, self.connection_id)
return [
Document(
page_content=doc["content"],
metadata={"title": doc["title"], "source": doc["uri"]},
)
for doc in psychic_docs
]

View File

@@ -19,9 +19,8 @@ class ReadTheDocsLoader(BaseLoader):
"""Initialize path."""
try:
from bs4 import BeautifulSoup
except ImportError:
raise ValueError(
raise ImportError(
"Could not import python packages. "
"Please install it with `pip install beautifulsoup4`. "
)

View File

@@ -19,7 +19,7 @@ class S3DirectoryLoader(BaseLoader):
try:
import boto3
except ImportError:
raise ValueError(
raise ImportError(
"Could not import boto3 python package. "
"Please install it with `pip install boto3`."
)

View File

@@ -21,7 +21,7 @@ class S3FileLoader(BaseLoader):
try:
import boto3
except ImportError:
raise ValueError(
raise ImportError(
"Could not import `boto3` python package. "
"Please install it with `pip install boto3`."
)

View File

@@ -58,7 +58,7 @@ class SitemapLoader(WebBaseLoader):
try:
import lxml # noqa:F401
except ImportError:
raise ValueError(
raise ImportError(
"lxml package not found, please install it with " "`pip install lxml`"
)
@@ -107,8 +107,9 @@ class SitemapLoader(WebBaseLoader):
try:
import bs4
except ImportError:
raise ValueError(
"bs4 package not found, please install it with " "`pip install bs4`"
raise ImportError(
"beautifulsoup4 package not found, please install it"
" with `pip install beautifulsoup4`"
)
fp = open(self.web_path)
soup = bs4.BeautifulSoup(fp, "xml")

View File

@@ -13,8 +13,8 @@ class SRTLoader(BaseLoader):
try:
import pysrt # noqa:F401
except ImportError:
raise ValueError(
"package `pysrt` not found, please install it with `pysrt`"
raise ImportError(
"package `pysrt` not found, please install it with `pip install pysrt`"
)
self.file_path = file_path

View File

@@ -226,7 +226,7 @@ class TelegramChatApiLoader(BaseLoader):
nest_asyncio.apply()
asyncio.run(self.fetch_data_from_telegram())
except ImportError:
raise ValueError(
raise ImportError(
"""`nest_asyncio` package not found.
please install with `pip install nest_asyncio`
"""
@@ -239,7 +239,7 @@ class TelegramChatApiLoader(BaseLoader):
try:
import pandas as pd
except ImportError:
raise ValueError(
raise ImportError(
"""`pandas` package not found.
please install with `pip install pandas`
"""

View File

@@ -15,7 +15,7 @@ def _dependable_tweepy_import() -> tweepy:
try:
import tweepy
except ImportError:
raise ValueError(
raise ImportError(
"tweepy package not found, please install it with `pip install tweepy`"
)
return tweepy

View File

@@ -1,6 +1,7 @@
"""Loader that uses unstructured to load files."""
import collections
from abc import ABC, abstractmethod
from typing import IO, Any, List
from typing import IO, Any, List, Sequence, Union
from langchain.docstore.document import Document
from langchain.document_loaders.base import BaseLoader
@@ -92,7 +93,10 @@ class UnstructuredFileLoader(UnstructuredBaseLoader):
"""Loader that uses unstructured to load files."""
def __init__(
self, file_path: str, mode: str = "single", **unstructured_kwargs: Any
self,
file_path: Union[str, List[str]],
mode: str = "single",
**unstructured_kwargs: Any,
):
"""Initialize with file path."""
self.file_path = file_path
@@ -107,12 +111,48 @@ class UnstructuredFileLoader(UnstructuredBaseLoader):
return {"source": self.file_path}
def get_elements_from_api(
file_path: Union[str, List[str], None] = None,
file: Union[IO, Sequence[IO], None] = None,
api_url: str = "https://api.unstructured.io/general/v0/general",
api_key: str = "",
**unstructured_kwargs: Any,
) -> List:
"""Retrieves a list of elements from the Unstructured API."""
if isinstance(file, collections.abc.Sequence) or isinstance(file_path, list):
from unstructured.partition.api import partition_multiple_via_api
_doc_elements = partition_multiple_via_api(
filenames=file_path,
files=file,
api_key=api_key,
api_url=api_url,
**unstructured_kwargs,
)
elements = []
for _elements in _doc_elements:
elements.extend(_elements)
return elements
else:
from unstructured.partition.api import partition_via_api
return partition_via_api(
filename=file_path,
file=file,
api_key=api_key,
api_url=api_url,
**unstructured_kwargs,
)
class UnstructuredAPIFileLoader(UnstructuredFileLoader):
"""Loader that uses the unstructured web API to load files."""
def __init__(
self,
file_path: str,
file_path: Union[str, List[str]] = "",
mode: str = "single",
url: str = "https://api.unstructured.io/general/v0/general",
api_key: str = "",
@@ -120,23 +160,22 @@ class UnstructuredAPIFileLoader(UnstructuredFileLoader):
):
"""Initialize with file path."""
min_unstructured_version = "0.6.2"
if not satisfies_min_unstructured_version(min_unstructured_version):
raise ValueError(
"Partitioning via API is only supported in "
f"unstructured>={min_unstructured_version}."
)
if isinstance(file_path, str):
validate_unstructured_version(min_unstructured_version="0.6.2")
else:
validate_unstructured_version(min_unstructured_version="0.6.3")
self.url = url
self.api_key = api_key
super().__init__(file_path=file_path, mode=mode, **unstructured_kwargs)
def _get_elements(self) -> List:
from unstructured.partition.api import partition_via_api
def _get_metadata(self) -> dict:
return {"source": self.file_path}
return partition_via_api(
filename=self.file_path,
def _get_elements(self) -> List:
return get_elements_from_api(
file_path=self.file_path,
api_key=self.api_key,
api_url=self.url,
**self.unstructured_kwargs,
@@ -146,7 +185,12 @@ class UnstructuredAPIFileLoader(UnstructuredFileLoader):
class UnstructuredFileIOLoader(UnstructuredBaseLoader):
"""Loader that uses unstructured to load file IO objects."""
def __init__(self, file: IO, mode: str = "single", **unstructured_kwargs: Any):
def __init__(
self,
file: Union[IO, Sequence[IO]],
mode: str = "single",
**unstructured_kwargs: Any,
):
"""Initialize with file path."""
self.file = file
super().__init__(mode=mode, **unstructured_kwargs)
@@ -165,7 +209,7 @@ class UnstructuredAPIFileIOLoader(UnstructuredFileIOLoader):
def __init__(
self,
file: IO,
file: Union[IO, Sequence[IO]],
mode: str = "single",
url: str = "https://api.unstructured.io/general/v0/general",
api_key: str = "",
@@ -173,21 +217,18 @@ class UnstructuredAPIFileIOLoader(UnstructuredFileIOLoader):
):
"""Initialize with file path."""
min_unstructured_version = "0.6.2"
if not satisfies_min_unstructured_version(min_unstructured_version):
raise ValueError(
"Partitioning via API is only supported in "
f"unstructured>={min_unstructured_version}."
)
if isinstance(file, collections.abc.Sequence):
validate_unstructured_version(min_unstructured_version="0.6.3")
if file:
validate_unstructured_version(min_unstructured_version="0.6.2")
self.url = url
self.api_key = api_key
super().__init__(file=file, mode=mode, **unstructured_kwargs)
def _get_elements(self) -> List:
from unstructured.partition.api import partition_via_api
return partition_via_api(
return get_elements_from_api(
file=self.file,
api_key=self.api_key,
api_url=self.url,

View File

@@ -30,7 +30,7 @@ class PlaywrightURLLoader(BaseLoader):
try:
import playwright # noqa:F401
except ImportError:
raise ValueError(
raise ImportError(
"playwright package not found, please install it with "
"`pip install playwright`"
)

View File

@@ -40,7 +40,7 @@ class SeleniumURLLoader(BaseLoader):
try:
import selenium # noqa:F401
except ImportError:
raise ValueError(
raise ImportError(
"selenium package not found, please install it with "
"`pip install selenium`"
)
@@ -48,7 +48,7 @@ class SeleniumURLLoader(BaseLoader):
try:
import unstructured # noqa:F401
except ImportError:
raise ValueError(
raise ImportError(
"unstructured package not found, please install it with "
"`pip install unstructured`"
)

View File

@@ -82,7 +82,7 @@ class UnstructuredWordDocumentLoader(UnstructuredFileLoader):
is_doc = detect_filetype(self.file_path) == FileType.DOC
except ImportError:
_, extension = os.path.splitext(self.file_path)
_, extension = os.path.splitext(str(self.file_path))
is_doc = extension == ".doc"
if is_doc and unstructured_version < (0, 4, 11):

View File

@@ -48,7 +48,7 @@ class CohereEmbeddings(BaseModel, Embeddings):
values["client"] = cohere.Client(cohere_api_key)
except ImportError:
raise ValueError(
raise ImportError(
"Could not import cohere python package. "
"Please install it with `pip install cohere`."
)

View File

@@ -46,7 +46,7 @@ class HuggingFaceEmbeddings(BaseModel, Embeddings):
import sentence_transformers
except ImportError as exc:
raise ValueError(
raise ImportError(
"Could not import sentence_transformers python package. "
"Please install it with `pip install sentence_transformers`."
) from exc

View File

@@ -34,7 +34,7 @@ class JinaEmbeddings(BaseModel, Embeddings):
try:
import jina
except ImportError:
raise ValueError(
raise ImportError(
"Could not import `jina` python package. "
"Please install it with `pip install jina`."
)

View File

@@ -178,7 +178,7 @@ class OpenAIEmbeddings(BaseModel, Embeddings):
openai.api_type = openai_api_type
values["client"] = openai.Embedding
except ImportError:
raise ValueError(
raise ImportError(
"Could not import openai python package. "
"Please install it with `pip install openai`."
)
@@ -192,67 +192,64 @@ class OpenAIEmbeddings(BaseModel, Embeddings):
embeddings: List[List[float]] = [[] for _ in range(len(texts))]
try:
import tiktoken
tokens = []
indices = []
encoding = tiktoken.model.encoding_for_model(self.model)
for i, text in enumerate(texts):
if self.model.endswith("001"):
# See: https://github.com/openai/openai-python/issues/418#issuecomment-1525939500
# replace newlines, which can negatively affect performance.
text = text.replace("\n", " ")
token = encoding.encode(
text,
allowed_special=self.allowed_special,
disallowed_special=self.disallowed_special,
)
for j in range(0, len(token), self.embedding_ctx_length):
tokens += [token[j : j + self.embedding_ctx_length]]
indices += [i]
batched_embeddings = []
_chunk_size = chunk_size or self.chunk_size
for i in range(0, len(tokens), _chunk_size):
response = embed_with_retry(
self,
input=tokens[i : i + _chunk_size],
engine=self.deployment,
request_timeout=self.request_timeout,
headers=self.headers,
)
batched_embeddings += [r["embedding"] for r in response["data"]]
results: List[List[List[float]]] = [[] for _ in range(len(texts))]
num_tokens_in_batch: List[List[int]] = [[] for _ in range(len(texts))]
for i in range(len(indices)):
results[indices[i]].append(batched_embeddings[i])
num_tokens_in_batch[indices[i]].append(len(tokens[i]))
for i in range(len(texts)):
_result = results[i]
if len(_result) == 0:
average = embed_with_retry(
self,
input="",
engine=self.deployment,
request_timeout=self.request_timeout,
headers=self.headers,
)["data"][0]["embedding"]
else:
average = np.average(
_result, axis=0, weights=num_tokens_in_batch[i]
)
embeddings[i] = (average / np.linalg.norm(average)).tolist()
return embeddings
except ImportError:
raise ValueError(
raise ImportError(
"Could not import tiktoken python package. "
"This is needed in order to for OpenAIEmbeddings. "
"Please install it with `pip install tiktoken`."
)
tokens = []
indices = []
encoding = tiktoken.model.encoding_for_model(self.model)
for i, text in enumerate(texts):
if self.model.endswith("001"):
# See: https://github.com/openai/openai-python/issues/418#issuecomment-1525939500
# replace newlines, which can negatively affect performance.
text = text.replace("\n", " ")
token = encoding.encode(
text,
allowed_special=self.allowed_special,
disallowed_special=self.disallowed_special,
)
for j in range(0, len(token), self.embedding_ctx_length):
tokens += [token[j : j + self.embedding_ctx_length]]
indices += [i]
batched_embeddings = []
_chunk_size = chunk_size or self.chunk_size
for i in range(0, len(tokens), _chunk_size):
response = embed_with_retry(
self,
input=tokens[i : i + _chunk_size],
engine=self.deployment,
request_timeout=self.request_timeout,
headers=self.headers,
)
batched_embeddings += [r["embedding"] for r in response["data"]]
results: List[List[List[float]]] = [[] for _ in range(len(texts))]
num_tokens_in_batch: List[List[int]] = [[] for _ in range(len(texts))]
for i in range(len(indices)):
results[indices[i]].append(batched_embeddings[i])
num_tokens_in_batch[indices[i]].append(len(tokens[i]))
for i in range(len(texts)):
_result = results[i]
if len(_result) == 0:
average = embed_with_retry(
self,
input="",
engine=self.deployment,
request_timeout=self.request_timeout,
headers=self.headers,
)["data"][0]["embedding"]
else:
average = np.average(_result, axis=0, weights=num_tokens_in_batch[i])
embeddings[i] = (average / np.linalg.norm(average)).tolist()
return embeddings
def _embedding_func(self, text: str, *, engine: str) -> List[float]:
"""Call out to OpenAI's embedding endpoint."""
# handle large input text

View File

@@ -30,13 +30,20 @@ class TensorflowHubEmbeddings(BaseModel, Embeddings):
super().__init__(**kwargs)
try:
import tensorflow_hub
except ImportError:
raise ImportError(
"Could not import tensorflow-hub python package. "
"Please install it with `pip install tensorflow-hub``."
)
try:
import tensorflow_text # noqa
except ImportError:
raise ImportError(
"Could not import tensorflow_text python package. "
"Please install it with `pip install tensorflow_text``."
)
self.embed = tensorflow_hub.load(self.model_url)
except ImportError as e:
raise ValueError(
"Could not import some python packages." "Please install them."
) from e
self.embed = tensorflow_hub.load(self.model_url)
class Config:
"""Configuration for this pydantic object."""

View File

@@ -1,3 +1,4 @@
from langchain.experimental.autonomous_agents.autogpt.agent import AutoGPT
from langchain.experimental.autonomous_agents.baby_agi.baby_agi import BabyAGI
__all__ = ["BabyAGI"]
__all__ = ["BabyAGI", "AutoGPT"]

View File

@@ -1,4 +1,5 @@
"""Graph implementations."""
from langchain.graphs.neo4j_graph import Neo4jGraph
from langchain.graphs.networkx_graph import NetworkxEntityGraph
__all__ = ["NetworkxEntityGraph"]
__all__ = ["NetworkxEntityGraph", "Neo4jGraph"]

View File

@@ -0,0 +1,100 @@
from typing import Any, Dict, List
node_properties_query = """
CALL apoc.meta.data()
YIELD label, other, elementType, type, property
WHERE NOT type = "RELATIONSHIP" AND elementType = "node"
WITH label AS nodeLabels, collect({property:property, type:type}) AS properties
RETURN {labels: nodeLabels, properties: properties} AS output
"""
rel_properties_query = """
CALL apoc.meta.data()
YIELD label, other, elementType, type, property
WHERE NOT type = "RELATIONSHIP" AND elementType = "relationship"
WITH label AS nodeLabels, collect({property:property, type:type}) AS properties
RETURN {type: nodeLabels, properties: properties} AS output
"""
rel_query = """
CALL apoc.meta.data()
YIELD label, other, elementType, type, property
WHERE type = "RELATIONSHIP" AND elementType = "node"
RETURN "(:" + label + ")-[:" + property + "]->(:" + toString(other[0]) + ")" AS output
"""
class Neo4jGraph:
"""Neo4j wrapper for graph operations."""
def __init__(
self, url: str, username: str, password: str, database: str = "neo4j"
) -> None:
"""Create a new Neo4j graph wrapper instance."""
try:
import neo4j
except ImportError:
raise ValueError(
"Could not import neo4j python package. "
"Please install it with `pip install neo4j`."
)
self._driver = neo4j.GraphDatabase.driver(url, auth=(username, password))
self._database = database
self.schema = ""
# Verify connection
try:
self._driver.verify_connectivity()
except neo4j.exceptions.ServiceUnavailable:
raise ValueError(
"Could not connect to Neo4j database. "
"Please ensure that the url is correct"
)
except neo4j.exceptions.AuthError:
raise ValueError(
"Could not connect to Neo4j database. "
"Please ensure that the username and password are correct"
)
# Set schema
try:
self.refresh_schema()
except neo4j.exceptions.ClientError:
raise ValueError(
"Could not use APOC procedures. "
"Please install the APOC plugin in Neo4j."
)
@property
def get_schema(self) -> str:
"""Returns the schema of the Neo4j database"""
return self.schema
def query(self, query: str, params: dict = {}) -> List[Dict[str, Any]]:
"""Query Neo4j database."""
from neo4j.exceptions import CypherSyntaxError
with self._driver.session(database=self._database) as session:
try:
data = session.run(query, params)
# Hard limit of 50 results
return [r.data() for r in data][:50]
except CypherSyntaxError as e:
raise ValueError("Generated Cypher Statement is not valid\n" f"{e}")
def refresh_schema(self) -> None:
"""
Refreshes the Neo4j graph schema information.
"""
node_properties = self.query(node_properties_query)
relationships_properties = self.query(rel_properties_query)
relationships = self.query(rel_query)
self.schema = f"""
Node properties are the following:
{[el['output'] for el in node_properties]}
Relationship properties are the following:
{[el['output'] for el in relationships_properties]}
The relationships are the following:
{[el['output'] for el in relationships]}
"""

View File

@@ -54,7 +54,7 @@ class NetworkxEntityGraph:
try:
import networkx as nx
except ImportError:
raise ValueError(
raise ImportError(
"Could not import networkx python package. "
"Please install it with `pip install networkx`."
)
@@ -70,7 +70,7 @@ class NetworkxEntityGraph:
try:
import networkx as nx
except ImportError:
raise ValueError(
raise ImportError(
"Could not import networkx python package. "
"Please install it with `pip install networkx`."
)

View File

@@ -24,6 +24,7 @@ from langchain.llms.llamacpp import LlamaCpp
from langchain.llms.modal import Modal
from langchain.llms.nlpcloud import NLPCloud
from langchain.llms.openai import AzureOpenAI, OpenAI, OpenAIChat
from langchain.llms.openlm import OpenLM
from langchain.llms.petals import Petals
from langchain.llms.pipelineai import PipelineAI
from langchain.llms.predictionguard import PredictionGuard
@@ -53,6 +54,7 @@ __all__ = [
"NLPCloud",
"OpenAI",
"OpenAIChat",
"OpenLM",
"Petals",
"PipelineAI",
"HuggingFaceEndpoint",
@@ -96,6 +98,7 @@ type_to_cls_dict: Dict[str, Type[BaseLLM]] = {
"nlpcloud": NLPCloud,
"human-input": HumanInputLLM,
"openai": OpenAI,
"openlm": OpenLM,
"petals": Petals,
"pipelineai": PipelineAI,
"huggingface_pipeline": HuggingFacePipeline,

View File

@@ -148,7 +148,7 @@ class AlephAlpha(LLM):
values["client"] = aleph_alpha_client.Client(token=aleph_alpha_api_key)
except ImportError:
raise ValueError(
raise ImportError(
"Could not import aleph_alpha_client python package. "
"Please install it with `pip install aleph_alpha_client`."
)

View File

@@ -59,7 +59,7 @@ class _AnthropicCommon(BaseModel):
values["AI_PROMPT"] = anthropic.AI_PROMPT
values["count_tokens"] = anthropic.count_tokens
except ImportError:
raise ValueError(
raise ImportError(
"Could not import anthropic python package. "
"Please it install it with `pip install anthropic`."
)

View File

@@ -91,7 +91,7 @@ class Banana(LLM):
try:
import banana_dev as banana
except ImportError:
raise ValueError(
raise ImportError(
"Could not import banana-dev python package. "
"Please install it with `pip install banana-dev`."
)

View File

@@ -72,7 +72,7 @@ class Cohere(LLM):
values["client"] = cohere.Client(cohere_api_key)
except ImportError:
raise ValueError(
raise ImportError(
"Could not import cohere python package. "
"Please install it with `pip install cohere`."
)

View File

@@ -29,7 +29,10 @@ def _create_retry_decorator() -> Callable[[Any], Any]:
try:
import google.api_core.exceptions
except ImportError:
raise ImportError()
raise ImportError(
"Could not import google-api-core python package. "
"Please install it with `pip install google-api-core`."
)
multiplier = 2
min_seconds = 1
@@ -105,7 +108,10 @@ class GooglePalm(BaseLLM, BaseModel):
genai.configure(api_key=google_api_key)
except ImportError:
raise ImportError("Could not import google.generativeai python package.")
raise ImportError(
"Could not import google-generativeai python package. "
"Please install it with `pip install google-generativeai`."
)
values["client"] = genai

View File

@@ -100,7 +100,7 @@ class GooseAI(LLM):
openai.api_base = "https://api.goose.ai/v1"
values["client"] = openai.Completion
except ImportError:
raise ValueError(
raise ImportError(
"Could not import openai python package. "
"Please install it with `pip install openai`."
)

View File

@@ -97,7 +97,7 @@ class HuggingFaceTextGenInference(LLM):
values["inference_server_url"], timeout=values["timeout"]
)
except ImportError:
raise ValueError(
raise ImportError(
"Could not import text_generation python package. "
"Please install it with `pip install text_generation`."
)

View File

@@ -75,7 +75,7 @@ class NLPCloud(LLM):
values["model_name"], nlpcloud_api_key, gpu=True, lang="en"
)
except ImportError:
raise ValueError(
raise ImportError(
"Could not import nlpcloud python package. "
"Please install it with `pip install nlpcloud`."
)

View File

@@ -234,7 +234,7 @@ class BaseOpenAI(BaseLLM):
openai.organization = openai_organization
values["client"] = openai.Completion
except ImportError:
raise ValueError(
raise ImportError(
"Could not import openai python package. "
"Please install it with `pip install openai`."
)
@@ -454,15 +454,15 @@ class BaseOpenAI(BaseLLM):
"""Return type of llm."""
return "openai"
def get_num_tokens(self, text: str) -> int:
"""Calculate num tokens with tiktoken package."""
def get_token_ids(self, text: str) -> List[int]:
"""Get the token IDs using the tiktoken package."""
# tiktoken NOT supported for Python < 3.8
if sys.version_info[1] < 8:
return super().get_num_tokens(text)
try:
import tiktoken
except ImportError:
raise ValueError(
raise ImportError(
"Could not import tiktoken python package. "
"This is needed in order to calculate get_num_tokens. "
"Please install it with `pip install tiktoken`."
@@ -470,15 +470,12 @@ class BaseOpenAI(BaseLLM):
enc = tiktoken.encoding_for_model(self.model_name)
tokenized_text = enc.encode(
return enc.encode(
text,
allowed_special=self.allowed_special,
disallowed_special=self.disallowed_special,
)
# calculate the number of tokens in the encoded text
return len(tokenized_text)
def modelname_to_contextsize(self, modelname: str) -> int:
"""Calculate the maximum number of tokens possible to generate for a model.
@@ -680,7 +677,7 @@ class OpenAIChat(BaseLLM):
if openai_organization:
openai.organization = openai_organization
except ImportError:
raise ValueError(
raise ImportError(
"Could not import openai python package. "
"Please install it with `pip install openai`."
)
@@ -802,28 +799,23 @@ class OpenAIChat(BaseLLM):
"""Return type of llm."""
return "openai-chat"
def get_num_tokens(self, text: str) -> int:
"""Calculate num tokens with tiktoken package."""
def get_token_ids(self, text: str) -> List[int]:
"""Get the token IDs using the tiktoken package."""
# tiktoken NOT supported for Python < 3.8
if sys.version_info[1] < 8:
return super().get_num_tokens(text)
return super().get_token_ids(text)
try:
import tiktoken
except ImportError:
raise ValueError(
raise ImportError(
"Could not import tiktoken python package. "
"This is needed in order to calculate get_num_tokens. "
"Please install it with `pip install tiktoken`."
)
# create a GPT-3.5-Turbo encoder instance
enc = tiktoken.encoding_for_model("gpt-3.5-turbo")
# encode the text using the GPT-3.5-Turbo encoder
tokenized_text = enc.encode(
enc = tiktoken.encoding_for_model(self.model_name)
return enc.encode(
text,
allowed_special=self.allowed_special,
disallowed_special=self.disallowed_special,
)
# calculate the number of tokens in the encoded text
return len(tokenized_text)

26
langchain/llms/openlm.py Normal file
View File

@@ -0,0 +1,26 @@
from typing import Any, Dict
from pydantic import root_validator
from langchain.llms.openai import BaseOpenAI
class OpenLM(BaseOpenAI):
@property
def _invocation_params(self) -> Dict[str, Any]:
return {**{"model": self.model_name}, **super()._invocation_params}
@root_validator()
def validate_environment(cls, values: Dict) -> Dict:
try:
import openlm
values["client"] = openlm.Completion
except ImportError:
raise ValueError(
"Could not import openlm python package. "
"Please install it with `pip install openlm`."
)
if values["streaming"]:
raise ValueError("Streaming not supported with openlm")
return values

View File

@@ -50,7 +50,7 @@ class PredictionGuard(LLM):
values["client"] = pg.Client(token=token)
except ImportError:
raise ValueError(
raise ImportError(
"Could not import predictionguard python package. "
"Please install it with `pip install predictionguard`."
)

View File

@@ -89,7 +89,7 @@ class Replicate(LLM):
try:
import replicate as replicate_python
except ImportError:
raise ValueError(
raise ImportError(
"Could not import replicate python package. "
"Please install it with `pip install replicate`."
)

View File

@@ -103,7 +103,7 @@ class RWKV(LLM, BaseModel):
try:
import tokenizers
except ImportError:
raise ValueError(
raise ImportError(
"Could not import tokenizers python package. "
"Please install it with `pip install tokenizers`."
)

View File

@@ -182,7 +182,7 @@ class SagemakerEndpoint(LLM):
) from e
except ImportError:
raise ValueError(
raise ImportError(
"Could not import boto3 python package. "
"Please install it with `pip install boto3`."
)

View File

@@ -155,7 +155,7 @@ class SelfHostedPipeline(LLM):
import runhouse as rh
except ImportError:
raise ValueError(
raise ImportError(
"Could not import runhouse python package. "
"Please install it with `pip install runhouse`."
)

View File

@@ -1,5 +1,5 @@
"""Math utils."""
from typing import List, Union
from typing import List, Optional, Tuple, Union
import numpy as np
@@ -23,3 +23,34 @@ def cosine_similarity(X: Matrix, Y: Matrix) -> np.ndarray:
similarity = np.dot(X, Y.T) / np.outer(X_norm, Y_norm)
similarity[np.isnan(similarity) | np.isinf(similarity)] = 0.0
return similarity
def cosine_similarity_top_k(
X: Matrix,
Y: Matrix,
top_k: Optional[int] = 5,
score_threshold: Optional[float] = None,
) -> Tuple[List[Tuple[int, int]], List[float]]:
"""Row-wise cosine similarity with optional top-k and score threshold filtering.
Args:
X: Matrix.
Y: Matrix, same width as X.
top_k: Max number of results to return.
score_threshold: Minimum cosine similarity of results.
Returns:
Tuple of two lists. First contains two-tuples of indices (X_idx, Y_idx),
second contains corresponding cosine similarities.
"""
if len(X) == 0 or len(Y) == 0:
return [], []
score_array = cosine_similarity(X, Y)
sorted_idxs = score_array.flatten().argsort()[::-1]
top_k = top_k or len(sorted_idxs)
top_idxs = sorted_idxs[:top_k]
score_threshold = score_threshold or -1.0
top_idxs = top_idxs[score_array.flatten()[top_idxs] > score_threshold]
ret_idxs = [(x // score_array.shape[1], x % score_array.shape[1]) for x in top_idxs]
scores = score_array.flatten()[top_idxs].tolist()
return ret_idxs, scores

View File

@@ -52,13 +52,11 @@ class FirestoreChatMessageHistory(BaseChatMessageHistory):
try:
import firebase_admin
from firebase_admin import firestore
except ImportError as e:
logger.error(
"Failed to import Firebase and Firestore: %s. "
"Make sure to install the 'firebase-admin' module.",
e,
except ImportError:
raise ImportError(
"Could not import firebase-admin python package. "
"Please install it with `pip install firebase-admin`."
)
raise e
# For multiple instances, only initialize the app once.
try:

View File

@@ -25,7 +25,7 @@ class RedisChatMessageHistory(BaseChatMessageHistory):
try:
import redis
except ImportError:
raise ValueError(
raise ImportError(
"Could not import redis python package. "
"Please install it with `pip install redis`."
)

View File

@@ -91,7 +91,7 @@ class RedisEntityStore(BaseEntityStore):
try:
import redis
except ImportError:
raise ValueError(
raise ImportError(
"Could not import redis python package. "
"Please install it with `pip install redis`."
)

View File

@@ -74,15 +74,14 @@ def _load_examples(config: dict) -> dict:
def _load_output_parser(config: dict) -> dict:
"""Load output parser."""
if "output_parsers" in config:
if config["output_parsers"] is not None:
_config = config["output_parsers"]
output_parser_type = _config["_type"]
if output_parser_type == "regex_parser":
output_parser = RegexParser(**_config)
else:
raise ValueError(f"Unsupported output parser {output_parser_type}")
config["output_parsers"] = output_parser
if "output_parser" in config and config["output_parser"]:
_config = config.pop("output_parser")
output_parser_type = _config.pop("_type")
if output_parser_type == "regex_parser":
output_parser = RegexParser(**_config)
else:
raise ValueError(f"Unsupported output parser {output_parser_type}")
config["output_parser"] = output_parser
return config

View File

@@ -23,30 +23,6 @@ class PromptTemplate(StringPromptTemplate):
from langchain import PromptTemplate
prompt = PromptTemplate(input_variables=["foo"], template="Say {foo}")
# Instantiate from a string template
prompt = PromptTemplate.from_string(
"Say {foo}"
)
# With partial variables
prompt = PromptTemplate.from_string(
"Say {foo} {bar}",
partial_variables={"foo": "hello world"}
)
prompt.format(bar="!!")
# Alternatively, from_template class method
prompt = PromptTemplate.from_template(
"This is a {foo}, {bar} test."
partial_variables={"bar": "!!"}
)
# Which also supports jinja2 templates
prompt = PromptTemplate.from_template(
jinja2_template, template_format="jinja2"
)
"""
input_variables: List[str]
@@ -140,7 +116,6 @@ class PromptTemplate(StringPromptTemplate):
template_file: The path to the file containing the prompt template.
input_variables: A list of variable names the final prompt template
will expect.
Returns:
The prompt loaded from the file.
"""
@@ -170,22 +145,6 @@ class PromptTemplate(StringPromptTemplate):
input_variables=list(sorted(input_variables)), template=template, **kwargs
)
@classmethod
def from_string(cls, string: str, **partial_variables: Any) -> PromptTemplate:
"""Load a prompt template from a python string.
A helper method around from_template that is streamlined specifically
for python string templates (e.g., "say {foo}").
Args:
string: a python string to use as the template (e.g., "say {foo}")
partial_variables: named arguments to use as partials (e.g., {"foo": "bar"})
Returns:
PromptTemplate instance
"""
return cls.from_template(string, partial_variables=partial_variables)
# For backwards compatibility.
Prompt = PromptTemplate

View File

@@ -41,7 +41,7 @@ class CohereRerank(BaseDocumentCompressor):
values["client"] = cohere.Client(cohere_api_key)
except ImportError:
raise ValueError(
raise ImportError(
"Could not import cohere python package. "
"Please install it with `pip install cohere`."
)

Some files were not shown because too many files have changed in this diff Show More