Compare commits

..

103 Commits

Author SHA1 Message Date
Bagatur
cdfe2c96c5 bump 263 (#9156) 2023-08-12 12:36:44 -07:00
Leonid Ganeline
19f504790e docstrings: document_loaders consitency 2 (#9148)
This is Part 2. See #9139 (Part 1).
2023-08-11 16:25:40 -07:00
Harrison Chase
1b58460fe3 update keys for chain (#5164)
Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-08-11 16:25:13 -07:00
Eugene Yurtsev
aca8cb5fba API Reference: Do not document private modules (#9042)
This PR prevents documentation of private modules in the API reference
2023-08-11 15:58:14 -07:00
胡亮
7edf4ca396 Support multi gpu inference for HuggingFaceEmbeddings (#4732)
Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-08-11 15:55:44 -07:00
UmerHA
8aab39e3ce Added SmartGPT workflow (issue #4463) (#4816)
# Added SmartGPT workflow by providing SmartLLM wrapper around LLMs
Edit:
As @hwchase17 suggested, this should be a chain, not an LLM. I have
adapted the PR.

It is used like this:
```
from langchain.prompts import PromptTemplate
from langchain.chains import SmartLLMChain
from langchain.chat_models import ChatOpenAI

hard_question = "I have a 12 liter jug and a 6 liter jug. I want to measure 6 liters. How do I do it?"
hard_question_prompt = PromptTemplate.from_template(hard_question)

llm = ChatOpenAI(model_name="gpt-4")
prompt = PromptTemplate.from_template(hard_question)
chain = SmartLLMChain(llm=llm, prompt=prompt, verbose=True)

chain.run({})
```


Original text: 
Added SmartLLM wrapper around LLMs to allow for SmartGPT workflow (as in
https://youtu.be/wVzuvf9D9BU). SmartLLM can be used wherever LLM can be
used. E.g:

```
smart_llm = SmartLLM(llm=OpenAI())
smart_llm("What would be a good company name for a company that makes colorful socks?")
```
or
```
smart_llm = SmartLLM(llm=OpenAI())
prompt = PromptTemplate(
    input_variables=["product"],
    template="What is a good name for a company that makes {product}?",
)
chain = LLMChain(llm=smart_llm, prompt=prompt)
chain.run("colorful socks")
```

SmartGPT consists of 3 steps:

1. Ideate - generate n possible solutions ("ideas") to user prompt
2. Critique - find flaws in every idea & select best one
3. Resolve - improve upon best idea & return it

Fixes #4463

## Who can review?

Community members can review the PR once tests pass. Tag
maintainers/contributors who might be interested:

- @hwchase17
- @agola11

Twitter: [@UmerHAdil](https://twitter.com/@UmerHAdil) | Discord:
RicChilligerDude#7589

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-08-11 15:44:27 -07:00
Lucas Pickup
1d3735a84c Ensure deployment_id is set to provided deployment, required for Azure OpenAI. (#5002)
# Ensure deployment_id is set to provided deployment, required for Azure
OpenAI.
---------

Co-authored-by: Lucas Pickup <lupickup@microsoft.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-08-11 15:43:01 -07:00
Bagatur
45741bcc1b Bagatur/vectara nit (#9140)
Co-authored-by: Ofer Mendelevitch <ofer@vectara.com>
2023-08-11 15:32:03 -07:00
Dominick DEV
9b64932e55 Add LangChain utility for real-time crypto exchange prices (#4501)
This commit adds the LangChain utility which allows for the real-time
retrieval of cryptocurrency exchange prices. With LangChain, users can
easily access up-to-date pricing information by running the command
".run(from_currency, to_currency)". This new feature provides a
convenient way to stay informed on the latest exchange rates and make
informed decisions when trading crypto.


---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-08-11 14:45:06 -07:00
Joshua Sundance Bailey
eaa505fb09 Create ArcGISLoader & example notebook (#8873)
- Description: Adds the ArcGISLoader class to
`langchain.document_loaders`
  - Allows users to load data from ArcGIS Online, Portal, and similar
- Users can authenticate with `arcgis.gis.GIS` or retrieve public data
anonymously
  - Uses the `arcgis.features.FeatureLayer` class to retrieve the data
  - Defines the most relevant keywords arguments and accepts `**kwargs`
- Dependencies: Using this class requires `arcgis` and, optionally,
`bs4.BeautifulSoup`.

Tagging maintainers:
  - DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-08-11 14:33:40 -07:00
Bagatur
e21152358a fix (#9145) 2023-08-11 13:58:23 -07:00
Leonid Ganeline
edb585228d docstrings: document_loaders consitency (#9139)
Formatted docstrings from different formats to consistent format, lile:
>Loads processed docs from Docugami.
"Load from `Docugami`."

>Loader that uses Unstructured to load HTML files.
"Load `HTML` files using `Unstructured`."

>Load documents from a directory.
"Load from a directory."
 
- `Load` - no `Loads`
- DocumentLoader always loads Documents, so no more
"documents/docs/texts/ etc"
- integrated systems and APIs enclosed in backticks,
2023-08-11 13:09:31 -07:00
Aashish Saini
0aabded97f Updating interactive walkthrough link in index.md to resolve 404 error (#9063)
Updated interactive walkthrough link in index.md to resolve 404 error.
Also, expressing deep gratitude to LangChain library developers for
their exceptional efforts 🥇 .

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-08-11 13:08:56 -07:00
Markus Schiffer
00bf472265 Fix for SVM retriever discarding document metadata (#9141)
As stated in the title the SVM retriever discarded the metadata of
passed in docs. This code fixes that. I also added one unit test that
should test that.
---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-08-11 13:08:17 -07:00
Bagatur
bace17e0aa rm integration deps (#9142) 2023-08-11 12:43:08 -07:00
Eugene Yurtsev
44bc89b7bf Support a few list like operations on ChatPromptTemplate (#9077)
Make it easier to work with chat prompt template
2023-08-11 14:49:51 -04:00
Hai The Dude
e4418d1b7e Added new use case docs for Web Scraping, Chromium loader, BS4 transformer (#8732)
- Description: Added a new use case category called "Web Scraping", and
a tutorial to scrape websites using OpenAI Functions Extraction chain to
the docs.
  - Tag maintainer:@baskaryan @hwchase17 ,
- Twitter handle: https://www.linkedin.com/in/haiphunghiem/ (I'm on
LinkedIn mostly)

---------

Co-authored-by: Lance Martin <lance@langchain.dev>
2023-08-11 11:46:59 -07:00
sseide
6cb763507c add basic support for redis cluster server (#9128)
This change updates the central utility class to recognize a Redis
cluster server after connection and returns an new cluster aware Redis
client. The "normal" Redis client would not be able to talk to a cluster
node because keys might be stored on other shards of the Redis cluster
and therefor not readable or writable.

With this patch clients do not need to know what Redis server it is,
they just connect though the same API calls for standalone and cluster
server.

There are no dependencies added due to this MR.

Remark - with current redis-py client library (4.6.0) a cluster cannot
be used as VectorStore. It can be used for other use-cases. There is a
bug / missing feature(?) in the Redis client breaking the VectorStore
implementation. I opened an issue at the client library too
(redis/redis-py#2888) to fix this. As soon as this is fixed in
`redis-py` library it should be usable there too.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-08-11 11:37:44 -07:00
David Duong
6d03f8b5d8 Add serialisable support for Replicate (#8525) 2023-08-11 11:35:21 -07:00
niklub
16af5f8690 Add LabelStudio integration (#8880)
This PR introduces [Label Studio](https://labelstud.io/) integration
with LangChain via `LabelStudioCallbackHandler`:

- sending data to the Label Studio instance
- labeling dataset for supervised LLM finetuning
- rating model responses
- tracking and displaying chat history
- support for custom data labeling workflow

### Example

```
chat_llm = ChatOpenAI(callbacks=[LabelStudioCallbackHandler(mode="chat")])
chat_llm([
    SystemMessage(content="Always use emojis in your responses."),
        HumanMessage(content="Hey AI, how's your day going?"),
    AIMessage(content="🤖 I don't have feelings, but I'm running smoothly! How can I help you today?"),
        HumanMessage(content="I'm feeling a bit down. Any advice?"),
    AIMessage(content="🤗 I'm sorry to hear that. Remember, it's okay to seek help or talk to someone if you need to. 💬"),
        HumanMessage(content="Can you tell me a joke to lighten the mood?"),
    AIMessage(content="Of course! 🎭 Why did the scarecrow win an award? Because he was outstanding in his field! 🌾"),
        HumanMessage(content="Haha, that was a good one! Thanks for cheering me up."),
    AIMessage(content="Always here to help! 😊 If you need anything else, just let me know."),
        HumanMessage(content="Will do! By the way, can you recommend a good movie?"),
])
```

<img width="906" alt="image"
src="https://github.com/langchain-ai/langchain/assets/6087484/0a1cf559-0bd3-4250-ad96-6e71dbb1d2f3">


### Dependencies
- [label-studio](https://pypi.org/project/label-studio/)
- [label-studio-sdk](https://pypi.org/project/label-studio-sdk/)

https://twitter.com/labelstudiohq

---------

Co-authored-by: nik <nik@heartex.net>
2023-08-11 11:24:10 -07:00
Bagatur
8cb2594562 Bagatur/dingo (#9079)
Co-authored-by: gary <1625721671@qq.com>
2023-08-11 10:54:45 -07:00
Jacques Arnoux
926c64da60 Fix web research retriever for unknown links in results (#9115)
Fixes an issue with web research retriever for unknown links in results.
This is currently making the retrieve crash sometimes.

@rlancemartin
2023-08-11 10:50:37 -07:00
Manuel Soria
31cfc00845 Code understanding use case (#8801)
Code understanding docs

---------

Co-authored-by: Manuel Soria <manuel.soria@greyscaleai.com>
Co-authored-by: Lance Martin <lance@langchain.dev>
2023-08-11 10:16:05 -07:00
Alvaro Bartolome
f7ae183f40 ArgillaCallbackHandler to properly use default values for api_url and api_key (#9113)
As of the recent PR at #9043, after some testing we've realised that the
default values were not being used for `api_key` and `api_url`. Besides
that, the default for `api_key` was set to `argilla.apikey`, but since
the default values are intended for people using the Argilla Quickstart
(easy to run and setup), the defaults should be instead `owner.apikey`
if using Argilla 1.11.0 or higher, or `admin.apikey` if using a lower
version of Argilla.

Additionally, we've removed the f-string replacements from the
docstrings.

---------

Co-authored-by: Gabriel Martin <gabriel@argilla.io>
2023-08-11 09:37:06 -07:00
Bagatur
0e5d09d0da dalle nb fix (#9125) 2023-08-11 08:21:48 -07:00
Francisco Ingham
9249d305af tagging docs refactor (#8722)
refactor of tagging use case according to new format

---------

Co-authored-by: Lance Martin <lance@langchain.dev>
2023-08-11 08:06:07 -07:00
Bagatur
01ef786e7e bump 262 (#9108) 2023-08-11 01:29:07 -07:00
Bagatur
3b754b5461 Bagatur/filter metadata (#9015)
Co-authored-by: Matt Robinson <mrobinson@unstructuredai.io>
2023-08-11 01:10:00 -07:00
Aayush Shah
a429145420 Minor grammatical error (#9102)
Have corrected a grammatical error in:
https://python.langchain.com/docs/modules/model_io/models/llms/ document
😄
2023-08-11 01:01:40 -07:00
Kim Minjong
7f0e847c13 Update pydantic format instruction prompt (#9095)
- remove unopened bracket
2023-08-11 00:22:13 -07:00
Ashutosh Sanzgiri
991b448dfc minor edits (#9093)
Description:

Minor edit to PR#845

Thanks!
2023-08-10 23:40:36 -07:00
Bagatur
3ab4e21579 fix json tool (#9096) 2023-08-10 23:39:25 -07:00
Sam Groenjes
2184e3a400 Fix IndexError when input_list is Empty in prep_prompts (#5769)
This MR corrects the IndexError arising in prep_prompts method when no
documents are returned from a similarity search.

Fixes #1733 
Co-authored-by: Sam Groenjes <sam.groenjes@darkwolfsolutions.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-08-10 22:50:39 -07:00
Chenyu Zhao
c0acbdca1b Update Fireworks model names (#9085) 2023-08-10 19:23:42 -07:00
Charles Lanahan
a2588d6c57 Update openai embeddings notebook with correct embedding model in section 2 (#5831)
In second section it looks like a copy/paste from the first section and
doesn't include the specific embedding model mentioned in the example so
I added it for clarity.
---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-08-10 19:02:10 -07:00
Bagatur
b80e3825a6 Bagatur/pinecone by vector (#9087)
Co-authored-by: joseph <joe@outverse.com>
2023-08-10 18:28:55 -07:00
Nikhil Kumar
6abb2c2c08 Buffer method of ConversationTokenBufferMemory should be able to return messages as string (#7057)
### Description:
`ConversationBufferTokenMemory` should have a simple way of returning
the conversation messages as a string.

Previously to complete this, you would only have the option to return
memory as an array through the buffer method and call
`get_buffer_string` by importing it from `langchain.schema`, or use the
`load_memory_variables` method and key into `self.memory_key`.

### Maintainer
@hwchase17

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-08-10 18:17:22 -07:00
William FH
57dd4daa9a Add string example mapper (#9086)
Now that we accept any runnable or arbitrary function to evaluate, we
don't always look up the input keys. If an evaluator requires
references, we should try to infer if there's one key present. We only
have delayed validation here but it's better than nothing
2023-08-10 17:07:02 -07:00
Josh Phillips
5fc07fa524 change id column type to uuid to match function (#7456)
The table creation process in these examples commands do not match what
the recently updated functions in these example commands is looking for.
This change updates the type in the table creation command.
Issue Number for my report of the doc problem #7446
@rlancemartin and @eyurtsev I believe this is your area
Twitter: @j1philli

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-08-10 16:57:19 -07:00
Bidhan Roy
02430e25b6 BagelDB (bageldb.ai), VectorStore integration. (#8971)
- **Description**: [BagelDB](bageldb.ai) a collaborative vector
database. Integrated the bageldb PyPi package with langchain with
related tests and code.

  - **Issue**: Not applicable.
  - **Dependencies**: `betabageldb` PyPi package.
  - **Tag maintainer**: @rlancemartin, @eyurtsev, @baskaryan
  - **Twitter handle**: bageldb_ai (https://twitter.com/BagelDB_ai)
  
We ran `make format`, `make lint` and `make test` locally.

Followed the contribution guideline thoroughly
https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md

---------

Co-authored-by: Towhid1 <nurulaktertowhid@gmail.com>
2023-08-10 16:48:36 -07:00
DJ Atha
ee52482db8 Fix issue 7445 (#7635)
Description: updated BabyAGI examples and experimental to append the
iteration to the result id to fix error storing data to vectorstore.
Issue: 7445
Dependencies: no
Tag maintainer: @eyurtsev
This fix worked for me locally. Happy to take some feedback and iterate
on a better solution. I was considering appending a uuid instead but
didn't want to over complicate the example.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-08-10 16:29:31 -07:00
Harrison Chase
bb6fbf4c71 openai adapters (#8988)
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
Co-authored-by: Nuno Campos <nuno@boringbits.io>
2023-08-10 16:08:50 -07:00
Harrison Chase
45f0f9460a add async for python repl (#9080) 2023-08-10 16:07:06 -07:00
Neil Murphy
105c787e5a Add convenience methods to ConversationBufferMemory and ConversationB… (#8981)
Add convenience methods to `ConversationBufferMemory` and
`ConversationBufferWindowMemory` to get buffer either as messages or as
string.

Helps when `return_messages` is set to `True` but you want access to the
messages as a string, and vice versa.

@hwchase17

One use case: Using a `MultiPromptRouter` where `default_chain` is
`ConversationChain`, but destination chains are `LLMChains`. Injecting
chat memory into prompts for destination chains prints a stringified
`List[Messages]` in the prompt, which creates a lot of noise. These
convenience methods allow caller to choose either as needed.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-08-10 15:45:30 -07:00
Zend
6221eb5974 Recursive url loader w/ test (#8813)
Description: Due to some issue on the test, this is a separate PR with
the test for #8502

Tag maintainer: @rlancemartin

---------

Co-authored-by: Lance Martin <lance@langchain.dev>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-08-10 14:50:31 -07:00
Junlin Zhou
cb5fb751e9 Enhance regex of structured_chat agents' output parser (#8965)
Current regex only extracts agent's action between '` ``` ``` `', this
commit will extract action between both '` ```json ``` `' and '` ``` ```
`'

This is very similar to #7511 
Co-authored-by: zjl <junlinzhou@yzbigdata.com>
2023-08-10 14:26:07 -07:00
Bagatur
16bd328aab Use Embeddings in pinecone (#8982)
cc @eyurtsev @olivier-lacroix @jamescalam 

redo of #2741
2023-08-10 14:22:41 -07:00
Piyush Jain
8eea46ed0e Bedrock embeddings async methods (#9024)
## Description
This PR adds the `aembed_query` and `aembed_documents` async methods for
improving the embeddings generation for large documents. The
implementation uses asyncio tasks and gather to achieve concurrency as
there is no bedrock async API in boto3.

### Maintainers
@agola11 
@aarora79  

### Open questions
To avoid throttling from the Bedrock API, should there be an option to
limit the concurrency of the calls?
2023-08-10 14:21:03 -07:00
Eugene Yurtsev
67ca187560 Fix incorrect code blocks in documentation (#9060)
Fixes incorrect code block syntax in doc strings.
2023-08-10 14:13:42 -07:00
Eugene Yurtsev
46f3428cb3 Fix more incorrect code blocks in doc strings (#9073)
Fix 2 more incorrect code blocks in strings
2023-08-10 13:49:15 -07:00
Nicolas
e3fb11bc10 docs: (Mendable Search) Fixes stuck when tabbing out issue (#9074)
This fixes Mendable not completing when tabbing out and fixes the
duplicate message issue as well.
2023-08-10 13:46:06 -07:00
Bagatur
1edead28b8 Add docs community page (#8992)
Co-authored-by: briannawolfson <brianna.wolfson@gmail.com>
2023-08-10 13:41:35 -07:00
Eugene Yurtsev
a5a4c53280 RedisStore: Update init and Documentation updates (#9044)
* Update Redis Store to support init from parameters
* Update notebook to show how to use redis store, and some fixes in
documentation
2023-08-10 15:30:29 -04:00
Bagatur
80b98812e1 Update README.md 2023-08-10 12:01:20 -07:00
Leonid Ganeline
fcbbddedae ArxivLoader fix for issue 9046 (#9061)
Fixed #9046 
Added ut-s for this fix.
 @eyurtsev
2023-08-10 14:59:39 -04:00
Mike Lambert
e94a5d753f Move from test to supported claude-instant-1 model (#9066)
Moves from "test" model to "claude-instant-1" model which is supported
and has actual capacity
2023-08-10 11:57:28 -07:00
Eugene Yurtsev
b7bc8ec87f Add excludes to FileSystemBlobLoader (#9064)
Add option to specify exclude patterns.

https://github.com/langchain-ai/langchain/discussions/9059
2023-08-10 14:56:58 -04:00
Eugene Yurtsev
6c70f491ba ChatPromptTemplate pending deprecation proposal (#9004)
Pending deprecations for ChatPromptTemplate proposals
2023-08-10 14:40:55 -04:00
Bagatur
f3f5853e9f update api ref exampels (#9065)
manually update for now
2023-08-10 11:28:24 -07:00
TRY-ER
2431eca700 Agent vector store tool doc (#9029)
I was initially confused weather to use create_vectorstore_agent or
create_vectorstore_router_agent due to lack of documentation so I
created a simple documentation for each of the function about their
different usecase.
Replace this comment with:
- Description: Added the doc_strings in create_vectorstore_agent and
create_vectorstore_router_agent to point out the difference in their
usecase
  - Tag maintainer: @rlancemartin, @eyurtsev

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-08-10 11:13:12 -07:00
Bagatur
641cb80c9d update pr temp (#9062) 2023-08-10 11:10:06 -07:00
Alvaro Bartolome
08a0741d82 Update ArgillaCallbackHandler as of latest argilla release (#9043)
Hi @agola11, or whoever is reviewing this PR 😄 

## What's in this PR?

As of the latest Argilla release, we'll change and refactor some things
to make some workflows easier, one of those is how everything's pushed
to Argilla, so that now there's no need to call `push_to_argilla` over a
`FeedbackDataset` when either `push_to_argilla` is called for the first
time, or `from_argilla` is called; among others.

We also add some class variables to make sure those are easy to update
in case we update those internally in the future, also to make the
`warnings.warn` message lighter from the code view.

P.S. Regarding the Twitter/X mention feel free to do so at either
https://twitter.com/argilla_io or https://twitter.com/alvarobartt, or
both if applicable, otherwise, just the first Twitter/X handle.
2023-08-10 10:59:46 -07:00
Blake (Yung Cher Ho)
8d351bfc20 Takeoff integration (#9045)
## Description:
This PR adds the Titan Takeoff Server to the available LLMs in
LangChain.

Titan Takeoff is an inference server created by
[TitanML](https://www.titanml.co/) that allows you to deploy large
language models locally on your hardware in a single command. Most
generative model architectures are included, such as Falcon, Llama 2,
GPT2, T5 and many more.

Read more about Titan Takeoff here:
-
[Blog](https://medium.com/@TitanML/introducing-titan-takeoff-6c30e55a8e1e)
- [Docs](https://docs.titanml.co/docs/titan-takeoff/getting-started)

#### Testing
As Titan Takeoff runs locally on port 8000 by default, no network access
is needed. Responses are mocked for testing.

- [x] Make Lint
- [x] Make Format
- [x] Make Test

#### Dependencies
No new dependencies are introduced. However, users will need to install
the titan-iris package in their local environment and start the Titan
Takeoff inferencing server in order to use the Titan Takeoff
integration.

Thanks for your help and please let me know if you have any questions.

cc: @hwchase17 @baskaryan
2023-08-10 10:56:06 -07:00
Nuno Campos
3bdc273ab3 Implement .transform() in RunnablePassthrough() (#9032)
- This ensures passthrough doesnt break streaming
---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-08-10 10:41:19 -07:00
Bagatur
206f809366 fix sched ci (more) (#9056) 2023-08-10 10:39:29 -07:00
Aashish Saini
8a320e55a0 Corrected grammatical errors and spelling mistakes in the index.mdx file. (#9026)
Expressing gratitude to the creator for crafting this remarkable
application. 🙌, Would like to Enhance grammar and spelling in the
documentation for a polished reader experience.

Your feedback is valuable as always 

@baskaryan , @hwchase17 , @eyurtsev
2023-08-10 10:17:09 -07:00
Bagatur
e5db8a16c0 Bagatur/fix sched (#9054) 2023-08-10 09:34:44 -07:00
Bagatur
e162fd418a fix sched ci (#9053) 2023-08-10 09:29:46 -07:00
Ismail Pelaseyed
abb1264edf Fix issue with Metaphor Search Tool throwing error on missing keys in API response (#9051)
- Description: Fixes an issue with Metaphor Search Tool throwing when
missing keys in API response.
  - Issue: #9048 
  - Tag maintainer: @hinthornw @hwchase17 
  - Twitter handle: @pelaseyed
2023-08-10 09:07:00 -07:00
Eugene Yurtsev
5e05ba2140 Add embeddings cache (#8976)
This PR adds the ability to temporarily cache or persistently store
embeddings. 

A notebook has been included showing how to set up the cache and how to
use it with a vectorstore.
2023-08-10 11:15:30 -04:00
Bagatur
6e14f9548b bump 261 (#9041) 2023-08-10 07:59:27 -07:00
Lance Martin
2380492c8e API use case (#8546)
Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-08-10 07:52:54 -07:00
Eugene Yurtsev
d21333d710 Add redis storage (#8980)
Add a redis implementation of a BaseStore
2023-08-10 10:48:35 -04:00
Luca Foppiano
dfb93dd2b5 Improved grobid documentation (#9025)
- Description: Improvement in the Grobid loader documentation, typos and
suggesting to use the docker image instead of installing Grobid in local
(the documentation was also limited to Mac, while docker allow running
in any platform)
  - Tag maintainer: @rlancemartin, @eyurtsev
  - Twitter handle: @whitenoise
2023-08-10 10:47:22 -04:00
Hiroshige Umino
2c7297d243 Fix a broken code block display (#9034)
- Description: Fix a broken code block in this page:
https://python.langchain.com/docs/modules/model_io/prompts/prompt_templates/
- Issue: N/A
- Dependencies: None
- Tag maintainer: @baskaryan
- Twitter handle: yaotti
2023-08-10 10:39:01 -04:00
Bagatur
434a96415b make runnable dir (#9016)
Co-authored-by: Nuno Campos <nuno@boringbits.io>
2023-08-10 08:56:37 +01:00
Nuno Campos
c7a489ae0d Small improvements for tracer and debug output of runnables (#8683)
<!-- Thank you for contributing to LangChain!

Replace this comment with:
  - Description: a description of the change, 
  - Issue: the issue # it fixes (if applicable),
  - Dependencies: any dependencies required for this change,
- Tag maintainer: for a quicker response, tag the relevant maintainer
(see below),
- Twitter handle: we announce bigger features on Twitter. If your PR
gets announced and you'd like a mention, we'll gladly shout you out!

Please make sure you're PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
  2. an example notebook showing its use.

Maintainer responsibilities:
  - General / Misc / if you don't know who to tag: @baskaryan
  - DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev
  - Models / Prompts: @hwchase17, @baskaryan
  - Memory: @hwchase17
  - Agents / Tools / Toolkits: @hinthornw
  - Tracing / Callbacks: @agola11
  - Async: @agola11

If no one reviews your PR within a few days, feel free to @-mention the
same people again.

See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
 -->
2023-08-10 07:24:12 +01:00
EricFan
618cf5241e Open file in UTF-8 encoding (#6919) (#8943)
FileCallbackHandler cannot handle some language, for example: Chinese. 
Open file using UTF-8 encoding can fix it.
@agola11
  
**Issue**: #6919 
**Dependencies**: NO dependencies,

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-08-09 17:54:21 -07:00
colegottdank
f4a47ec717 Add optional model kwargs to ChatAnthropic to allow overrides (#9013)
---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-08-09 17:34:00 -07:00
Piyush Jain
3b51817706 Updating port and ssl use in sample notebook (#8995)
## Description
This PR updates the sample notebook to use the default port (8182) and
the ssl for the Neptune database connection.
2023-08-09 17:08:48 -07:00
Kaizen
bbbd2b076f DirectoryLoader slicing (#8994)
DirectoryLoader can now return a random sample of files in a directory.
Parameters added are:
sample_size
randomize_sample
sample_seed


@rlancemartin, @eyurtsev

---------

Co-authored-by: Andrew Oseen <amovfx@protonmail.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-08-09 16:05:16 -07:00
IanRogers-101Ways
d248481f13 skip over empty google spreadsheets (#8974)
- Description: Allow GoogleDriveLoader to handle empty spreadsheets  
- Issue: Currently GoogleDriveLoader will crash if it tries to load a
spreadsheet with an empty sheet
  - Dependencies: n/a
  - Tag maintainer: @rlancemartin, @eyurtsev
2023-08-09 16:05:02 -07:00
Eugene Yurtsev
efa02ed768 Suppress divide by zero wranings for cosine similarity (#9006)
Suppress run time warnings for divide by zero as the downstream code
handles the scenario (handling inf and nan)
2023-08-09 15:56:51 -07:00
Leonid Ganeline
5454591b0a docstrings cleanup (#8993)
Added/Updated docstrings

 @baskaryan
2023-08-09 15:49:06 -07:00
Massimiliano Pronesti
c72da53c10 Add logprobs to SamplingParameters in vllm (#9010)
This PR aims at amending #8806 , that I opened a few days ago, adding
the extra `logprobs` parameter that I accidentally forgot
2023-08-09 15:48:29 -07:00
Bagatur
8dd071ad08 import airbyte loaders (#9009) 2023-08-09 14:51:15 -07:00
Bagatur
96d064e305 bump 260 (#9002) 2023-08-09 13:40:49 -07:00
Michael Shen
c2f46b2cdb Fixed wrong paper reference (#8970)
The ReAct reference references to MRKL paper. Corrected so that it
points to the actual ReAct paper #8964.
2023-08-09 16:17:46 -04:00
Nuno Campos
808248049d Implement a router for openai functions (#8589) 2023-08-09 21:17:04 +01:00
Eugene Yurtsev
a6e6e9bb86 Fix airbyte loader (#8998)
Fix airbyte loader

https://github.com/langchain-ai/langchain/issues/8996
2023-08-09 16:13:06 -04:00
William FH
90579021f8 Update Key Check (#8948)
In eval loop. It needn't be done unless you are creating the
corresponding evaluators
2023-08-09 12:33:00 -07:00
Jerzy Czopek
539672a7fd Feature/fix azureopenai model mappings (#8621)
This pull request aims to ensure that the `OpenAICallbackHandler` can
properly calculate the total cost for Azure OpenAI chat models. The
following changes have resolved this issue:

- The `model_name` has been added to the ChatResult llm_output. Without
this, the default values of `gpt-35-turbo` were applied. This was
causing the total cost for Azure OpenAI's GPT-4 to be significantly
inaccurate.
- A new parameter `model_version` has been added to `AzureChatOpenAI`.
Azure does not include the model version in the response. With the
addition of `model_name`, this is not a significant issue for GPT-4
models, but it's an issue for GPT-3.5-Turbo. Version 0301 (default) of
GPT-3.5-Turbo on Azure has a flat rate of 0.002 per 1k tokens for both
prompt and completion. However, version 0613 introduced a split in
pricing for prompt and completion tokens.
- The `OpenAICallbackHandler` implementation has been updated with the
proper model names, versions, and cost per 1k tokens.

Unit tests have been added to ensure the functionality works as
expected; the Azure ChatOpenAI notebook has been updated with examples.

Maintainers: @hwchase17, @baskaryan

Twitter handle: @jjczopek

---------

Co-authored-by: Jerzy Czopek <jerzy.czopek@avanade.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-08-09 10:56:15 -07:00
Bagatur
269f85b7b7 scheduled gha fix (#8977) 2023-08-09 09:44:25 -07:00
shibuiwilliam
3adb1e12ca make trajectory eval chain stricter and add unit tests (#8909)
- update trajectory eval logic to be stricter
- add tests to trajectory eval chain
2023-08-09 10:57:18 -04:00
Nuno Campos
b8df15cd64 Adds transform support for runnables (#8762)
<!-- Thank you for contributing to LangChain!

Replace this comment with:
  - Description: a description of the change, 
  - Issue: the issue # it fixes (if applicable),
  - Dependencies: any dependencies required for this change,
- Tag maintainer: for a quicker response, tag the relevant maintainer
(see below),
- Twitter handle: we announce bigger features on Twitter. If your PR
gets announced and you'd like a mention, we'll gladly shout you out!

Please make sure you're PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
  2. an example notebook showing its use.

Maintainer responsibilities:
  - General / Misc / if you don't know who to tag: @baskaryan
  - DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev
  - Models / Prompts: @hwchase17, @baskaryan
  - Memory: @hwchase17
  - Agents / Tools / Toolkits: @hinthornw
  - Tracing / Callbacks: @agola11
  - Async: @agola11

If no one reviews your PR within a few days, feel free to @-mention the
same people again.

See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
 -->

---------

Co-authored-by: jacoblee93 <jacoblee93@gmail.com>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-08-09 12:34:23 +01:00
Harrison Chase
4d72288487 async output parser (#8894)
Co-authored-by: Nuno Campos <nuno@boringbits.io>
2023-08-09 08:25:38 +01:00
Bagatur
3c6eccd701 bump 259 (#8951) 2023-08-09 00:07:47 -07:00
Harrison Chase
7de6a1b78e parent document retriever (#8941) 2023-08-08 22:39:08 -07:00
arjunbansal
a2681f950d add instructions on integrating Log10 (#8938)
- Description: Instruction for integration with Log10: an [open
source](https://github.com/log10-io/log10) proxiless LLM data management
and application development platform that lets you log, debug and tag
your Langchain calls
  - Tag maintainer: @baskaryan
  - Twitter handle: @log10io @coffeephoenix

Several examples showing the integration included
[here](https://github.com/log10-io/log10/tree/main/examples/logging) and
in the PR
2023-08-08 19:15:31 -07:00
Aarav Borthakur
3f64b8a761 Integrate Rockset as a chat history store (#8940)
Description: Adds Rockset as a chat history store
Dependencies: no changes
Tag maintainer: @hwchase17

This PR passes linting and testing. 

I added a test for the integration and an example notebook showing its
use.
2023-08-08 18:54:07 -07:00
Bagatur
0a1be1d501 document lcel fallbacks (#8942) 2023-08-08 18:49:33 -07:00
William FH
e3056340da Add id in error in tracer (#8944) 2023-08-08 18:25:27 -07:00
Molly Cantillon
99b5a7226c Weaviate: adding auth example + fixing spelling in ReadME (#8939)
Added basic auth example to Weaviate notebook @baskaryan
2023-08-08 16:24:17 -07:00
355 changed files with 13375 additions and 6720 deletions

View File

@@ -1,28 +1,20 @@
<!-- Thank you for contributing to LangChain!
Replace this comment with:
Replace this entire comment with:
- Description: a description of the change,
- Issue: the issue # it fixes (if applicable),
- Dependencies: any dependencies required for this change,
- Tag maintainer: for a quicker response, tag the relevant maintainer (see below),
- Twitter handle: we announce bigger features on Twitter. If your PR gets announced and you'd like a mention, we'll gladly shout you out!
Please make sure you're PR is passing linting and testing before submitting. Run `make format`, `make lint` and `make test` to check this locally.
Please make sure your PR is passing linting and testing before submitting. Run `make format`, `make lint` and `make test` to check this locally.
See contribution guidelines for more information on how to write/run tests, lint, etc:
https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on network access,
2. an example notebook showing its use.
2. an example notebook showing its use. These live is docs/extras directory.
Maintainer responsibilities:
- General / Misc / if you don't know who to tag: @baskaryan
- DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev
- Models / Prompts: @hwchase17, @baskaryan
- Memory: @hwchase17
- Agents / Tools / Toolkits: @hinthornw
- Tracing / Callbacks: @agola11
- Async: @agola11
If no one reviews your PR within a few days, feel free to @-mention the same people again.
See contribution guidelines for more information on how to write/run tests, lint, etc: https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
If no one reviews your PR within a few days, please @-mention one of @baskaryan, @eyurtsev, @hwchase17, @rlancemartin.
-->

View File

@@ -1,7 +1,8 @@
name: Scheduled tests
on:
scheduled:
workflow_dispatch: # Allows to trigger the workflow manually in GitHub UI
schedule:
- cron: '0 13 * * *'
env:
@@ -9,6 +10,9 @@ env:
jobs:
build:
defaults:
run:
working-directory: libs/langchain
runs-on: ubuntu-latest
environment: Scheduled testing
strategy:
@@ -26,13 +30,13 @@ jobs:
with:
python-version: ${{ matrix.python-version }}
poetry-version: "1.4.2"
working-directory: libs/langchain
install-command: |
echo "Running scheduled tests, installing dependencies with poetry..."
poetry install -E scheduled_testing
poetry install --with=test_integration
- name: Run tests
env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
run: |
make scheduled_tests
shell: bash
secrets: inherit

View File

@@ -21,7 +21,7 @@ Looking for the JS/TS version? Check out [LangChain.js](https://github.com/hwcha
**Production Support:** As you move your LangChains into production, we'd love to offer more hands-on support.
Fill out [this form](https://airtable.com/appwQzlErAS2qiP0L/shrGtGaVBVAz7NcV2) to share more about what you're building, and our team will get in touch.
## 🚨Breaking Changes for select chains (SQLDatabase) on 7/28
## 🚨Breaking Changes for select chains (SQLDatabase) on 7/28/23
In an effort to make `langchain` leaner and safer, we are moving select chains to `langchain_experimental`.
This migration has already started, but we are remaining backwards compatible until 7/28.

View File

@@ -145,30 +145,36 @@ def _load_package_modules(
package_name = package_path.name
for file_path in package_path.rglob("*.py"):
if not file_path.name.startswith("__"):
relative_module_name = file_path.relative_to(package_path)
# Get the full namespace of the module
namespace = str(relative_module_name).replace(".py", "").replace("/", ".")
# Keep only the top level namespace
top_namespace = namespace.split(".")[0]
if file_path.name.startswith("_"):
continue
try:
module_members = _load_module_members(
f"{package_name}.{namespace}", namespace
relative_module_name = file_path.relative_to(package_path)
if relative_module_name.name.startswith("_"):
continue
# Get the full namespace of the module
namespace = str(relative_module_name).replace(".py", "").replace("/", ".")
# Keep only the top level namespace
top_namespace = namespace.split(".")[0]
try:
module_members = _load_module_members(
f"{package_name}.{namespace}", namespace
)
# Merge module members if the namespace already exists
if top_namespace in modules_by_namespace:
existing_module_members = modules_by_namespace[top_namespace]
_module_members = _merge_module_members(
[existing_module_members, module_members]
)
# Merge module members if the namespace already exists
if top_namespace in modules_by_namespace:
existing_module_members = modules_by_namespace[top_namespace]
_module_members = _merge_module_members(
[existing_module_members, module_members]
)
else:
_module_members = module_members
else:
_module_members = module_members
modules_by_namespace[top_namespace] = _module_members
modules_by_namespace[top_namespace] = _module_members
except ImportError as e:
print(f"Error: Unable to import module '{namespace}' with error: {e}")
except ImportError as e:
print(f"Error: Unable to import module '{namespace}' with error: {e}")
return modules_by_namespace
@@ -222,9 +228,9 @@ Classes
"""
for class_ in classes:
if not class_['is_public']:
if not class_["is_public"]:
continue
if class_["kind"] == "TypedDict":
template = "typeddict.rst"
elif class_["kind"] == "enum":

File diff suppressed because one or more lines are too long

View File

@@ -0,0 +1,54 @@
# Community Navigator
Hi! Thanks for being here. Were lucky to have a community of so many passionate developers building with LangChainwe have so much to teach and learn from each other. Community members contribute code, host meetups, write blog posts, amplify each others work, become each other's customers and collaborators, and so much more.
Whether youre new to LangChain, looking to go deeper, or just want to get more exposure to the world of building with LLMs, this page can point you in the right direction.
- **🦜 Contribute to LangChain**
- **🌍 Meetups, Events, and Hackathons**
- **📣 Help Us Amplify Your Work**
- **💬 Stay in the loop**
# 🦜 Contribute to LangChain
LangChain is the product of over 5,000+ contributions by 1,500+ contributors, and there is ******still****** so much to do together. Here are some ways to get involved:
- **[Open a pull request](https://github.com/langchain-ai/langchain/issues):** wed appreciate all forms of contributionsnew features, infrastructure improvements, better documentation, bug fixes, etc. If you have an improvement or an idea, wed love to work on it with you.
- **[Read our contributor guidelines:](https://github.com/langchain-ai/langchain/blob/bbd22b9b761389a5e40fc45b0570e1830aabb707/.github/CONTRIBUTING.md)** We ask contributors to follow a ["fork and pull request"](https://docs.github.com/en/get-started/quickstart/contributing-to-projects) workflow, run a few local checks for formatting, linting, and testing before submitting, and follow certain documentation and testing conventions.
- **First time contributor?** [Try one of these PRs with the “good first issue” tag](https://github.com/langchain-ai/langchain/contribute).
- **Become an expert:** our experts help the community by answering product questions in Discord. If thats a role youd like to play, wed be so grateful! (And we have some special experts-only goodies/perks we can tell you more about). Send us an email to introduce yourself at hello@langchain.dev and well take it from there!
- **Integrate with LangChain:** if your product integrates with LangChainor aspires towe want to help make sure the experience is as smooth as possible for you and end users. Send us an email at hello@langchain.dev and tell us what youre working on.
- **Become an Integration Maintainer:** Partner with our team to ensure your integration stays up-to-date and talk directly with users (and answer their inquiries) in our Discord. Introduce yourself at hello@langchain.dev if youd like to explore this role.
# 🌍 Meetups, Events, and Hackathons
One of our favorite things about working in AI is how much enthusiasm there is for building together. We want to help make that as easy and impactful for you as possible!
- **Find a meetup, hackathon, or webinar:** you can find the one for you on on our [global events calendar](https://mirror-feeling-d80.notion.site/0bc81da76a184297b86ca8fc782ee9a3?v=0d80342540df465396546976a50cfb3f).
- **Submit an event to our calendar:** email us at events@langchain.dev with a link to your event page! We can also help you spread the word with our local communities.
- **Host a meetup:** If you want to bring a group of builders together, we want to help! We can publicize your event on our event calendar/Twitter, share with our local communities in Discord, send swag, or potentially hook you up with a sponsor. Email us at events@langchain.dev to tell us about your event!
- **Become a meetup sponsor:** we often hear from groups of builders that want to get together, but are blocked or limited on some dimension (space to host, budget for snacks, prizes to distribute, etc.). If youd like to help, send us an email to events@langchain.dev we can share more about how it works!
- **Speak at an event:** meetup hosts are always looking for great speakers, presenters, and panelists. If youd like to do that at an event, send us an email to hello@langchain.dev with more information about yourself, what you want to talk about, and what city youre based in and well try to match you with an upcoming event!
- **Tell us about your LLM community:** If you host or participate in a community that would welcome support from LangChain and/or our team, send us an email at hello@langchain.dev and let us know how we can help.
# 📣 Help Us Amplify Your Work
If youre working on something youre proud of, and think the LangChain community would benefit from knowing about it, we want to help you show it off.
- **Post about your work and mention us:** we love hanging out on Twitter to see what people in the space are talking about and working on. If you tag [@langchainai](https://twitter.com/LangChainAI), well almost certainly see it and can show you some love.
- **Publish something on our blog:** if youre writing about your experience building with LangChain, wed love to post (or crosspost) it on our blog! E-mail hello@langchain.dev with a draft of your post! Or even an idea for something you want to write about.
- **Get your product onto our [integrations hub](https://integrations.langchain.com/):** Many developers take advantage of our seamless integrations with other products, and come to our integrations hub to find out who those are. If you want to get your product up there, tell us about it (and how it works with LangChain) at hello@langchain.dev.
# ☀️ Stay in the loop
Heres where our team hangs out, talks shop, spotlights cool work, and shares what were up to. Wed love to see you there too.
- **[Twitter](https://twitter.com/LangChainAI):** we post about what were working on and what cool things were seeing in the space. If you tag @langchainai in your post, well almost certainly see it, and can snow you some love!
- **[Discord](https://discord.gg/6adMQxSpJS):** connect with with >30k developers who are building with LangChain
- **[GitHub](https://github.com/langchain-ai/langchain):** open pull requests, contribute to a discussion, and/or contribute
- **[Subscribe to our bi-weekly Release Notes](https://6w1pwbss0py.typeform.com/to/KjZB1auB):** a twice/month email roundup of the coolest things going on in our orbit
- **Slack:** if youre building an application in production at your company, wed love to get into a Slack channel together. Fill out [this form](https://airtable.com/appwQzlErAS2qiP0L/shrGtGaVBVAz7NcV2) and well get in touch about setting one up.

View File

@@ -8,9 +8,9 @@ import DocCardList from "@theme/DocCardList";
Building applications with language models involves many moving parts. One of the most critical components is ensuring that the outcomes produced by your models are reliable and useful across a broad array of inputs, and that they work well with your application's other software components. Ensuring reliability usually boils down to some combination of application design, testing & evaluation, and runtime checks.
The guides in this section review the APIs and functionality LangChain provides to help yous better evaluate your applications. Evaluation and testing are both critical when thinking about deploying LLM applications, since production environments require repeatable and useful outcomes.
The guides in this section review the APIs and functionality LangChain provides to help you better evaluate your applications. Evaluation and testing are both critical when thinking about deploying LLM applications, since production environments require repeatable and useful outcomes.
LangChain offers various types of evaluators to help you measure performance and integrity on diverse data, and we hope to encourage the the community to create and share other useful evaluators so everyone can improve. These docs will introduce the evaluator types, how to use them, and provide some examples of their use in real-world scenarios.
LangChain offers various types of evaluators to help you measure performance and integrity on diverse data, and we hope to encourage the community to create and share other useful evaluators so everyone can improve. These docs will introduce the evaluator types, how to use them, and provide some examples of their use in real-world scenarios.
Each evaluator type in LangChain comes with ready-to-use implementations and an extensible API that allows for customization according to your unique requirements. Here are some of the types of evaluators we offer:

View File

@@ -5,8 +5,8 @@ import DocCardList from "@theme/DocCardList";
LangSmith helps you trace and evaluate your language model applications and intelligent agents to help you
move from prototype to production.
Check out the [interactive walkthrough](walkthrough) below to get started.
Check out the [interactive walkthrough](/docs/guides/langsmith/walkthrough) below to get started.
For more information, please refer to the [LangSmith documentation](https://docs.smith.langchain.com/)
<DocCardList />
<DocCardList />

View File

@@ -12,7 +12,7 @@ Here are the agents available in LangChain.
### [Zero-shot ReAct](/docs/modules/agents/agent_types/react.html)
This agent uses the [ReAct](https://arxiv.org/pdf/2205.00445.pdf) framework to determine which tool to use
This agent uses the [ReAct](https://arxiv.org/pdf/2210.03629) framework to determine which tool to use
based solely on the tool's description. Any number of tools can be provided.
This agent requires that a description is provided for each tool.

View File

@@ -1,9 +0,0 @@
---
sidebar_position: 0
---
# API chains
APIChain enables using LLMs to interact with APIs to retrieve relevant information. Construct the chain by providing a question relevant to the provided API documentation.
import Example from "@snippets/modules/chains/popular/api.mdx"
<Example/>

View File

@@ -0,0 +1,9 @@
---
sidebar_position: 3
---
# Web Scraping
Web scraping has historically been a challenging endeavor due to the ever-changing nature of website structures, making it tedious for developers to maintain their scraping scripts. Traditional methods often rely on specific HTML tags and patterns which, when altered, can disrupt data extraction processes.
Enter the LLM-based method for parsing HTML: By leveraging the capabilities of LLMs, and especially OpenAI Functions in LangChain's extraction chain, developers can instruct the model to extract only the desired data in a specified format. This method not only streamlines the extraction process but also significantly reduces the time spent on manual debugging and script modifications. Its adaptability means that even if websites undergo significant design changes, the extraction remains consistent and robust. This level of resilience translates to reduced maintenance efforts, cost savings, and ensures a higher quality of extracted data. Compared to its predecessors, LLM-based approach wins out the web scraping domain by transforming a historically cumbersome task into a more automated and efficient process.

View File

@@ -12,7 +12,7 @@
"@docusaurus/preset-classic": "2.4.0",
"@docusaurus/remark-plugin-npm2yarn": "^2.4.0",
"@mdx-js/react": "^1.6.22",
"@mendable/search": "^0.0.137",
"@mendable/search": "^0.0.150",
"clsx": "^1.2.1",
"json-loader": "^0.5.7",
"process": "^0.11.10",
@@ -3212,9 +3212,9 @@
}
},
"node_modules/@mendable/search": {
"version": "0.0.137",
"resolved": "https://registry.npmjs.org/@mendable/search/-/search-0.0.137.tgz",
"integrity": "sha512-2J2fd5eqToK+mLzrSDA6NAr4F1kfql7QRiHpD7AUJJX0nqpvInhr/mMJKBCUSCv2z76UKCmF5wLuPSw+C90Qdg==",
"version": "0.0.150",
"resolved": "https://registry.npmjs.org/@mendable/search/-/search-0.0.150.tgz",
"integrity": "sha512-Eb5SeAWlMxzEim/8eJ/Ysn01Pyh39xlPBzRBw/5OyOBhti0HVLXk4wd1Fq2TKgJC2ppQIvhEKO98PUcj9dNDFw==",
"dependencies": {
"html-react-parser": "^4.2.0",
"posthog-js": "^1.45.1"

View File

@@ -23,7 +23,7 @@
"@docusaurus/preset-classic": "2.4.0",
"@docusaurus/remark-plugin-npm2yarn": "^2.4.0",
"@mdx-js/react": "^1.6.22",
"@mendable/search": "^0.0.137",
"@mendable/search": "^0.0.150",
"clsx": "^1.2.1",
"json-loader": "^0.5.7",
"process": "^0.11.10",

View File

@@ -75,6 +75,7 @@ module.exports = {
slug: "additional_resources",
},
},
'community'
],
integrations: [
{

Binary file not shown.

After

Width:  |  Height:  |  Size: 471 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 520 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 98 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 117 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 307 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 193 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 111 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 130 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 152 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 172 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 716 KiB

View File

@@ -556,6 +556,14 @@
"source": "/docs/integrations/llamacpp",
"destination": "/docs/integrations/providers/llamacpp"
},
{
"source": "/en/latest/integrations/log10.html",
"destination": "/docs/integrations/providers/log10"
},
{
"source": "/docs/integrations/log10",
"destination": "/docs/integrations/providers/log10"
},
{
"source": "/en/latest/integrations/mediawikidump.html",
"destination": "/docs/integrations/providers/mediawikidump"

View File

@@ -0,0 +1,323 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "700a516b",
"metadata": {},
"source": [
"# OpenAI Adapter\n",
"\n",
"A lot of people get started with OpenAI but want to explore other models. LangChain's integrations with many model providers make this easy to do so. While LangChain has it's own message and model APIs, we've also made it as easy as possible to explore other models by exposing an adapter to adapt LangChain models to the OpenAI api.\n",
"\n",
"At the moment this only deals with output and does not return other information (token counts, stop reasons, etc)."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "6017f26a",
"metadata": {},
"outputs": [],
"source": [
"import openai\n",
"from langchain.adapters import openai as lc_openai"
]
},
{
"cell_type": "markdown",
"id": "b522ceda",
"metadata": {},
"source": [
"## ChatCompletion.create"
]
},
{
"cell_type": "code",
"execution_count": 29,
"id": "1d22eb61",
"metadata": {},
"outputs": [],
"source": [
"messages = [{\"role\": \"user\", \"content\": \"hi\"}]"
]
},
{
"cell_type": "markdown",
"id": "d550d3ad",
"metadata": {},
"source": [
"Original OpenAI call"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "e1d27dfa",
"metadata": {},
"outputs": [],
"source": [
"result = openai.ChatCompletion.create(\n",
" messages=messages, \n",
" model=\"gpt-3.5-turbo\", \n",
" temperature=0\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "012d81ae",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'role': 'assistant', 'content': 'Hello! How can I assist you today?'}"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"result[\"choices\"][0]['message'].to_dict_recursive()"
]
},
{
"cell_type": "markdown",
"id": "db5b5500",
"metadata": {},
"source": [
"LangChain OpenAI wrapper call"
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "87c2d515",
"metadata": {},
"outputs": [],
"source": [
"lc_result = lc_openai.ChatCompletion.create(\n",
" messages=messages, \n",
" model=\"gpt-3.5-turbo\", \n",
" temperature=0\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "c67a5ac8",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'role': 'assistant', 'content': 'Hello! How can I assist you today?'}"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"lc_result[\"choices\"][0]['message']"
]
},
{
"cell_type": "markdown",
"id": "034ba845",
"metadata": {},
"source": [
"Swapping out model providers"
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "7a2c011c",
"metadata": {},
"outputs": [],
"source": [
"lc_result = lc_openai.ChatCompletion.create(\n",
" messages=messages, \n",
" model=\"claude-2\", \n",
" temperature=0, \n",
" provider=\"ChatAnthropic\"\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "f7c94827",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'role': 'assistant', 'content': ' Hello!'}"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"lc_result[\"choices\"][0]['message']"
]
},
{
"cell_type": "markdown",
"id": "cb3f181d",
"metadata": {},
"source": [
"## ChatCompletion.stream"
]
},
{
"cell_type": "markdown",
"id": "f7b8cd18",
"metadata": {},
"source": [
"Original OpenAI call"
]
},
{
"cell_type": "code",
"execution_count": 24,
"id": "fd8cb1ea",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'role': 'assistant', 'content': ''}\n",
"{'content': 'Hello'}\n",
"{'content': '!'}\n",
"{'content': ' How'}\n",
"{'content': ' can'}\n",
"{'content': ' I'}\n",
"{'content': ' assist'}\n",
"{'content': ' you'}\n",
"{'content': ' today'}\n",
"{'content': '?'}\n",
"{}\n"
]
}
],
"source": [
"for c in openai.ChatCompletion.create(\n",
" messages = messages,\n",
" model=\"gpt-3.5-turbo\", \n",
" temperature=0,\n",
" stream=True\n",
"):\n",
" print(c[\"choices\"][0]['delta'].to_dict_recursive())"
]
},
{
"cell_type": "markdown",
"id": "0b2a076b",
"metadata": {},
"source": [
"LangChain OpenAI wrapper call"
]
},
{
"cell_type": "code",
"execution_count": 30,
"id": "9521218c",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'role': 'assistant', 'content': ''}\n",
"{'content': 'Hello'}\n",
"{'content': '!'}\n",
"{'content': ' How'}\n",
"{'content': ' can'}\n",
"{'content': ' I'}\n",
"{'content': ' assist'}\n",
"{'content': ' you'}\n",
"{'content': ' today'}\n",
"{'content': '?'}\n",
"{}\n"
]
}
],
"source": [
"for c in lc_openai.ChatCompletion.create(\n",
" messages = messages,\n",
" model=\"gpt-3.5-turbo\", \n",
" temperature=0,\n",
" stream=True\n",
"):\n",
" print(c[\"choices\"][0]['delta'])"
]
},
{
"cell_type": "markdown",
"id": "0fc39750",
"metadata": {},
"source": [
"Swapping out model providers"
]
},
{
"cell_type": "code",
"execution_count": 31,
"id": "68f0214e",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'role': 'assistant', 'content': ' Hello'}\n",
"{'content': '!'}\n",
"{}\n"
]
}
],
"source": [
"for c in lc_openai.ChatCompletion.create(\n",
" messages = messages,\n",
" model=\"claude-2\", \n",
" temperature=0,\n",
" stream=True,\n",
" provider=\"ChatAnthropic\",\n",
"):\n",
" print(c[\"choices\"][0]['delta'])"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.1"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -1648,6 +1648,186 @@
"source": [
"## Conversational Retrieval With Memory"
]
},
{
"cell_type": "markdown",
"id": "92c87dd8-bb6f-4f32-a30d-8f5459ce6265",
"metadata": {},
"source": [
"## Fallbacks\n",
"\n",
"With LCEL you can easily introduce fallbacks for any Runnable component, like an LLM."
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "1b1cb744-31fc-4261-ab25-65fe1fcad559",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"AIMessage(content='To get to the other side.', additional_kwargs={}, example=False)"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from langchain.chat_models import ChatOpenAI\n",
"\n",
"bad_llm = ChatOpenAI(model_name=\"gpt-fake\")\n",
"good_llm = ChatOpenAI(model_name=\"gpt-3.5-turbo\")\n",
"llm = bad_llm.with_fallbacks([good_llm])\n",
"\n",
"llm.invoke(\"Why did the the chicken cross the road?\")"
]
},
{
"cell_type": "markdown",
"id": "b8cf3982-03f6-49b3-8ff5-7cd12444f19c",
"metadata": {},
"source": [
"Looking at the trace, we can see that the first model failed but the second succeeded, so we still got an output: https://smith.langchain.com/public/dfaf0bf6-d86d-43e9-b084-dd16a56df15c/r\n",
"\n",
"We can add an arbitrary sequence of fallbacks, which will be executed in order:"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "31819be0-7f40-4e67-b5ab-61340027b948",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"AIMessage(content='To get to the other side.', additional_kwargs={}, example=False)"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"llm = bad_llm.with_fallbacks([bad_llm, bad_llm, good_llm])\n",
"\n",
"llm.invoke(\"Why did the the chicken cross the road?\")"
]
},
{
"cell_type": "markdown",
"id": "acad6e88-8046-450e-b005-db7e50f33b80",
"metadata": {},
"source": [
"Trace: https://smith.langchain.com/public/c09efd01-3184-4369-a225-c9da8efcaf47/r\n",
"\n",
"We can continue to use our Runnable with fallbacks the same way we use any Runnable, mean we can include it in sequences:"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "bab114a1-bb93-4b7e-a639-e7e00f21aebc",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"AIMessage(content='To show off its incredible jumping skills! Kangaroos are truly amazing creatures.', additional_kwargs={}, example=False)"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from langchain.prompts import ChatPromptTemplate\n",
"\n",
"prompt = ChatPromptTemplate.from_messages(\n",
" [\n",
" (\"system\", \"You're a nice assistant who always includes a compliment in your response\"),\n",
" (\"human\", \"Why did the {animal} cross the road\"),\n",
" ]\n",
")\n",
"chain = prompt | llm\n",
"chain.invoke({\"animal\": \"kangaroo\"})"
]
},
{
"cell_type": "markdown",
"id": "58340afa-8187-4ffe-9bd2-7912fb733a15",
"metadata": {},
"source": [
"Trace: https://smith.langchain.com/public/ba03895f-f8bd-4c70-81b7-8b930353eabd/r\n",
"\n",
"Note, since every sequence of Runnables is itself a Runnable, we can create fallbacks for whole Sequences. We can also continue using the full interface, including asynchronous calls, batched calls, and streams:"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "45aa3170-b2e6-430d-887b-bd879048060a",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[\"\\n\\nAnswer: The rabbit crossed the road to get to the other side. That's quite clever of him!\",\n",
" '\\n\\nAnswer: The turtle crossed the road to get to the other side. You must be pretty clever to come up with that riddle!']"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from langchain.llms import OpenAI\n",
"from langchain.prompts import PromptTemplate\n",
"\n",
"chat_prompt = ChatPromptTemplate.from_messages(\n",
" [\n",
" (\"system\", \"You're a nice assistant who always includes a compliment in your response\"),\n",
" (\"human\", \"Why did the {animal} cross the road\"),\n",
" ]\n",
")\n",
"chat_model = ChatOpenAI(model_name=\"gpt-fake\")\n",
"\n",
"prompt_template = \"\"\"Instructions: You should always include a compliment in your response.\n",
"\n",
"Question: Why did the {animal} cross the road?\"\"\"\n",
"prompt = PromptTemplate.from_template(prompt_template)\n",
"llm = OpenAI()\n",
"\n",
"bad_chain = chat_prompt | chat_model\n",
"good_chain = prompt | llm\n",
"chain = bad_chain.with_fallbacks([good_chain])\n",
"await chain.abatch([{\"animal\": \"rabbit\"}, {\"animal\": \"turtle\"}])"
]
},
{
"cell_type": "markdown",
"id": "af6731c6-0c73-4b1d-a433-6e8f6ecce2bb",
"metadata": {},
"source": [
"Traces: \n",
"1. https://smith.langchain.com/public/ccd73236-9ae5-48a6-94b5-41210be18a46/r\n",
"2. https://smith.langchain.com/public/f43f608e-075c-45c7-bf73-b64e4d3f3082/r"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3d2fe1fe-506b-4ee5-8056-8b9df801765f",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
@@ -1666,7 +1846,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.1"
"version": "3.9.1"
}
},
"nbformat": 4,

View File

@@ -147,7 +147,7 @@
" api_key=os.environ[\"ARGILLA_API_KEY\"],\n",
")\n",
"\n",
"dataset.push_to_argilla(\"langchain-dataset\")"
"dataset.push_to_argilla(\"langchain-dataset\");"
]
},
{

View File

@@ -0,0 +1,382 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"collapsed": true,
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"# Label Studio\n",
"\n",
"<div>\n",
"<img src=\"https://labelstudio-pub.s3.amazonaws.com/lc/open-source-data-labeling-platform.png\" width=\"400\"/>\n",
"</div>\n",
"\n",
"Label Studio is an open-source data labeling platform that provides LangChain with flexibility when it comes to labeling data for fine-tuning large language models (LLMs). It also enables the preparation of custom training data and the collection and evaluation of responses through human feedback.\n",
"\n",
"In this guide, you will learn how to connect a LangChain pipeline to Label Studio to:\n",
"\n",
"- Aggregate all input prompts, conversations, and responses in a single LabelStudio project. This consolidates all the data in one place for easier labeling and analysis.\n",
"- Refine prompts and responses to create a dataset for supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) scenarios. The labeled data can be used to further train the LLM to improve its performance.\n",
"- Evaluate model responses through human feedback. LabelStudio provides an interface for humans to review and provide feedback on model responses, allowing evaluation and iteration."
]
},
{
"cell_type": "markdown",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"## Installation and setup"
]
},
{
"cell_type": "markdown",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"First install latest versions of Label Studio and Label Studio API client:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"!pip install -U label-studio label-studio-sdk openai"
]
},
{
"cell_type": "markdown",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"Next, run `label-studio` on the command line to start the local LabelStudio instance at `http://localhost:8080`. See the [Label Studio installation guide](https://labelstud.io/guide/install) for more options."
]
},
{
"cell_type": "markdown",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"You'll need a token to make API calls.\n",
"\n",
"Open your LabelStudio instance in your browser, go to `Account & Settings > Access Token` and copy the key.\n",
"\n",
"Set environment variables with your LabelStudio URL, API key and OpenAI API key:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"import os\n",
"\n",
"os.environ['LABEL_STUDIO_URL'] = '<YOUR-LABEL-STUDIO-URL>' # e.g. http://localhost:8080\n",
"os.environ['LABEL_STUDIO_API_KEY'] = '<YOUR-LABEL-STUDIO-API-KEY>'\n",
"os.environ['OPENAI_API_KEY'] = '<YOUR-OPENAI-API-KEY>'"
]
},
{
"cell_type": "markdown",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"## Collecting LLMs prompts and responses"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The data used for labeling is stored in projects within Label Studio. Every project is identified by an XML configuration that details the specifications for input and output data. \n",
"\n",
"Create a project that takes human input in text format and outputs an editable LLM response in a text area:\n",
"\n",
"```xml\n",
"<View>\n",
"<Style>\n",
" .prompt-box {\n",
" background-color: white;\n",
" border-radius: 10px;\n",
" box-shadow: 0px 4px 6px rgba(0, 0, 0, 0.1);\n",
" padding: 20px;\n",
" }\n",
"</Style>\n",
"<View className=\"root\">\n",
" <View className=\"prompt-box\">\n",
" <Text name=\"prompt\" value=\"$prompt\"/>\n",
" </View>\n",
" <TextArea name=\"response\" toName=\"prompt\"\n",
" maxSubmissions=\"1\" editable=\"true\"\n",
" required=\"true\"/>\n",
"</View>\n",
"<Header value=\"Rate the response:\"/>\n",
"<Rating name=\"rating\" toName=\"prompt\"/>\n",
"</View>\n",
"```\n",
"\n",
"1. To create a project in Label Studio, click on the \"Create\" button. \n",
"2. Enter a name for your project in the \"Project Name\" field, such as `My Project`.\n",
"3. Navigate to `Labeling Setup > Custom Template` and paste the XML configuration provided above."
]
},
{
"cell_type": "markdown",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"You can collect input LLM prompts and output responses in a LabelStudio project, connecting it via `LabelStudioCallbackHandler`:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"from langchain.llms import OpenAI\n",
"from langchain.callbacks import LabelStudioCallbackHandler\n",
"\n",
"llm = OpenAI(\n",
" temperature=0,\n",
" callbacks=[\n",
" LabelStudioCallbackHandler(\n",
" project_name=\"My Project\"\n",
" )]\n",
")\n",
"print(llm(\"Tell me a joke\"))"
]
},
{
"cell_type": "markdown",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"In the Label Studio, open `My Project`. You will see the prompts, responses, and metadata like the model name. "
]
},
{
"cell_type": "markdown",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"## Collecting Chat model Dialogues"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can also track and display full chat dialogues in LabelStudio, with the ability to rate and modify the last response:\n",
"\n",
"1. Open Label Studio and click on the \"Create\" button.\n",
"2. Enter a name for your project in the \"Project Name\" field, such as `New Project with Chat`.\n",
"3. Navigate to Labeling Setup > Custom Template and paste the following XML configuration:\n",
"\n",
"```xml\n",
"<View>\n",
"<View className=\"root\">\n",
" <Paragraphs name=\"dialogue\"\n",
" value=\"$prompt\"\n",
" layout=\"dialogue\"\n",
" textKey=\"content\"\n",
" nameKey=\"role\"\n",
" granularity=\"sentence\"/>\n",
" <Header value=\"Final response:\"/>\n",
" <TextArea name=\"response\" toName=\"dialogue\"\n",
" maxSubmissions=\"1\" editable=\"true\"\n",
" required=\"true\"/>\n",
"</View>\n",
"<Header value=\"Rate the response:\"/>\n",
"<Rating name=\"rating\" toName=\"dialogue\"/>\n",
"</View>\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"from langchain.chat_models import ChatOpenAI\n",
"from langchain.schema import HumanMessage, SystemMessage\n",
"from langchain.callbacks import LabelStudioCallbackHandler\n",
"\n",
"chat_llm = ChatOpenAI(callbacks=[\n",
" LabelStudioCallbackHandler(\n",
" mode=\"chat\",\n",
" project_name=\"New Project with Chat\",\n",
" )\n",
"])\n",
"llm_results = chat_llm([\n",
" SystemMessage(content=\"Always use a lot of emojis\"),\n",
" HumanMessage(content=\"Tell me a joke\")\n",
"])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In Label Studio, open \"New Project with Chat\". Click on a created task to view dialog history and edit/annotate responses."
]
},
{
"cell_type": "markdown",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"## Custom Labeling Configuration"
]
},
{
"cell_type": "markdown",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"You can modify the default labeling configuration in LabelStudio to add more target labels like response sentiment, relevance, and many [other types annotator's feedback](https://labelstud.io/tags/).\n",
"\n",
"New labeling configuration can be added from UI: go to `Settings > Labeling Interface` and set up a custom configuration with additional tags like `Choices` for sentiment or `Rating` for relevance. Keep in mind that [`TextArea` tag](https://labelstud.io/tags/textarea) should be presented in any configuration to display the LLM responses.\n",
"\n",
"Alternatively, you can specify the labeling configuration on the initial call before project creation:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"ls = LabelStudioCallbackHandler(project_config='''\n",
"<View>\n",
"<Text name=\"prompt\" value=\"$prompt\"/>\n",
"<TextArea name=\"response\" toName=\"prompt\"/>\n",
"<TextArea name=\"user_feedback\" toName=\"prompt\"/>\n",
"<Rating name=\"rating\" toName=\"prompt\"/>\n",
"<Choices name=\"sentiment\" toName=\"prompt\">\n",
" <Choice value=\"Positive\"/>\n",
" <Choice value=\"Negative\"/>\n",
"</Choices>\n",
"</View>\n",
"''')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note that if the project doesn't exist, it will be created with the specified labeling configuration."
]
},
{
"cell_type": "markdown",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"## Other parameters"
]
},
{
"cell_type": "markdown",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"The `LabelStudioCallbackHandler` accepts several optional parameters:\n",
"\n",
"- **api_key** - Label Studio API key. Overrides environmental variable `LABEL_STUDIO_API_KEY`.\n",
"- **url** - Label Studio URL. Overrides `LABEL_STUDIO_URL`, default `http://localhost:8080`.\n",
"- **project_id** - Existing Label Studio project ID. Overrides `LABEL_STUDIO_PROJECT_ID`. Stores data in this project.\n",
"- **project_name** - Project name if project ID not specified. Creates a new project. Default is `\"LangChain-%Y-%m-%d\"` formatted with the current date.\n",
"- **project_config** - [custom labeling configuration](#custom-labeling-configuration)\n",
"- **mode**: use this shortcut to create target configuration from scratch:\n",
" - `\"prompt\"` - Single prompt, single response. Default.\n",
" - `\"chat\"` - Multi-turn chat mode.\n",
"\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "labelops",
"language": "python",
"name": "labelops"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.16"
}
},
"nbformat": 4,
"nbformat_minor": 1
}

View File

@@ -74,6 +74,124 @@
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"id": "f27fa24d",
"metadata": {},
"source": [
"## Model Version\n",
"Azure OpenAI responses contain `model` property, which is name of the model used to generate the response. However unlike native OpenAI responses, it does not contain the version of the model, which is set on the deplyoment in Azure. This makes it tricky to know which version of the model was used to generate the response, which as result can lead to e.g. wrong total cost calculation with `OpenAICallbackHandler`.\n",
"\n",
"To solve this problem, you can pass `model_version` parameter to `AzureChatOpenAI` class, which will be added to the model name in the llm output. This way you can easily distinguish between different versions of the model."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0531798a",
"metadata": {},
"outputs": [],
"source": [
"from langchain.callbacks import get_openai_callback"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "3fd97dfc",
"metadata": {},
"outputs": [],
"source": [
"BASE_URL = \"https://{endpoint}.openai.azure.com\"\n",
"API_KEY = \"...\"\n",
"DEPLOYMENT_NAME = \"gpt-35-turbo\" # in Azure, this deployment has version 0613 - input and output tokens are counted separately"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "aceddb72",
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Total Cost (USD): $0.000054\n"
]
}
],
"source": [
"model = AzureChatOpenAI(\n",
" openai_api_base=BASE_URL,\n",
" openai_api_version=\"2023-05-15\",\n",
" deployment_name=DEPLOYMENT_NAME,\n",
" openai_api_key=API_KEY,\n",
" openai_api_type=\"azure\",\n",
")\n",
"with get_openai_callback() as cb:\n",
" model(\n",
" [\n",
" HumanMessage(\n",
" content=\"Translate this sentence from English to French. I love programming.\"\n",
" )\n",
" ]\n",
" )\n",
" print(f\"Total Cost (USD): ${format(cb.total_cost, '.6f')}\") # without specifying the model version, flat-rate 0.002 USD per 1k input and output tokens is used\n"
]
},
{
"cell_type": "markdown",
"id": "2e61eefd",
"metadata": {},
"source": [
"We can provide the model version to `AzureChatOpenAI` constructor. It will get appended to the model name returned by Azure OpenAI and cost will be counted correctly."
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "8d5e54e9",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Total Cost (USD): $0.000044\n"
]
}
],
"source": [
"model0613 = AzureChatOpenAI(\n",
" openai_api_base=BASE_URL,\n",
" openai_api_version=\"2023-05-15\",\n",
" deployment_name=DEPLOYMENT_NAME,\n",
" openai_api_key=API_KEY,\n",
" openai_api_type=\"azure\",\n",
" model_version=\"0613\"\n",
")\n",
"with get_openai_callback() as cb:\n",
" model0613(\n",
" [\n",
" HumanMessage(\n",
" content=\"Translate this sentence from English to French. I love programming.\"\n",
" )\n",
" ]\n",
" )\n",
" print(f\"Total Cost (USD): ${format(cb.total_cost, '.6f')}\")\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "99682534",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
@@ -92,7 +210,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
"version": "3.8.10"
}
},
"nbformat": 4,

View File

@@ -0,0 +1,325 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "62359e08-cf80-4210-a30c-f450000e65b9",
"metadata": {},
"source": [
"# ArcGISLoader\n",
"\n",
"This notebook demonstrates the use of the `langchain.document_loaders.ArcGISLoader` class.\n",
"\n",
"You will need to install the ArcGIS API for Python `arcgis` and, optionally, `bs4.BeautifulSoup`.\n",
"\n",
"You can use an `arcgis.gis.GIS` object for authenticated data loading, or leave it blank to access public data."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "b782cab5-0584-4e2a-9073-009fb8dc93a3",
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import ArcGISLoader\n",
"\n",
"\n",
"url = \"https://maps1.vcgov.org/arcgis/rest/services/Beaches/MapServer/7\"\n",
"\n",
"loader = ArcGISLoader(url)"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "aa3053cf-4127-43ea-bf56-e378b348091f",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 4.04 ms, sys: 1.63 ms, total: 5.67 ms\n",
"Wall time: 644 ms\n"
]
}
],
"source": [
"%%time\n",
"\n",
"docs = loader.load()"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "a2444519-9117-4feb-8bb9-8931ce286fa5",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"dict_keys(['url', 'layer_description', 'item_description', 'layer_properties'])"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"docs[0].metadata.keys()"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "6b6e9107-6a80-4ef7-8149-3013faa2de76",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"KeysView({\n",
" \"currentVersion\": 10.81,\n",
" \"id\": 7,\n",
" \"name\": \"Beach Ramps\",\n",
" \"type\": \"Feature Layer\",\n",
" \"description\": \"\",\n",
" \"geometryType\": \"esriGeometryPoint\",\n",
" \"sourceSpatialReference\": {\n",
" \"wkid\": 2881,\n",
" \"latestWkid\": 2881\n",
" },\n",
" \"copyrightText\": \"\",\n",
" \"parentLayer\": null,\n",
" \"subLayers\": [],\n",
" \"minScale\": 750000,\n",
" \"maxScale\": 0,\n",
" \"drawingInfo\": {\n",
" \"renderer\": {\n",
" \"type\": \"simple\",\n",
" \"symbol\": {\n",
" \"type\": \"esriPMS\",\n",
" \"url\": \"9bb2e5ca499bb68aa3ee0d4e1ecc3849\",\n",
" \"imageData\": \"iVBORw0KGgoAAAANSUhEUgAAABAAAAAQCAYAAAAf8/9hAAAAAXNSR0IB2cksfwAAAAlwSFlzAAAOxAAADsQBlSsOGwAAAJJJREFUOI3NkDEKg0AQRZ9kkSnSGBshR7DJqdJYeg7BMpcS0uQWQsqoCLExkcUJzGqT38zw2fcY1rEzbp7vjXz0EXC7gBxs1ABcG/8CYkCcDqwyLqsV+RlV0I/w7PzuJBArr1VB20H58Ls6h+xoFITkTwWpQJX7XSIBAnFwVj7MLAjJV/AC6G3QoAmK+74Lom04THTBEp/HCSc6AAAAAElFTkSuQmCC\",\n",
" \"contentType\": \"image/png\",\n",
" \"width\": 12,\n",
" \"height\": 12,\n",
" \"angle\": 0,\n",
" \"xoffset\": 0,\n",
" \"yoffset\": 0\n",
" },\n",
" \"label\": \"\",\n",
" \"description\": \"\"\n",
" },\n",
" \"transparency\": 0,\n",
" \"labelingInfo\": null\n",
" },\n",
" \"defaultVisibility\": true,\n",
" \"extent\": {\n",
" \"xmin\": -81.09480168806815,\n",
" \"ymin\": 28.858349245353473,\n",
" \"xmax\": -80.77512908572814,\n",
" \"ymax\": 29.41078388840041,\n",
" \"spatialReference\": {\n",
" \"wkid\": 4326,\n",
" \"latestWkid\": 4326\n",
" }\n",
" },\n",
" \"hasAttachments\": false,\n",
" \"htmlPopupType\": \"esriServerHTMLPopupTypeNone\",\n",
" \"displayField\": \"AccessName\",\n",
" \"typeIdField\": null,\n",
" \"subtypeFieldName\": null,\n",
" \"subtypeField\": null,\n",
" \"defaultSubtypeCode\": null,\n",
" \"fields\": [\n",
" {\n",
" \"name\": \"OBJECTID\",\n",
" \"type\": \"esriFieldTypeOID\",\n",
" \"alias\": \"OBJECTID\",\n",
" \"domain\": null\n",
" },\n",
" {\n",
" \"name\": \"Shape\",\n",
" \"type\": \"esriFieldTypeGeometry\",\n",
" \"alias\": \"Shape\",\n",
" \"domain\": null\n",
" },\n",
" {\n",
" \"name\": \"AccessName\",\n",
" \"type\": \"esriFieldTypeString\",\n",
" \"alias\": \"AccessName\",\n",
" \"length\": 40,\n",
" \"domain\": null\n",
" },\n",
" {\n",
" \"name\": \"AccessID\",\n",
" \"type\": \"esriFieldTypeString\",\n",
" \"alias\": \"AccessID\",\n",
" \"length\": 50,\n",
" \"domain\": null\n",
" },\n",
" {\n",
" \"name\": \"AccessType\",\n",
" \"type\": \"esriFieldTypeString\",\n",
" \"alias\": \"AccessType\",\n",
" \"length\": 25,\n",
" \"domain\": null\n",
" },\n",
" {\n",
" \"name\": \"GeneralLoc\",\n",
" \"type\": \"esriFieldTypeString\",\n",
" \"alias\": \"GeneralLoc\",\n",
" \"length\": 100,\n",
" \"domain\": null\n",
" },\n",
" {\n",
" \"name\": \"MilePost\",\n",
" \"type\": \"esriFieldTypeDouble\",\n",
" \"alias\": \"MilePost\",\n",
" \"domain\": null\n",
" },\n",
" {\n",
" \"name\": \"City\",\n",
" \"type\": \"esriFieldTypeString\",\n",
" \"alias\": \"City\",\n",
" \"length\": 50,\n",
" \"domain\": null\n",
" },\n",
" {\n",
" \"name\": \"AccessStatus\",\n",
" \"type\": \"esriFieldTypeString\",\n",
" \"alias\": \"AccessStatus\",\n",
" \"length\": 50,\n",
" \"domain\": null\n",
" },\n",
" {\n",
" \"name\": \"Entry_Date_Time\",\n",
" \"type\": \"esriFieldTypeDate\",\n",
" \"alias\": \"Entry_Date_Time\",\n",
" \"length\": 8,\n",
" \"domain\": null\n",
" },\n",
" {\n",
" \"name\": \"DrivingZone\",\n",
" \"type\": \"esriFieldTypeString\",\n",
" \"alias\": \"DrivingZone\",\n",
" \"length\": 50,\n",
" \"domain\": null\n",
" }\n",
" ],\n",
" \"geometryField\": {\n",
" \"name\": \"Shape\",\n",
" \"type\": \"esriFieldTypeGeometry\",\n",
" \"alias\": \"Shape\"\n",
" },\n",
" \"indexes\": null,\n",
" \"subtypes\": [],\n",
" \"relationships\": [],\n",
" \"canModifyLayer\": true,\n",
" \"canScaleSymbols\": false,\n",
" \"hasLabels\": false,\n",
" \"capabilities\": \"Map,Query,Data\",\n",
" \"maxRecordCount\": 1000,\n",
" \"supportsStatistics\": true,\n",
" \"supportsAdvancedQueries\": true,\n",
" \"supportedQueryFormats\": \"JSON, geoJSON\",\n",
" \"isDataVersioned\": false,\n",
" \"ownershipBasedAccessControlForFeatures\": {\n",
" \"allowOthersToQuery\": true\n",
" },\n",
" \"useStandardizedQueries\": true,\n",
" \"advancedQueryCapabilities\": {\n",
" \"useStandardizedQueries\": true,\n",
" \"supportsStatistics\": true,\n",
" \"supportsHavingClause\": true,\n",
" \"supportsCountDistinct\": true,\n",
" \"supportsOrderBy\": true,\n",
" \"supportsDistinct\": true,\n",
" \"supportsPagination\": true,\n",
" \"supportsTrueCurve\": true,\n",
" \"supportsReturningQueryExtent\": true,\n",
" \"supportsQueryWithDistance\": true,\n",
" \"supportsSqlExpression\": true\n",
" },\n",
" \"supportsDatumTransformation\": true,\n",
" \"dateFieldsTimeReference\": null,\n",
" \"supportsCoordinatesQuantization\": true\n",
"})"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"docs[0].metadata['layer_properties'].keys()"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "1d132b7d-5a13-4d66-98e8-785ffdf87af0",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{\"OBJECTID\": 2, \"AccessName\": \"27TH AV\", \"AccessID\": \"NS-141\", \"AccessType\": \"OPEN VEHICLE RAMP\", \"GeneralLoc\": \"3600 BLK S ATLANTIC AV\", \"MilePost\": 4.83, \"City\": \"NEW SMYRNA BEACH\", \"AccessStatus\": \"OPEN\", \"Entry_Date_Time\": 1691397348000, \"DrivingZone\": \"BOTH\"}\n",
"{\"OBJECTID\": 7, \"AccessName\": \"BEACHWAY AV\", \"AccessID\": \"NS-106\", \"AccessType\": \"OPEN VEHICLE RAMP\", \"GeneralLoc\": \"1400 N ATLANTIC AV\", \"MilePost\": 1.57, \"City\": \"NEW SMYRNA BEACH\", \"AccessStatus\": \"OPEN\", \"Entry_Date_Time\": 1691397348000, \"DrivingZone\": \"YES\"}\n",
"{\"OBJECTID\": 10, \"AccessName\": \"SEABREEZE BLVD\", \"AccessID\": \"DB-051\", \"AccessType\": \"OPEN VEHICLE RAMP\", \"GeneralLoc\": \"500 BLK N ATLANTIC AV\", \"MilePost\": 14.24, \"City\": \"DAYTONA BEACH\", \"AccessStatus\": \"OPEN\", \"Entry_Date_Time\": 1691394892000, \"DrivingZone\": \"BOTH\"}\n",
"{\"OBJECTID\": 13, \"AccessName\": \"GRANADA BLVD\", \"AccessID\": \"OB-030\", \"AccessType\": \"OPEN VEHICLE RAMP\", \"GeneralLoc\": \"20 BLK OCEAN SHORE BLVD\", \"MilePost\": 10.02, \"City\": \"ORMOND BEACH\", \"AccessStatus\": \"4X4 ONLY\", \"Entry_Date_Time\": 1691394952000, \"DrivingZone\": \"BOTH\"}\n",
"{\"OBJECTID\": 16, \"AccessName\": \"INTERNATIONAL SPEEDWAY BLVD\", \"AccessID\": \"DB-059\", \"AccessType\": \"OPEN VEHICLE RAMP\", \"GeneralLoc\": \"300 BLK S ATLANTIC AV\", \"MilePost\": 15.27, \"City\": \"DAYTONA BEACH\", \"AccessStatus\": \"OPEN\", \"Entry_Date_Time\": 1691395174000, \"DrivingZone\": \"BOTH\"}\n",
"{\"OBJECTID\": 26, \"AccessName\": \"UNIVERSITY BLVD\", \"AccessID\": \"DB-048\", \"AccessType\": \"OPEN VEHICLE RAMP\", \"GeneralLoc\": \"900 BLK N ATLANTIC AV\", \"MilePost\": 13.74, \"City\": \"DAYTONA BEACH\", \"AccessStatus\": \"OPEN\", \"Entry_Date_Time\": 1691394892000, \"DrivingZone\": \"BOTH\"}\n",
"{\"OBJECTID\": 36, \"AccessName\": \"BEACH ST\", \"AccessID\": \"PI-097\", \"AccessType\": \"OPEN VEHICLE RAMP\", \"GeneralLoc\": \"4890 BLK S ATLANTIC AV\", \"MilePost\": 25.85, \"City\": \"PONCE INLET\", \"AccessStatus\": \"OPEN\", \"Entry_Date_Time\": 1691397348000, \"DrivingZone\": \"BOTH\"}\n",
"{\"OBJECTID\": 40, \"AccessName\": \"BOTEFUHR AV\", \"AccessID\": \"DBS-067\", \"AccessType\": \"OPEN VEHICLE RAMP\", \"GeneralLoc\": \"1900 BLK S ATLANTIC AV\", \"MilePost\": 16.68, \"City\": \"DAYTONA BEACH SHORES\", \"AccessStatus\": \"4X4 ONLY\", \"Entry_Date_Time\": 1691395124000, \"DrivingZone\": \"YES\"}\n",
"{\"OBJECTID\": 41, \"AccessName\": \"SILVER BEACH AV\", \"AccessID\": \"DB-064\", \"AccessType\": \"OPEN VEHICLE RAMP\", \"GeneralLoc\": \"1000 BLK S ATLANTIC AV\", \"MilePost\": 15.98, \"City\": \"DAYTONA BEACH\", \"AccessStatus\": \"OPEN\", \"Entry_Date_Time\": 1691395174000, \"DrivingZone\": \"YES\"}\n",
"{\"OBJECTID\": 50, \"AccessName\": \"3RD AV\", \"AccessID\": \"NS-118\", \"AccessType\": \"OPEN VEHICLE RAMP\", \"GeneralLoc\": \"1200 BLK HILL ST\", \"MilePost\": 3.25, \"City\": \"NEW SMYRNA BEACH\", \"AccessStatus\": \"OPEN\", \"Entry_Date_Time\": 1691397348000, \"DrivingZone\": \"YES\"}\n",
"{\"OBJECTID\": 58, \"AccessName\": \"DUNLAWTON BLVD\", \"AccessID\": \"DBS-078\", \"AccessType\": \"OPEN VEHICLE RAMP\", \"GeneralLoc\": \"3400 BLK S ATLANTIC AV\", \"MilePost\": 20.61, \"City\": \"DAYTONA BEACH SHORES\", \"AccessStatus\": \"OPEN\", \"Entry_Date_Time\": 1691397348000, \"DrivingZone\": \"YES\"}\n",
"{\"OBJECTID\": 63, \"AccessName\": \"MILSAP RD\", \"AccessID\": \"OB-037\", \"AccessType\": \"OPEN VEHICLE RAMP\", \"GeneralLoc\": \"700 BLK S ATLANTIC AV\", \"MilePost\": 11.52, \"City\": \"ORMOND BEACH\", \"AccessStatus\": \"4X4 ONLY\", \"Entry_Date_Time\": 1691394952000, \"DrivingZone\": \"YES\"}\n",
"{\"OBJECTID\": 68, \"AccessName\": \"EMILIA AV\", \"AccessID\": \"DBS-082\", \"AccessType\": \"OPEN VEHICLE RAMP\", \"GeneralLoc\": \"3790 BLK S ATLANTIC AV\", \"MilePost\": 21.38, \"City\": \"DAYTONA BEACH SHORES\", \"AccessStatus\": \"OPEN\", \"Entry_Date_Time\": 1691397348000, \"DrivingZone\": \"BOTH\"}\n",
"{\"OBJECTID\": 92, \"AccessName\": \"FLAGLER AV\", \"AccessID\": \"NS-110\", \"AccessType\": \"OPEN VEHICLE RAMP\", \"GeneralLoc\": \"500 BLK FLAGLER AV\", \"MilePost\": 2.57, \"City\": \"NEW SMYRNA BEACH\", \"AccessStatus\": \"OPEN\", \"Entry_Date_Time\": 1691397348000, \"DrivingZone\": \"YES\"}\n",
"{\"OBJECTID\": 94, \"AccessName\": \"CRAWFORD RD\", \"AccessID\": \"NS-108\", \"AccessType\": \"OPEN VEHICLE RAMP - PASS\", \"GeneralLoc\": \"800 BLK N ATLANTIC AV\", \"MilePost\": 2.19, \"City\": \"NEW SMYRNA BEACH\", \"AccessStatus\": \"OPEN\", \"Entry_Date_Time\": 1691397348000, \"DrivingZone\": \"YES\"}\n",
"{\"OBJECTID\": 122, \"AccessName\": \"HARTFORD AV\", \"AccessID\": \"DB-043\", \"AccessType\": \"OPEN VEHICLE RAMP\", \"GeneralLoc\": \"1890 BLK N ATLANTIC AV\", \"MilePost\": 12.76, \"City\": \"DAYTONA BEACH\", \"AccessStatus\": \"CLOSED - SEASONAL\", \"Entry_Date_Time\": 1691394832000, \"DrivingZone\": \"YES\"}\n",
"{\"OBJECTID\": 125, \"AccessName\": \"WILLIAMS AV\", \"AccessID\": \"DB-042\", \"AccessType\": \"OPEN VEHICLE RAMP\", \"GeneralLoc\": \"2200 BLK N ATLANTIC AV\", \"MilePost\": 12.5, \"City\": \"DAYTONA BEACH\", \"AccessStatus\": \"4X4 ONLY\", \"Entry_Date_Time\": 1691394952000, \"DrivingZone\": \"YES\"}\n",
"{\"OBJECTID\": 134, \"AccessName\": \"CARDINAL DR\", \"AccessID\": \"OB-036\", \"AccessType\": \"OPEN VEHICLE RAMP\", \"GeneralLoc\": \"600 BLK S ATLANTIC AV\", \"MilePost\": 11.27, \"City\": \"ORMOND BEACH\", \"AccessStatus\": \"4X4 ONLY\", \"Entry_Date_Time\": 1691394952000, \"DrivingZone\": \"YES\"}\n",
"{\"OBJECTID\": 229, \"AccessName\": \"EL PORTAL ST\", \"AccessID\": \"DBS-076\", \"AccessType\": \"OPEN VEHICLE RAMP\", \"GeneralLoc\": \"3200 BLK S ATLANTIC AV\", \"MilePost\": 20.04, \"City\": \"DAYTONA BEACH SHORES\", \"AccessStatus\": \"OPEN\", \"Entry_Date_Time\": 1691397348000, \"DrivingZone\": \"YES\"}\n",
"{\"OBJECTID\": 230, \"AccessName\": \"HARVARD DR\", \"AccessID\": \"OB-038\", \"AccessType\": \"OPEN VEHICLE RAMP\", \"GeneralLoc\": \"900 BLK S ATLANTIC AV\", \"MilePost\": 11.72, \"City\": \"ORMOND BEACH\", \"AccessStatus\": \"4X4 ONLY\", \"Entry_Date_Time\": 1691394952000, \"DrivingZone\": \"YES\"}\n",
"{\"OBJECTID\": 232, \"AccessName\": \"VAN AV\", \"AccessID\": \"DBS-075\", \"AccessType\": \"OPEN VEHICLE RAMP\", \"GeneralLoc\": \"3100 BLK S ATLANTIC AV\", \"MilePost\": 19.6, \"City\": \"DAYTONA BEACH SHORES\", \"AccessStatus\": \"OPEN\", \"Entry_Date_Time\": 1691397348000, \"DrivingZone\": \"YES\"}\n",
"{\"OBJECTID\": 233, \"AccessName\": \"ROCKEFELLER DR\", \"AccessID\": \"OB-034\", \"AccessType\": \"OPEN VEHICLE RAMP\", \"GeneralLoc\": \"400 BLK S ATLANTIC AV\", \"MilePost\": 10.9, \"City\": \"ORMOND BEACH\", \"AccessStatus\": \"CLOSED - SEASONAL\", \"Entry_Date_Time\": 1691394832000, \"DrivingZone\": \"YES\"}\n",
"{\"OBJECTID\": 235, \"AccessName\": \"MINERVA RD\", \"AccessID\": \"DBS-069\", \"AccessType\": \"OPEN VEHICLE RAMP\", \"GeneralLoc\": \"2300 BLK S ATLANTIC AV\", \"MilePost\": 17.52, \"City\": \"DAYTONA BEACH SHORES\", \"AccessStatus\": \"4X4 ONLY\", \"Entry_Date_Time\": 1691395124000, \"DrivingZone\": \"YES\"}\n"
]
}
],
"source": [
"for doc in docs:\n",
" print(doc.page_content)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.13"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,101 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "ad553e51",
"metadata": {},
"source": [
"# Async Chromium\n",
"\n",
"Chromium is one of the browsers supported by Playwright, a library used to control browser automation. \n",
"\n",
"By running `p.chromium.launch(headless=True)`, we are launching a headless instance of Chromium. \n",
"\n",
"Headless mode means that the browser is running without a graphical user interface.\n",
"\n",
"`AsyncChromiumLoader` load the page, and then we use `Html2TextTransformer` to trasnform to text."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1c3a4c19",
"metadata": {},
"outputs": [],
"source": [
"! pip install -q playwright beautifulsoup4\n",
"! playwright install"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "dd2cdea7",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'<!DOCTYPE html><html lang=\"en\"><head><script src=\"https://s0.2mdn.net/instream/video/client.js\" asyn'"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from langchain.document_loaders import AsyncChromiumLoader\n",
"urls = [\"https://www.wsj.com\"]\n",
"loader = AsyncChromiumLoader(urls)\n",
"docs = loader.load()\n",
"docs[0].page_content[0:100]"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "013caa7e",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"\"Skip to Main ContentSkip to SearchSkip to... Select * Top News * What's News *\\nFeatured Stories * Retirement * Life & Arts * Hip-Hop * Sports * Video *\\nEconomy * Real Estate * Sports * CMO * CIO * CFO * Risk & Compliance *\\nLogistics Report * Sustainable Business * Heard on the Street * Barrons *\\nMarketWatch * Mansion Global * Penta * Opinion * Journal Reports * Sponsored\\nOffers Explore Our Brands * WSJ * * * * * Barron's * * * * * MarketWatch * * *\\n* * IBD # The Wall Street Journal SubscribeSig\""
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from langchain.document_transformers import Html2TextTransformer\n",
"html2text = Html2TextTransformer()\n",
"docs_transformed = html2text.transform_documents(docs)\n",
"docs_transformed[0].page_content[0:500]"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.16"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -9,66 +9,16 @@
"\n",
"GROBID is a machine learning library for extracting, parsing, and re-structuring raw documents.\n",
"\n",
"It is particularly good for sturctured PDFs, like academic papers.\n",
"It is designed and expected to be used to parse academic papers, where it works particularly well. Note: if the articles supplied to Grobid are large documents (e.g. dissertations) exceeding a certain number of elements, they might not be processed. \n",
"\n",
"This loader uses GROBIB to parse PDFs into `Documents` that retain metadata associated with the section of text.\n",
"This loader uses Grobid to parse PDFs into `Documents` that retain metadata associated with the section of text.\n",
"\n",
"---\n",
"The best approach is to install Grobid via docker, see https://grobid.readthedocs.io/en/latest/Grobid-docker/. \n",
"\n",
"For users on `Mac` - \n",
"(Note: additional instructions can be found [here](https://python.langchain.com/docs/extras/integrations/providers/grobid.mdx).)\n",
"\n",
"(Note: additional instructions can be found [here](https://python.langchain.com/docs/ecosystem/integrations/grobid.mdx).)\n",
"\n",
"Install Java (Apple Silicon):\n",
"```\n",
"$ arch -arm64 brew install openjdk@11\n",
"$ brew --prefix openjdk@11\n",
"/opt/homebrew/opt/openjdk@ 11\n",
"```\n",
"\n",
"In `~/.zshrc`:\n",
"```\n",
"export JAVA_HOME=/opt/homebrew/opt/openjdk@11\n",
"export PATH=$JAVA_HOME/bin:$PATH\n",
"```\n",
"\n",
"Then, in Terminal:\n",
"```\n",
"$ source ~/.zshrc\n",
"```\n",
"\n",
"Confirm install:\n",
"```\n",
"$ which java\n",
"/opt/homebrew/opt/openjdk@11/bin/java\n",
"$ java -version \n",
"openjdk version \"11.0.19\" 2023-04-18\n",
"OpenJDK Runtime Environment Homebrew (build 11.0.19+0)\n",
"OpenJDK 64-Bit Server VM Homebrew (build 11.0.19+0, mixed mode)\n",
"```\n",
"\n",
"Then, get [Grobid](https://grobid.readthedocs.io/en/latest/Install-Grobid/#getting-grobid):\n",
"```\n",
"$ curl -LO https://github.com/kermitt2/grobid/archive/0.7.3.zip\n",
"$ unzip 0.7.3.zip\n",
"```\n",
" \n",
"Build\n",
"```\n",
"$ ./gradlew clean install\n",
"```\n",
"\n",
"Then, run the server:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "2d8992fc",
"metadata": {},
"outputs": [],
"source": [
"! get_ipython().system_raw('nohup ./gradlew run > grobid.log 2>&1 &')"
"Once grobid is up-and-running you can interact as described below. \n"
]
},
{

View File

@@ -0,0 +1,95 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "2ed9a4c2",
"metadata": {},
"source": [
"# Beautiful Soup\n",
"\n",
"Beautiful Soup offers fine-grained control over HTML content, enabling specific tag extraction, removal, and content cleaning. \n",
"\n",
"It's suited for cases where you want to extract specific information and clean up the HTML content according to your needs.\n",
"\n",
"For example, we can scrape text content within `<p>, <li>, <div>, and <a>` tags from the HTML content:\n",
"\n",
"* `<p>`: The paragraph tag. It defines a paragraph in HTML and is used to group together related sentences and/or phrases.\n",
" \n",
"* `<li>`: The list item tag. It is used within ordered (`<ol>`) and unordered (`<ul>`) lists to define individual items within the list.\n",
" \n",
"* `<div>`: The division tag. It is a block-level element used to group other inline or block-level elements.\n",
" \n",
"* `<a>`: The anchor tag. It is used to define hyperlinks."
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "dd710e5b",
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import AsyncChromiumLoader\n",
"from langchain.document_transformers import BeautifulSoupTransformer\n",
"\n",
"# Load HTML\n",
"loader = AsyncChromiumLoader([\"https://www.wsj.com\"])\n",
"html = loader.load()"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "052b64dd",
"metadata": {},
"outputs": [],
"source": [
"# Transform\n",
"bs_transformer = BeautifulSoupTransformer()\n",
"docs_transformed = bs_transformer.transform_documents(html,tags_to_extract=[\"p\", \"li\", \"div\", \"a\"])"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "b53a5307",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'Conservative legal activists are challenging Amazon, Comcast and others using many of the same tools that helped kill affirmative-action programs in colleges.1,2099 min read U.S. stock indexes fell and government-bond prices climbed, after Moodys lowered credit ratings for 10 smaller U.S. banks and said it was reviewing ratings for six larger ones. The Dow industrials dropped more than 150 points.3 min read Penn Entertainments Barstool Sportsbook app will be rebranded as ESPN Bet this fall as '"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"docs_transformed[0].page_content[0:500]"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.16"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -14,7 +14,7 @@
},
{
"cell_type": "code",
"execution_count": 25,
"execution_count": 1,
"id": "60b6dbb2",
"metadata": {},
"outputs": [],
@@ -42,25 +42,14 @@
},
{
"cell_type": "code",
"execution_count": 5,
"execution_count": 2,
"id": "9ca87a2e",
"metadata": {},
"outputs": [],
"source": [
"# Initialize a Fireworks LLM\n",
"os.environ['FIREWORKS_API_KEY'] = \"\" #change this to your own API KEY\n",
"llm = Fireworks(model_id=\"accounts/fireworks/models/fireworks-llama-v2-13b-chat\")"
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "43a11ba8",
"metadata": {},
"outputs": [],
"source": [
"# Create LLM chain\n",
"llm_chain = LLMChain(prompt=prompt, llm=llm)"
"os.environ['FIREWORKS_API_KEY'] = \"<YOUR_API_KEY>\" # Change this to your own API key\n",
"llm = Fireworks(model_id=\"accounts/fireworks/models/llama-v2-13b-chat\")"
]
},
{
@@ -72,16 +61,33 @@
"\n",
"You can use the LLMs to call the model for specified prompt(s). \n",
"\n",
"Current Specified Models: \n",
"* accounts/fireworks/models/fireworks-falcon-7b, accounts/fireworks/models/fireworks-falcon-40b-w8a16\n",
"* accounts/fireworks/models/fireworks-starcoder-1b-w8a16-1gpu, accounts/fireworks/models/fireworks-starcoder-3b-w8a16-1gpu, accounts/fireworks/models/fireworks-starcoder-7b-w8a16-1gpu, accounts/fireworks/models/fireworks-starcoder-16b-w8a16 \n",
"* accounts/fireworks/models/fireworks-llama-v2-13b, accounts/fireworks/models/fireworks-llama-v2-13b-chat, accounts/fireworks/models/fireworks-llama-v2-13b-w8a16, accounts/fireworks/models/fireworks-llama-v2-13b-chat-w8a16\n",
"* accounts/fireworks/models/fireworks-llama-v2-7b, accounts/fireworks/models/fireworks-llama-v2-7b-chat, accounts/fireworks/models/fireworks-llama-v2-7b-w8a16, accounts/fireworks/models/fireworks-llama-v2-7b-chat-w8a16, accounts/fireworks/models/fireworks-llama-v2-70b-chat-4gpu"
"Currently supported models: \n",
"\n",
"* Falcon\n",
" * `accounts/fireworks/models/falcon-7b`\n",
" * `accounts/fireworks/models/falcon-40b-w8a16`\n",
"* Llama 2\n",
" * `accounts/fireworks/models/llama-v2-7b`\n",
" * `accounts/fireworks/models/llama-v2-7b-w8a16`\n",
" * `accounts/fireworks/models/llama-v2-7b-chat`\n",
" * `accounts/fireworks/models/llama-v2-7b-chat-w8a16`\n",
" * `accounts/fireworks/models/llama-v2-13b`\n",
" * `accounts/fireworks/models/llama-v2-13b-w8a16`\n",
" * `accounts/fireworks/models/llama-v2-13b-chat`\n",
" * `accounts/fireworks/models/llama-v2-13b-chat-w8a16`\n",
" * `accounts/fireworks/models/llama-v2-70b-chat-4gpu`\n",
"* StarCoder\n",
" * `accounts/fireworks/models/starcoder-1b-w8a16-1gpu`\n",
" * `accounts/fireworks/models/starcoder-3b-w8a16-1gpu`\n",
" * `accounts/fireworks/models/starcoder-7b-w8a16-1gpu`\n",
" * `accounts/fireworks/models/starcoder-16b-w8a16`\n",
"\n",
"See the full, most up-to-date list on [app.fireworks.ai](https://app.fireworks.ai)."
]
},
{
"cell_type": "code",
"execution_count": 10,
"execution_count": 3,
"id": "bf0a425c",
"metadata": {},
"outputs": [
@@ -89,29 +95,41 @@
"name": "stdout",
"output_type": "stream",
"text": [
"Is it Tom Brady, Aaron Rodgers, or someone else? It's a tough question to answer, and there are strong arguments for each of these quarterbacks. Here are some of the reasons why each of these quarterbacks could be considered the best:\n",
"\n",
"Tom Brady:\n",
"\n",
"It's a question that has been debated for years, with different analysts and fans making their cases for various signal-callers. Here are some of the top contenders for the title of best quarterback in the NFL:\n",
"* He has the most Super Bowl wins (6) of any quarterback in NFL history.\n",
"* He has been named Super Bowl MVP four times, more than any other player.\n",
"* He has led the New England Patriots to 18 playoff victories, the most in NFL history.\n",
"* He has thrown for over 70,000 yards in his career, the most of any quarterback in NFL history.\n",
"* He has thrown for 50 or more touchdowns in a season four times, the most of any quarterback in NFL history.\n",
"\n",
"1. Tom Brady: The New England Patriots legend has won six Super Bowls and has been named Super Bowl MVP four times. He's known for his precision passing, pocket presence, and ability to read defenses.\n",
"2. Aaron Rodgers: The Green Bay Packers quarterback has won two Super Bowls and has been named NFL MVP twice. He's known for his quick release, accuracy, and ability to extend plays with his feet.\n",
"3. Drew Brees: The New Orleans Saints quarterback has won a Super Bowl and has been named NFL MVP once. He's known for his accuracy, pocket presence, and ability to read defenses.\n",
"4. Patrick Mahomes: The Kansas City Chiefs quarterback has won a Super Bowl and has been named NFL MVP twice. He's known for his arm strength, athleticism, and ability to make plays outside of the pocket.\n",
"5. Russell Wilson: The Seattle Seahawks quarterback has won a Super Bowl and has been named NFL MVP once. He's known for his mobility, accuracy, and ability to extend plays with his feet.\n",
"Aaron Rodgers:\n",
"\n",
"Of course, there are other talented quarterbacks in the league, such as Lamar Jackson, Deshaun Watson, and Carson Wentz, who could also be considered among the best. Ultimately, the answer to the question of who's the best quarterback in the NFL is subjective and can vary depending on individual perspectives and criteria.\n"
"* He has led the Green Bay Packers to a Super Bowl victory in 2010.\n",
"* He has been named Super Bowl MVP once.\n",
"* He has thrown for over 40,000 yards in his career, the most of any quarterback in NFL history.\n",
"* He has thrown for 40 or more touchdowns in a season three times, the most of any quarterback in NFL history.\n",
"* He has a career passer rating of 103.1, the highest of any quarterback in NFL history.\n",
"\n",
"So, who's the best quarterback in the NFL? It's a tough call, but here's my opinion:\n",
"\n",
"I think Aaron Rodgers is the best quarterback in the NFL right now. He has led the Packers to a Super Bowl victory and has had some incredible seasons, including the 2011 season when he threw for 45 touchdowns and just 6 interceptions. He has a strong arm, great accuracy, and is incredibly mobile for a quarterback of his size. He also has a great sense of timing and knows when to take risks and when to play it safe.\n",
"\n",
"Tom Brady is a close second, though. He has an incredible track record of success, including six Super Bowl victories, and has been one of the most consistent quarterbacks in the league for the past two decades. He has a strong arm and is incredibly accurate\n"
]
}
],
"source": [
"#single prompt\n",
"# Single prompt\n",
"output = llm(\"Who's the best quarterback in the NFL?\")\n",
"print(output)"
]
},
{
"cell_type": "code",
"execution_count": 12,
"execution_count": 4,
"id": "afc7de6f",
"metadata": {},
"outputs": [
@@ -119,19 +137,22 @@
"name": "stdout",
"output_type": "stream",
"text": [
"generations=[[Generation(text=\"\\nWho is the best cricket player in the world in 2016?\\nThe best cricket player in the world in 2016 is Virat Kohli. The Indian captain has had a fabulous year, scoring heavily in all formats of the game, leading India to several victories, and breaking several records. In Test cricket, Kohli has scored 1215 runs at an average of 75.33 with 6 centuries and 4 fifties, which is the highest number of runs scored by any player in a calendar year. In ODI cricket, he has scored 1143 runs at an average of 83.42 with 7 centuries and 6 fifties, which is also the highest number of runs scored by any player in a calendar year. Additionally, Kohli has led India to the number one ranking in Test cricket, and has been named the ICC Test Player of the Year and the ICC ODI Player of the Year.\\nVirat Kohli has been in incredible form in 2016, and his performances have made him the standout player of the year. Other players who have had a great year include Steve Smith, Joe Root, and Kane Williamson, but Kohli's consistency and dominance in all formats of the game make him the best cricket player in the world in 2016.\", generation_info=None)], [Generation(text=\"\\n\\nA: LeBron James.\\n\\nB: Kevin Durant.\\n\\nC: Steph Curry.\\n\\nD: James Harden.\\n\\nE: Other (please specify).\\n\\nWhat's your answer?\", generation_info=None)]] llm_output={'token_usage': {}, 'model_id': 'fireworks-llama-v2-13b-chat'} run=[RunInfo(run_id=UUID('d14b6bee-7692-46ad-8798-acb6f72fc7fb')), RunInfo(run_id=UUID('b9f5b3b5-9e62-4eaf-b269-ecf0cbbcfb82'))]\n"
"[[Generation(text='\\nThe best cricket player in 2016 is a matter of opinion, but some of the top contenders for the title include:\\n\\n1. Virat Kohli (India): Kohli had a phenomenal year in 2016, scoring over 1,000 runs in Test cricket, including four centuries, and averaging over 70. He also scored heavily in ODI cricket, with an average of over 80.\\n2. Steve Smith (Australia): Smith had a remarkable year in 2016, leading Australia to a Test series victory in India and scoring over 1,000 runs in the format, including five centuries. He also averaged over 60 in ODI cricket.\\n3. KL Rahul (India): Rahul had a breakout year in 2016, scoring over 1,000 runs in Test cricket, including four centuries, and averaging over 60. He also scored heavily in ODI cricket, with an average of over 70.\\n4. Joe Root (England): Root had a solid year in 2016, scoring over 1,000 runs in Test cricket, including four centuries, and averaging over 50. He also scored heavily in ODI cricket, with an average of over 80.\\n5. Quinton de Kock (South Africa): De Kock had a remarkable year in 2016, scoring over 1,000 runs in ODI cricket, including six centuries, and averaging over 80. He also scored heavily in Test cricket, with an average of over 50.\\n\\nThese are just a few of the top contenders for the title of best cricket player in 2016, but there were many other talented players who also had impressive years. Ultimately, the answer to this question is subjective and depends on individual opinions and criteria for evaluation.', generation_info=None)], [Generation(text=\"\\nThis is a tough one, as there are so many great players in the league right now. But if I had to choose one, I'd say LeBron James is the best basketball player in the league. He's a once-in-a-generation talent who can dominate the game in so many ways. He's got incredible speed, strength, and court vision, and he's always finding new ways to improve his game. Plus, he's been doing it at an elite level for over a decade now, which is just amazing.\\n\\nBut don't just take my word for it - there are plenty of other great players in the league who could make a strong case for being the best. Guys like Kevin Durant, Steph Curry, James Harden, and Giannis Antetokounmpo are all having incredible seasons, and they've all got their own unique skills and strengths that make them special. So ultimately, it's up to you to decide who you think is the best basketball player in the league.\", generation_info=None)]]\n"
]
}
],
"source": [
"#calling multiple prompts\n",
"output = llm.generate([\"Who's the best cricket player in 2016?\", \"Who's the best basketball player in the league?\"])\n",
"print(output)"
"# Calling multiple prompts\n",
"output = llm.generate([\n",
" \"Who's the best cricket player in 2016?\",\n",
" \"Who's the best basketball player in the league?\",\n",
"])\n",
"print(output.generations)"
]
},
{
"cell_type": "code",
"execution_count": 17,
"execution_count": 5,
"id": "b801c20d",
"metadata": {},
"outputs": [
@@ -140,13 +161,13 @@
"output_type": "stream",
"text": [
"\n",
"Kansas City in December can be quite chilly, with average high\n"
"Kansas City in December is quite cold, with temperatures typically r\n"
]
}
],
"source": [
"#setting parameters: model_id, temperature, max_tokens, top_p\n",
"llm = Fireworks(model_id=\"accounts/fireworks/models/fireworks-llama-v2-13b-chat\", temperature=0.7, max_tokens=15, top_p=1.0)\n",
"# Setting additional parameters: temperature, max_tokens, top_p\n",
"llm = Fireworks(model_id=\"accounts/fireworks/models/llama-v2-13b-chat\", temperature=0.7, max_tokens=15, top_p=1.0)\n",
"print(llm(\"What's the weather like in Kansas City in December?\"))"
]
},
@@ -162,7 +183,7 @@
},
{
"cell_type": "code",
"execution_count": 26,
"execution_count": 6,
"id": "fd2c6bc1",
"metadata": {},
"outputs": [
@@ -171,34 +192,25 @@
"output_type": "stream",
"text": [
"\n",
"(Note: I'm just an AI and not a branding expert, so take this as a starting point for your own research and brainstorming.)\n",
"A good name for a company that makes football helmets could be:\n",
"Naming a company can be a fun and creative process! Here are a few name ideas for a company that makes football helmets:\n",
"\n",
"1. Helix Pro: This name plays off the idea of a helix, or spiral, shape that is commonly associated with football helmets. \"Pro\" implies a professional-level product.\n",
"2. Gridiron Gear: \"Gridiron\" is a term used to describe a football field, and \"gear\" highlights the company's focus on producing high-quality football helmets.\n",
"3. Linebacker Lab: \"Linebacker\" is a position on the football field, and \"Lab\" suggests a focus on research and development.\n",
"4. Helmet Hut: This name is simple and easy to remember, and it immediately conveys the company's focus on football helmets.\n",
"5. Tackle Tech: \"Tackle\" is a term used in football to describe a hit or collision, and \"Tech\" implies a focus on advanced technology and innovation.\n",
"6. Victory Vest: \"Victory\" implies a focus on winning and success, and \"Vest\" could suggest a protective or armored design.\n",
"7. Pigskin Pro: \"Pigskin\" is a term used to describe a football, and \"Pro\" implies a professional-level product.\n",
"8. Football Fusion: This name could suggest a combination of different materials or technologies to create a high-quality football helmet.\n",
"9. Endzone Edge: \"Endzone\" is the area of the football field where a team scores a touchdown, and \"Edge\" implies a competitive advantage.\n",
"10. MVP Masks: \"MVP\" stands for \"Most Valuable Player,\" and \"Masks\" highlights the protective nature of the company's football helmets.\n",
"1. Helix Headgear: This name plays off the idea of the helix shape of a football helmet and could be a memorable and catchy name for a company.\n",
"2. Gridiron Gear: \"Gridiron\" is a term used to describe a football field, and \"gear\" refers to the products the company sells. This name is straightforward and easy to understand.\n",
"3. Cushion Crusaders: This name emphasizes the protective qualities of football helmets and could appeal to customers looking for safety-conscious products.\n",
"4. Helmet Heroes: This name has a fun, heroic tone and could appeal to customers looking for high-quality products.\n",
"5. Tackle Tech: \"Tackle\" is a term used in football to describe a player's attempt to stop an opponent, and \"tech\" refers to the technology used in the helmets. This name could appeal to customers interested in innovative products.\n",
"6. Padded Protection: This name emphasizes the protective qualities of football helmets and could appeal to customers looking for products that prioritize safety.\n",
"7. Gridiron Gear Co.: This name is simple and straightforward, and it clearly conveys the company's focus on football-related products.\n",
"8. Helmet Haven: This name has a soothing, protective tone and could appeal to customers looking for a reliable brand.\n",
"\n",
"Remember, the name you choose for your company should be memorable, easy to pronounce and spell, and convey a sense of quality and professionalism. It's also important to check that the name isn't already in use by another company, and to consider any potential trademark issues.\n"
"Remember to choose a name that reflects your company's values and mission, and that resonates with your target market. Good luck with your company!\n"
]
}
],
"source": [
"human_message_prompt = HumanMessagePromptTemplate(\n",
" prompt=PromptTemplate(\n",
" template=\"What is a good name for a company that makes {product}?\",\n",
" input_variables=[\"product\"],\n",
" )\n",
")\n",
"\n",
"human_message_prompt = HumanMessagePromptTemplate.from_template(\"What is a good name for a company that makes {product}?\")\n",
"chat_prompt_template = ChatPromptTemplate.from_messages([human_message_prompt])\n",
"chat = Fireworks()\n",
"chat = FireworksChat()\n",
"chain = LLMChain(llm=chat, prompt=chat_prompt_template)\n",
"output = chain.run(\"football helmets\")\n",
"\n",
@@ -222,7 +234,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.9"
"version": "3.11.4"
}
},
"nbformat": 4,

View File

@@ -31,12 +31,11 @@
},
"outputs": [],
"source": [
"from langchain.llms.bedrock import Bedrock\n",
"from langchain.llms import Bedrock\n",
"\n",
"llm = Bedrock(\n",
" credentials_profile_name=\"bedrock-admin\",\n",
" model_id=\"amazon.titan-tg1-large\",\n",
" endpoint_url=\"custom_endpoint_url\",\n",
" model_id=\"amazon.titan-tg1-large\"\n",
")"
]
},

View File

@@ -0,0 +1,169 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# Titan Takeoff\n",
"\n",
"TitanML helps businesses build and deploy better, smaller, cheaper, and faster NLP models through our training, compression, and inference optimization platform. \n",
"\n",
"Our inference server, [Titan Takeoff](https://docs.titanml.co/docs/titan-takeoff/getting-started) enables deployment of LLMs locally on your hardware in a single command. Most generative model architectures are supported, such as Falcon, Llama 2, GPT2, T5 and many more."
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Installation\n",
"\n",
"To get started with Iris Takeoff, all you need is to have docker and python installed on your local system. If you wish to use the server with gpu suport, then you will need to install docker with cuda support.\n",
"\n",
"For Mac and Windows users, make sure you have the docker daemon running! You can check this by running docker ps in your terminal. To start the daemon, open the docker desktop app.\n",
"\n",
"Run the following command to install the Iris CLI that will enable you to run the takeoff server:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"vscode": {
"languageId": "shellscript"
}
},
"outputs": [],
"source": [
"pip install titan-iris"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Choose a Model\n",
"Iris Takeoff supports many of the most powerful generative text models, such as Falcon, MPT, and Llama. See the [supported models](https://docs.titanml.co/docs/titan-takeoff/supported-models) for more information. For information about using your own models, see the [custom models](https://docs.titanml.co/docs/titan-takeoff/Advanced/custom-models).\n",
"\n",
"Going forward in this demo we will be using the falcon 7B instruct model. This is a good open source model that is trained to follow instructions, and is small enough to easily inference even on CPUs.\n",
"\n",
"## Taking off\n",
"Models are referred to by their model id on HuggingFace. Takeoff uses port 8000 by default, but can be configured to use another port. There is also support to use a Nvidia GPU by specifing cuda for the device flag.\n",
"\n",
"To start the takeoff server, run:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"vscode": {
"languageId": "shellscript"
}
},
"outputs": [],
"source": [
"iris takeoff --model tiiuae/falcon-7b-instruct --device cpu\n",
"iris takeoff --model tiiuae/falcon-7b-instruct --device cuda # Nvidia GPU required\n",
"iris takeoff --model tiiuae/falcon-7b-instruct --device cpu --port 5000 # run on port 5000 (default: 8000)\n",
"```"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"You will then be directed to a login page, where you will need to create an account to proceed.\n",
"After logging in, run the command onscreen to check whether the server is ready. When it is ready, you can start using the Takeoff integration\n",
"\n",
"## Inferencing your model\n",
"To access your LLM, use the TitanTakeoff LLM wrapper:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain.llms import TitanTakeoff\n",
"\n",
"llm = TitanTakeoff(\n",
" port=8000,\n",
" generate_max_length=128,\n",
" temperature=1.0\n",
")\n",
"\n",
"prompt = \"What is the largest planet in the solar system?\"\n",
"\n",
"llm(prompt)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"No parameters are needed by default, but a port can be specified and [generation parameters](https://docs.titanml.co/docs/titan-takeoff/Advanced/generation-parameters) can be supplied.\n",
"\n",
"### Streaming\n",
"Streaming is also supported via the streaming flag:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler\n",
"from langchain.callbacks.manager import CallbackManager\n",
"\n",
"llm = TitanTakeoff(port=8000, callback_manager=CallbackManager([StreamingStdOutCallbackHandler()]), streaming=True)\n",
"\n",
"prompt = \"What is the capital of France?\"\n",
"\n",
"llm(prompt)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"### Integration with LLMChain"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain import PromptTemplate, LLMChain\n",
"\n",
"llm = TitanTakeoff()\n",
"\n",
"template = \"What is the capital of {country}\"\n",
"\n",
"prompt = PromptTemplate(template=template, input_variables=[\"country\"])\n",
"\n",
"llm_chain = LLMChain(llm=llm, prompt=prompt)\n",
"\n",
"generated = llm_chain.run(country=\"Belgium\")\n",
"print(generated)"
]
}
],
"metadata": {
"language_info": {
"name": "python"
},
"orig_nbformat": 4
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -0,0 +1,67 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# Rockset Chat Message History\n",
"\n",
"This notebook goes over how to use [Rockset](https://rockset.com/docs) to store chat message history. \n",
"\n",
"To begin, with get your API key from the [Rockset console](https://console.rockset.com/apikeys). Find your API region for the Rockset [API reference](https://rockset.com/docs/rest-api#introduction)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"vscode": {
"languageId": "plaintext"
}
},
"outputs": [],
"source": [
"from langchain.memory.chat_message_histories import RocksetChatMessageHistory\n",
"from rockset import RocksetClient, Regions\n",
"\n",
"history = RocksetChatMessageHistory(\n",
" session_id=\"MySession\",\n",
" client=RocksetClient(\n",
" api_key=\"YOUR API KEY\", \n",
" host=Regions.usw2a1 # us-west-2 Oregon\n",
" ),\n",
" collection=\"langchain_demo\",\n",
" sync=True\n",
")\n",
"history.add_user_message(\"hi!\")\n",
"history.add_ai_message(\"whats up?\")\n",
"print(history.messages)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"The output should be something like:\n",
"\n",
"```python\n",
"[\n",
" HumanMessage(content='hi!', additional_kwargs={'id': '2e62f1c2-e9f7-465e-b551-49bae07fe9f0'}, example=False), \n",
" AIMessage(content='whats up?', additional_kwargs={'id': 'b9be8eda-4c18-4cf8-81c3-e91e876927d0'}, example=False)\n",
"]\n",
"\n",
"```"
]
}
],
"metadata": {
"language_info": {
"name": "python"
},
"orig_nbformat": 4
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -0,0 +1,21 @@
# BagelDB
> [BagelDB](https://www.bageldb.ai/) (`Open Vector Database for AI`), is like GitHub for AI data.
It is a collaborative platform where users can create,
share, and manage vector datasets. It can support private projects for independent developers,
internal collaborations for enterprises, and public contributions for data DAOs.
## Installation and Setup
```bash
pip install betabageldb
```
## VectorStore
See a [usage example](/docs/integrations/vectorstores/bageldb).
```python
from langchain.vectorstores import Bagel
```

View File

@@ -0,0 +1,19 @@
# Dingo
This page covers how to use the Dingo ecosystem within LangChain.
It is broken into two parts: installation and setup, and then references to specific Dingo wrappers.
## Installation and Setup
- Install the Python SDK with `pip install dingodb`
## VectorStore
There exists a wrapper around Dingo indexes, allowing you to use it as a vectorstore,
whether for semantic search or example selection.
To import this vectorstore:
```python
from langchain.vectorstores import Dingo
```
For a more detailed walkthrough of the Dingo wrapper, see [this notebook](/docs/integrations/vectorstores/dingo.html)

View File

@@ -4,14 +4,15 @@ This page covers how to use the Fireworks models within Langchain.
## Installation and Setup
- To use the Fireworks model, you need to have a Fireworks API key. To generate one, sign up at platform.fireworks.ai
- To use the Fireworks model, you need to have a Fireworks API key. To generate one, sign up at [app.fireworks.ai](https://app.fireworks.ai).
- Authenticate by setting the FIREWORKS_API_KEY environment variable.
## LLM
Fireworks integrates with Langchain through the LLM module, which allows for standardized usage of any models deployed on the Fireworks models.
In this example, we'll work the llama-v2-13b.
In this example, we'll work the llama-v2-13b-chat model.
```python
from langchain.llms.fireworks import Fireworks

View File

@@ -1,22 +1,23 @@
# Grobid
GROBID is a machine learning library for extracting, parsing, and re-structuring raw documents.
It is designed and expected to be used to parse academic papers, where it works particularly well.
*Note*: if the articles supplied to Grobid are large documents (e.g. dissertations) exceeding a certain number
of elements, they might not be processed.
This page covers how to use the Grobid to parse articles for LangChain.
It is separated into two parts: installation and running the server
## Installation and Setup
#Ensure You have Java installed
!apt-get install -y openjdk-11-jdk -q
!update-alternatives --set java /usr/lib/jvm/java-11-openjdk-amd64/bin/java
## Installation
The grobid installation is described in details in https://grobid.readthedocs.io/en/latest/Install-Grobid/.
However, it is probably easier and less troublesome to run grobid through a docker container,
as documented [here](https://grobid.readthedocs.io/en/latest/Grobid-docker/).
#Clone and install the Grobid Repo
import os
!git clone https://github.com/kermitt2/grobid.git
os.environ["JAVA_HOME"] = "/usr/lib/jvm/java-11-openjdk-amd64"
os.chdir('grobid')
!./gradlew clean install
## Use Grobid with LangChain
#Run the server,
get_ipython().system_raw('nohup ./gradlew run > grobid.log 2>&1 &')
Once grobid is installed and up and running (you can check by accessing it http://localhost:8070),
you're ready to go.
You can now use the GrobidParser to produce documents
```python
@@ -41,4 +42,5 @@ loader = GenericLoader.from_filesystem(
)
docs = loader.load()
```
Chunk metadata will include bboxes although these are a bit funky to parse, see https://grobid.readthedocs.io/en/latest/Coordinates-in-PDF/
Chunk metadata will include Bounding Boxes. Although these are a bit funky to parse,
they are explained in https://grobid.readthedocs.io/en/latest/Coordinates-in-PDF/

View File

@@ -0,0 +1,104 @@
# Log10
This page covers how to use the [Log10](https://log10.io) within LangChain.
## What is Log10?
Log10 is an [open source](https://github.com/log10-io/log10) proxiless LLM data management and application development platform that lets you log, debug and tag your Langchain calls.
## Quick start
1. Create your free account at [log10.io](https://log10.io)
2. Add your `LOG10_TOKEN` and `LOG10_ORG_ID` from the Settings and Organization tabs respectively as environment variables.
3. Also add `LOG10_URL=https://log10.io` and your usual LLM API key: for e.g. `OPENAI_API_KEY` or `ANTHROPIC_API_KEY` to your environment
## How to enable Log10 data management for Langchain
Integration with log10 is a simple one-line `log10_callback` integration as shown below:
```python
from langchain.chat_models import ChatOpenAI
from langchain.schema import HumanMessage
from log10.langchain import Log10Callback
from log10.llm import Log10Config
log10_callback = Log10Callback(log10_config=Log10Config())
messages = [
HumanMessage(content="You are a ping pong machine"),
HumanMessage(content="Ping?"),
]
llm = ChatOpenAI(model_name="gpt-3.5-turbo", callbacks=[log10_callback])
```
[Log10 + Langchain + Logs docs](https://github.com/log10-io/log10/blob/main/logging.md#langchain-logger)
[More details + screenshots](https://log10.io/docs/logs) including instructions for self-hosting logs
## How to use tags with Log10
```python
from langchain import OpenAI
from langchain.chat_models import ChatAnthropic
from langchain.chat_models import ChatOpenAI
from langchain.schema import HumanMessage
from log10.langchain import Log10Callback
from log10.llm import Log10Config
log10_callback = Log10Callback(log10_config=Log10Config())
messages = [
HumanMessage(content="You are a ping pong machine"),
HumanMessage(content="Ping?"),
]
llm = ChatOpenAI(model_name="gpt-3.5-turbo", callbacks=[log10_callback], temperature=0.5, tags=["test"])
completion = llm.predict_messages(messages, tags=["foobar"])
print(completion)
llm = ChatAnthropic(model="claude-2", callbacks=[log10_callback], temperature=0.7, tags=["baz"])
llm.predict_messages(messages)
print(completion)
llm = OpenAI(model_name="text-davinci-003", callbacks=[log10_callback], temperature=0.5)
completion = llm.predict("You are a ping pong machine.\nPing?\n")
print(completion)
```
You can also intermix direct OpenAI calls and Langchain LLM calls:
```python
import os
from log10.load import log10, log10_session
import openai
from langchain import OpenAI
log10(openai)
with log10_session(tags=["foo", "bar"]):
# Log a direct OpenAI call
response = openai.Completion.create(
model="text-ada-001",
prompt="Where is the Eiffel Tower?",
temperature=0,
max_tokens=1024,
top_p=1,
frequency_penalty=0,
presence_penalty=0,
)
print(response)
# Log a call via Langchain
llm = OpenAI(model_name="text-ada-001", temperature=0.5)
response = llm.predict("You are a ping pong machine.\nPing?\n")
print(response)
```
## How to debug Langchain calls
[Example of debugging](https://log10.io/docs/prompt_chain_debugging)
[More Langchain examples](https://github.com/log10-io/log10/tree/main/examples#langchain)

View File

@@ -23,4 +23,11 @@ from langchain.vectorstores import Rockset
See a [usage example](/docs/integrations/document_loaders/rockset).
```python
from langchain.document_loaders import RocksetLoader
```
## Chat Message History
See a [usage example](/docs/integrations/memory/rockset_chat_message_history).
```python
from langchain.memory.chat_message_histories import RocksetChatMessageHistory
```

View File

@@ -63,7 +63,7 @@ results = vectara.similarity_score("what is LangChain?")
- `k`: number of results to return (defaults to 5)
- `lambda_val`: the [lexical matching](https://docs.vectara.com/docs/api-reference/search-apis/lexical-matching) factor for hybrid search (defaults to 0.025)
- `filter`: a [filter](https://docs.vectara.com/docs/common-use-cases/filtering-by-metadata/filter-overview) to apply to the results (default None)
- `n_sentence_context`: number of sentences to include before/after the actual matching segment when returning results. This defaults to 0 so as to return the exact text segment that matches, but can be used with other values e.g. 2 or 3 to return adjacent text segments.
- `n_sentence_context`: number of sentences to include before/after the actual matching segment when returning results. This defaults to 2.
The results are returned as a list of relevant documents, and a relevance score of each document.

View File

@@ -20,7 +20,7 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 5,
"id": "282239c8-e03a-4abc-86c1-ca6120231a20",
"metadata": {},
"outputs": [],
@@ -28,7 +28,7 @@
"from langchain.embeddings import BedrockEmbeddings\n",
"\n",
"embeddings = BedrockEmbeddings(\n",
" credentials_profile_name=\"bedrock-admin\", endpoint_url=\"custom_endpoint_url\"\n",
" credentials_profile_name=\"bedrock-admin\", region_name=\"us-east-1\"\n",
")"
]
},
@@ -49,7 +49,29 @@
"metadata": {},
"outputs": [],
"source": [
"embeddings.embed_documents([\"This is a content of the document\"])"
"embeddings.embed_documents([\"This is a content of the document\", \"This is another document\"])"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9f6b364d",
"metadata": {},
"outputs": [],
"source": [
"# async embed query\n",
"await embeddings.aembed_query(\"This is a content of the document\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c9240a5a",
"metadata": {},
"outputs": [],
"source": [
"# async embed documents\n",
"await embeddings.aembed_documents([\"This is a content of the document\", \"This is another document\"])"
]
}
],
@@ -69,7 +91,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.11"
"version": "3.9.13"
}
},
"nbformat": 4,

View File

@@ -12,7 +12,7 @@
},
{
"cell_type": "code",
"execution_count": 1,
"execution_count": 6,
"id": "0be1af71",
"metadata": {},
"outputs": [],
@@ -22,7 +22,7 @@
},
{
"cell_type": "code",
"execution_count": 2,
"execution_count": 29,
"id": "2c66e5da",
"metadata": {},
"outputs": [],
@@ -32,7 +32,7 @@
},
{
"cell_type": "code",
"execution_count": 3,
"execution_count": 30,
"id": "01370375",
"metadata": {},
"outputs": [],
@@ -42,7 +42,7 @@
},
{
"cell_type": "code",
"execution_count": 4,
"execution_count": 31,
"id": "bfb6142c",
"metadata": {},
"outputs": [],
@@ -52,7 +52,32 @@
},
{
"cell_type": "code",
"execution_count": 5,
"execution_count": 32,
"id": "91bc875d-829b-4c3d-8e6f-fc2dda30a3bd",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[-0.003186025367556387,\n",
" 0.011071979803637493,\n",
" -0.004020420763285827,\n",
" -0.011658221276953042,\n",
" -0.0010534035786864363]"
]
},
"execution_count": 32,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"query_result[:5]"
]
},
{
"cell_type": "code",
"execution_count": 33,
"id": "0356c3b7",
"metadata": {},
"outputs": [],
@@ -60,6 +85,31 @@
"doc_result = embeddings.embed_documents([text])"
]
},
{
"cell_type": "code",
"execution_count": 34,
"id": "a4b0d49e-0c73-44b6-aed5-5b426564e085",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[-0.003186025367556387,\n",
" 0.011071979803637493,\n",
" -0.004020420763285827,\n",
" -0.011658221276953042,\n",
" -0.0010534035786864363]"
]
},
"execution_count": 34,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"doc_result[0][:5]"
]
},
{
"cell_type": "markdown",
"id": "bb61bbeb",
@@ -70,7 +120,7 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 1,
"id": "c0b072cc",
"metadata": {},
"outputs": [],
@@ -80,17 +130,17 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 23,
"id": "a56b70f5",
"metadata": {},
"outputs": [],
"source": [
"embeddings = OpenAIEmbeddings()"
"embeddings = OpenAIEmbeddings(model=\"text-search-ada-doc-001\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 24,
"id": "14aefb64",
"metadata": {},
"outputs": [],
@@ -100,7 +150,7 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 25,
"id": "3c39ed33",
"metadata": {},
"outputs": [],
@@ -110,7 +160,32 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 26,
"id": "2ee7ce9f-d506-4810-8897-e44334412714",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[0.004452846988523035,\n",
" 0.034550655976098514,\n",
" -0.015029939040690051,\n",
" 0.03827273883655212,\n",
" 0.005785414075152477]"
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"query_result[:5]"
]
},
{
"cell_type": "code",
"execution_count": 27,
"id": "e3221db6",
"metadata": {},
"outputs": [],
@@ -118,6 +193,31 @@
"doc_result = embeddings.embed_documents([text])"
]
},
{
"cell_type": "code",
"execution_count": 28,
"id": "a0865409-3a6d-468f-939f-abde17c7cac3",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[0.004452846988523035,\n",
" 0.034550655976098514,\n",
" -0.015029939040690051,\n",
" 0.03827273883655212,\n",
" 0.005785414075152477]"
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"doc_result[0][:5]"
]
},
{
"cell_type": "code",
"execution_count": null,
@@ -132,7 +232,7 @@
],
"metadata": {
"kernelspec": {
"display_name": "Python 3.11.1 64-bit",
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
@@ -146,7 +246,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.1"
"version": "3.9.1"
},
"vscode": {
"interpreter": {

View File

@@ -0,0 +1,134 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "245a954a",
"metadata": {
"id": "245a954a"
},
"source": [
"# Alpha Vantage\n",
"\n",
">[Alpha Vantage](https://www.alphavantage.co) Alpha Vantage provides realtime and historical financial market data through a set of powerful and developer-friendly data APIs and spreadsheets. \n",
"\n",
"Use the ``AlphaVantageAPIWrapper`` to get currency exchange rates."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "34bb5968",
"metadata": {
"id": "34bb5968"
},
"outputs": [
{
"name": "stdin",
"output_type": "stream",
"text": [
" ········\n"
]
}
],
"source": [
"import getpass\n",
"import os\n",
"\n",
"os.environ[\"ALPHAVANTAGE_API_KEY\"] = getpass.getpass()"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "ac4910f8",
"metadata": {
"id": "ac4910f8"
},
"outputs": [],
"source": [
"from langchain.utilities.alpha_vantage import AlphaVantageAPIWrapper"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "84b8f773",
"metadata": {
"id": "84b8f773"
},
"outputs": [],
"source": [
"alpha_vantage = AlphaVantageAPIWrapper()"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "068991a6",
"metadata": {
"id": "068991a6",
"outputId": "c5cdc6ec-03cf-4084-cc6f-6ae792d91d39"
},
"outputs": [
{
"data": {
"text/plain": [
"{'1. From_Currency Code': 'USD',\n",
" '2. From_Currency Name': 'United States Dollar',\n",
" '3. To_Currency Code': 'JPY',\n",
" '4. To_Currency Name': 'Japanese Yen',\n",
" '5. Exchange Rate': '144.93000000',\n",
" '6. Last Refreshed': '2023-08-11 21:31:01',\n",
" '7. Time Zone': 'UTC',\n",
" '8. Bid Price': '144.92600000',\n",
" '9. Ask Price': '144.93400000'}"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"alpha_vantage.run(\"USD\", \"JPY\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "84fc2b66-c08f-4cd3-ae13-494c54789c09",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"colab": {
"provenance": []
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
},
"vscode": {
"interpreter": {
"hash": "53f3bc57609c7a84333bb558594977aa5b4026b1d6070b93987956689e367341"
}
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -1,181 +1,194 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# Dall-E Image Generator\n",
"\n",
"This notebook shows how you can generate images from a prompt synthesized using an OpenAI LLM. The images are generated using Dall-E, which uses the same OpenAI API key as the LLM."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Needed if you would like to display images in the notebook\n",
"!pip install opencv-python scikit-image"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"id": "q-k8wmp0zquh"
},
"outputs": [],
"source": [
"from langchain.llms import OpenAI\n",
"import os\n",
"os.environ[\"OPENAI_API_KEY\"] = \"<your-key-here>\""
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Run as a chain"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"from langchain.utilities.dalle_image_generator import DallEAPIWrapper\n",
"from langchain.prompts import PromptTemplate\n",
"from langchain.chains import LLMChain\n",
"\n",
"llm = OpenAI(temperature=0.9)\n",
"prompt = PromptTemplate(\n",
" input_variables=[\"image_desc\"],\n",
" template=\"Generate a detailed prompt to generate an image based on the following description: {image_desc}\",\n",
")\n",
"chain = LLMChain(llm=llm, prompt=prompt)"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"https://oaidalleapiprodscus.blob.core.windows.net/private/org-rocrupyvzgcl4yf25rqq6d1v/user-WsxrbKyP2c8rfhCKWDyMfe8N/img-mg1OWiziXxQN1aR2XRsLNndg.png?st=2023-01-31T07%3A34%3A15Z&se=2023-01-31T09%3A34%3A15Z&sp=r&sv=2021-08-06&sr=b&rscd=inline&rsct=image/png&skoid=6aaadede-4fb3-4698-a8f6-684d7786b067&sktid=a48cca56-e6da-484e-a814-9c849652bcb3&skt=2023-01-30T22%3A19%3A44Z&ske=2023-01-31T22%3A19%3A44Z&sks=b&skv=2021-08-06&sig=XDPee5aEng%2BcbXq2mqhh39uHGZTBmJgGAerSd0g%2BMEs%3D\n"
]
}
],
"source": [
"image_url = DallEAPIWrapper().run(chain.run(\"halloween night at a haunted museum\"))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# You can click on the link above to display the image for\n",
"# Or you can try the options below to display the image inline in this notebook\n",
"\n",
"try:\n",
" import google.colab\n",
" IN_COLAB = True\n",
"except:\n",
" IN_COLAB = False\n",
"\n",
"if IN_COLAB:\n",
" from google.colab.patches import cv2_imshow # for image display\n",
" from skimage import io\n",
"\n",
" image = io.imread(image_url) \n",
" cv2_imshow(image)\n",
"else:\n",
" import cv2\n",
" from skimage import io\n",
"\n",
" image = io.imread(image_url) \n",
" cv2.imshow('image', image)\n",
" cv2.waitKey(0) #wait for a keyboard input\n",
" cv2.destroyAllWindows()\n"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Run as a tool with an agent"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3m What is the best way to turn this description into an image?\n",
"Action: Dall-E Image Generator\n",
"Action Input: A spooky Halloween night at a haunted museum\u001b[0mhttps://oaidalleapiprodscus.blob.core.windows.net/private/org-rocrupyvzgcl4yf25rqq6d1v/user-WsxrbKyP2c8rfhCKWDyMfe8N/img-ogKfqxxOS5KWVSj4gYySR6FY.png?st=2023-01-31T07%3A38%3A25Z&se=2023-01-31T09%3A38%3A25Z&sp=r&sv=2021-08-06&sr=b&rscd=inline&rsct=image/png&skoid=6aaadede-4fb3-4698-a8f6-684d7786b067&sktid=a48cca56-e6da-484e-a814-9c849652bcb3&skt=2023-01-30T22%3A19%3A36Z&ske=2023-01-31T22%3A19%3A36Z&sks=b&skv=2021-08-06&sig=XsomxxBfu2CP78SzR9lrWUlbask4wBNnaMsHamy4VvU%3D\n",
"\n",
"Observation: \u001b[36;1m\u001b[1;3mhttps://oaidalleapiprodscus.blob.core.windows.net/private/org-rocrupyvzgcl4yf25rqq6d1v/user-WsxrbKyP2c8rfhCKWDyMfe8N/img-ogKfqxxOS5KWVSj4gYySR6FY.png?st=2023-01-31T07%3A38%3A25Z&se=2023-01-31T09%3A38%3A25Z&sp=r&sv=2021-08-06&sr=b&rscd=inline&rsct=image/png&skoid=6aaadede-4fb3-4698-a8f6-684d7786b067&sktid=a48cca56-e6da-484e-a814-9c849652bcb3&skt=2023-01-30T22%3A19%3A36Z&ske=2023-01-31T22%3A19%3A36Z&sks=b&skv=2021-08-06&sig=XsomxxBfu2CP78SzR9lrWUlbask4wBNnaMsHamy4VvU%3D\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m With the image generated, I can now make my final answer.\n",
"Final Answer: An image of a Halloween night at a haunted museum can be seen here: https://oaidalleapiprodscus.blob.core.windows.net/private/org-rocrupyvzgcl4yf25rqq6d1v/user-WsxrbKyP2c8rfhCKWDyMfe8N/img-ogKfqxxOS5KWVSj4gYySR6FY.png?st=2023-01-31T07%3A38%3A25Z&se=2023-01-31T09%3A38%3A25Z&sp=r&sv=2021-08-06&sr=b&rscd=inline&rsct=image/png&skoid=6aaadede-4fb3-4698-a8f6-684d7786b067&sktid=a48cca56-e6da-484e-a814-9c849652bcb3&skt=2023-01-30T22\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
}
],
"source": [
"from langchain.agents import load_tools\n",
"from langchain.agents import initialize_agent\n",
"\n",
"tools = load_tools(['dalle-image-generator'])\n",
"agent = initialize_agent(tools, llm, agent=\"zero-shot-react-description\", verbose=True)\n",
"output = agent.run(\"Create an image of a halloween night at a haunted museum\")"
]
}
],
"metadata": {
"colab": {
"provenance": []
},
"kernelspec": {
"display_name": "langchain",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.16"
},
"vscode": {
"interpreter": {
"hash": "3570c8892273ffbeee7ead61dc7c022b73551d9f55fb2584ac0e8e8920b18a89"
}
}
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# Dall-E Image Generator\n",
"\n",
"This notebook shows how you can generate images from a prompt synthesized using an OpenAI LLM. The images are generated using Dall-E, which uses the same OpenAI API key as the LLM."
]
},
"nbformat": 4,
"nbformat_minor": 0
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Needed if you would like to display images in the notebook\n",
"!pip install opencv-python scikit-image"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"id": "q-k8wmp0zquh"
},
"outputs": [],
"source": [
"from langchain.llms import OpenAI\n",
"import os\n",
"os.environ[\"OPENAI_API_KEY\"] = \"<your-key-here>\""
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Run as a chain"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"from langchain.utilities.dalle_image_generator import DallEAPIWrapper\n",
"from langchain.prompts import PromptTemplate\n",
"from langchain.chains import LLMChain\n",
"from langchain.llms import OpenAI\n",
"\n",
"llm = OpenAI(temperature=0.9)\n",
"prompt = PromptTemplate(\n",
" input_variables=[\"image_desc\"],\n",
" template=\"Generate a detailed prompt to generate an image based on the following description: {image_desc}\",\n",
")\n",
"chain = LLMChain(llm=llm, prompt=prompt)"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"image_url = DallEAPIWrapper().run(chain.run(\"halloween night at a haunted museum\"))"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'https://oaidalleapiprodscus.blob.core.windows.net/private/org-i0zjYONU3PemzJ222esBaAzZ/user-f6uEIOFxoiUZivy567cDSWni/img-i7Z2ZxvJ4IbbdAiO6OXJgS3v.png?st=2023-08-11T14%3A03%3A14Z&se=2023-08-11T16%3A03%3A14Z&sp=r&sv=2021-08-06&sr=b&rscd=inline&rsct=image/png&skoid=6aaadede-4fb3-4698-a8f6-684d7786b067&sktid=a48cca56-e6da-484e-a814-9c849652bcb3&skt=2023-08-10T20%3A58%3A32Z&ske=2023-08-11T20%3A58%3A32Z&sks=b&skv=2021-08-06&sig=/sECe7C0EAq37ssgBm7g7JkVIM/Q1W3xOstd0Go6slA%3D'"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"image_url"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# You can click on the link above to display the image \n",
"# Or you can try the options below to display the image inline in this notebook\n",
"\n",
"try:\n",
" import google.colab\n",
" IN_COLAB = True\n",
"except:\n",
" IN_COLAB = False\n",
"\n",
"if IN_COLAB:\n",
" from google.colab.patches import cv2_imshow # for image display\n",
" from skimage import io\n",
"\n",
" image = io.imread(image_url) \n",
" cv2_imshow(image)\n",
"else:\n",
" import cv2\n",
" from skimage import io\n",
"\n",
" image = io.imread(image_url) \n",
" cv2.imshow('image', image)\n",
" cv2.waitKey(0) #wait for a keyboard input\n",
" cv2.destroyAllWindows()\n"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Run as a tool with an agent"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3m What is the best way to turn this description into an image?\n",
"Action: Dall-E Image Generator\n",
"Action Input: A spooky Halloween night at a haunted museum\u001b[0mhttps://oaidalleapiprodscus.blob.core.windows.net/private/org-rocrupyvzgcl4yf25rqq6d1v/user-WsxrbKyP2c8rfhCKWDyMfe8N/img-ogKfqxxOS5KWVSj4gYySR6FY.png?st=2023-01-31T07%3A38%3A25Z&se=2023-01-31T09%3A38%3A25Z&sp=r&sv=2021-08-06&sr=b&rscd=inline&rsct=image/png&skoid=6aaadede-4fb3-4698-a8f6-684d7786b067&sktid=a48cca56-e6da-484e-a814-9c849652bcb3&skt=2023-01-30T22%3A19%3A36Z&ske=2023-01-31T22%3A19%3A36Z&sks=b&skv=2021-08-06&sig=XsomxxBfu2CP78SzR9lrWUlbask4wBNnaMsHamy4VvU%3D\n",
"\n",
"Observation: \u001b[36;1m\u001b[1;3mhttps://oaidalleapiprodscus.blob.core.windows.net/private/org-rocrupyvzgcl4yf25rqq6d1v/user-WsxrbKyP2c8rfhCKWDyMfe8N/img-ogKfqxxOS5KWVSj4gYySR6FY.png?st=2023-01-31T07%3A38%3A25Z&se=2023-01-31T09%3A38%3A25Z&sp=r&sv=2021-08-06&sr=b&rscd=inline&rsct=image/png&skoid=6aaadede-4fb3-4698-a8f6-684d7786b067&sktid=a48cca56-e6da-484e-a814-9c849652bcb3&skt=2023-01-30T22%3A19%3A36Z&ske=2023-01-31T22%3A19%3A36Z&sks=b&skv=2021-08-06&sig=XsomxxBfu2CP78SzR9lrWUlbask4wBNnaMsHamy4VvU%3D\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m With the image generated, I can now make my final answer.\n",
"Final Answer: An image of a Halloween night at a haunted museum can be seen here: https://oaidalleapiprodscus.blob.core.windows.net/private/org-rocrupyvzgcl4yf25rqq6d1v/user-WsxrbKyP2c8rfhCKWDyMfe8N/img-ogKfqxxOS5KWVSj4gYySR6FY.png?st=2023-01-31T07%3A38%3A25Z&se=2023-01-31T09%3A38%3A25Z&sp=r&sv=2021-08-06&sr=b&rscd=inline&rsct=image/png&skoid=6aaadede-4fb3-4698-a8f6-684d7786b067&sktid=a48cca56-e6da-484e-a814-9c849652bcb3&skt=2023-01-30T22\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
}
],
"source": [
"from langchain.agents import load_tools\n",
"from langchain.agents import initialize_agent\n",
"\n",
"tools = load_tools(['dalle-image-generator'])\n",
"agent = initialize_agent(tools, llm, agent=\"zero-shot-react-description\", verbose=True)\n",
"output = agent.run(\"Create an image of a halloween night at a haunted museum\")"
]
}
],
"metadata": {
"colab": {
"provenance": []
},
"kernelspec": {
"display_name": "poetry-venv",
"language": "python",
"name": "poetry-venv"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
},
"vscode": {
"interpreter": {
"hash": "3570c8892273ffbeee7ead61dc7c022b73551d9f55fb2584ac0e8e8920b18a89"
}
}
},
"nbformat": 4,
"nbformat_minor": 4
}

View File

@@ -0,0 +1,300 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# BagelDB\n",
"\n",
"> [BagelDB](https://www.bageldb.ai/) (`Open Vector Database for AI`), is like GitHub for AI data.\n",
"It is a collaborative platform where users can create,\n",
"share, and manage vector datasets. It can support private projects for independent developers,\n",
"internal collaborations for enterprises, and public contributions for data DAOs.\n",
"\n",
"### Installation and Setup\n",
"\n",
"```bash\n",
"pip install betabageldb\n",
"```\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create VectorStore from texts"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"from langchain.vectorstores import Bagel\n",
"\n",
"texts = [\"hello bagel\", \"hello langchain\", \"I love salad\", \"my car\", \"a dog\"]\n",
"# create cluster and add texts\n",
"cluster = Bagel.from_texts(cluster_name=\"testing\", texts=texts)"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[Document(page_content='hello bagel', metadata={}),\n",
" Document(page_content='my car', metadata={}),\n",
" Document(page_content='I love salad', metadata={})]"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# similarity search\n",
"cluster.similarity_search(\"bagel\", k=3)"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[(Document(page_content='hello bagel', metadata={}), 0.27392977476119995),\n",
" (Document(page_content='my car', metadata={}), 1.4783176183700562),\n",
" (Document(page_content='I love salad', metadata={}), 1.5342965126037598)]"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# the score is a distance metric, so lower is better\n",
"cluster.similarity_search_with_score(\"bagel\", k=3)"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [],
"source": [
"# delete the cluster\n",
"cluster.delete_cluster()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create VectorStore from docs"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import TextLoader\n",
"from langchain.text_splitter import CharacterTextSplitter\n",
"\n",
"loader = TextLoader(\"../../../state_of_the_union.txt\")\n",
"documents = loader.load()\n",
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
"docs = text_splitter.split_documents(documents)[:10]"
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {},
"outputs": [],
"source": [
"# create cluster with docs\n",
"cluster = Bagel.from_documents(cluster_name=\"testing_with_docs\", documents=docs)"
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and the \n"
]
}
],
"source": [
"# similarity search\n",
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
"docs = cluster.similarity_search(query)\n",
"print(docs[0].page_content[:102])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Get all text/doc from Cluster"
]
},
{
"cell_type": "code",
"execution_count": 53,
"metadata": {},
"outputs": [],
"source": [
"texts = [\"hello bagel\", \"this is langchain\"]\n",
"cluster = Bagel.from_texts(cluster_name=\"testing\", texts=texts)\n",
"cluster_data = cluster.get()"
]
},
{
"cell_type": "code",
"execution_count": 54,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"dict_keys(['ids', 'embeddings', 'metadatas', 'documents'])"
]
},
"execution_count": 54,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# all keys\n",
"cluster_data.keys()"
]
},
{
"cell_type": "code",
"execution_count": 56,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'ids': ['578c6d24-3763-11ee-a8ab-b7b7b34f99ba',\n",
" '578c6d25-3763-11ee-a8ab-b7b7b34f99ba',\n",
" 'fb2fc7d8-3762-11ee-a8ab-b7b7b34f99ba',\n",
" 'fb2fc7d9-3762-11ee-a8ab-b7b7b34f99ba',\n",
" '6b40881a-3762-11ee-a8ab-b7b7b34f99ba',\n",
" '6b40881b-3762-11ee-a8ab-b7b7b34f99ba',\n",
" '581e691e-3762-11ee-a8ab-b7b7b34f99ba',\n",
" '581e691f-3762-11ee-a8ab-b7b7b34f99ba'],\n",
" 'embeddings': None,\n",
" 'metadatas': [{}, {}, {}, {}, {}, {}, {}, {}],\n",
" 'documents': ['hello bagel',\n",
" 'this is langchain',\n",
" 'hello bagel',\n",
" 'this is langchain',\n",
" 'hello bagel',\n",
" 'this is langchain',\n",
" 'hello bagel',\n",
" 'this is langchain']}"
]
},
"execution_count": 56,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# all values and keys\n",
"cluster_data"
]
},
{
"cell_type": "code",
"execution_count": 57,
"metadata": {},
"outputs": [],
"source": [
"cluster.delete_cluster()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create cluster with metadata & filter using metadata"
]
},
{
"cell_type": "code",
"execution_count": 63,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[(Document(page_content='hello bagel', metadata={'source': 'notion'}), 0.0)]"
]
},
"execution_count": 63,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"texts = [\"hello bagel\", \"this is langchain\"]\n",
"metadatas = [{\"source\": \"notion\"}, {\"source\": \"google\"}]\n",
"\n",
"cluster = Bagel.from_texts(cluster_name=\"testing\", texts=texts, metadatas=metadatas)\n",
"cluster.similarity_search_with_score(\"hello bagel\", where={\"source\": \"notion\"})"
]
},
{
"cell_type": "code",
"execution_count": 64,
"metadata": {},
"outputs": [],
"source": [
"# delete the cluster\n",
"cluster.delete_cluster()"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.12"
},
"orig_nbformat": 4
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -0,0 +1,244 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "683953b3",
"metadata": {},
"source": [
"# Dingo\n",
"\n",
">[Dingo](https://dingodb.readthedocs.io/en/latest/) is a distributed multi-mode vector database, which combines the characteristics of data lakes and vector databases, and can store data of any type and size (Key-Value, PDF, audio, video, etc.). It has real-time low-latency processing capabilities to achieve rapid insight and response, and can efficiently conduct instant analysis and process multi-modal data.\n",
"\n",
"This notebook shows how to use functionality related to the DingoDB vector database.\n",
"\n",
"To run, you should have a [DingoDB instance up and running](https://github.com/dingodb/dingo-deploy/blob/main/README.md)."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a62cff8a-bcf7-4e33-bbbc-76999c2e3e20",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"!pip install dingodb"
]
},
{
"cell_type": "markdown",
"id": "7a0f9e02-8eb0-4aef-b11f-8861360472ee",
"metadata": {},
"source": [
"We want to use OpenAIEmbeddings so we have to get the OpenAI API Key."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "8b6ed9cd-81b9-46e5-9c20-5aafca2844d0",
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"OpenAI API Key:········\n"
]
}
],
"source": [
"import os\n",
"import getpass\n",
"\n",
"os.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"OpenAI API Key:\")"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "aac9563e",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
"from langchain.text_splitter import CharacterTextSplitter\n",
"from langchain.vectorstores import Dingo\n",
"from langchain.document_loaders import TextLoader"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "a3c3999a",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"from langchain.document_loaders import TextLoader\n",
"\n",
"loader = TextLoader(\"../../../state_of_the_union.txt\")\n",
"documents = loader.load()\n",
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
"docs = text_splitter.split_documents(documents)\n",
"\n",
"embeddings = OpenAIEmbeddings()"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "dcf88bdf",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"from dingodb import DingoDB\n",
"\n",
"index_name = \"langchain-demo\"\n",
"\n",
"dingo_client = DingoDB(user=\"\", password=\"\", host=[\"127.0.0.1:13000\"])\n",
"# First, check if our index already exists. If it doesn't, we create it\n",
"if index_name not in dingo_client.get_index():\n",
" # we create a new index\n",
" dingo_client.create_index(\n",
" index_name=index_name,\n",
" dimension=1536,\n",
" metric_type='cosine',\n",
" auto_id=False\n",
")\n",
"\n",
"# The OpenAI embedding model `text-embedding-ada-002 uses 1536 dimensions`\n",
"docsearch = Dingo.from_documents(docs, embeddings, client=dingo_client, index_name=index_name)\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c3aae49e",
"metadata": {},
"outputs": [],
"source": [
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
"from langchain.text_splitter import CharacterTextSplitter\n",
"from langchain.vectorstores import Dingo\n",
"from langchain.document_loaders import TextLoader"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "a8c513ab",
"metadata": {},
"outputs": [],
"source": [
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
"docs = docsearch.similarity_search(query)"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "fc516993",
"metadata": {},
"outputs": [],
"source": [
"print(docs[0][1])"
]
},
{
"cell_type": "markdown",
"id": "1eca81e4",
"metadata": {},
"source": [
"### Adding More Text to an Existing Index\n",
"\n",
"More text can embedded and upserted to an existing Dingo index using the `add_texts` function"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e40d558b",
"metadata": {},
"outputs": [],
"source": [
"vectorstore = Dingo(client, embeddings.embed_query, \"text\")\n",
"\n",
"vectorstore.add_texts(\"More text!\")"
]
},
{
"cell_type": "markdown",
"id": "bcb858a8",
"metadata": {},
"source": [
"### Maximal Marginal Relevance Searches\n",
"\n",
"In addition to using similarity search in the retriever object, you can also use `mmr` as retriever."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "649083ab",
"metadata": {},
"outputs": [],
"source": [
"retriever = docsearch.as_retriever(search_type=\"mmr\")\n",
"matched_docs = retriever.get_relevant_documents(query)\n",
"for i, d in enumerate(matched_docs):\n",
" print(f\"\\n## Document {i}\\n\")\n",
" print(d.page_content)"
]
},
{
"cell_type": "markdown",
"id": "7d3831ad",
"metadata": {},
"source": [
"Or use `max_marginal_relevance_search` directly:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "732f58b1",
"metadata": {},
"outputs": [],
"source": [
"found_docs = docsearch.max_marginal_relevance_search(query, k=2, fetch_k=10)\n",
"for i, doc in enumerate(found_docs):\n",
" print(f\"{i + 1}.\", doc.page_content, \"\\n\")"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.11"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -35,7 +35,7 @@
"\n",
" -- Create a table to store your documents\n",
" create table documents (\n",
" id bigserial primary key,\n",
" id uuid primary key,\n",
" content text, -- corresponds to Document.pageContent\n",
" metadata jsonb, -- corresponds to Document.metadata\n",
" embedding vector(1536) -- 1536 works for OpenAI embeddings, change if needed\n",

View File

@@ -81,17 +81,18 @@
},
{
"cell_type": "code",
"execution_count": 4,
"execution_count": 21,
"id": "53b7ce2d-3c09-4d1c-b66b-5769ce6746ae",
"metadata": {},
"outputs": [],
"source": [
"os.environ[\"WEAVIATE_API_KEY\"] = getpass.getpass(\"WEAVIATE_API_KEY:\")"
"os.environ[\"WEAVIATE_API_KEY\"] = getpass.getpass(\"WEAVIATE_API_KEY:\")\n",
"WEAVIATE_API_KEY = os.environ[\"WEAVIATE_API_KEY\"]"
]
},
{
"cell_type": "code",
"execution_count": 5,
"execution_count": 7,
"id": "aac9563e",
"metadata": {
"tags": []
@@ -106,7 +107,7 @@
},
{
"cell_type": "code",
"execution_count": 6,
"execution_count": 12,
"id": "a3c3999a",
"metadata": {},
"outputs": [],
@@ -123,7 +124,7 @@
},
{
"cell_type": "code",
"execution_count": 7,
"execution_count": 14,
"id": "21e9e528",
"metadata": {},
"outputs": [],
@@ -133,7 +134,7 @@
},
{
"cell_type": "code",
"execution_count": 8,
"execution_count": 15,
"id": "b4170176",
"metadata": {},
"outputs": [],
@@ -144,7 +145,7 @@
},
{
"cell_type": "code",
"execution_count": 9,
"execution_count": 16,
"id": "ecf3b890",
"metadata": {},
"outputs": [
@@ -166,6 +167,53 @@
"print(docs[0].page_content)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "7826d0ea",
"metadata": {},
"source": [
"## Authentication"
]
},
{
"cell_type": "markdown",
"id": "13989a7c",
"metadata": {},
"source": [
"Weaviate instances have authentication enabled by default. You can use either a username/password combination or API key. "
]
},
{
"cell_type": "code",
"execution_count": 23,
"id": "f6604f1d",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<langchain.vectorstores.weaviate.Weaviate object at 0x107f46550>\n"
]
}
],
"source": [
"import weaviate\n",
"\n",
"client = weaviate.Client(url=WEAVIATE_URL, auth_client_secret=weaviate.AuthApiKey(WEAVIATE_API_KEY))\n",
"\n",
"# client = weaviate.Client(\n",
"# url=WEAVIATE_URL,\n",
"# auth_client_secret=weaviate.AuthClientPassword(\n",
"# username = \"WCS_USERNAME\", # Replace w/ your WCS username\n",
"# password = \"WCS_PASSWORD\", # Replace w/ your WCS password\n",
"# ),\n",
"# )\n",
"\n",
"vectorstore = Weaviate.from_documents(documents, embeddings, client=client, by_text=False)"
]
},
{
"attachments": {},
"cell_type": "markdown",
@@ -187,7 +235,7 @@
},
{
"cell_type": "code",
"execution_count": 10,
"execution_count": 17,
"id": "102105a1",
"metadata": {},
"outputs": [
@@ -213,7 +261,7 @@
"id": "8fc3487b",
"metadata": {},
"source": [
"# Persistance"
"# Persistence"
]
},
{
@@ -249,7 +297,7 @@
},
{
"cell_type": "code",
"execution_count": 11,
"execution_count": null,
"id": "8b7df7ae",
"metadata": {},
"outputs": [
@@ -287,7 +335,7 @@
},
{
"cell_type": "code",
"execution_count": 12,
"execution_count": null,
"id": "5e824f3b",
"metadata": {},
"outputs": [],
@@ -298,7 +346,7 @@
},
{
"cell_type": "code",
"execution_count": 13,
"execution_count": null,
"id": "61209cc3",
"metadata": {},
"outputs": [],
@@ -311,7 +359,7 @@
},
{
"cell_type": "code",
"execution_count": 14,
"execution_count": null,
"id": "4abc3d37",
"metadata": {},
"outputs": [],
@@ -327,7 +375,7 @@
},
{
"cell_type": "code",
"execution_count": 15,
"execution_count": null,
"id": "c7062393",
"metadata": {},
"outputs": [],
@@ -339,7 +387,7 @@
},
{
"cell_type": "code",
"execution_count": 16,
"execution_count": null,
"id": "7e41b773",
"metadata": {},
"outputs": [

View File

@@ -0,0 +1,594 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "bf4061ce",
"metadata": {},
"source": [
"# Caching Embeddings\n",
"\n",
"Embeddings can be stored or temporarily cached to avoid needing to recompute them.\n",
"\n",
"Caching embeddings can be done using a `CacheBackedEmbeddings`.\n",
"\n",
"The cache backed embedder is a wrapper around an embedder that caches\n",
"embeddings in a key-value store. \n",
"\n",
"The text is hashed and the hash is used as the key in the cache.\n",
"\n",
"\n",
"The main supported way to initialized a `CacheBackedEmbeddings` is `from_bytes_store`. This takes in the following parameters:\n",
"\n",
"- underlying_embedder: The embedder to use for embedding.\n",
"- document_embedding_cache: The cache to use for storing document embeddings.\n",
"- namespace: (optional, defaults to `\"\"`) The namespace to use for document cache. This namespace is used to avoid collisions with other caches. For example, set it to the name of the embedding model used.\n",
"\n",
"**Attention**: Be sure to set the `namespace` parameter to avoid collisions of the same text embedded using different embeddings models."
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "a463c3c2-749b-40d1-a433-84f68a1cd1c7",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"from langchain.storage import InMemoryStore, LocalFileStore, RedisStore\n",
"from langchain.embeddings import OpenAIEmbeddings, CacheBackedEmbeddings"
]
},
{
"cell_type": "markdown",
"id": "9ddf07dd-3e72-41de-99d4-78e9521e272f",
"metadata": {},
"source": [
"## Using with a vectorstore\n",
"\n",
"First, let's see an example that uses the local file system for storing embeddings and uses FAISS vectorstore for retrieval."
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "9e4314d8-88ef-4f52-81ae-0be771168bb6",
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import TextLoader\n",
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
"from langchain.text_splitter import CharacterTextSplitter\n",
"from langchain.vectorstores import FAISS"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "3e751f26-9b5b-4c10-843a-d784b5ea8538",
"metadata": {},
"outputs": [],
"source": [
"underlying_embeddings = OpenAIEmbeddings()"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "30743664-38f5-425d-8216-772b64e7f348",
"metadata": {},
"outputs": [],
"source": [
"fs = LocalFileStore(\"./cache/\")\n",
"\n",
"cached_embedder = CacheBackedEmbeddings.from_bytes_store(\n",
" underlying_embeddings, fs, namespace=underlying_embeddings.model\n",
")"
]
},
{
"cell_type": "markdown",
"id": "f8cdf33c-321d-4d2c-b76b-d6f5f8b42a92",
"metadata": {},
"source": [
"The cache is empty prior to embedding"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "f9ad627f-ced2-4277-b336-2434f22f2c8a",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[]"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"list(fs.yield_keys())"
]
},
{
"cell_type": "markdown",
"id": "a4effe04-b40f-42f8-a449-72fe6991cf20",
"metadata": {},
"source": [
"Load the document, split it into chunks, embed each chunk and load it into the vector store."
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "cf958ac2-e60e-4668-b32c-8bb2d78b3c61",
"metadata": {},
"outputs": [],
"source": [
"raw_documents = TextLoader(\"../state_of_the_union.txt\").load()\n",
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
"documents = text_splitter.split_documents(raw_documents)"
]
},
{
"cell_type": "markdown",
"id": "f526444b-93f8-423f-b6d1-dab539450921",
"metadata": {},
"source": [
"create the vectorstore"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "3a1d7bb8-3b72-4bb5-9013-cf7729caca61",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 608 ms, sys: 58.9 ms, total: 667 ms\n",
"Wall time: 1.3 s\n"
]
}
],
"source": [
"%%time\n",
"db = FAISS.from_documents(documents, cached_embedder)"
]
},
{
"cell_type": "markdown",
"id": "64fc53f5-d559-467f-bf62-5daef32ffbc0",
"metadata": {},
"source": [
"If we try to create the vectostore again, it'll be much faster since it does not need to re-compute any embeddings."
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "714cb2e2-77ba-41a8-bb83-84e75342af2d",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 33.6 ms, sys: 3.96 ms, total: 37.6 ms\n",
"Wall time: 36.8 ms\n"
]
}
],
"source": [
"%%time\n",
"db2 = FAISS.from_documents(documents, cached_embedder)"
]
},
{
"cell_type": "markdown",
"id": "1acc76b9-9c70-4160-b593-5f932c75e2b6",
"metadata": {},
"source": [
"And here are some of the embeddings that got created:"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "f2ca32dd-3712-4093-942b-4122f3dc8a8e",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['text-embedding-ada-002614d7cf6-46f1-52fa-9d3a-740c39e7a20e',\n",
" 'text-embedding-ada-0020fc1ede2-407a-5e14-8f8f-5642214263f5',\n",
" 'text-embedding-ada-002e4ad20ef-dfaa-5916-9459-f90c6d8e8159',\n",
" 'text-embedding-ada-002a5ef11e4-0474-5725-8d80-81c91943b37f',\n",
" 'text-embedding-ada-00281426526-23fe-58be-9e84-6c7c72c8ca9a']"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"list(fs.yield_keys())[:5]"
]
},
{
"cell_type": "markdown",
"id": "564c9801-29f0-4452-aeac-527382e2c0e8",
"metadata": {},
"source": [
"## In Memory\n",
"\n",
"This section shows how to set up an in memory cache for embeddings. This type of cache is primarily \n",
"useful for unit tests or prototyping. Do **not** use this cache if you need to actually store the embeddings."
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "13bd1c5b-b7ba-4394-957c-7d5b5a841972",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"store = InMemoryStore()"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "9d99885f-99e1-498c-904d-6db539ac9466",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"underlying_embeddings = OpenAIEmbeddings()\n",
"embedder = CacheBackedEmbeddings.from_bytes_store(\n",
" underlying_embeddings, store, namespace=underlying_embeddings.model\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "682eb5d4-0b7a-4dac-b8fb-3de4ca6e421c",
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 10.9 ms, sys: 916 µs, total: 11.8 ms\n",
"Wall time: 159 ms\n"
]
}
],
"source": [
"%%time\n",
"embeddings = embedder.embed_documents([\"hello\", \"goodbye\"])"
]
},
{
"cell_type": "markdown",
"id": "95233026-147f-49d1-bd87-e1e8b88ebdbc",
"metadata": {},
"source": [
"The second time we try to embed the embedding time is only 2 ms because the embeddings are looked up in the cache."
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "f819c3ff-a212-4d06-a5f7-5eb1435c1feb",
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 1.67 ms, sys: 342 µs, total: 2.01 ms\n",
"Wall time: 2.01 ms\n"
]
}
],
"source": [
"%%time\n",
"embeddings_from_cache = embedder.embed_documents([\"hello\", \"goodbye\"])"
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "ec38fb72-90a9-4687-a483-c62c87d1f4dd",
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"embeddings == embeddings_from_cache"
]
},
{
"cell_type": "markdown",
"id": "f6cbe100-8587-4830-b207-fb8b524a9854",
"metadata": {},
"source": [
"## File system\n",
"\n",
"This section covers how to use a file system store."
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "a0070271-0809-4528-97e0-2a88216846f3",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"fs = LocalFileStore(\"./test_cache/\")"
]
},
{
"cell_type": "code",
"execution_count": 20,
"id": "0b20e9fe-f57f-4d7c-9f81-105c5f8726f4",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"embedder2 = CacheBackedEmbeddings.from_bytes_store(\n",
" underlying_embeddings, fs, namespace=underlying_embeddings.model\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 21,
"id": "630515fd-bf5c-4d9c-a404-9705308f3a2c",
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 6.89 ms, sys: 4.89 ms, total: 11.8 ms\n",
"Wall time: 184 ms\n"
]
}
],
"source": [
"%%time\n",
"embeddings = embedder2.embed_documents([\"hello\", \"goodbye\"])"
]
},
{
"cell_type": "code",
"execution_count": 22,
"id": "30e6bb87-42c9-4d08-88ac-0d22c9c449a1",
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 0 ns, sys: 3.24 ms, total: 3.24 ms\n",
"Wall time: 2.84 ms\n"
]
}
],
"source": [
"%%time\n",
"embeddings = embedder2.embed_documents([\"hello\", \"goodbye\"])"
]
},
{
"cell_type": "markdown",
"id": "12ed5a45-8352-4e0f-8583-5537397f53c0",
"metadata": {},
"source": [
"Here are the embeddings that have been persisted to the directory `./test_cache`. \n",
"\n",
"Notice that the embedder takes a namespace parameter."
]
},
{
"cell_type": "code",
"execution_count": 23,
"id": "658e2914-05e9-44a3-a8fe-3fe17ca84039",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['text-embedding-ada-002e885db5b-c0bd-5fbc-88b1-4d1da6020aa5',\n",
" 'text-embedding-ada-0026ba52e44-59c9-5cc9-a084-284061b13c80']"
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"list(fs.yield_keys())"
]
},
{
"cell_type": "markdown",
"id": "cd5f5a96-6ffa-429d-aa82-00b3f6532871",
"metadata": {},
"source": [
"## Redis Store\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 24,
"id": "4879c134-141f-48a0-acfe-7d6f30253af0",
"metadata": {},
"outputs": [],
"source": [
"from langchain.storage import RedisStore"
]
},
{
"cell_type": "code",
"execution_count": 25,
"id": "8b2bb9a0-6549-4487-8532-29ab4ab7336f",
"metadata": {},
"outputs": [],
"source": [
"# For cache isolation can use a separate DB\n",
"# Or additional namepace\n",
"store = RedisStore(redis_url=\"redis://localhost:6379\", client_kwargs={'db': 2}, namespace='embedding_caches')\n",
"\n",
"underlying_embeddings = OpenAIEmbeddings()\n",
"embedder = CacheBackedEmbeddings.from_bytes_store(\n",
" underlying_embeddings, store, namespace=underlying_embeddings.model\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 26,
"id": "eca3cb99-2bb3-49d5-81f9-1dee03da4b8c",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 3.99 ms, sys: 0 ns, total: 3.99 ms\n",
"Wall time: 3.5 ms\n"
]
}
],
"source": [
"%%time\n",
"embeddings = embedder.embed_documents([\"hello\", \"goodbye\"])"
]
},
{
"cell_type": "code",
"execution_count": 27,
"id": "317ba5d8-89f9-462c-b807-ad4ef26e518b",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 2.47 ms, sys: 767 µs, total: 3.24 ms\n",
"Wall time: 2.75 ms\n"
]
}
],
"source": [
"%%time\n",
"embeddings = embedder.embed_documents([\"hello\", \"goodbye\"])"
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "8a540317-5142-4491-9062-a097932b56e3",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['text-embedding-ada-002e885db5b-c0bd-5fbc-88b1-4d1da6020aa5',\n",
" 'text-embedding-ada-0026ba52e44-59c9-5cc9-a084-284061b13c80']"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"list(store.yield_keys())"
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "cd9b0d4a-f816-4dce-9dde-cde1ad9a65fb",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[b'embedding_caches/text-embedding-ada-002e885db5b-c0bd-5fbc-88b1-4d1da6020aa5',\n",
" b'embedding_caches/text-embedding-ada-0026ba52e44-59c9-5cc9-a084-284061b13c80']"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"list(store.client.scan_iter())"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.10"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,440 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "34883374",
"metadata": {},
"source": [
"# Parent Document Retriever\n",
"\n",
"When splitting documents for retrieval, there are often conflicting desires:\n",
"\n",
"1. You may want to have small documents, so that their embeddings can most\n",
" accurately reflect their meaning. If too long, then the embeddings can\n",
" lose meaning.\n",
"2. You want to have long enough documents that the context of each chunk is\n",
" retained.\n",
"\n",
"The ParentDocumentRetriever strikes that balance by splitting and storing\n",
"small chunks of data. During retrieval, it first fetches the small chunks\n",
"but then looks up the parent ids for those chunks and returns those larger\n",
"documents.\n",
"\n",
"Note that \"parent document\" refers to the document that a small chunk\n",
"originated from. This can either be the whole raw document OR a larger\n",
"chunk."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "8b6e74b2",
"metadata": {},
"outputs": [],
"source": [
"from langchain.retrievers import ParentDocumentRetriever"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "1d17af96",
"metadata": {},
"outputs": [],
"source": [
"from langchain.vectorstores import Chroma\n",
"from langchain.embeddings import OpenAIEmbeddings\n",
"from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
"from langchain.storage import InMemoryStore\n",
"from langchain.document_loaders import TextLoader"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "604ff981",
"metadata": {},
"outputs": [],
"source": [
"loaders = [\n",
" TextLoader('../../paul_graham_essay.txt'),\n",
" TextLoader('../../state_of_the_union.txt'),\n",
"]\n",
"docs = []\n",
"for l in loaders:\n",
" docs.extend(l.load())"
]
},
{
"cell_type": "markdown",
"id": "d3943f72",
"metadata": {},
"source": [
"## Retrieving Full Documents\n",
"\n",
"In this mode, we want to retrieve the full documents. Therefor, we only specify a child splitter."
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "1a8b2e5f",
"metadata": {},
"outputs": [],
"source": [
"# This text splitter is used to create the child documents\n",
"\n",
"child_splitter = RecursiveCharacterTextSplitter(chunk_size=400)\n",
"# The vectorstore to use to index the child chunks\n",
"vectorstore = Chroma(\n",
" collection_name=\"full_documents\",\n",
" embedding_function=OpenAIEmbeddings()\n",
")\n",
"# The storage layer for the parent documents\n",
"store = InMemoryStore()\n",
"retriever = ParentDocumentRetriever(\n",
" vectorstore=vectorstore, \n",
" docstore=store, \n",
" child_splitter=child_splitter,\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "2b107935",
"metadata": {},
"outputs": [],
"source": [
"retriever.add_documents(docs)"
]
},
{
"cell_type": "markdown",
"id": "d05b97b7",
"metadata": {},
"source": [
"This should yield two keys, because we added two documents."
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "30e3812b",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['05fe8d8a-bf60-4f87-b576-4351b23df266',\n",
" '571cc9e5-9ef7-4f6c-b800-835c83a1858b']"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"list(store.yield_keys())"
]
},
{
"cell_type": "markdown",
"id": "f895d62b",
"metadata": {},
"source": [
"Let's now call the vectorstore search functionality - we should see that it returns small chunks (since we're storing the small chunks"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "b261c02c",
"metadata": {},
"outputs": [],
"source": [
"sub_docs = vectorstore.similarity_search(\"justice breyer\")"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "5108222f",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Tonight, Id like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n",
"\n",
"One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court.\n"
]
}
],
"source": [
"print(sub_docs[0].page_content)"
]
},
{
"cell_type": "markdown",
"id": "bda8ed5a",
"metadata": {},
"source": [
"Let's now retrieve from the overall retriever. This should return large documents - since it returns the documents where the smaller chunks are located."
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "419a91c4",
"metadata": {},
"outputs": [],
"source": [
"retrieved_docs = retriever.get_relevant_documents(\"justice breyer\")"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "cf10d250",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"38539"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"len(retrieved_docs[0].page_content)"
]
},
{
"cell_type": "markdown",
"id": "14f813a5",
"metadata": {},
"source": [
"## Retrieving Larger Chunks\n",
"\n",
"Sometimes, the full documents can be too big to want to retrieve them as is. In that case, what we really want to do is to first split the raw documents into larger chunks, and then split it into smaller chunks. We then index the smaller chunks, but on retrieval we retrieve the larger chunks (but still not the full documents)."
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "b6f9a4f0",
"metadata": {},
"outputs": [],
"source": [
"# This text splitter is used to create the parent documents\n",
"parent_splitter = RecursiveCharacterTextSplitter(chunk_size=2000)\n",
"# This text splitter is used to create the child documents\n",
"# It should create documents smaller than the parent\n",
"child_splitter = RecursiveCharacterTextSplitter(chunk_size=400)\n",
"# The vectorstore to use to index the child chunks\n",
"vectorstore = Chroma(collection_name=\"split_parents\", embedding_function=OpenAIEmbeddings())\n",
"# The storage layer for the parent documents\n",
"store = InMemoryStore()"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "19478ff3",
"metadata": {},
"outputs": [],
"source": [
"retriever = ParentDocumentRetriever(\n",
" vectorstore=vectorstore, \n",
" docstore=store, \n",
" child_splitter=child_splitter,\n",
" parent_splitter=parent_splitter,\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "fe16e620",
"metadata": {},
"outputs": [],
"source": [
"retriever.add_documents(docs)"
]
},
{
"cell_type": "markdown",
"id": "64ad3c8c",
"metadata": {},
"source": [
"We can see that there are much more than two documents now - these are the larger chunks"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "24d81886",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"66"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"len(list(store.yield_keys()))"
]
},
{
"cell_type": "markdown",
"id": "baaef673",
"metadata": {},
"source": [
"Let's make sure the underlying vectorstore still retrieves the small chunks."
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "b1c859de",
"metadata": {},
"outputs": [],
"source": [
"sub_docs = vectorstore.similarity_search(\"justice breyer\")"
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "6fffa2eb",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Tonight, Id like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n",
"\n",
"One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court.\n"
]
}
],
"source": [
"print(sub_docs[0].page_content)"
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "3a3202df",
"metadata": {},
"outputs": [],
"source": [
"retrieved_docs = retriever.get_relevant_documents(\"justice breyer\")"
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "684fdb2c",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"1849"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"len(retrieved_docs[0].page_content)"
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "9f17f662",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"In state after state, new laws have been passed, not only to suppress the vote, but to subvert entire elections. \n",
"\n",
"We cannot let this happen. \n",
"\n",
"Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while youre at it, pass the Disclose Act so Americans can know who is funding our elections. \n",
"\n",
"Tonight, Id like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n",
"\n",
"One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n",
"\n",
"And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nations top legal minds, who will continue Justice Breyers legacy of excellence. \n",
"\n",
"A former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus builder. Since shes been nominated, shes received a broad range of support—from the Fraternal Order of Police to former judges appointed by Democrats and Republicans. \n",
"\n",
"And if we are to advance liberty and justice, we need to secure the Border and fix the immigration system. \n",
"\n",
"We can do both. At our border, weve installed new technology like cutting-edge scanners to better detect drug smuggling. \n",
"\n",
"Weve set up joint patrols with Mexico and Guatemala to catch more human traffickers. \n",
"\n",
"Were putting in place dedicated immigration judges so families fleeing persecution and violence can have their cases heard faster. \n",
"\n",
"Were securing commitments and supporting partners in South and Central America to host more refugees and secure their own borders.\n"
]
}
],
"source": [
"print(retrieved_docs[0].page_content)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "facfdacb",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.1"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,423 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "a15e6a18",
"metadata": {},
"source": [
"# Interacting with APIs\n",
"\n",
"[![Open In Collab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/langchain-ai/langchain/blob/master/docs/extras/use_cases/apis.ipynb)\n",
"\n",
"## Use case \n",
"\n",
"Suppose you want an LLM to interact with external APIs.\n",
"\n",
"This can be very useful for retrieving context for the LLM to utilize.\n",
"\n",
"And, more generally, it allows us to interact with APIs using natural langugage! \n",
" \n",
"\n",
"## Overview\n",
"\n",
"There are two primary ways to interface LLMs with external APIs:\n",
" \n",
"* `Functions`: For example, [OpenAI functions](https://platform.openai.com/docs/guides/gpt/function-calling) is one popular means of doing this.\n",
"* `LLM-generated interface`: Use an LLM with access to API documentation to create an interface.\n",
"\n",
"![Image description](/img/api_use_case.png)"
]
},
{
"cell_type": "markdown",
"id": "abbd82f0",
"metadata": {},
"source": [
"## Quickstart \n",
"\n",
"Many APIs already are compatible with OpenAI function calling.\n",
"\n",
"For example, [Klarna](https://www.klarna.com/international/press/klarna-brings-smoooth-shopping-to-chatgpt/) has a YAML file that describes its API and allows OpenAI to interact with it:\n",
"\n",
"```\n",
"https://www.klarna.com/us/shopping/public/openai/v0/api-docs/\n",
"```\n",
"\n",
"Other options include:\n",
"\n",
"* [Speak](https://api.speak.com/openapi.yaml) for translation\n",
"* [XKCD](https://gist.githubusercontent.com/roaldnefs/053e505b2b7a807290908fe9aa3e1f00/raw/0a212622ebfef501163f91e23803552411ed00e4/openapi.yaml) for comics\n",
"\n",
"We can supply the specification to `get_openapi_chain` directly in order to query the API with OpenAI functions:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5a218fcc",
"metadata": {},
"outputs": [],
"source": [
"pip install langchain openai \n",
"\n",
"# Set env var OPENAI_API_KEY or load from a .env file:\n",
"# import dotenv\n",
"# dotenv.load_env()"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "30b780e3",
"metadata": {
"scrolled": false
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Attempting to load an OpenAPI 3.0.1 spec. This may result in degraded performance. Convert your OpenAPI spec to 3.1.* spec for better support.\n"
]
},
{
"data": {
"text/plain": [
"{'query': \"What are some options for a men's large blue button down shirt\",\n",
" 'response': {'products': [{'name': 'Cubavera Four Pocket Guayabera Shirt',\n",
" 'url': 'https://www.klarna.com/us/shopping/pl/cl10001/3202055522/Clothing/Cubavera-Four-Pocket-Guayabera-Shirt/?utm_source=openai&ref-site=openai_plugin',\n",
" 'price': '$13.50',\n",
" 'attributes': ['Material:Polyester,Cotton',\n",
" 'Target Group:Man',\n",
" 'Color:Red,White,Blue,Black',\n",
" 'Properties:Pockets',\n",
" 'Pattern:Solid Color',\n",
" 'Size (Small-Large):S,XL,L,M,XXL']},\n",
" {'name': 'Polo Ralph Lauren Plaid Short Sleeve Button-down Oxford Shirt',\n",
" 'url': 'https://www.klarna.com/us/shopping/pl/cl10001/3207163438/Clothing/Polo-Ralph-Lauren-Plaid-Short-Sleeve-Button-down-Oxford-Shirt/?utm_source=openai&ref-site=openai_plugin',\n",
" 'price': '$52.20',\n",
" 'attributes': ['Material:Cotton',\n",
" 'Target Group:Man',\n",
" 'Color:Red,Blue,Multicolor',\n",
" 'Size (Small-Large):S,XL,L,M,XXL']},\n",
" {'name': 'Brixton Bowery Flannel Shirt',\n",
" 'url': 'https://www.klarna.com/us/shopping/pl/cl10001/3202331096/Clothing/Brixton-Bowery-Flannel-Shirt/?utm_source=openai&ref-site=openai_plugin',\n",
" 'price': '$27.48',\n",
" 'attributes': ['Material:Cotton',\n",
" 'Target Group:Man',\n",
" 'Color:Gray,Blue,Black,Orange',\n",
" 'Properties:Pockets',\n",
" 'Pattern:Checkered',\n",
" 'Size (Small-Large):XL,3XL,4XL,5XL,L,M,XXL']},\n",
" {'name': 'Vineyard Vines Gingham On-The-Go brrr Classic Fit Shirt Crystal',\n",
" 'url': 'https://www.klarna.com/us/shopping/pl/cl10001/3201938510/Clothing/Vineyard-Vines-Gingham-On-The-Go-brrr-Classic-Fit-Shirt-Crystal/?utm_source=openai&ref-site=openai_plugin',\n",
" 'price': '$80.64',\n",
" 'attributes': ['Material:Cotton',\n",
" 'Target Group:Man',\n",
" 'Color:Blue',\n",
" 'Size (Small-Large):XL,XS,L,M']},\n",
" {'name': \"Carhartt Men's Loose Fit Midweight Short Sleeve Plaid Shirt\",\n",
" 'url': 'https://www.klarna.com/us/shopping/pl/cl10001/3201826024/Clothing/Carhartt-Men-s-Loose-Fit-Midweight-Short-Sleeve-Plaid-Shirt/?utm_source=openai&ref-site=openai_plugin',\n",
" 'price': '$17.99',\n",
" 'attributes': ['Material:Cotton',\n",
" 'Target Group:Man',\n",
" 'Color:Red,Brown,Blue,Green',\n",
" 'Properties:Pockets',\n",
" 'Pattern:Checkered',\n",
" 'Size (Small-Large):S,XL,L,M']}]}}"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from langchain.chains.openai_functions.openapi import get_openapi_chain\n",
"chain = get_openapi_chain(\"https://www.klarna.com/us/shopping/public/openai/v0/api-docs/\")\n",
"chain(\"What are some options for a men's large blue button down shirt\")"
]
},
{
"cell_type": "markdown",
"id": "9162c91c",
"metadata": {},
"source": [
"## Functions \n",
"\n",
"We can unpack what is hapening when we use the funtions to calls external APIs.\n",
"\n",
"Let's look at the [LangSmith trace](https://smith.langchain.com/public/76a58b85-193f-4eb7-ba40-747f0d5dd56e/r):\n",
"\n",
"* See [here](https://github.com/langchain-ai/langchain/blob/7fc07ba5df99b9fa8bef837b0fafa220bc5c932c/libs/langchain/langchain/chains/openai_functions/openapi.py#L279C9-L279C19) that we call the OpenAI LLM with the provided API spec:\n",
"\n",
"```\n",
"https://www.klarna.com/us/shopping/public/openai/v0/api-docs/\n",
"```\n",
"\n",
"* The prompt then tells the LLM to use the API spec wiith input question:\n",
"\n",
"```\n",
"Use the provided API's to respond to this user query:\n",
"What are some options for a men's large blue button down shirt\n",
"```\n",
"\n",
"* The LLM returns the parameters for the function call `productsUsingGET`, which is [specified in the provided API spec](https://www.klarna.com/us/shopping/public/openai/v0/api-docs/):\n",
"```\n",
"function_call:\n",
" name: productsUsingGET\n",
" arguments: |-\n",
" {\n",
" \"params\": {\n",
" \"countryCode\": \"US\",\n",
" \"q\": \"men's large blue button down shirt\",\n",
" \"size\": 5,\n",
" \"min_price\": 0,\n",
" \"max_price\": 100\n",
" }\n",
" }\n",
" ```\n",
" \n",
"![Image description](/img/api_function_call.png)\n",
" \n",
"* This `Dict` above split and the [API is called here](https://github.com/langchain-ai/langchain/blob/7fc07ba5df99b9fa8bef837b0fafa220bc5c932c/libs/langchain/langchain/chains/openai_functions/openapi.py#L215)."
]
},
{
"cell_type": "markdown",
"id": "1fe49a0d",
"metadata": {},
"source": [
"## API Chain \n",
"\n",
"We can also build our own interface to external APIs using the `APIChain` and provided API documentation."
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "4ef0c3d0",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new APIChain chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3mhttps://api.open-meteo.com/v1/forecast?latitude=48.1351&longitude=11.5820&hourly=temperature_2m&temperature_unit=fahrenheit&current_weather=true\u001b[0m\n",
"\u001b[33;1m\u001b[1;3m{\"latitude\":48.14,\"longitude\":11.58,\"generationtime_ms\":1.0769367218017578,\"utc_offset_seconds\":0,\"timezone\":\"GMT\",\"timezone_abbreviation\":\"GMT\",\"elevation\":521.0,\"current_weather\":{\"temperature\":52.9,\"windspeed\":12.6,\"winddirection\":239.0,\"weathercode\":3,\"is_day\":0,\"time\":\"2023-08-07T22:00\"},\"hourly_units\":{\"time\":\"iso8601\",\"temperature_2m\":\"°F\"},\"hourly\":{\"time\":[\"2023-08-07T00:00\",\"2023-08-07T01:00\",\"2023-08-07T02:00\",\"2023-08-07T03:00\",\"2023-08-07T04:00\",\"2023-08-07T05:00\",\"2023-08-07T06:00\",\"2023-08-07T07:00\",\"2023-08-07T08:00\",\"2023-08-07T09:00\",\"2023-08-07T10:00\",\"2023-08-07T11:00\",\"2023-08-07T12:00\",\"2023-08-07T13:00\",\"2023-08-07T14:00\",\"2023-08-07T15:00\",\"2023-08-07T16:00\",\"2023-08-07T17:00\",\"2023-08-07T18:00\",\"2023-08-07T19:00\",\"2023-08-07T20:00\",\"2023-08-07T21:00\",\"2023-08-07T22:00\",\"2023-08-07T23:00\",\"2023-08-08T00:00\",\"2023-08-08T01:00\",\"2023-08-08T02:00\",\"2023-08-08T03:00\",\"2023-08-08T04:00\",\"2023-08-08T05:00\",\"2023-08-08T06:00\",\"2023-08-08T07:00\",\"2023-08-08T08:00\",\"2023-08-08T09:00\",\"2023-08-08T10:00\",\"2023-08-08T11:00\",\"2023-08-08T12:00\",\"2023-08-08T13:00\",\"2023-08-08T14:00\",\"2023-08-08T15:00\",\"2023-08-08T16:00\",\"2023-08-08T17:00\",\"2023-08-08T18:00\",\"2023-08-08T19:00\",\"2023-08-08T20:00\",\"2023-08-08T21:00\",\"2023-08-08T22:00\",\"2023-08-08T23:00\",\"2023-08-09T00:00\",\"2023-08-09T01:00\",\"2023-08-09T02:00\",\"2023-08-09T03:00\",\"2023-08-09T04:00\",\"2023-08-09T05:00\",\"2023-08-09T06:00\",\"2023-08-09T07:00\",\"2023-08-09T08:00\",\"2023-08-09T09:00\",\"2023-08-09T10:00\",\"2023-08-09T11:00\",\"2023-08-09T12:00\",\"2023-08-09T13:00\",\"2023-08-09T14:00\",\"2023-08-09T15:00\",\"2023-08-09T16:00\",\"2023-08-09T17:00\",\"2023-08-09T18:00\",\"2023-08-09T19:00\",\"2023-08-09T20:00\",\"2023-08-09T21:00\",\"2023-08-09T22:00\",\"2023-08-09T23:00\",\"2023-08-10T00:00\",\"2023-08-10T01:00\",\"2023-08-10T02:00\",\"2023-08-10T03:00\",\"2023-08-10T04:00\",\"2023-08-10T05:00\",\"2023-08-10T06:00\",\"2023-08-10T07:00\",\"2023-08-10T08:00\",\"2023-08-10T09:00\",\"2023-08-10T10:00\",\"2023-08-10T11:00\",\"2023-08-10T12:00\",\"2023-08-10T13:00\",\"2023-08-10T14:00\",\"2023-08-10T15:00\",\"2023-08-10T16:00\",\"2023-08-10T17:00\",\"2023-08-10T18:00\",\"2023-08-10T19:00\",\"2023-08-10T20:00\",\"2023-08-10T21:00\",\"2023-08-10T22:00\",\"2023-08-10T23:00\",\"2023-08-11T00:00\",\"2023-08-11T01:00\",\"2023-08-11T02:00\",\"2023-08-11T03:00\",\"2023-08-11T04:00\",\"2023-08-11T05:00\",\"2023-08-11T06:00\",\"2023-08-11T07:00\",\"2023-08-11T08:00\",\"2023-08-11T09:00\",\"2023-08-11T10:00\",\"2023-08-11T11:00\",\"2023-08-11T12:00\",\"2023-08-11T13:00\",\"2023-08-11T14:00\",\"2023-08-11T15:00\",\"2023-08-11T16:00\",\"2023-08-11T17:00\",\"2023-08-11T18:00\",\"2023-08-11T19:00\",\"2023-08-11T20:00\",\"2023-08-11T21:00\",\"2023-08-11T22:00\",\"2023-08-11T23:00\",\"2023-08-12T00:00\",\"2023-08-12T01:00\",\"2023-08-12T02:00\",\"2023-08-12T03:00\",\"2023-08-12T04:00\",\"2023-08-12T05:00\",\"2023-08-12T06:00\",\"2023-08-12T07:00\",\"2023-08-12T08:00\",\"2023-08-12T09:00\",\"2023-08-12T10:00\",\"2023-08-12T11:00\",\"2023-08-12T12:00\",\"2023-08-12T13:00\",\"2023-08-12T14:00\",\"2023-08-12T15:00\",\"2023-08-12T16:00\",\"2023-08-12T17:00\",\"2023-08-12T18:00\",\"2023-08-12T19:00\",\"2023-08-12T20:00\",\"2023-08-12T21:00\",\"2023-08-12T22:00\",\"2023-08-12T23:00\",\"2023-08-13T00:00\",\"2023-08-13T01:00\",\"2023-08-13T02:00\",\"2023-08-13T03:00\",\"2023-08-13T04:00\",\"2023-08-13T05:00\",\"2023-08-13T06:00\",\"2023-08-13T07:00\",\"2023-08-13T08:00\",\"2023-08-13T09:00\",\"2023-08-13T10:00\",\"2023-08-13T11:00\",\"2023-08-13T12:00\",\"2023-08-13T13:00\",\"2023-08-13T14:00\",\"2023-08-13T15:00\",\"2023-08-13T16:00\",\"2023-08-13T17:00\",\"2023-08-13T18:00\",\"2023-08-13T19:00\",\"2023-08-13T20:00\",\"2023-08-13T21:00\",\"2023-08-13T22:00\",\"2023-08-13T23:00\"],\"temperature_2m\":[53.0,51.2,50.9,50.4,50.7,51.3,51.7,52.9,54.3,56.1,57.4,59.3,59.1,60.7,59.7,58.8,58.8,57.8,56.6,55.3,53.9,52.7,52.9,53.2,52.0,51.8,51.3,50.7,50.8,51.5,53.9,57.7,61.2,63.2,64.7,66.6,67.5,67.0,68.7,68.7,67.9,66.2,64.4,61.4,59.8,58.9,57.9,56.3,55.7,55.3,55.5,55.4,55.7,56.5,57.6,58.8,59.7,59.1,58.9,60.6,59.9,59.8,59.9,61.7,63.2,63.6,62.3,58.9,57.3,57.1,57.0,56.5,56.2,56.0,55.3,54.7,54.4,55.2,57.8,60.7,63.0,65.3,66.9,68.2,70.1,72.1,72.6,71.4,69.7,68.6,66.2,63.6,61.8,60.6,59.6,58.9,58.0,57.1,56.3,56.2,56.7,57.9,59.9,63.7,68.4,72.4,75.0,76.8,78.0,78.7,78.9,78.4,76.9,74.8,72.5,70.1,67.6,65.6,64.4,63.9,63.4,62.7,62.2,62.1,62.5,63.4,65.1,68.0,71.7,74.8,76.8,78.2,79.1,79.6,79.7,79.2,77.6,75.3,73.7,68.6,66.8,65.3,64.2,63.4,62.6,61.7,60.9,60.6,60.9,61.6,63.2,65.9,69.3,72.2,74.4,76.2,77.6,78.8,79.6,79.6,78.4,76.4,74.3,72.3,70.4,68.7,67.6,66.8]}}\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"' The current temperature in Munich, Germany is 52.9°F.'"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from langchain.llms import OpenAI\n",
"from langchain.chains import APIChain\n",
"from langchain.chains.api import open_meteo_docs\n",
"llm = OpenAI(temperature=0)\n",
"chain = APIChain.from_llm_and_api_docs(llm, open_meteo_docs.OPEN_METEO_DOCS, verbose=True)\n",
"chain.run('What is the weather like right now in Munich, Germany in degrees Fahrenheit?')"
]
},
{
"cell_type": "markdown",
"id": "5b179318",
"metadata": {},
"source": [
"Note that we supply information about the API:"
]
},
{
"cell_type": "code",
"execution_count": 37,
"id": "a9e03cc2",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'BASE URL: https://api.open-meteo.com/\\n\\nAPI Documentation\\nThe API endpoint /v1/forecast accepts a geographical coordinate, a list of weather variables and responds with a JSON hourly weather forecast for 7 days. Time always starts at 0:00 today and contains 168 hours. All URL parameters are listed below:\\n\\nParameter\\tFormat\\tRequired\\tDefault\\tDescription\\nlatitude, longitude\\tFloating point\\tYes\\t\\tGeographical WGS84 coordinate of the location\\nhourly\\tString array\\tNo\\t\\tA list of weather variables which shou'"
]
},
"execution_count": 37,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"open_meteo_docs.OPEN_METEO_DOCS[0:500]"
]
},
{
"cell_type": "markdown",
"id": "3fab7930",
"metadata": {},
"source": [
"Under the hood, we do two things:\n",
" \n",
"* `api_request_chain`: Generate an API URL based on the input question and the api_docs\n",
"* `api_answer_chain`: generate a final answer based on the API response\n",
"\n",
"We can look at the [LangSmith trace](https://smith.langchain.com/public/1e0d18ca-0d76-444c-97df-a939a6a815a7/r) to inspect this:\n",
"\n",
"* The `api_request_chain` produces the API url from our question and the API documentation:\n",
"\n",
"![Image description](/img/api_chain.png)\n",
"\n",
"* [Here](https://github.com/langchain-ai/langchain/blob/bbd22b9b761389a5e40fc45b0570e1830aabb707/libs/langchain/langchain/chains/api/base.py#L82) we make the API request with the API url.\n",
"* The `api_answer_chain` takes the response from the API and provides us with a natural langugae response:\n",
"\n",
"![Image description](/img/api_chain_response.png)"
]
},
{
"cell_type": "markdown",
"id": "2511f446",
"metadata": {},
"source": [
"### Going deeper\n",
"\n",
"**Test with other APIs**"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1e1cf418",
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"os.environ['TMDB_BEARER_TOKEN'] = \"\"\n",
"from langchain.chains.api import tmdb_docs\n",
"headers = {\"Authorization\": f\"Bearer {os.environ['TMDB_BEARER_TOKEN']}\"}\n",
"chain = APIChain.from_llm_and_api_docs(llm, tmdb_docs.TMDB_DOCS, headers=headers, verbose=True)\n",
"chain.run(\"Search for 'Avatar'\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "dd80a717",
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"from langchain.llms import OpenAI\n",
"from langchain.chains.api import podcast_docs\n",
"from langchain.chains import APIChain\n",
" \n",
"listen_api_key = 'xxx' # Get api key here: https://www.listennotes.com/api/pricing/\n",
"llm = OpenAI(temperature=0)\n",
"headers = {\"X-ListenAPI-Key\": listen_api_key}\n",
"chain = APIChain.from_llm_and_api_docs(llm, podcast_docs.PODCAST_DOCS, headers=headers, verbose=True)\n",
"chain.run(\"Search for 'silicon valley bank' podcast episodes, audio length is more than 30 minutes, return only 1 results\")"
]
},
{
"cell_type": "markdown",
"id": "a5939be5",
"metadata": {},
"source": [
"**Web requests**\n",
"\n",
"URL requets are such a common use-case that we have the `LLMRequestsChain`, which makes a HTTP GET request. "
]
},
{
"cell_type": "code",
"execution_count": 39,
"id": "0b158296",
"metadata": {},
"outputs": [],
"source": [
"from langchain.llms import OpenAI\n",
"from langchain.prompts import PromptTemplate\n",
"from langchain.chains import LLMRequestsChain, LLMChain"
]
},
{
"cell_type": "code",
"execution_count": 40,
"id": "d49c33e4",
"metadata": {},
"outputs": [],
"source": [
"template = \"\"\"Between >>> and <<< are the raw search result text from google.\n",
"Extract the answer to the question '{query}' or say \"not found\" if the information is not contained.\n",
"Use the format\n",
"Extracted:<answer or \"not found\">\n",
">>> {requests_result} <<<\n",
"Extracted:\"\"\"\n",
"\n",
"PROMPT = PromptTemplate(\n",
" input_variables=[\"query\", \"requests_result\"],\n",
" template=template,\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 43,
"id": "d0fd4aab",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'query': 'What are the Three (3) biggest countries, and their respective sizes?',\n",
" 'url': 'https://www.google.com/search?q=What+are+the+Three+(3)+biggest+countries,+and+their+respective+sizes?',\n",
" 'output': ' Russia (17,098,242 km²), Canada (9,984,670 km²), China (9,706,961 km²)'}"
]
},
"execution_count": 43,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chain = LLMRequestsChain(llm_chain=LLMChain(llm=OpenAI(temperature=0), prompt=PROMPT))\n",
"question = \"What are the Three (3) biggest countries, and their respective sizes?\"\n",
"inputs = {\n",
" \"query\": question,\n",
" \"url\": \"https://www.google.com/search?q=\" + question.replace(\" \", \"+\"),\n",
"}\n",
"chain(inputs)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.16"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -1,24 +0,0 @@
---
sidebar_position: 3
---
# Interacting with APIs
Lots of data and information is stored behind APIs.
This page covers all resources available in LangChain for working with APIs.
## Chains
If you are just getting started, and you have relatively simple apis, you should get started with chains.
Chains are a sequence of predetermined steps, so they are good to get started with as they give you more control and let you
understand what is happening better.
- [API Chain](/docs/use_cases/apis/api.html)
## Agents
Agents are more complex, and involve multiple queries to the LLM to understand what to do.
The downside of agents are that you have less control. The upside is that they are more powerful,
which allows you to use them on larger and more complex schemas.
- [OpenAPI Agent](/docs/integrations/toolkits/openapi.html)

View File

@@ -1,123 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "dd7ec7af",
"metadata": {},
"source": [
"# HTTP request chain\n",
"\n",
"Using the request library to get HTML results from a URL and then an LLM to parse results"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "dd8eae75",
"metadata": {},
"outputs": [],
"source": [
"from langchain.llms import OpenAI\n",
"from langchain.chains import LLMRequestsChain, LLMChain"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "65bf324e",
"metadata": {},
"outputs": [],
"source": [
"from langchain.prompts import PromptTemplate\n",
"\n",
"template = \"\"\"Between >>> and <<< are the raw search result text from google.\n",
"Extract the answer to the question '{query}' or say \"not found\" if the information is not contained.\n",
"Use the format\n",
"Extracted:<answer or \"not found\">\n",
">>> {requests_result} <<<\n",
"Extracted:\"\"\"\n",
"\n",
"PROMPT = PromptTemplate(\n",
" input_variables=[\"query\", \"requests_result\"],\n",
" template=template,\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "f36ae0d8",
"metadata": {},
"outputs": [],
"source": [
"chain = LLMRequestsChain(llm_chain=LLMChain(llm=OpenAI(temperature=0), prompt=PROMPT))"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "b5d22d9d",
"metadata": {},
"outputs": [],
"source": [
"question = \"What are the Three (3) biggest countries, and their respective sizes?\"\n",
"inputs = {\n",
" \"query\": question,\n",
" \"url\": \"https://www.google.com/search?q=\" + question.replace(\" \", \"+\"),\n",
"}"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "2ea81168",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'query': 'What are the Three (3) biggest countries, and their respective sizes?',\n",
" 'url': 'https://www.google.com/search?q=What+are+the+Three+(3)+biggest+countries,+and+their+respective+sizes?',\n",
" 'output': ' Russia (17,098,242 km²), Canada (9,984,670 km²), United States (9,826,675 km²)'}"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chain(inputs)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "db8f2b6d",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.3"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

File diff suppressed because it is too large Load Diff

View File

@@ -1,583 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "9fcaa37f",
"metadata": {},
"source": [
"# OpenAPI chain\n",
"\n",
"This notebook shows an example of using an OpenAPI chain to call an endpoint in natural language, and get back a response in natural language."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "efa6909f",
"metadata": {},
"outputs": [],
"source": [
"from langchain.tools import OpenAPISpec, APIOperation\n",
"from langchain.chains import OpenAPIEndpointChain\n",
"from langchain.requests import Requests\n",
"from langchain.llms import OpenAI"
]
},
{
"cell_type": "markdown",
"id": "71e38c6c",
"metadata": {},
"source": [
"## Load the spec\n",
"\n",
"Load a wrapper of the spec (so we can work with it more easily). You can load from a url or from a local file."
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "0831271b",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Attempting to load an OpenAPI 3.0.1 spec. This may result in degraded performance. Convert your OpenAPI spec to 3.1.* spec for better support.\n"
]
}
],
"source": [
"spec = OpenAPISpec.from_url(\n",
" \"https://www.klarna.com/us/shopping/public/openai/v0/api-docs/\"\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "189dd506",
"metadata": {},
"outputs": [],
"source": [
"# Alternative loading from file\n",
"# spec = OpenAPISpec.from_file(\"openai_openapi.yaml\")"
]
},
{
"cell_type": "markdown",
"id": "f7093582",
"metadata": {},
"source": [
"## Select the Operation\n",
"\n",
"In order to provide a focused on modular chain, we create a chain specifically only for one of the endpoints. Here we get an API operation from a specified endpoint and method."
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "157494b9",
"metadata": {},
"outputs": [],
"source": [
"operation = APIOperation.from_openapi_spec(spec, \"/public/openai/v0/products\", \"get\")"
]
},
{
"cell_type": "markdown",
"id": "e3ab1c5c",
"metadata": {},
"source": [
"## Construct the chain\n",
"\n",
"We can now construct a chain to interact with it. In order to construct such a chain, we will pass in:\n",
"\n",
"1. The operation endpoint\n",
"2. A requests wrapper (can be used to handle authentication, etc)\n",
"3. The LLM to use to interact with it"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "788a7cef",
"metadata": {},
"outputs": [],
"source": [
"llm = OpenAI() # Load a Language Model"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "c5f27406",
"metadata": {},
"outputs": [],
"source": [
"chain = OpenAPIEndpointChain.from_api_operation(\n",
" operation,\n",
" llm,\n",
" requests=Requests(),\n",
" verbose=True,\n",
" return_intermediate_steps=True, # Return request and response text\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "23652053",
"metadata": {
"scrolled": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new OpenAPIEndpointChain chain...\u001b[0m\n",
"\n",
"\n",
"\u001b[1m> Entering new APIRequesterChain chain...\u001b[0m\n",
"Prompt after formatting:\n",
"\u001b[32;1m\u001b[1;3mYou are a helpful AI Assistant. Please provide JSON arguments to agentFunc() based on the user's instructions.\n",
"\n",
"API_SCHEMA: ```typescript\n",
"/* API for fetching Klarna product information */\n",
"type productsUsingGET = (_: {\n",
"/* A precise query that matches one very small category or product that needs to be searched for to find the products the user is looking for. If the user explicitly stated what they want, use that as a query. The query is as specific as possible to the product name or category mentioned by the user in its singular form, and don't contain any clarifiers like latest, newest, cheapest, budget, premium, expensive or similar. The query is always taken from the latest topic, if there is a new topic a new query is started. */\n",
"\t\tq: string,\n",
"/* number of products returned */\n",
"\t\tsize?: number,\n",
"/* (Optional) Minimum price in local currency for the product searched for. Either explicitly stated by the user or implicitly inferred from a combination of the user's request and the kind of product searched for. */\n",
"\t\tmin_price?: number,\n",
"/* (Optional) Maximum price in local currency for the product searched for. Either explicitly stated by the user or implicitly inferred from a combination of the user's request and the kind of product searched for. */\n",
"\t\tmax_price?: number,\n",
"}) => any;\n",
"```\n",
"\n",
"USER_INSTRUCTIONS: \"whats the most expensive shirt?\"\n",
"\n",
"Your arguments must be plain json provided in a markdown block:\n",
"\n",
"ARGS: ```json\n",
"{valid json conforming to API_SCHEMA}\n",
"```\n",
"\n",
"Example\n",
"-----\n",
"\n",
"ARGS: ```json\n",
"{\"foo\": \"bar\", \"baz\": {\"qux\": \"quux\"}}\n",
"```\n",
"\n",
"The block must be no more than 1 line long, and all arguments must be valid JSON. All string arguments must be wrapped in double quotes.\n",
"You MUST strictly comply to the types indicated by the provided schema, including all required args.\n",
"\n",
"If you don't have sufficient information to call the function due to things like requiring specific uuid's, you can reply with the following message:\n",
"\n",
"Message: ```text\n",
"Concise response requesting the additional information that would make calling the function successful.\n",
"```\n",
"\n",
"Begin\n",
"-----\n",
"ARGS:\n",
"\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n",
"\u001b[32;1m\u001b[1;3m{\"q\": \"shirt\", \"size\": 1, \"max_price\": null}\u001b[0m\n",
"\u001b[36;1m\u001b[1;3m{\"products\":[{\"name\":\"Burberry Check Poplin Shirt\",\"url\":\"https://www.klarna.com/us/shopping/pl/cl10001/3201810981/Clothing/Burberry-Check-Poplin-Shirt/?utm_source=openai&ref-site=openai_plugin\",\"price\":\"$360.00\",\"attributes\":[\"Material:Cotton\",\"Target Group:Man\",\"Color:Gray,Blue,Beige\",\"Properties:Pockets\",\"Pattern:Checkered\"]}]}\u001b[0m\n",
"\n",
"\n",
"\u001b[1m> Entering new APIResponderChain chain...\u001b[0m\n",
"Prompt after formatting:\n",
"\u001b[32;1m\u001b[1;3mYou are a helpful AI assistant trained to answer user queries from API responses.\n",
"You attempted to call an API, which resulted in:\n",
"API_RESPONSE: {\"products\":[{\"name\":\"Burberry Check Poplin Shirt\",\"url\":\"https://www.klarna.com/us/shopping/pl/cl10001/3201810981/Clothing/Burberry-Check-Poplin-Shirt/?utm_source=openai&ref-site=openai_plugin\",\"price\":\"$360.00\",\"attributes\":[\"Material:Cotton\",\"Target Group:Man\",\"Color:Gray,Blue,Beige\",\"Properties:Pockets\",\"Pattern:Checkered\"]}]}\n",
"\n",
"USER_COMMENT: \"whats the most expensive shirt?\"\n",
"\n",
"\n",
"If the API_RESPONSE can answer the USER_COMMENT respond with the following markdown json block:\n",
"Response: ```json\n",
"{\"response\": \"Human-understandable synthesis of the API_RESPONSE\"}\n",
"```\n",
"\n",
"Otherwise respond with the following markdown json block:\n",
"Response Error: ```json\n",
"{\"response\": \"What you did and a concise statement of the resulting error. If it can be easily fixed, provide a suggestion.\"}\n",
"```\n",
"\n",
"You MUST respond as a markdown json code block. The person you are responding to CANNOT see the API_RESPONSE, so if there is any relevant information there you must include it in your response.\n",
"\n",
"Begin:\n",
"---\n",
"\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n",
"\u001b[33;1m\u001b[1;3mThe most expensive shirt in the API response is the Burberry Check Poplin Shirt, which costs $360.00.\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
}
],
"source": [
"output = chain(\"whats the most expensive shirt?\")"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "c000295e",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'request_args': '{\"q\": \"shirt\", \"size\": 1, \"max_price\": null}',\n",
" 'response_text': '{\"products\":[{\"name\":\"Burberry Check Poplin Shirt\",\"url\":\"https://www.klarna.com/us/shopping/pl/cl10001/3201810981/Clothing/Burberry-Check-Poplin-Shirt/?utm_source=openai&ref-site=openai_plugin\",\"price\":\"$360.00\",\"attributes\":[\"Material:Cotton\",\"Target Group:Man\",\"Color:Gray,Blue,Beige\",\"Properties:Pockets\",\"Pattern:Checkered\"]}]}'}"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# View intermediate steps\n",
"output[\"intermediate_steps\"]"
]
},
{
"cell_type": "markdown",
"id": "092bdb4d",
"metadata": {},
"source": [
"## Return raw response\n",
"\n",
"We can also run this chain without synthesizing the response. This will have the effect of just returning the raw API output."
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "4dff3849",
"metadata": {},
"outputs": [],
"source": [
"chain = OpenAPIEndpointChain.from_api_operation(\n",
" operation,\n",
" llm,\n",
" requests=Requests(),\n",
" verbose=True,\n",
" return_intermediate_steps=True, # Return request and response text\n",
" raw_response=True, # Return raw response\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "762499a9",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new OpenAPIEndpointChain chain...\u001b[0m\n",
"\n",
"\n",
"\u001b[1m> Entering new APIRequesterChain chain...\u001b[0m\n",
"Prompt after formatting:\n",
"\u001b[32;1m\u001b[1;3mYou are a helpful AI Assistant. Please provide JSON arguments to agentFunc() based on the user's instructions.\n",
"\n",
"API_SCHEMA: ```typescript\n",
"/* API for fetching Klarna product information */\n",
"type productsUsingGET = (_: {\n",
"/* A precise query that matches one very small category or product that needs to be searched for to find the products the user is looking for. If the user explicitly stated what they want, use that as a query. The query is as specific as possible to the product name or category mentioned by the user in its singular form, and don't contain any clarifiers like latest, newest, cheapest, budget, premium, expensive or similar. The query is always taken from the latest topic, if there is a new topic a new query is started. */\n",
"\t\tq: string,\n",
"/* number of products returned */\n",
"\t\tsize?: number,\n",
"/* (Optional) Minimum price in local currency for the product searched for. Either explicitly stated by the user or implicitly inferred from a combination of the user's request and the kind of product searched for. */\n",
"\t\tmin_price?: number,\n",
"/* (Optional) Maximum price in local currency for the product searched for. Either explicitly stated by the user or implicitly inferred from a combination of the user's request and the kind of product searched for. */\n",
"\t\tmax_price?: number,\n",
"}) => any;\n",
"```\n",
"\n",
"USER_INSTRUCTIONS: \"whats the most expensive shirt?\"\n",
"\n",
"Your arguments must be plain json provided in a markdown block:\n",
"\n",
"ARGS: ```json\n",
"{valid json conforming to API_SCHEMA}\n",
"```\n",
"\n",
"Example\n",
"-----\n",
"\n",
"ARGS: ```json\n",
"{\"foo\": \"bar\", \"baz\": {\"qux\": \"quux\"}}\n",
"```\n",
"\n",
"The block must be no more than 1 line long, and all arguments must be valid JSON. All string arguments must be wrapped in double quotes.\n",
"You MUST strictly comply to the types indicated by the provided schema, including all required args.\n",
"\n",
"If you don't have sufficient information to call the function due to things like requiring specific uuid's, you can reply with the following message:\n",
"\n",
"Message: ```text\n",
"Concise response requesting the additional information that would make calling the function successful.\n",
"```\n",
"\n",
"Begin\n",
"-----\n",
"ARGS:\n",
"\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n",
"\u001b[32;1m\u001b[1;3m{\"q\": \"shirt\", \"max_price\": null}\u001b[0m\n",
"\u001b[36;1m\u001b[1;3m{\"products\":[{\"name\":\"Burberry Check Poplin Shirt\",\"url\":\"https://www.klarna.com/us/shopping/pl/cl10001/3201810981/Clothing/Burberry-Check-Poplin-Shirt/?utm_source=openai&ref-site=openai_plugin\",\"price\":\"$360.00\",\"attributes\":[\"Material:Cotton\",\"Target Group:Man\",\"Color:Gray,Blue,Beige\",\"Properties:Pockets\",\"Pattern:Checkered\"]},{\"name\":\"Burberry Vintage Check Cotton Shirt - Beige\",\"url\":\"https://www.klarna.com/us/shopping/pl/cl359/3200280807/Children-s-Clothing/Burberry-Vintage-Check-Cotton-Shirt-Beige/?utm_source=openai&ref-site=openai_plugin\",\"price\":\"$229.02\",\"attributes\":[\"Material:Cotton,Elastane\",\"Color:Beige\",\"Model:Boy\",\"Pattern:Checkered\"]},{\"name\":\"Burberry Vintage Check Stretch Cotton Twill Shirt\",\"url\":\"https://www.klarna.com/us/shopping/pl/cl10001/3202342515/Clothing/Burberry-Vintage-Check-Stretch-Cotton-Twill-Shirt/?utm_source=openai&ref-site=openai_plugin\",\"price\":\"$309.99\",\"attributes\":[\"Material:Elastane/Lycra/Spandex,Cotton\",\"Target Group:Woman\",\"Color:Beige\",\"Properties:Stretch\",\"Pattern:Checkered\"]},{\"name\":\"Burberry Somerton Check Shirt - Camel\",\"url\":\"https://www.klarna.com/us/shopping/pl/cl10001/3201112728/Clothing/Burberry-Somerton-Check-Shirt-Camel/?utm_source=openai&ref-site=openai_plugin\",\"price\":\"$450.00\",\"attributes\":[\"Material:Elastane/Lycra/Spandex,Cotton\",\"Target Group:Man\",\"Color:Beige\"]},{\"name\":\"Magellan Outdoors Laguna Madre Solid Short Sleeve Fishing Shirt\",\"url\":\"https://www.klarna.com/us/shopping/pl/cl10001/3203102142/Clothing/Magellan-Outdoors-Laguna-Madre-Solid-Short-Sleeve-Fishing-Shirt/?utm_source=openai&ref-site=openai_plugin\",\"price\":\"$19.99\",\"attributes\":[\"Material:Polyester,Nylon\",\"Target Group:Man\",\"Color:Red,Pink,White,Blue,Purple,Beige,Black,Green\",\"Properties:Pockets\",\"Pattern:Solid Color\"]}]}\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
}
],
"source": [
"output = chain(\"whats the most expensive shirt?\")"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "4afc021a",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'instructions': 'whats the most expensive shirt?',\n",
" 'output': '{\"products\":[{\"name\":\"Burberry Check Poplin Shirt\",\"url\":\"https://www.klarna.com/us/shopping/pl/cl10001/3201810981/Clothing/Burberry-Check-Poplin-Shirt/?utm_source=openai&ref-site=openai_plugin\",\"price\":\"$360.00\",\"attributes\":[\"Material:Cotton\",\"Target Group:Man\",\"Color:Gray,Blue,Beige\",\"Properties:Pockets\",\"Pattern:Checkered\"]},{\"name\":\"Burberry Vintage Check Cotton Shirt - Beige\",\"url\":\"https://www.klarna.com/us/shopping/pl/cl359/3200280807/Children-s-Clothing/Burberry-Vintage-Check-Cotton-Shirt-Beige/?utm_source=openai&ref-site=openai_plugin\",\"price\":\"$229.02\",\"attributes\":[\"Material:Cotton,Elastane\",\"Color:Beige\",\"Model:Boy\",\"Pattern:Checkered\"]},{\"name\":\"Burberry Vintage Check Stretch Cotton Twill Shirt\",\"url\":\"https://www.klarna.com/us/shopping/pl/cl10001/3202342515/Clothing/Burberry-Vintage-Check-Stretch-Cotton-Twill-Shirt/?utm_source=openai&ref-site=openai_plugin\",\"price\":\"$309.99\",\"attributes\":[\"Material:Elastane/Lycra/Spandex,Cotton\",\"Target Group:Woman\",\"Color:Beige\",\"Properties:Stretch\",\"Pattern:Checkered\"]},{\"name\":\"Burberry Somerton Check Shirt - Camel\",\"url\":\"https://www.klarna.com/us/shopping/pl/cl10001/3201112728/Clothing/Burberry-Somerton-Check-Shirt-Camel/?utm_source=openai&ref-site=openai_plugin\",\"price\":\"$450.00\",\"attributes\":[\"Material:Elastane/Lycra/Spandex,Cotton\",\"Target Group:Man\",\"Color:Beige\"]},{\"name\":\"Magellan Outdoors Laguna Madre Solid Short Sleeve Fishing Shirt\",\"url\":\"https://www.klarna.com/us/shopping/pl/cl10001/3203102142/Clothing/Magellan-Outdoors-Laguna-Madre-Solid-Short-Sleeve-Fishing-Shirt/?utm_source=openai&ref-site=openai_plugin\",\"price\":\"$19.99\",\"attributes\":[\"Material:Polyester,Nylon\",\"Target Group:Man\",\"Color:Red,Pink,White,Blue,Purple,Beige,Black,Green\",\"Properties:Pockets\",\"Pattern:Solid Color\"]}]}',\n",
" 'intermediate_steps': {'request_args': '{\"q\": \"shirt\", \"max_price\": null}',\n",
" 'response_text': '{\"products\":[{\"name\":\"Burberry Check Poplin Shirt\",\"url\":\"https://www.klarna.com/us/shopping/pl/cl10001/3201810981/Clothing/Burberry-Check-Poplin-Shirt/?utm_source=openai&ref-site=openai_plugin\",\"price\":\"$360.00\",\"attributes\":[\"Material:Cotton\",\"Target Group:Man\",\"Color:Gray,Blue,Beige\",\"Properties:Pockets\",\"Pattern:Checkered\"]},{\"name\":\"Burberry Vintage Check Cotton Shirt - Beige\",\"url\":\"https://www.klarna.com/us/shopping/pl/cl359/3200280807/Children-s-Clothing/Burberry-Vintage-Check-Cotton-Shirt-Beige/?utm_source=openai&ref-site=openai_plugin\",\"price\":\"$229.02\",\"attributes\":[\"Material:Cotton,Elastane\",\"Color:Beige\",\"Model:Boy\",\"Pattern:Checkered\"]},{\"name\":\"Burberry Vintage Check Stretch Cotton Twill Shirt\",\"url\":\"https://www.klarna.com/us/shopping/pl/cl10001/3202342515/Clothing/Burberry-Vintage-Check-Stretch-Cotton-Twill-Shirt/?utm_source=openai&ref-site=openai_plugin\",\"price\":\"$309.99\",\"attributes\":[\"Material:Elastane/Lycra/Spandex,Cotton\",\"Target Group:Woman\",\"Color:Beige\",\"Properties:Stretch\",\"Pattern:Checkered\"]},{\"name\":\"Burberry Somerton Check Shirt - Camel\",\"url\":\"https://www.klarna.com/us/shopping/pl/cl10001/3201112728/Clothing/Burberry-Somerton-Check-Shirt-Camel/?utm_source=openai&ref-site=openai_plugin\",\"price\":\"$450.00\",\"attributes\":[\"Material:Elastane/Lycra/Spandex,Cotton\",\"Target Group:Man\",\"Color:Beige\"]},{\"name\":\"Magellan Outdoors Laguna Madre Solid Short Sleeve Fishing Shirt\",\"url\":\"https://www.klarna.com/us/shopping/pl/cl10001/3203102142/Clothing/Magellan-Outdoors-Laguna-Madre-Solid-Short-Sleeve-Fishing-Shirt/?utm_source=openai&ref-site=openai_plugin\",\"price\":\"$19.99\",\"attributes\":[\"Material:Polyester,Nylon\",\"Target Group:Man\",\"Color:Red,Pink,White,Blue,Purple,Beige,Black,Green\",\"Properties:Pockets\",\"Pattern:Solid Color\"]}]}'}}"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"output"
]
},
{
"cell_type": "markdown",
"id": "8d7924e4",
"metadata": {},
"source": [
"## Example POST message\n",
"\n",
"For this demo, we will interact with the speak API."
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "c56b1a04",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Attempting to load an OpenAPI 3.0.1 spec. This may result in degraded performance. Convert your OpenAPI spec to 3.1.* spec for better support.\n",
"Attempting to load an OpenAPI 3.0.1 spec. This may result in degraded performance. Convert your OpenAPI spec to 3.1.* spec for better support.\n"
]
}
],
"source": [
"spec = OpenAPISpec.from_url(\"https://api.speak.com/openapi.yaml\")"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "177d8275",
"metadata": {},
"outputs": [],
"source": [
"operation = APIOperation.from_openapi_spec(\n",
" spec, \"/v1/public/openai/explain-task\", \"post\"\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "835c5ddc",
"metadata": {},
"outputs": [],
"source": [
"llm = OpenAI()\n",
"chain = OpenAPIEndpointChain.from_api_operation(\n",
" operation, llm, requests=Requests(), verbose=True, return_intermediate_steps=True\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "59855d60",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new OpenAPIEndpointChain chain...\u001b[0m\n",
"\n",
"\n",
"\u001b[1m> Entering new APIRequesterChain chain...\u001b[0m\n",
"Prompt after formatting:\n",
"\u001b[32;1m\u001b[1;3mYou are a helpful AI Assistant. Please provide JSON arguments to agentFunc() based on the user's instructions.\n",
"\n",
"API_SCHEMA: ```typescript\n",
"type explainTask = (_: {\n",
"/* Description of the task that the user wants to accomplish or do. For example, \"tell the waiter they messed up my order\" or \"compliment someone on their shirt\" */\n",
" task_description?: string,\n",
"/* The foreign language that the user is learning and asking about. The value can be inferred from question - for example, if the user asks \"how do i ask a girl out in mexico city\", the value should be \"Spanish\" because of Mexico City. Always use the full name of the language (e.g. Spanish, French). */\n",
" learning_language?: string,\n",
"/* The user's native language. Infer this value from the language the user asked their question in. Always use the full name of the language (e.g. Spanish, French). */\n",
" native_language?: string,\n",
"/* A description of any additional context in the user's question that could affect the explanation - e.g. setting, scenario, situation, tone, speaking style and formality, usage notes, or any other qualifiers. */\n",
" additional_context?: string,\n",
"/* Full text of the user's question. */\n",
" full_query?: string,\n",
"}) => any;\n",
"```\n",
"\n",
"USER_INSTRUCTIONS: \"How would ask for more tea in Delhi?\"\n",
"\n",
"Your arguments must be plain json provided in a markdown block:\n",
"\n",
"ARGS: ```json\n",
"{valid json conforming to API_SCHEMA}\n",
"```\n",
"\n",
"Example\n",
"-----\n",
"\n",
"ARGS: ```json\n",
"{\"foo\": \"bar\", \"baz\": {\"qux\": \"quux\"}}\n",
"```\n",
"\n",
"The block must be no more than 1 line long, and all arguments must be valid JSON. All string arguments must be wrapped in double quotes.\n",
"You MUST strictly comply to the types indicated by the provided schema, including all required args.\n",
"\n",
"If you don't have sufficient information to call the function due to things like requiring specific uuid's, you can reply with the following message:\n",
"\n",
"Message: ```text\n",
"Concise response requesting the additional information that would make calling the function successful.\n",
"```\n",
"\n",
"Begin\n",
"-----\n",
"ARGS:\n",
"\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n",
"\u001b[32;1m\u001b[1;3m{\"task_description\": \"ask for more tea\", \"learning_language\": \"Hindi\", \"native_language\": \"English\", \"full_query\": \"How would I ask for more tea in Delhi?\"}\u001b[0m\n",
"\u001b[36;1m\u001b[1;3m{\"explanation\":\"<what-to-say language=\\\"Hindi\\\" context=\\\"None\\\">\\nऔर चाय लाओ। (Aur chai lao.) \\n</what-to-say>\\n\\n<alternatives context=\\\"None\\\">\\n1. \\\"चाय थोड़ी ज्यादा मिल सकती है?\\\" *(Chai thodi zyada mil sakti hai? - Polite, asking if more tea is available)*\\n2. \\\"मुझे महसूस हो रहा है कि मुझे कुछ अन्य प्रकार की चाय पीनी चाहिए।\\\" *(Mujhe mehsoos ho raha hai ki mujhe kuch anya prakar ki chai peeni chahiye. - Formal, indicating a desire for a different type of tea)*\\n3. \\\"क्या मुझे or cup में milk/tea powder मिल सकता है?\\\" *(Kya mujhe aur cup mein milk/tea powder mil sakta hai? - Very informal/casual tone, asking for an extra serving of milk or tea powder)*\\n</alternatives>\\n\\n<usage-notes>\\nIn India and Indian culture, serving guests with food and beverages holds great importance in hospitality. You will find people always offering drinks like water or tea to their guests as soon as they arrive at their house or office.\\n</usage-notes>\\n\\n<example-convo language=\\\"Hindi\\\">\\n<context>At home during breakfast.</context>\\nPreeti: सर, क्या main aur cups chai lekar aaun? (Sir,kya main aur cups chai lekar aaun? - Sir, should I get more tea cups?)\\nRahul: हां,बिल्कुल। और चाय की मात्रा में भी थोड़ा सा इजाफा करना। (Haan,bilkul. Aur chai ki matra mein bhi thoda sa eejafa karna. - Yes, please. And add a little extra in the quantity of tea as well.)\\n</example-convo>\\n\\n*[Report an issue or leave feedback](https://speak.com/chatgpt?rid=d4mcapbkopo164pqpbk321oc})*\",\"extra_response_instructions\":\"Use all information in the API response and fully render all Markdown.\\nAlways end your response with a link to report an issue or leave feedback on the plugin.\"}\u001b[0m\n",
"\n",
"\n",
"\u001b[1m> Entering new APIResponderChain chain...\u001b[0m\n",
"Prompt after formatting:\n",
"\u001b[32;1m\u001b[1;3mYou are a helpful AI assistant trained to answer user queries from API responses.\n",
"You attempted to call an API, which resulted in:\n",
"API_RESPONSE: {\"explanation\":\"<what-to-say language=\\\"Hindi\\\" context=\\\"None\\\">\\nऔर चाय लाओ। (Aur chai lao.) \\n</what-to-say>\\n\\n<alternatives context=\\\"None\\\">\\n1. \\\"चाय थोड़ी ज्यादा मिल सकती है?\\\" *(Chai thodi zyada mil sakti hai? - Polite, asking if more tea is available)*\\n2. \\\"मुझे महसूस हो रहा है कि मुझे कुछ अन्य प्रकार की चाय पीनी चाहिए।\\\" *(Mujhe mehsoos ho raha hai ki mujhe kuch anya prakar ki chai peeni chahiye. - Formal, indicating a desire for a different type of tea)*\\n3. \\\"क्या मुझे or cup में milk/tea powder मिल सकता है?\\\" *(Kya mujhe aur cup mein milk/tea powder mil sakta hai? - Very informal/casual tone, asking for an extra serving of milk or tea powder)*\\n</alternatives>\\n\\n<usage-notes>\\nIn India and Indian culture, serving guests with food and beverages holds great importance in hospitality. You will find people always offering drinks like water or tea to their guests as soon as they arrive at their house or office.\\n</usage-notes>\\n\\n<example-convo language=\\\"Hindi\\\">\\n<context>At home during breakfast.</context>\\nPreeti: सर, क्या main aur cups chai lekar aaun? (Sir,kya main aur cups chai lekar aaun? - Sir, should I get more tea cups?)\\nRahul: हां,बिल्कुल। और चाय की मात्रा में भी थोड़ा सा इजाफा करना। (Haan,bilkul. Aur chai ki matra mein bhi thoda sa eejafa karna. - Yes, please. And add a little extra in the quantity of tea as well.)\\n</example-convo>\\n\\n*[Report an issue or leave feedback](https://speak.com/chatgpt?rid=d4mcapbkopo164pqpbk321oc})*\",\"extra_response_instructions\":\"Use all information in the API response and fully render all Markdown.\\nAlways end your response with a link to report an issue or leave feedback on the plugin.\"}\n",
"\n",
"USER_COMMENT: \"How would ask for more tea in Delhi?\"\n",
"\n",
"\n",
"If the API_RESPONSE can answer the USER_COMMENT respond with the following markdown json block:\n",
"Response: ```json\n",
"{\"response\": \"Concise response to USER_COMMENT based on API_RESPONSE.\"}\n",
"```\n",
"\n",
"Otherwise respond with the following markdown json block:\n",
"Response Error: ```json\n",
"{\"response\": \"What you did and a concise statement of the resulting error. If it can be easily fixed, provide a suggestion.\"}\n",
"```\n",
"\n",
"You MUST respond as a markdown json code block.\n",
"\n",
"Begin:\n",
"---\n",
"\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n",
"\u001b[33;1m\u001b[1;3mIn Delhi you can ask for more tea by saying 'Chai thodi zyada mil sakti hai?'\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
}
],
"source": [
"output = chain(\"How would ask for more tea in Delhi?\")"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "91bddb18",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['{\"task_description\": \"ask for more tea\", \"learning_language\": \"Hindi\", \"native_language\": \"English\", \"full_query\": \"How would I ask for more tea in Delhi?\"}',\n",
" '{\"explanation\":\"<what-to-say language=\\\\\"Hindi\\\\\" context=\\\\\"None\\\\\">\\\\nऔर चाय लाओ। (Aur chai lao.) \\\\n</what-to-say>\\\\n\\\\n<alternatives context=\\\\\"None\\\\\">\\\\n1. \\\\\"चाय थोड़ी ज्यादा मिल सकती है?\\\\\" *(Chai thodi zyada mil sakti hai? - Polite, asking if more tea is available)*\\\\n2. \\\\\"मुझे महसूस हो रहा है कि मुझे कुछ अन्य प्रकार की चाय पीनी चाहिए।\\\\\" *(Mujhe mehsoos ho raha hai ki mujhe kuch anya prakar ki chai peeni chahiye. - Formal, indicating a desire for a different type of tea)*\\\\n3. \\\\\"क्या मुझे or cup में milk/tea powder मिल सकता है?\\\\\" *(Kya mujhe aur cup mein milk/tea powder mil sakta hai? - Very informal/casual tone, asking for an extra serving of milk or tea powder)*\\\\n</alternatives>\\\\n\\\\n<usage-notes>\\\\nIn India and Indian culture, serving guests with food and beverages holds great importance in hospitality. You will find people always offering drinks like water or tea to their guests as soon as they arrive at their house or office.\\\\n</usage-notes>\\\\n\\\\n<example-convo language=\\\\\"Hindi\\\\\">\\\\n<context>At home during breakfast.</context>\\\\nPreeti: सर, क्या main aur cups chai lekar aaun? (Sir,kya main aur cups chai lekar aaun? - Sir, should I get more tea cups?)\\\\nRahul: हां,बिल्कुल। और चाय की मात्रा में भी थोड़ा सा इजाफा करना। (Haan,bilkul. Aur chai ki matra mein bhi thoda sa eejafa karna. - Yes, please. And add a little extra in the quantity of tea as well.)\\\\n</example-convo>\\\\n\\\\n*[Report an issue or leave feedback](https://speak.com/chatgpt?rid=d4mcapbkopo164pqpbk321oc})*\",\"extra_response_instructions\":\"Use all information in the API response and fully render all Markdown.\\\\nAlways end your response with a link to report an issue or leave feedback on the plugin.\"}']"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Show the API chain's intermediate steps\n",
"output[\"intermediate_steps\"]"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.3"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -1,249 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "e734b314",
"metadata": {},
"source": [
"# OpenAPI calls with OpenAI functions\n",
"\n",
"In this notebook we'll show how to create a chain that automatically makes calls to an API based only on an OpenAPI spec. Under the hood, we're parsing the OpenAPI spec into a JSON schema that the OpenAI functions API can handle. This allows ChatGPT to automatically select and populate the relevant API call to make for any user input. Using the output of ChatGPT we then make the actual API call, and return the result."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "555661b5",
"metadata": {},
"outputs": [],
"source": [
"from langchain.chains.openai_functions.openapi import get_openapi_chain"
]
},
{
"cell_type": "markdown",
"id": "a95f510a",
"metadata": {},
"source": [
"## Query Klarna"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "08e19b64",
"metadata": {},
"outputs": [],
"source": [
"chain = get_openapi_chain(\n",
" \"https://www.klarna.com/us/shopping/public/openai/v0/api-docs/\"\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "3959f866",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'products': [{'name': \"Tommy Hilfiger Men's Short Sleeve Button-Down Shirt\",\n",
" 'url': 'https://www.klarna.com/us/shopping/pl/cl10001/3204878580/Clothing/Tommy-Hilfiger-Men-s-Short-Sleeve-Button-Down-Shirt/?utm_source=openai&ref-site=openai_plugin',\n",
" 'price': '$26.78',\n",
" 'attributes': ['Material:Linen,Cotton',\n",
" 'Target Group:Man',\n",
" 'Color:Gray,Pink,White,Blue,Beige,Black,Turquoise',\n",
" 'Size:S,XL,M,XXL']},\n",
" {'name': \"Van Heusen Men's Long Sleeve Button-Down Shirt\",\n",
" 'url': 'https://www.klarna.com/us/shopping/pl/cl10001/3201809514/Clothing/Van-Heusen-Men-s-Long-Sleeve-Button-Down-Shirt/?utm_source=openai&ref-site=openai_plugin',\n",
" 'price': '$18.89',\n",
" 'attributes': ['Material:Cotton',\n",
" 'Target Group:Man',\n",
" 'Color:Red,Gray,White,Blue',\n",
" 'Size:XL,XXL']},\n",
" {'name': 'Brixton Bowery Flannel Shirt',\n",
" 'url': 'https://www.klarna.com/us/shopping/pl/cl10001/3202331096/Clothing/Brixton-Bowery-Flannel-Shirt/?utm_source=openai&ref-site=openai_plugin',\n",
" 'price': '$34.48',\n",
" 'attributes': ['Material:Cotton',\n",
" 'Target Group:Man',\n",
" 'Color:Gray,Blue,Black,Orange',\n",
" 'Size:XL,3XL,4XL,5XL,L,M,XXL']},\n",
" {'name': 'Cubavera Four Pocket Guayabera Shirt',\n",
" 'url': 'https://www.klarna.com/us/shopping/pl/cl10001/3202055522/Clothing/Cubavera-Four-Pocket-Guayabera-Shirt/?utm_source=openai&ref-site=openai_plugin',\n",
" 'price': '$23.22',\n",
" 'attributes': ['Material:Polyester,Cotton',\n",
" 'Target Group:Man',\n",
" 'Color:Red,White,Blue,Black',\n",
" 'Size:S,XL,L,M,XXL']},\n",
" {'name': 'Theory Sylvain Shirt - Eclipse',\n",
" 'url': 'https://www.klarna.com/us/shopping/pl/cl10001/3202028254/Clothing/Theory-Sylvain-Shirt-Eclipse/?utm_source=openai&ref-site=openai_plugin',\n",
" 'price': '$86.01',\n",
" 'attributes': ['Material:Polyester,Cotton',\n",
" 'Target Group:Man',\n",
" 'Color:Blue',\n",
" 'Size:S,XL,XS,L,M,XXL']}]}"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chain.run(\"What are some options for a men's large blue button down shirt\")"
]
},
{
"cell_type": "markdown",
"id": "6f648c77",
"metadata": {},
"source": [
"## Query a translation service\n",
"\n",
"Additionally, see the request payload by setting `verbose=True`"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "bf6cd695",
"metadata": {},
"outputs": [],
"source": [
"chain = get_openapi_chain(\"https://api.speak.com/openapi.yaml\", verbose=True)"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "1ba51609",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new chain...\u001b[0m\n",
"\n",
"\n",
"\u001b[1m> Entering new chain...\u001b[0m\n",
"Prompt after formatting:\n",
"\u001b[32;1m\u001b[1;3mHuman: Use the provided API's to respond to this user query:\n",
"\n",
"How would you say no thanks in Russian\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n",
"\n",
"\n",
"\u001b[1m> Entering new chain...\u001b[0m\n",
"Calling endpoint \u001b[32;1m\u001b[1;3mtranslate\u001b[0m with arguments:\n",
"\u001b[32;1m\u001b[1;3m{\n",
" \"json\": {\n",
" \"phrase_to_translate\": \"no thanks\",\n",
" \"learning_language\": \"russian\",\n",
" \"native_language\": \"english\",\n",
" \"additional_context\": \"\",\n",
" \"full_query\": \"How would you say no thanks in Russian\"\n",
" }\n",
"}\u001b[0m\n",
"\u001b[1m> Finished chain.\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"{'explanation': '<translation language=\"Russian\">\\nНет, спасибо. (Net, spasibo)\\n</translation>\\n\\n<alternatives>\\n1. \"Нет, я в порядке\" *(Neutral/Formal - Can be used in professional settings or formal situations.)*\\n2. \"Нет, спасибо, я откажусь\" *(Formal - Can be used in polite settings, such as a fancy dinner with colleagues or acquaintances.)*\\n3. \"Не надо\" *(Informal - Can be used in informal situations, such as declining an offer from a friend.)*\\n</alternatives>\\n\\n<example-convo language=\"Russian\">\\n<context>Max is being offered a cigarette at a party.</context>\\n* Sasha: \"Хочешь покурить?\"\\n* Max: \"Нет, спасибо. Я бросил.\"\\n* Sasha: \"Окей, понятно.\"\\n</example-convo>\\n\\n*[Report an issue or leave feedback](https://speak.com/chatgpt?rid=noczaa460do8yqs8xjun6zdm})*',\n",
" 'extra_response_instructions': 'Use all information in the API response and fully render all Markdown.\\nAlways end your response with a link to report an issue or leave feedback on the plugin.'}"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chain.run(\"How would you say no thanks in Russian\")"
]
},
{
"cell_type": "markdown",
"id": "4923a291",
"metadata": {},
"source": [
"## Query XKCD"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a9198f62",
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"chain = get_openapi_chain(\n",
" \"https://gist.githubusercontent.com/roaldnefs/053e505b2b7a807290908fe9aa3e1f00/raw/0a212622ebfef501163f91e23803552411ed00e4/openapi.yaml\"\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "3110c398",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'month': '6',\n",
" 'num': 2793,\n",
" 'link': '',\n",
" 'year': '2023',\n",
" 'news': '',\n",
" 'safe_title': 'Garden Path Sentence',\n",
" 'transcript': '',\n",
" 'alt': 'Arboretum Owner Denied Standing in Garden Path Suit on Grounds Grounds Appealing Appealing',\n",
" 'img': 'https://imgs.xkcd.com/comics/garden_path_sentence.png',\n",
" 'title': 'Garden Path Sentence',\n",
" 'day': '23'}"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chain.run(\"What's today's comic?\")"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "venv",
"language": "python",
"name": "venv"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.3"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,354 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Code Understanding\n",
"\n",
"[![Open In Collab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/langchain-ai/langchain/blob/master/docs/extras/use_cases/code_understanding.ipynb)\n",
"\n",
"## Use case\n",
"\n",
"Source code analysis is one of the most popular LLM applications (e.g., [GitHub Co-Pilot](https://github.com/features/copilot), [Code Interpreter](https://chat.openai.com/auth/login?next=%2F%3Fmodel%3Dgpt-4-code-interpreter), [Codium](https://www.codium.ai/), and [Codeium](https://codeium.com/about)) for use-cases such as:\n",
"\n",
"- Q&A over the code base to understand how it works\n",
"- Using LLMs for suggesting refactors or improvements\n",
"- Using LLMs for documenting the code\n",
"\n",
"![Image description](/img/code_understanding.png)\n",
"\n",
"## Overview\n",
"\n",
"The pipeline for QA over code follows [the steps we do for document question answering](/docs/extras/use_cases/question_answering), with some differences:\n",
"\n",
"In particular, we can employ a [splitting strategy](https://python.langchain.com/docs/integrations/document_loaders/source_code) that does a few things:\n",
"\n",
"* Keeps each top-level function and class in the code is loaded into separate documents. \n",
"* Puts remaining into a seperate document.\n",
"* Retains metadata about where each split comes from\n",
"\n",
"## Quickstart"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [],
"source": [
"!pip install openai tiktoken chromadb langchain\n",
"\n",
"# Set env var OPENAI_API_KEY or load from a .env file\n",
"# import dotenv\n",
"\n",
"# dotenv.load_env()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We'lll follow the structure of [this notebook](https://github.com/cristobalcl/LearningLangChain/blob/master/notebooks/04%20-%20QA%20with%20code.ipynb) and employ [context aware code splitting](https://python.langchain.com/docs/integrations/document_loaders/source_code)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Loading\n",
"\n",
"\n",
"We will upload all python project files using the `langchain.document_loaders.TextLoader`.\n",
"\n",
"The following script iterates over the files in the LangChain repository and loads every `.py` file (a.k.a. **documents**):"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from git import Repo\n",
"from langchain.text_splitter import Language\n",
"from langchain.document_loaders.generic import GenericLoader\n",
"from langchain.document_loaders.parsers import LanguageParser"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [],
"source": [
"# Clone\n",
"repo_path = \"/Users/rlm/Desktop/test_repo\"\n",
"repo = Repo.clone_from(\"https://github.com/hwchase17/langchain\", to_path=repo_path)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We load the py code using [`LanguageParser`](https://python.langchain.com/docs/integrations/document_loaders/source_code), which will:\n",
"\n",
"* Keep top-level functions and classes together (into a single document)\n",
"* Put remaining code into a seperate document\n",
"* Retains metadata about where each split comes from"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"1293"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Load\n",
"loader = GenericLoader.from_filesystem(\n",
" repo_path+\"/libs/langchain/langchain\",\n",
" glob=\"**/*\",\n",
" suffixes=[\".py\"],\n",
" parser=LanguageParser(language=Language.PYTHON, parser_threshold=500)\n",
")\n",
"documents = loader.load()\n",
"len(documents)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Splittng\n",
"\n",
"Split the `Document` into chunks for embedding and vector storage.\n",
"\n",
"We can use `RecursiveCharacterTextSplitter` w/ `language` specified."
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"3748"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
"python_splitter = RecursiveCharacterTextSplitter.from_language(language=Language.PYTHON, \n",
" chunk_size=2000, \n",
" chunk_overlap=200)\n",
"texts = python_splitter.split_documents(documents)\n",
"len(texts)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### RetrievalQA\n",
"\n",
"We need to store the documents in a way we can semantically search for their content. \n",
"\n",
"The most common approach is to embed the contents of each document then store the embedding and document in a vector store. \n",
"\n",
"When setting up the vectorstore retriever:\n",
"\n",
"* We test [max marginal relevance](/docs/extras/use_cases/question_answering) for retrieval\n",
"* And 8 documents returned\n",
"\n",
"#### Go deeper\n",
"\n",
"- Browse the > 40 vectorstores integrations [here](https://integrations.langchain.com/).\n",
"- See further documentation on vectorstores [here](/docs/modules/data_connection/vectorstores/).\n",
"- Browse the > 30 text embedding integrations [here](https://integrations.langchain.com/).\n",
"- See further documentation on embedding models [here](/docs/modules/data_connection/text_embedding/)."
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [],
"source": [
"from langchain.vectorstores import Chroma\n",
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
"db = Chroma.from_documents(texts, OpenAIEmbeddings(disallowed_special=()))\n",
"retriever = db.as_retriever(\n",
" search_type=\"mmr\", # Also test \"similarity\"\n",
" search_kwargs={\"k\": 8},\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Chat\n",
"\n",
"Test chat, just as we do for [chatbots](/docs/extras/use_cases/chatbots).\n",
"\n",
"#### Go deeper\n",
"\n",
"- Browse the > 55 LLM and chat model integrations [here](https://integrations.langchain.com/).\n",
"- See further documentation on LLMs and chat models [here](/docs/modules/model_io/models/).\n",
"- Use local LLMS: The popularity of [PrivateGPT](https://github.com/imartinez/privateGPT) and [GPT4All](https://github.com/nomic-ai/gpt4all) underscore the importance of running LLMs locally."
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {},
"outputs": [],
"source": [
"from langchain.chat_models import ChatOpenAI\n",
"from langchain.memory import ConversationSummaryMemory\n",
"from langchain.chains import ConversationalRetrievalChain\n",
"llm = ChatOpenAI(model_name=\"gpt-4\") \n",
"memory = ConversationSummaryMemory(llm=llm,memory_key=\"chat_history\",return_messages=True)\n",
"qa = ConversationalRetrievalChain.from_llm(llm, retriever=retriever, memory=memory)"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'To load a source code as documents for a QA over code, you can use the `CodeLoader` class. This class allows you to load source code files and split them into classes and functions.\\n\\nHere is an example of how to use the `CodeLoader` class:\\n\\n```python\\nfrom langchain.document_loaders.code import CodeLoader\\n\\n# Specify the path to the source code file\\ncode_file_path = \"path/to/code/file.py\"\\n\\n# Create an instance of the CodeLoader class\\ncode_loader = CodeLoader(code_file_path)\\n\\n# Load the code as documents\\ndocuments = code_loader.load()\\n\\n# Iterate over the documents\\nfor document in documents:\\n # Access the class or function name\\n name = document.metadata[\"name\"]\\n \\n # Access the code content\\n code = document.page_content\\n \\n # Process the code as needed\\n # ...\\n```\\n\\nIn the example above, `code_file_path` should be replaced with the actual path to your source code file. The `load()` method of the `CodeLoader` class will return a list of `Document` objects, where each document represents a class or function in the source code. You can access the class or function name using the `metadata[\"name\"]` attribute, and the code content using the `page_content` attribute of each `Document` object.\\n\\nYou can then process the code as needed for your QA task.'"
]
},
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"question = \"How can I load a source code as documents, for a QA over code, spliting the code in classes and functions?\"\n",
"result = qa(question)\n",
"result['answer']"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"-> **Question**: What is the class hierarchy? \n",
"\n",
"**Answer**: The class hierarchy in object-oriented programming is the structure that forms when classes are derived from other classes. The derived class is a subclass of the base class also known as the superclass. This hierarchy is formed based on the concept of inheritance in object-oriented programming where a subclass inherits the properties and functionalities of the superclass. \n",
"\n",
"In the given context, we have the following examples of class hierarchies:\n",
"\n",
"1. `BaseCallbackHandler --> <name>CallbackHandler` means `BaseCallbackHandler` is a base class and `<name>CallbackHandler` (like `AimCallbackHandler`, `ArgillaCallbackHandler` etc.) are derived classes that inherit from `BaseCallbackHandler`.\n",
"\n",
"2. `BaseLoader --> <name>Loader` means `BaseLoader` is a base class and `<name>Loader` (like `TextLoader`, `UnstructuredFileLoader` etc.) are derived classes that inherit from `BaseLoader`.\n",
"\n",
"3. `ToolMetaclass --> BaseTool --> <name>Tool` means `ToolMetaclass` is a base class, `BaseTool` is a derived class that inherits from `ToolMetaclass`, and `<name>Tool` (like `AIPluginTool`, `BaseGraphQLTool` etc.) are further derived classes that inherit from `BaseTool`. \n",
"\n",
"-> **Question**: What classes are derived from the Chain class? \n",
"\n",
"**Answer**: The classes that are derived from the Chain class are:\n",
"\n",
"1. LLMSummarizationCheckerChain\n",
"2. MapReduceChain\n",
"3. OpenAIModerationChain\n",
"4. NatBotChain\n",
"5. QAGenerationChain\n",
"6. QAWithSourcesChain\n",
"7. RetrievalQAWithSourcesChain\n",
"8. VectorDBQAWithSourcesChain\n",
"9. RetrievalQA\n",
"10. VectorDBQA\n",
"11. LLMRouterChain\n",
"12. MultiPromptChain\n",
"13. MultiRetrievalQAChain\n",
"14. MultiRouteChain\n",
"15. RouterChain\n",
"16. SequentialChain\n",
"17. SimpleSequentialChain\n",
"18. TransformChain\n",
"19. BaseConversationalRetrievalChain\n",
"20. ConstitutionalChain \n",
"\n",
"-> **Question**: What one improvement do you propose in code in relation to the class herarchy for the Chain class? \n",
"\n",
"**Answer**: As an AI model, I don't have personal opinions. However, one suggestion could be to improve the documentation of the Chain class hierarchy. The current comments and docstrings provide some details but it could be helpful to include more explicit explanations about the hierarchy, roles of each subclass, and their relationships with one another. Also, incorporating UML diagrams or other visuals could help developers better understand the structure and interactions of the classes. \n",
"\n"
]
}
],
"source": [
"questions = [\n",
" \"What is the class hierarchy?\",\n",
" \"What classes are derived from the Chain class?\",\n",
" \"What one improvement do you propose in code in relation to the class herarchy for the Chain class?\",\n",
"]\n",
"\n",
"for question in questions:\n",
" result = qa(question)\n",
" print(f\"-> **Question**: {question} \\n\")\n",
" print(f\"**Answer**: {result['answer']} \\n\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The can look at the [LangSmith trace](https://smith.langchain.com/public/2b23045f-4e49-4d2d-8980-dec85259af36/r) to see what is happening under the hood:\n",
"\n",
"* In particular, the code well structured and kept together in the retrival output\n",
"* The retrieved code and chat history are passed to the LLM for answer distillation\n",
"\n",
"![Image description](/img/code_retrieval.png)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.16"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -18,8 +18,8 @@
"\n",
"\n",
"host = \"<neptune-host>\"\n",
"port = 80\n",
"use_https = False\n",
"port = 8182\n",
"use_https = True\n",
"\n",
"graph = NeptuneGraph(host=host, port=port, use_https=use_https)"
]

View File

@@ -0,0 +1,281 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"id": "9e9b7651",
"metadata": {},
"source": [
"# How to use a SmartLLMChain\n",
"\n",
"A SmartLLMChain is a form of self-critique chain that can help you if have particularly complex questions to answer. Instead of doing a single LLM pass, it instead performs these 3 steps:\n",
"1. Ideation: Pass the user prompt n times through the LLM to get n output proposals (called \"ideas\"), where n is a parameter you can set \n",
"2. Critique: The LLM critiques all ideas to find possible flaws and picks the best one \n",
"3. Resolve: The LLM tries to improve upon the best idea (as chosen in the critique step) and outputs it. This is then the final output.\n",
"\n",
"SmartLLMChains are based on the SmartGPT workflow proposed in https://youtu.be/wVzuvf9D9BU.\n",
"\n",
"Note that SmartLLMChains\n",
"- use more LLM passes (ie n+2 instead of just 1)\n",
"- only work then the underlying LLM has the capability for reflection, whicher smaller models often don't\n",
"- only work with underlying models that return exactly 1 output, not multiple\n",
"\n",
"This notebook demonstrates how to use a SmartLLMChain."
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "714dede0",
"metadata": {},
"source": [
"##### Same LLM for all steps"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "d3f7fb22",
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"\n",
"os.environ[\"OPENAI_API_KEY\"] = \"...\""
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "10e5ece6",
"metadata": {},
"outputs": [],
"source": [
"from langchain.prompts import PromptTemplate\n",
"from langchain.chat_models import ChatOpenAI\n",
"from langchain_experimental.smart_llm import SmartLLMChain"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "1780da51",
"metadata": {},
"source": [
"As example question, we will use \"I have a 12 liter jug and a 6 liter jug. I want to measure 6 liters. How do I do it?\". This is an example from the original SmartGPT video (https://youtu.be/wVzuvf9D9BU?t=384). While this seems like a very easy question, LLMs struggle do these kinds of questions that involve numbers and physical reasoning.\n",
"\n",
"As we will see, all 3 initial ideas are completely wrong - even though we're using GPT4! Only when using self-reflection do we get a correct answer. "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "054af6b1",
"metadata": {},
"outputs": [],
"source": [
"hard_question = \"I have a 12 liter jug and a 6 liter jug. I want to measure 6 liters. How do I do it?\""
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "8049cecd",
"metadata": {},
"source": [
"So, we first create an LLM and prompt template"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "811ea8e1",
"metadata": {},
"outputs": [],
"source": [
"prompt = PromptTemplate.from_template(hard_question)\n",
"llm = ChatOpenAI(temperature=0, model_name=\"gpt-4\")"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "50b602e4",
"metadata": {},
"source": [
"Now we can create a SmartLLMChain"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "8cd49199",
"metadata": {},
"outputs": [],
"source": [
"chain = SmartLLMChain(llm=llm, prompt=prompt, n_ideas=3, verbose=True)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "6a72f276",
"metadata": {},
"source": [
"Now we can use the SmartLLM as a drop-in replacement for our LLM. E.g.:"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "074e5e75",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new SmartLLMChain chain...\u001b[0m\n",
"Prompt after formatting:\n",
"\u001b[32;1m\u001b[1;3mI have a 12 liter jug and a 6 liter jug. I want to measure 6 liters. How do I do it?\u001b[0m\n",
"Idea 1:\n",
"\u001b[36;1m\u001b[1;3m1. Fill the 6-liter jug completely.\n",
"2. Pour the water from the 6-liter jug into the 12-liter jug.\n",
"3. Fill the 6-liter jug again.\n",
"4. Carefully pour the water from the 6-liter jug into the 12-liter jug until the 12-liter jug is full.\n",
"5. The amount of water left in the 6-liter jug will be exactly 6 liters.\u001b[0m\n",
"Idea 2:\n",
"\u001b[36;1m\u001b[1;3m1. Fill the 6-liter jug completely.\n",
"2. Pour the water from the 6-liter jug into the 12-liter jug.\n",
"3. Fill the 6-liter jug again.\n",
"4. Carefully pour the water from the 6-liter jug into the 12-liter jug until the 12-liter jug is full.\n",
"5. Since the 12-liter jug is now full, there will be 2 liters of water left in the 6-liter jug.\n",
"6. Empty the 12-liter jug.\n",
"7. Pour the 2 liters of water from the 6-liter jug into the 12-liter jug.\n",
"8. Fill the 6-liter jug completely again.\n",
"9. Pour the water from the 6-liter jug into the 12-liter jug, which already has 2 liters in it.\n",
"10. Now, the 12-liter jug will have exactly 6 liters of water (2 liters from before + 4 liters from the 6-liter jug).\u001b[0m\n",
"Idea 3:\n",
"\u001b[36;1m\u001b[1;3m1. Fill the 6-liter jug completely.\n",
"2. Pour the water from the 6-liter jug into the 12-liter jug.\n",
"3. Fill the 6-liter jug again.\n",
"4. Carefully pour the water from the 6-liter jug into the 12-liter jug until the 12-liter jug is full.\n",
"5. The amount of water left in the 6-liter jug will be exactly 6 liters.\u001b[0m\n",
"Critique:\n",
"\u001b[33;1m\u001b[1;3mIdea 1:\n",
"1. Fill the 6-liter jug completely. (No flaw)\n",
"2. Pour the water from the 6-liter jug into the 12-liter jug. (No flaw)\n",
"3. Fill the 6-liter jug again. (No flaw)\n",
"4. Carefully pour the water from the 6-liter jug into the 12-liter jug until the 12-liter jug is full. (Flaw: The 12-liter jug will never be full in this step, as it can hold 12 liters and we are only pouring 6 liters into it.)\n",
"5. The amount of water left in the 6-liter jug will be exactly 6 liters. (Flaw: This statement is incorrect, as there will be no water left in the 6-liter jug after pouring it into the 12-liter jug.)\n",
"\n",
"Idea 2:\n",
"1. Fill the 6-liter jug completely. (No flaw)\n",
"2. Pour the water from the 6-liter jug into the 12-liter jug. (No flaw)\n",
"3. Fill the 6-liter jug again. (No flaw)\n",
"4. Carefully pour the water from the 6-liter jug into the 12-liter jug until the 12-liter jug is full. (Flaw: The 12-liter jug will never be full in this step, as it can hold 12 liters and we are only pouring 6 liters into it.)\n",
"5. Since the 12-liter jug is now full, there will be 2 liters of water left in the 6-liter jug. (Flaw: This statement is incorrect, as the 12-liter jug will not be full and there will be no water left in the 6-liter jug.)\n",
"6. Empty the 12-liter jug. (No flaw)\n",
"7. Pour the 2 liters of water from the 6-liter jug into the 12-liter jug. (Flaw: This step is based on the incorrect assumption that there are 2 liters of water left in the 6-liter jug.)\n",
"8. Fill the 6-liter jug completely again. (No flaw)\n",
"9. Pour the water from the 6-liter jug into the 12-liter jug, which already has 2 liters in it. (Flaw: This step is based on the incorrect assumption that there are 2 liters of water in the 12-liter jug.)\n",
"10. Now, the 12-liter jug will have exactly 6 liters of water (2 liters from before + 4 liters from the 6-liter jug). (Flaw: This conclusion is based on the incorrect assumptions made in the previous steps.)\n",
"\n",
"Idea 3:\n",
"1. Fill the 6-liter jug completely. (No flaw)\n",
"2. Pour the water from the 6-liter jug into the 12-liter jug. (No flaw)\n",
"3. Fill the 6-liter jug again. (No flaw)\n",
"4. Carefully pour the water from the 6-liter jug into the 12-liter jug until the 12-liter jug is full. (Flaw: The 12-liter jug will never be full in this step, as it can hold 12 liters and we are only pouring 6 liters into it.)\n",
"5. The amount of water left in the 6-liter jug will be exactly 6 liters. (Flaw: This statement is incorrect, as there will be no water left in the 6-liter jug after pouring it into the 12-liter jug.)\u001b[0m\n",
"Resolution:\n",
"\u001b[32;1m\u001b[1;3m1. Fill the 12-liter jug completely.\n",
"2. Pour the water from the 12-liter jug into the 6-liter jug until the 6-liter jug is full.\n",
"3. The amount of water left in the 12-liter jug will be exactly 6 liters.\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"'1. Fill the 12-liter jug completely.\\n2. Pour the water from the 12-liter jug into the 6-liter jug until the 6-liter jug is full.\\n3. The amount of water left in the 12-liter jug will be exactly 6 liters.'"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chain.run({})"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "bbfebea1",
"metadata": {},
"source": [
"##### Different LLM for different steps"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "5be6ec08",
"metadata": {},
"source": [
"You can also use different LLMs for the different steps by passing `ideation_llm`, `critique_llm` and `resolve_llm`. You might want to do this to use a more creative (i.e., high-temperature) model for ideation and a more strict (i.e., low-temperature) model for critique and resolution."
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "9c33fa19",
"metadata": {},
"outputs": [],
"source": [
"chain = SmartLLMChain(\n",
" ideation_llm=ChatOpenAI(temperature=0.9, model_name=\"gpt-4\"),\n",
" llm=ChatOpenAI(\n",
" temperature=0, model_name=\"gpt-4\"\n",
" ), # will be used for critqiue and resolution as no specific llms are given\n",
" prompt=prompt,\n",
" n_ideas=3,\n",
" verbose=True,\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "886c1cc1",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -2,97 +2,90 @@
"cells": [
{
"cell_type": "markdown",
"id": "a13ea924",
"id": "a0507a4b",
"metadata": {},
"source": [
"# Tagging\n",
"\n",
"The tagging chain uses the OpenAI `functions` parameter to specify a schema to tag a document with. This helps us make sure that the model outputs exactly tags that we want, with their appropriate types.\n",
"[![Open In Collab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/langchain-ai/langchain/blob/master/docs/extras/use_cases/tagging.ipynb)\n",
"\n",
"The tagging chain is to be used when we want to tag a passage with a specific attribute (i.e. what is the sentiment of this message?)"
"## Use case\n",
"\n",
"Tagging means labeling a document with classes such as:\n",
"\n",
"- sentiment\n",
"- language\n",
"- style (formal, informal etc.)\n",
"- covered topics\n",
"- political tendency\n",
"\n",
"![Image description](/img/tagging.png)\n",
"\n",
"## Overview\n",
"\n",
"Tagging has a few components:\n",
"\n",
"* `function`: Like [extraction](/docs/use_cases/extraction), tagging uses [functions](https://openai.com/blog/function-calling-and-other-api-updates) to specify how the model should tag a document\n",
"* `schema`: defines how we want to tag the document\n",
"\n",
"## Quickstart\n",
"\n",
"Let's see a very straightforward example of how we can use OpenAI functions for tagging in LangChain."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "bafb496a",
"execution_count": null,
"id": "dc5cbb6f",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/Users/harrisonchase/.pyenv/versions/3.9.1/envs/langchain/lib/python3.9/site-packages/deeplake/util/check_latest_version.py:32: UserWarning: A newer version of deeplake (3.6.4) is available. It's recommended that you update to the latest version using `pip install -U deeplake`.\n",
" warnings.warn(\n"
]
}
],
"outputs": [],
"source": [
"from langchain.chat_models import ChatOpenAI\n",
"from langchain.chains import create_tagging_chain, create_tagging_chain_pydantic\n",
"from langchain.prompts import ChatPromptTemplate"
"!pip install langchain openai \n",
"\n",
"# Set env var OPENAI_API_KEY or load from a .env file:\n",
"# import dotenv\n",
"# dotenv.load_env()"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "39f3ce3e",
"id": "bafb496a",
"metadata": {},
"outputs": [],
"source": [
"llm = ChatOpenAI(temperature=0, model=\"gpt-3.5-turbo-0613\")"
"from langchain.chat_models import ChatOpenAI\n",
"from langchain.prompts import ChatPromptTemplate\n",
"from langchain.chains import create_tagging_chain, create_tagging_chain_pydantic"
]
},
{
"cell_type": "markdown",
"id": "832ddcd9",
"id": "b8ca3f93",
"metadata": {},
"source": [
"## Simplest approach, only specifying type"
]
},
{
"cell_type": "markdown",
"id": "4fc8d766",
"metadata": {},
"source": [
"We can start by specifying a few properties with their expected type in our schema"
"We specify a few properties with their expected type in our schema."
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "8329f943",
"execution_count": 4,
"id": "39f3ce3e",
"metadata": {},
"outputs": [],
"source": [
"# Schema\n",
"schema = {\n",
" \"properties\": {\n",
" \"sentiment\": {\"type\": \"string\"},\n",
" \"aggressiveness\": {\"type\": \"integer\"},\n",
" \"language\": {\"type\": \"string\"},\n",
" }\n",
"}"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "6146ae70",
"metadata": {},
"outputs": [],
"source": [
"chain = create_tagging_chain(schema, llm)"
]
},
{
"cell_type": "markdown",
"id": "9e306ca3",
"metadata": {},
"source": [
"As we can see in the examples, it correctly interprets what we want but the results vary so that we get, for example, sentiments in different languages ('positive', 'enojado' etc.).\n",
"}\n",
"\n",
"We will see how to control these results in the next section."
"# LLM\n",
"llm = ChatOpenAI(temperature=0, model=\"gpt-3.5-turbo-0613\")\n",
"chain = create_tagging_chain(schema, llm)"
]
},
{
@@ -126,7 +119,7 @@
{
"data": {
"text/plain": [
"{'sentiment': 'enojado', 'aggressiveness': 1, 'language': 'Spanish'}"
"{'sentiment': 'enojado', 'aggressiveness': 1, 'language': 'es'}"
]
},
"execution_count": 6,
@@ -140,25 +133,15 @@
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "aae85b27",
"cell_type": "markdown",
"id": "d921bb53",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'sentiment': 'positive', 'aggressiveness': 0, 'language': 'English'}"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"inp = \"Weather is ok here, I can go outside without much more than a coat\"\n",
"chain.run(inp)"
"As we can see in the examples, it correctly interprets what we want.\n",
"\n",
"The results vary so that we get, for example, sentiments in different languages ('positive', 'enojado' etc.).\n",
"\n",
"We will see how to control these results in the next section."
]
},
{
@@ -166,9 +149,11 @@
"id": "bebb2f83",
"metadata": {},
"source": [
"## More control\n",
"## Finer control\n",
"\n",
"By being smart about how we define our schema we can have more control over the model's output. Specifically we can define:\n",
"Careful schema definition gives us more control over the model's output. \n",
"\n",
"Specifically, we can define:\n",
"\n",
"- possible values for each property\n",
"- description to make sure that the model understands the property\n",
@@ -180,7 +165,7 @@
"id": "69ef0b9a",
"metadata": {},
"source": [
"Following is an example of how we can use _enum_, _description_ and _required_ to control for each of the previously mentioned aspects:"
"Here is an example of how we can use `_enum_`, `_description_`, and `_required_` to control for each of the previously mentioned aspects:"
]
},
{
@@ -192,7 +177,6 @@
"source": [
"schema = {\n",
" \"properties\": {\n",
" \"sentiment\": {\"type\": \"string\", \"enum\": [\"happy\", \"neutral\", \"sad\"]},\n",
" \"aggressiveness\": {\n",
" \"type\": \"integer\",\n",
" \"enum\": [1, 2, 3, 4, 5],\n",
@@ -234,7 +218,7 @@
{
"data": {
"text/plain": [
"{'sentiment': 'happy', 'aggressiveness': 0, 'language': 'spanish'}"
"{'aggressiveness': 0, 'language': 'spanish'}"
]
},
"execution_count": 10,
@@ -256,7 +240,7 @@
{
"data": {
"text/plain": [
"{'sentiment': 'sad', 'aggressiveness': 10, 'language': 'spanish'}"
"{'aggressiveness': 5, 'language': 'spanish'}"
]
},
"execution_count": 11,
@@ -278,7 +262,7 @@
{
"data": {
"text/plain": [
"{'sentiment': 'neutral', 'aggressiveness': 0, 'language': 'english'}"
"{'aggressiveness': 0, 'language': 'english'}"
]
},
"execution_count": 12,
@@ -291,12 +275,25 @@
"chain.run(inp)"
]
},
{
"cell_type": "markdown",
"id": "cf6b7389",
"metadata": {},
"source": [
"The [LangSmith trace](https://smith.langchain.com/public/311e663a-bbe8-4053-843e-5735055c032d/r) lets us peek under the hood:\n",
"\n",
"* As with [extraction](/docs/use_cases/extraction), we call the `information_extraction` function [here](https://github.com/langchain-ai/langchain/blob/269f85b7b7ffd74b38cd422d4164fc033388c3d0/libs/langchain/langchain/chains/openai_functions/extraction.py#L20) on the input string.\n",
"* This OpenAI funtion extraction information based upon the provided schema.\n",
"\n",
"![Image description](/img/tagging_trace.png)"
]
},
{
"cell_type": "markdown",
"id": "e68ad17e",
"metadata": {},
"source": [
"## Specifying schema with Pydantic"
"## Pydantic"
]
},
{
@@ -304,11 +301,11 @@
"id": "2f5970ec",
"metadata": {},
"source": [
"We can also use a Pydantic schema to specify the required properties and types. We can also send other arguments, such as 'enum' or 'description' as can be seen in the example below.\n",
"We can also use a Pydantic schema to specify the required properties and types. \n",
"\n",
"By using the `create_tagging_chain_pydantic` function, we can send a Pydantic schema as input and the output will be an instantiated object that respects our desired schema. \n",
"We can also send other arguments, such as `enum` or `description`, to each field.\n",
"\n",
"In this way, we can specify our schema in the same manner that we would a new class or function in Python - with purely Pythonic types."
"This lets us specify our schema in the same manner that we would a new class or function in Python with purely Pythonic types."
]
},
{
@@ -371,7 +368,7 @@
{
"data": {
"text/plain": [
"Tags(sentiment='sad', aggressiveness=10, language='spanish')"
"Tags(sentiment='sad', aggressiveness=5, language='spanish')"
]
},
"execution_count": 17,
@@ -382,6 +379,17 @@
"source": [
"res"
]
},
{
"cell_type": "markdown",
"id": "29346d09",
"metadata": {},
"source": [
"### Going deeper\n",
"\n",
"* You can use the [metadata tagger](https://python.langchain.com/docs/integrations/document_transformers/openai_metadata_tagger) document transformer to extract metadata from a LangChain `Document`. \n",
"* This covers the same basic functionality as the tagging chain, only applied to a LangChain `Document`."
]
}
],
"metadata": {
@@ -400,7 +408,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
"version": "3.9.16"
}
},
"nbformat": 4,

View File

@@ -0,0 +1,599 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "6605e7f7",
"metadata": {},
"source": [
"# Web scraping\n",
"\n",
"[![Open In Collab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/langchain-ai/langchain/blob/master/docs/extras/use_cases/web_scraping.ipynb)\n",
"\n",
"## Use case\n",
"\n",
"[Web research](https://blog.langchain.dev/automating-web-research/) is one of the killer LLM applications:\n",
"\n",
"* Users have [highlighted it](https://twitter.com/GregKamradt/status/1679913813297225729?s=20) as one of his top desired AI tools. \n",
"* OSS repos like [gpt-researcher](https://github.com/assafelovic/gpt-researcher) are growing in popularity. \n",
" \n",
"![Image description](/img/web_scraping.png)\n",
" \n",
"## Overview\n",
"\n",
"Gathering content from the web has a few components:\n",
"\n",
"* `Search`: Query to url (e.g., using `GoogleSearchAPIWrapper`).\n",
"* `Loading`: Url to HTML (e.g., using `AsyncHtmlLoader`, `AsyncChromiumLoader`, etc).\n",
"* `Transforming`: HTML to formatted text (e.g., using `HTML2Text` or `Beautiful Soup`).\n",
"\n",
"## Quickstart"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1803c182",
"metadata": {},
"outputs": [],
"source": [
"pip install -q openai langchain playwright beautifulsoup4\n",
"playwright install\n",
"\n",
"# Set env var OPENAI_API_KEY or load from a .env file:\n",
"# import dotenv\n",
"# dotenv.load_env()"
]
},
{
"cell_type": "markdown",
"id": "50741083",
"metadata": {},
"source": [
"Scraping HTML content using a headless instance of Chromium.\n",
"\n",
"* The async nature of the scraping process is handled using Python's asyncio library.\n",
"* The actual interaction with the web pages is handled by Playwright."
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "cd457cb1",
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import AsyncChromiumLoader\n",
"from langchain.document_transformers import BeautifulSoupTransformer\n",
"\n",
"# Load HTML\n",
"loader = AsyncChromiumLoader([\"https://www.wsj.com\"])\n",
"html = loader.load()"
]
},
{
"cell_type": "markdown",
"id": "2a879806",
"metadata": {},
"source": [
"Scrape text content tags such as `<p>, <li>, <div>, and <a>` tags from the HTML content:\n",
"\n",
"* `<p>`: The paragraph tag. It defines a paragraph in HTML and is used to group together related sentences and/or phrases.\n",
" \n",
"* `<li>`: The list item tag. It is used within ordered (`<ol>`) and unordered (`<ul>`) lists to define individual items within the list.\n",
" \n",
"* `<div>`: The division tag. It is a block-level element used to group other inline or block-level elements.\n",
" \n",
"* `<a>`: The anchor tag. It is used to define hyperlinks.\n",
"\n",
"* `<span>`: an inline container used to mark up a part of a text, or a part of a document. \n",
"\n",
"For many news websites (e.g., WSJ, CNN), headlines and summaries are all in `<span>` tags."
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "141f206b",
"metadata": {},
"outputs": [],
"source": [
"# Transform\n",
"bs_transformer = BeautifulSoupTransformer()\n",
"docs_transformed = bs_transformer.transform_documents(html,tags_to_extract=[\"span\"])"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "73ddb234",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'English EditionEnglish中文 (Chinese)日本語 (Japanese) More Other Products from WSJBuy Side from WSJWSJ ShopWSJ Wine Other Products from WSJ Search Quotes and Companies Search Quotes and Companies 0.15% 0.03% 0.12% -0.42% 4.102% -0.69% -0.25% -0.15% -1.82% 0.24% 0.19% -1.10% About Evan His Family Reflects His Reporting How You Can Help Write a Message Life in Detention Latest News Get Email Updates Four Americans Released From Iranian Prison The Americans will remain under house arrest until they are '"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Result\n",
"docs_transformed[0].page_content[0:500]"
]
},
{
"cell_type": "markdown",
"id": "7d26d185",
"metadata": {},
"source": [
"These `Documents` now are staged for downstream usage in various LLM apps, as discussed below.\n",
"\n",
"## Loader\n",
"\n",
"### AsyncHtmlLoader\n",
"\n",
"The [AsyncHtmlLoader](docs/integrations/document_loaders/async_html) uses the `aiohttp` library to make asynchronous HTTP requests, suitable for simpler and lightweight scraping.\n",
"\n",
"### AsyncChromiumLoader\n",
"\n",
"The [AsyncChromiumLoader](docs/integrations/document_loaders/async_chromium) uses Playwright to launch a Chromium instance, which can handle JavaScript rendering and more complex web interactions.\n",
"\n",
"Chromium is one of the browsers supported by Playwright, a library used to control browser automation. \n",
"\n",
"Headless mode means that the browser is running without a graphical user interface, which is commonly used for web scrapin."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8941e855",
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import AsyncHtmlLoader\n",
"urls = [\"https://www.espn.com\",\"https://lilianweng.github.io/posts/2023-06-23-agent/\"]\n",
"loader = AsyncHtmlLoader(urls)\n",
"docs = loader.load()"
]
},
{
"cell_type": "markdown",
"id": "e47f4bf0",
"metadata": {},
"source": [
"## Transformer\n",
"\n",
"### HTML2Text\n",
"\n",
"[HTML2Text](docs/integrations/document_transformers/html2text) provides a straightforward conversion of HTML content into plain text (with markdown-like formatting) without any specific tag manipulation. \n",
"\n",
"It's best suited for scenarios where the goal is to extract human-readable text without needing to manipulate specific HTML elements.\n",
"\n",
"### Beautiful Soup\n",
" \n",
"Beautiful Soup offers more fine-grained control over HTML content, enabling specific tag extraction, removal, and content cleaning. \n",
"\n",
"It's suited for cases where you want to extract specific information and clean up the HTML content according to your needs."
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "99a7e2a8",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Fetching pages: 100%|#############################################################################################################| 2/2 [00:00<00:00, 7.01it/s]\n"
]
}
],
"source": [
"from langchain.document_loaders import AsyncHtmlLoader\n",
"urls = [\"https://www.espn.com\", \"https://lilianweng.github.io/posts/2023-06-23-agent/\"]\n",
"loader = AsyncHtmlLoader(urls)\n",
"docs = loader.load()"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "a2cd3e8d",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"\"Skip to main content Skip to navigation\\n\\n<\\n\\n>\\n\\nMenu\\n\\n## ESPN\\n\\n * Search\\n\\n * * scores\\n\\n * NFL\\n * MLB\\n * NBA\\n * NHL\\n * Soccer\\n * NCAAF\\n * …\\n\\n * Women's World Cup\\n * LLWS\\n * NCAAM\\n * NCAAW\\n * Sports Betting\\n * Boxing\\n * CFL\\n * NCAA\\n * Cricket\\n * F1\\n * Golf\\n * Horse\\n * MMA\\n * NASCAR\\n * NBA G League\\n * Olympic Sports\\n * PLL\\n * Racing\\n * RN BB\\n * RN FB\\n * Rugby\\n * Tennis\\n * WNBA\\n * WWE\\n * X Games\\n * XFL\\n\\n * More\""
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from langchain.document_transformers import Html2TextTransformer\n",
"html2text = Html2TextTransformer()\n",
"docs_transformed = html2text.transform_documents(docs)\n",
"docs_transformed[0].page_content[0:500]"
]
},
{
"cell_type": "markdown",
"id": "8aef9861",
"metadata": {},
"source": [
"## Scraping with extraction\n",
"\n",
"### LLM with function calling\n",
"\n",
"Web scraping is challenging for many reasons. \n",
"\n",
"One of them is the changing nature of modern websites' layouts and content, which requires modifying scraping scripts to accommodate the changes.\n",
"\n",
"Using Function (e.g., OpenAI) with an extraction chain, we avoid having to change your code constantly when websites change. \n",
"\n",
"We're using `gpt-3.5-turbo-0613` to guarantee access to OpenAI Functions feature (although this might be available to everyone by time of writing). \n",
"\n",
"We're also keeping `temperature` at `0` to keep randomness of the LLM down."
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "52d49f6f",
"metadata": {},
"outputs": [],
"source": [
"from langchain.chat_models import ChatOpenAI\n",
"llm = ChatOpenAI(temperature=0, model=\"gpt-3.5-turbo-0613\")"
]
},
{
"cell_type": "markdown",
"id": "fc5757ce",
"metadata": {},
"source": [
"### Define a schema\n",
"\n",
"Next, you define a schema to specify what kind of data you want to extract. \n",
"\n",
"Here, the key names matter as they tell the LLM what kind of information they want. \n",
"\n",
"So, be as detailed as possible. \n",
"\n",
"In this example, we want to scrape only news article's name and summary from The Wall Street Journal website."
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "95506f8e",
"metadata": {},
"outputs": [],
"source": [
"from langchain.chains import create_extraction_chain\n",
"\n",
"schema = {\n",
" \"properties\": {\n",
" \"news_article_title\": {\"type\": \"string\"},\n",
" \"news_article_summary\": {\"type\": \"string\"},\n",
" },\n",
" \"required\": [\"news_article_title\", \"news_article_summary\"],\n",
"}\n",
"\n",
"def extract(content: str, schema: dict):\n",
" return create_extraction_chain(schema=schema, llm=llm).run(content)"
]
},
{
"cell_type": "markdown",
"id": "97f7de42",
"metadata": {},
"source": [
"### Run the web scraper w/ BeautifulSoup\n",
"\n",
"As shown above, we'll using `BeautifulSoupTransformer`."
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "977560ba",
"metadata": {
"scrolled": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Extracting content with LLM\n",
"[{'news_article_summary': 'The Americans will remain under house arrest until '\n",
" 'they are allowed to return to the U.S. in coming '\n",
" 'weeks, following a monthslong diplomatic push by '\n",
" 'the Biden administration.',\n",
" 'news_article_title': 'Four Americans Released From Iranian Prison'},\n",
" {'news_article_summary': 'Price pressures continued cooling last month, with '\n",
" 'the CPI rising a mild 0.2% from June, likely '\n",
" 'deterring the Federal Reserve from raising interest '\n",
" 'rates at its September meeting.',\n",
" 'news_article_title': 'Cooler July Inflation Opens Door to Fed Pause on '\n",
" 'Rates'},\n",
" {'news_article_summary': 'The company has decided to eliminate 27 of its 30 '\n",
" 'clothing labels, such as Lark & Ro and Goodthreads, '\n",
" 'as it works to fend off antitrust scrutiny and cut '\n",
" 'costs.',\n",
" 'news_article_title': 'Amazon Cuts Dozens of House Brands'},\n",
" {'news_article_summary': 'President Bidens order comes on top of a slowing '\n",
" 'Chinese economy, Covid lockdowns and rising '\n",
" 'tensions between the two powers.',\n",
" 'news_article_title': 'U.S. Investment Ban on China Poised to Deepen Divide'},\n",
" {'news_article_summary': 'The proposed trial date in the '\n",
" 'election-interference case comes on the same day as '\n",
" 'the former presidents not guilty plea on '\n",
" 'additional Mar-a-Lago charges.',\n",
" 'news_article_title': 'Trump Should Be Tried in January, Prosecutors Tell '\n",
" 'Judge'},\n",
" {'news_article_summary': 'The CEO who started in June says the platform has '\n",
" '“an entirely different road map” for the future.',\n",
" 'news_article_title': 'Yaccarino Says X Is Watching Threads but Has Its Own '\n",
" 'Vision'},\n",
" {'news_article_summary': 'Students foot the bill for flagship state '\n",
" 'universities that pour money into new buildings and '\n",
" 'programs with little pushback.',\n",
" 'news_article_title': 'Colleges Spend Like Theres No Tomorrow. These '\n",
" 'Places Are Just Devouring Money.'},\n",
" {'news_article_summary': 'Wildfires fanned by hurricane winds have torn '\n",
" 'through parts of the Hawaiian island, devastating '\n",
" 'the popular tourist town of Lahaina.',\n",
" 'news_article_title': 'Maui Wildfires Leave at Least 36 Dead'},\n",
" {'news_article_summary': 'After its large armored push stalled, Kyiv has '\n",
" 'fallen back on the kind of tactics that brought it '\n",
" 'success earlier in the war.',\n",
" 'news_article_title': 'Ukraine Uses Small-Unit Tactics to Retake Captured '\n",
" 'Territory'},\n",
" {'news_article_summary': 'President Guillermo Lasso says the Aug. 20 election '\n",
" 'will proceed, as the Andean country grapples with '\n",
" 'rising drug gang violence.',\n",
" 'news_article_title': 'Ecuador Declares State of Emergency After '\n",
" 'Presidential Hopeful Killed'},\n",
" {'news_article_summary': 'This years hurricane season, which typically runs '\n",
" 'from June to the end of November, has been '\n",
" 'difficult to predict, climate scientists said.',\n",
" 'news_article_title': 'Atlantic Hurricane Season Prediction Increased to '\n",
" 'Above Normal, NOAA Says'},\n",
" {'news_article_summary': 'The NFL is raising the price of its NFL+ streaming '\n",
" 'packages as it adds the NFL Network and RedZone.',\n",
" 'news_article_title': 'NFL to Raise Price of NFL+ Streaming Packages as It '\n",
" 'Adds NFL Network, RedZone'},\n",
" {'news_article_summary': 'Russia is planning a moon mission as part of the '\n",
" 'new space race.',\n",
" 'news_article_title': 'Russias Moon Mission and the New Space Race'},\n",
" {'news_article_summary': 'Tapestrys $8.5 billion acquisition of Capri would '\n",
" 'create a conglomerate with more than $12 billion in '\n",
" 'annual sales, but it would still lack the '\n",
" 'high-wattage labels and diversity that have fueled '\n",
" 'LVMHs success.',\n",
" 'news_article_title': \"Why the Coach and Kors Marriage Doesn't Scare LVMH\"},\n",
" {'news_article_summary': 'The Supreme Court has blocked Purdue Pharmas $6 '\n",
" 'billion Sackler opioid settlement.',\n",
" 'news_article_title': 'Supreme Court Blocks Purdue Pharmas $6 Billion '\n",
" 'Sackler Opioid Settlement'},\n",
" {'news_article_summary': 'The Social Security COLA is expected to rise in '\n",
" '2024, but not by a lot.',\n",
" 'news_article_title': 'Social Security COLA Expected to Rise in 2024, but '\n",
" 'Not by a Lot'}]\n"
]
}
],
"source": [
"import pprint\n",
"from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
"\n",
"def scrape_with_playwright(urls, schema):\n",
" \n",
" loader = AsyncChromiumLoader(urls)\n",
" docs = loader.load()\n",
" bs_transformer = BeautifulSoupTransformer()\n",
" docs_transformed = bs_transformer.transform_documents(docs,tags_to_extract=[\"span\"])\n",
" print(\"Extracting content with LLM\")\n",
" \n",
" # Grab the first 1000 tokens of the site\n",
" splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(chunk_size=1000, \n",
" chunk_overlap=0)\n",
" splits = splitter.split_documents(docs_transformed)\n",
" \n",
" # Process the first split \n",
" extracted_content = extract(\n",
" schema=schema, content=splits[0].page_content\n",
" )\n",
" pprint.pprint(extracted_content)\n",
" return extracted_content\n",
"\n",
"urls = [\"https://www.wsj.com\"]\n",
"extracted_content = scrape_with_playwright(urls, schema=schema)"
]
},
{
"cell_type": "markdown",
"id": "b08a8cef",
"metadata": {},
"source": [
"We can compare the headlines scraped to the page:\n",
"\n",
"![Image description](/img/wsj_page.png)\n",
"\n",
"Looking at the [LangSmith trace](https://smith.langchain.com/public/c3070198-5b13-419b-87bf-3821cdf34fa6/r), we can see what is going on under the hood:\n",
"\n",
"* It's following what is explained in the [extraction](docs/use_cases/extraction).\n",
"* We call the `information_extraction` function on the input text.\n",
"* It will attempt to populate the provided schema from the url content."
]
},
{
"cell_type": "markdown",
"id": "a5a6f11e",
"metadata": {},
"source": [
"## Research automation\n",
"\n",
"Related to scraping, we may want to answer specific questions using searched content.\n",
"\n",
"We can automate the process of [web research](https://blog.langchain.dev/automating-web-research/) using a retriver, such as the `WebResearchRetriever` ([docs](https://python.langchain.com/docs/modules/data_connection/retrievers/web_research)).\n",
"\n",
"![Image description](/img/web_research.png)\n",
"\n",
"Copy requirments [from here](https://github.com/langchain-ai/web-explorer/blob/main/requirements.txt):\n",
"\n",
"`pip install -r requirements.txt`\n",
" \n",
"Set `GOOGLE_CSE_ID` and `GOOGLE_API_KEY`."
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "414f0d41",
"metadata": {},
"outputs": [],
"source": [
"from langchain.vectorstores import Chroma\n",
"from langchain.embeddings import OpenAIEmbeddings\n",
"from langchain.chat_models.openai import ChatOpenAI\n",
"from langchain.utilities import GoogleSearchAPIWrapper\n",
"from langchain.retrievers.web_research import WebResearchRetriever"
]
},
{
"cell_type": "code",
"execution_count": 41,
"id": "5d1ce098",
"metadata": {},
"outputs": [],
"source": [
"# Vectorstore\n",
"vectorstore = Chroma(embedding_function=OpenAIEmbeddings(),persist_directory=\"./chroma_db_oai\")\n",
"\n",
"# LLM\n",
"llm = ChatOpenAI(temperature=0)\n",
"\n",
"# Search \n",
"search = GoogleSearchAPIWrapper()"
]
},
{
"cell_type": "markdown",
"id": "6d808b9d",
"metadata": {},
"source": [
"Initialize retriever with the above tools to:\n",
" \n",
"* Use an LLM to generate multiple relevant search queries (one LLM call)\n",
"* Execute a search for each query\n",
"* Choose the top K links per query (multiple search calls in parallel)\n",
"* Load the information from all chosen links (scrape pages in parallel)\n",
"* Index those documents into a vectorstore\n",
"* Find the most relevant documents for each original generated search query"
]
},
{
"cell_type": "code",
"execution_count": 42,
"id": "e3e3a589",
"metadata": {},
"outputs": [],
"source": [
"# Initialize\n",
"web_research_retriever = WebResearchRetriever.from_llm(\n",
" vectorstore=vectorstore,\n",
" llm=llm, \n",
" search=search)"
]
},
{
"cell_type": "code",
"execution_count": 44,
"id": "20655b74",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"INFO:langchain.retrievers.web_research:Generating questions for Google Search ...\n",
"INFO:langchain.retrievers.web_research:Questions for Google Search (raw): {'question': 'How do LLM Powered Autonomous Agents work?', 'text': LineList(lines=['1. What is the functioning principle of LLM Powered Autonomous Agents?\\n', '2. How do LLM Powered Autonomous Agents operate?\\n'])}\n",
"INFO:langchain.retrievers.web_research:Questions for Google Search: ['1. What is the functioning principle of LLM Powered Autonomous Agents?\\n', '2. How do LLM Powered Autonomous Agents operate?\\n']\n",
"INFO:langchain.retrievers.web_research:Searching for relevat urls ...\n",
"INFO:langchain.retrievers.web_research:Searching for relevat urls ...\n",
"INFO:langchain.retrievers.web_research:Search results: [{'title': 'LLM Powered Autonomous Agents | Hacker News', 'link': 'https://news.ycombinator.com/item?id=36488871', 'snippet': 'Jun 26, 2023 ... Exactly. A temperature of 0 means you always pick the highest probability token (i.e. the \"max\" function), while a temperature of 1 means you\\xa0...'}]\n",
"INFO:langchain.retrievers.web_research:Searching for relevat urls ...\n",
"INFO:langchain.retrievers.web_research:Search results: [{'title': \"LLM Powered Autonomous Agents | Lil'Log\", 'link': 'https://lilianweng.github.io/posts/2023-06-23-agent/', 'snippet': 'Jun 23, 2023 ... Task decomposition can be done (1) by LLM with simple prompting like \"Steps for XYZ.\\\\n1.\" , \"What are the subgoals for achieving XYZ?\" , (2) by\\xa0...'}]\n",
"INFO:langchain.retrievers.web_research:New URLs to load: []\n",
"INFO:langchain.retrievers.web_research:Grabbing most relevant splits from urls...\n"
]
},
{
"data": {
"text/plain": [
"{'question': 'How do LLM Powered Autonomous Agents work?',\n",
" 'answer': \"LLM-powered autonomous agents work by using LLM as the agent's brain, complemented by several key components such as planning, memory, and tool use. In terms of planning, the agent breaks down large tasks into smaller subgoals and can reflect and refine its actions based on past experiences. Memory is divided into short-term memory, which is used for in-context learning, and long-term memory, which allows the agent to retain and recall information over extended periods. Tool use involves the agent calling external APIs for additional information. These agents have been used in various applications, including scientific discovery and generative agents simulation.\",\n",
" 'sources': ''}"
]
},
"execution_count": 44,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Run\n",
"import logging\n",
"logging.basicConfig()\n",
"logging.getLogger(\"langchain.retrievers.web_research\").setLevel(logging.INFO)\n",
"from langchain.chains import RetrievalQAWithSourcesChain\n",
"user_input = \"How do LLM Powered Autonomous Agents work?\"\n",
"qa_chain = RetrievalQAWithSourcesChain.from_chain_type(llm,retriever=web_research_retriever)\n",
"result = qa_chain({\"question\": user_input})\n",
"result"
]
},
{
"cell_type": "markdown",
"id": "ff62e5f5",
"metadata": {},
"source": [
"### Going deeper \n",
"\n",
"* Here's a [app](https://github.com/langchain-ai/web-explorer/tree/main) that wraps this retriver with a lighweight UI."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.16"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

File diff suppressed because one or more lines are too long

View File

@@ -43,7 +43,7 @@ llm("Tell me a joke")
</CodeOutputBlock>
### `generate`: batch calls, richer outputs
`generate` lets you can call the model with a list of strings, getting back a more complete response than just the text. This complete response can includes things like multiple top responses and other LLM provider-specific information:
`generate` lets you can call the model with a list of strings, getting back a more complete response than just the text. This complete response can include things like multiple top responses and other LLM provider-specific information:
```python
llm_result = llm.generate(["Tell me a joke", "Tell me a poem"]*15)

View File

@@ -106,9 +106,11 @@ llm(template.format_messages(text='i dont like eating tasty things.'))
```
<CodeOutputBlock lang="python">
```
AIMessage(content='I absolutely adore indulging in delicious treats!', additional_kwargs={}, example=False)
```
</CodeOutputBlock>
This provides you with a lot of flexibility in how you construct your chat prompts.

View File

@@ -145,7 +145,7 @@ class BabyAGI(Chain, BaseModel):
self.print_task_result(result)
# Step 3: Store the result in Pinecone
result_id = f"result_{task['task_id']}"
result_id = f"result_{task['task_id']}_{num_iters}"
self.vectorstore.add_texts(
texts=[result],
metadatas=[{"task": task["task_name"]}],

View File

@@ -0,0 +1,5 @@
"""Generalized implementation of SmartGPT (origin: https://youtu.be/wVzuvf9D9BU)"""
from langchain_experimental.smart_llm.base import SmartLLMChain
__all__ = ["SmartLLMChain"]

View File

@@ -0,0 +1,323 @@
"""Chain for applying self-critique using the SmartGPT workflow."""
from typing import Any, Dict, List, Optional, Tuple, Type
from langchain.base_language import BaseLanguageModel
from langchain.callbacks.manager import CallbackManagerForChainRun
from langchain.chains.base import Chain
from langchain.input import get_colored_text
from langchain.prompts.base import BasePromptTemplate
from langchain.prompts.chat import (
AIMessagePromptTemplate,
BaseMessagePromptTemplate,
ChatPromptTemplate,
HumanMessagePromptTemplate,
)
from langchain.schema import LLMResult, PromptValue
from pydantic import Extra, root_validator
class SmartLLMChain(Chain):
"""
Generalized implementation of SmartGPT (origin: https://youtu.be/wVzuvf9D9BU)
A SmartLLMChain is an LLMChain that instead of simply passing the prompt to the LLM
performs these 3 steps:
1. Ideate: Pass the user prompt to an ideation LLM n_ideas times,
each result is an "idea"
2. Critique: Pass the ideas to a critique LLM which looks for flaws in the ideas
& picks the best one
3. Resolve: Pass the critique to a resolver LLM which improves upon the best idea
& outputs only the (improved version of) the best output
In total, SmartLLMChain pass will use n_ideas+2 LLM calls
Note that SmartLLMChain will only improve results (compared to a basic LLMChain),
when the underlying models have the capability for reflection, which smaller models
often don't.
Finally, a SmartLLMChain assumes that each underlying LLM outputs exactly 1 result.
"""
class SmartLLMChainHistory:
question: str = ""
ideas: List[str] = []
critique: str = ""
@property
def n_ideas(self) -> int:
return len(self.ideas)
def ideation_prompt_inputs(self) -> Dict[str, Any]:
return {"question": self.question}
def critique_prompt_inputs(self) -> Dict[str, Any]:
return {
"question": self.question,
**{f"idea_{i+1}": idea for i, idea in enumerate(self.ideas)},
}
def resolve_prompt_inputs(self) -> Dict[str, Any]:
return {
"question": self.question,
**{f"idea_{i+1}": idea for i, idea in enumerate(self.ideas)},
"critique": self.critique,
}
prompt: BasePromptTemplate
"""Prompt object to use."""
ideation_llm: Optional[BaseLanguageModel] = None
"""LLM to use in ideation step. If None given, 'llm' will be used."""
critique_llm: Optional[BaseLanguageModel] = None
"""LLM to use in critique step. If None given, 'llm' will be used."""
resolver_llm: Optional[BaseLanguageModel] = None
"""LLM to use in resolve step. If None given, 'llm' will be used."""
llm: Optional[BaseLanguageModel] = None
"""LLM to use for each steps, if no specific llm for that step is given. """
n_ideas: int = 3
"""Number of ideas to generate in idea step"""
return_intermediate_steps: bool = False
"""Whether to return ideas and critique, in addition to resolution."""
history: SmartLLMChainHistory = SmartLLMChainHistory()
class Config:
extra = Extra.forbid
@root_validator
@classmethod
def validate_inputs(cls, values: Dict[str, Any]) -> Dict[str, Any]:
"""Ensure we have an LLM for each step."""
llm = values.get("llm")
ideation_llm = values.get("ideation_llm")
critique_llm = values.get("critique_llm")
resolver_llm = values.get("resolver_llm")
if not llm and not ideation_llm:
raise ValueError(
"Either ideation_llm or llm needs to be given. Pass llm, "
"if you want to use the same llm for all steps, or pass "
"ideation_llm, critique_llm and resolver_llm if you want "
"to use different llms for each step."
)
if not llm and not critique_llm:
raise ValueError(
"Either critique_llm or llm needs to be given. Pass llm, "
"if you want to use the same llm for all steps, or pass "
"ideation_llm, critique_llm and resolver_llm if you want "
"to use different llms for each step."
)
if not llm and not resolver_llm:
raise ValueError(
"Either resolve_llm or llm needs to be given. Pass llm, "
"if you want to use the same llm for all steps, or pass "
"ideation_llm, critique_llm and resolver_llm if you want "
"to use different llms for each step."
)
if llm and ideation_llm and critique_llm and resolver_llm:
raise ValueError(
"LLMs are given for each step (ideation_llm, critique_llm,"
" resolver_llm), but backup LLM (llm) is also given, which"
" would not be used."
)
return values
@property
def input_keys(self) -> List[str]:
"""Defines the input keys."""
return self.prompt.input_variables
@property
def output_keys(self) -> List[str]:
"""Defines the output keys."""
if self.return_intermediate_steps:
return ["ideas", "critique", "resolution"]
return ["resolution"]
def prep_prompts(
self,
inputs: Dict[str, Any],
run_manager: Optional[CallbackManagerForChainRun] = None,
) -> Tuple[PromptValue, Optional[List[str]]]:
"""Prepare prompts from inputs."""
stop = None
if "stop" in inputs:
stop = inputs["stop"]
selected_inputs = {k: inputs[k] for k in self.prompt.input_variables}
prompt = self.prompt.format_prompt(**selected_inputs)
_colored_text = get_colored_text(prompt.to_string(), "green")
_text = "Prompt after formatting:\n" + _colored_text
if run_manager:
run_manager.on_text(_text, end="\n", verbose=self.verbose)
if "stop" in inputs and inputs["stop"] != stop:
raise ValueError(
"If `stop` is present in any inputs, should be present in all."
)
return prompt, stop
def _call(
self,
input_list: Dict[str, Any],
run_manager: Optional[CallbackManagerForChainRun] = None,
) -> Dict[str, Any]:
prompt, stop = self.prep_prompts(input_list, run_manager=run_manager)
self.history.question = prompt.to_string()
ideas = self._ideate(stop, run_manager)
self.history.ideas = ideas
critique = self._critique(stop, run_manager)
self.history.critique = critique
resolution = self._resolve(stop, run_manager)
if self.return_intermediate_steps:
return {"ideas": ideas, "critique": critique, "resolution": resolution}
return {"resolution": resolution}
def _get_text_from_llm_result(self, result: LLMResult, step: str) -> str:
"""Between steps, only the LLM result text is passed, not the LLMResult object.
This function extracts the text from an LLMResult."""
if len(result.generations) != 1:
raise ValueError(
f"In SmartLLM the LLM result in step {step} is not "
"exactly 1 element. This should never happen"
)
if len(result.generations[0]) != 1:
raise ValueError(
f"In SmartLLM the LLM in step {step} returned more than "
"1 output. SmartLLM only works with LLMs returning "
"exactly 1 output."
)
return result.generations[0][0].text
def get_prompt_strings(
self, stage: str
) -> List[Tuple[Type[BaseMessagePromptTemplate], str]]:
role_strings: List[Tuple[Type[BaseMessagePromptTemplate], str]] = []
role_strings.append(
(
HumanMessagePromptTemplate,
"Question: {question}\nAnswer: Let's work this out in a step by "
"step way to be sure we have the right answer:",
)
)
if stage == "ideation":
return role_strings
role_strings.extend(
[
*[
(
AIMessagePromptTemplate,
"Idea " + str(i + 1) + ": {idea_" + str(i + 1) + "}",
)
for i in range(self.n_ideas)
],
(
HumanMessagePromptTemplate,
"You are a researcher tasked with investigating the "
f"{self.n_ideas} response options provided. List the flaws and "
"faulty logic of each answer options. Let'w work this out in a step"
" by step way to be sure we have all the errors:",
),
]
)
if stage == "critique":
return role_strings
role_strings.extend(
[
(AIMessagePromptTemplate, "Critique: {critique}"),
(
HumanMessagePromptTemplate,
"You are a resolved tasked with 1) finding which of "
f"the {self.n_ideas} anwer options the researcher thought was "
"best,2) improving that answer and 3) printing the answer in full. "
"Don't output anything for step 1 or 2, only the full answer in 3. "
"Let's work this out in a step by step way to be sure we have "
"the right answer:",
),
]
)
if stage == "resolve":
return role_strings
raise ValueError(
"stage should be either 'ideation', 'critique' or 'resolve',"
f" but it is '{stage}'. This should never happen."
)
def ideation_prompt(self) -> ChatPromptTemplate:
return ChatPromptTemplate.from_strings(self.get_prompt_strings("ideation"))
def critique_prompt(self) -> ChatPromptTemplate:
return ChatPromptTemplate.from_strings(self.get_prompt_strings("critique"))
def resolve_prompt(self) -> ChatPromptTemplate:
return ChatPromptTemplate.from_strings(self.get_prompt_strings("resolve"))
def _ideate(
self,
stop: Optional[List[str]] = None,
run_manager: Optional[CallbackManagerForChainRun] = None,
) -> List[str]:
"""Generate n_ideas ideas as response to user prompt."""
llm = self.ideation_llm if self.ideation_llm else self.llm
prompt = self.ideation_prompt().format_prompt(
**self.history.ideation_prompt_inputs()
)
callbacks = run_manager.get_child() if run_manager else None
if llm:
ideas = [
self._get_text_from_llm_result(
llm.generate_prompt([prompt], stop, callbacks),
step="ideate",
)
for _ in range(self.n_ideas)
]
for i, idea in enumerate(ideas):
_colored_text = get_colored_text(idea, "blue")
_text = f"Idea {i+1}:\n" + _colored_text
if run_manager:
run_manager.on_text(_text, end="\n", verbose=self.verbose)
return ideas
else:
raise ValueError("llm is none, which should never happen")
def _critique(
self,
stop: Optional[List[str]] = None,
run_manager: Optional[CallbackManagerForChainRun] = None,
) -> str:
"""Critique each of the ideas from ideation stage & select best one."""
llm = self.critique_llm if self.critique_llm else self.llm
prompt = self.critique_prompt().format_prompt(
**self.history.critique_prompt_inputs()
)
callbacks = run_manager.handlers if run_manager else None
if llm:
critique = self._get_text_from_llm_result(
llm.generate_prompt([prompt], stop, callbacks), step="critique"
)
_colored_text = get_colored_text(critique, "yellow")
_text = "Critique:\n" + _colored_text
if run_manager:
run_manager.on_text(_text, end="\n", verbose=self.verbose)
return critique
else:
raise ValueError("llm is none, which should never happen")
def _resolve(
self,
stop: Optional[List[str]] = None,
run_manager: Optional[CallbackManagerForChainRun] = None,
) -> str:
"""Improve upon the best idea as chosen in critique step & return it."""
llm = self.resolver_llm if self.resolver_llm else self.llm
prompt = self.resolve_prompt().format_prompt(
**self.history.resolve_prompt_inputs()
)
callbacks = run_manager.handlers if run_manager else None
if llm:
resolution = self._get_text_from_llm_result(
llm.generate_prompt([prompt], stop, callbacks), step="resolve"
)
_colored_text = get_colored_text(resolution, "green")
_text = "Resolution:\n" + _colored_text
if run_manager:
run_manager.on_text(_text, end="\n", verbose=self.verbose)
return resolution
else:
raise ValueError("llm is none, which should never happen")

View File

@@ -0,0 +1,120 @@
"""Test SmartLLM."""
from langchain.chat_models import FakeListChatModel
from langchain.llms import FakeListLLM
from langchain.prompts.prompt import PromptTemplate
from langchain_experimental.smart_llm import SmartLLMChain
def test_ideation() -> None:
# test that correct responses are returned
responses = ["Idea 1", "Idea 2", "Idea 3"]
llm = FakeListLLM(responses=responses)
prompt = PromptTemplate(
input_variables=["product"],
template="What is a good name for a company that makes {product}?",
)
chain = SmartLLMChain(llm=llm, prompt=prompt)
prompt_value, _ = chain.prep_prompts({"product": "socks"})
chain.history.question = prompt_value.to_string()
results = chain._ideate()
assert results == responses
# test that correct number of responses are returned
for i in range(1, 5):
responses = [f"Idea {j+1}" for j in range(i)]
llm = FakeListLLM(responses=responses)
chain = SmartLLMChain(llm=llm, prompt=prompt, n_ideas=i)
prompt_value, _ = chain.prep_prompts({"product": "socks"})
chain.history.question = prompt_value.to_string()
results = chain._ideate()
assert len(results) == i
def test_critique() -> None:
response = "Test Critique"
llm = FakeListLLM(responses=[response])
prompt = PromptTemplate(
input_variables=["product"],
template="What is a good name for a company that makes {product}?",
)
chain = SmartLLMChain(llm=llm, prompt=prompt, n_ideas=2)
prompt_value, _ = chain.prep_prompts({"product": "socks"})
chain.history.question = prompt_value.to_string()
chain.history.ideas = ["Test Idea 1", "Test Idea 2"]
result = chain._critique()
assert result == response
def test_resolver() -> None:
response = "Test resolution"
llm = FakeListLLM(responses=[response])
prompt = PromptTemplate(
input_variables=["product"],
template="What is a good name for a company that makes {product}?",
)
chain = SmartLLMChain(llm=llm, prompt=prompt, n_ideas=2)
prompt_value, _ = chain.prep_prompts({"product": "socks"})
chain.history.question = prompt_value.to_string()
chain.history.ideas = ["Test Idea 1", "Test Idea 2"]
chain.history.critique = "Test Critique"
result = chain._resolve()
assert result == response
def test_all_steps() -> None:
joke = "Why did the chicken cross the Mobius strip?"
response = "Resolution response"
ideation_llm = FakeListLLM(responses=["Ideation response" for _ in range(20)])
critique_llm = FakeListLLM(responses=["Critique response" for _ in range(20)])
resolver_llm = FakeListLLM(responses=[response for _ in range(20)])
prompt = PromptTemplate(
input_variables=["joke"],
template="Explain this joke to me: {joke}?",
)
chain = SmartLLMChain(
ideation_llm=ideation_llm,
critique_llm=critique_llm,
resolver_llm=resolver_llm,
prompt=prompt,
)
result = chain(joke)
assert result["joke"] == joke
assert result["resolution"] == response
def test_intermediate_output() -> None:
joke = "Why did the chicken cross the Mobius strip?"
llm = FakeListLLM(responses=[f"Response {i+1}" for i in range(5)])
prompt = PromptTemplate(
input_variables=["joke"],
template="Explain this joke to me: {joke}?",
)
chain = SmartLLMChain(llm=llm, prompt=prompt, return_intermediate_steps=True)
result = chain(joke)
assert result["joke"] == joke
assert result["ideas"] == [f"Response {i+1}" for i in range(3)]
assert result["critique"] == "Response 4"
assert result["resolution"] == "Response 5"
def test_all_steps_with_chat_model() -> None:
joke = "Why did the chicken cross the Mobius strip?"
response = "Resolution response"
ideation_llm = FakeListChatModel(responses=["Ideation response" for _ in range(20)])
critique_llm = FakeListChatModel(responses=["Critique response" for _ in range(20)])
resolver_llm = FakeListChatModel(responses=[response for _ in range(20)])
prompt = PromptTemplate(
input_variables=["joke"],
template="Explain this joke to me: {joke}?",
)
chain = SmartLLMChain(
ideation_llm=ideation_llm,
critique_llm=critique_llm,
resolver_llm=resolver_llm,
prompt=prompt,
)
result = chain(joke)
assert result["joke"] == joke
assert result["resolution"] == response

View File

@@ -0,0 +1,208 @@
from __future__ import annotations
import importlib
from typing import (
Any,
AsyncIterator,
Dict,
Iterable,
List,
Mapping,
Sequence,
Union,
overload,
)
from typing_extensions import Literal
from langchain.schema.messages import (
AIMessage,
AIMessageChunk,
BaseMessage,
BaseMessageChunk,
ChatMessage,
FunctionMessage,
HumanMessage,
SystemMessage,
)
async def aenumerate(
iterable: AsyncIterator[Any], start: int = 0
) -> AsyncIterator[tuple[int, Any]]:
"""Async version of enumerate."""
i = start
async for x in iterable:
yield i, x
i += 1
def convert_dict_to_message(_dict: Mapping[str, Any]) -> BaseMessage:
role = _dict["role"]
if role == "user":
return HumanMessage(content=_dict["content"])
elif role == "assistant":
# Fix for azure
# Also OpenAI returns None for tool invocations
content = _dict.get("content", "") or ""
if _dict.get("function_call"):
additional_kwargs = {"function_call": dict(_dict["function_call"])}
else:
additional_kwargs = {}
return AIMessage(content=content, additional_kwargs=additional_kwargs)
elif role == "system":
return SystemMessage(content=_dict["content"])
elif role == "function":
return FunctionMessage(content=_dict["content"], name=_dict["name"])
else:
return ChatMessage(content=_dict["content"], role=role)
def convert_message_to_dict(message: BaseMessage) -> dict:
message_dict: Dict[str, Any]
if isinstance(message, ChatMessage):
message_dict = {"role": message.role, "content": message.content}
elif isinstance(message, HumanMessage):
message_dict = {"role": "user", "content": message.content}
elif isinstance(message, AIMessage):
message_dict = {"role": "assistant", "content": message.content}
if "function_call" in message.additional_kwargs:
message_dict["function_call"] = message.additional_kwargs["function_call"]
# If function call only, content is None not empty string
if message_dict["content"] == "":
message_dict["content"] = None
elif isinstance(message, SystemMessage):
message_dict = {"role": "system", "content": message.content}
elif isinstance(message, FunctionMessage):
message_dict = {
"role": "function",
"content": message.content,
"name": message.name,
}
else:
raise TypeError(f"Got unknown type {message}")
if "name" in message.additional_kwargs:
message_dict["name"] = message.additional_kwargs["name"]
return message_dict
def convert_openai_messages(messages: Sequence[Dict[str, Any]]) -> List[BaseMessage]:
"""Convert dictionaries representing OpenAI messages to LangChain format.
Args:
messages: List of dictionaries representing OpenAI messages
Returns:
List of LangChain BaseMessage objects.
"""
return [convert_dict_to_message(m) for m in messages]
def _convert_message_chunk_to_delta(chunk: BaseMessageChunk, i: int) -> Dict[str, Any]:
_dict: Dict[str, Any] = {}
if isinstance(chunk, AIMessageChunk):
if i == 0:
# Only shows up in the first chunk
_dict["role"] = "assistant"
if "function_call" in chunk.additional_kwargs:
_dict["function_call"] = chunk.additional_kwargs["function_call"]
# If the first chunk is a function call, the content is not empty string,
# not missing, but None.
if i == 0:
_dict["content"] = None
else:
_dict["content"] = chunk.content
else:
raise ValueError(f"Got unexpected streaming chunk type: {type(chunk)}")
# This only happens at the end of streams, and OpenAI returns as empty dict
if _dict == {"content": ""}:
_dict = {}
return {"choices": [{"delta": _dict}]}
class ChatCompletion:
@overload
@staticmethod
def create(
messages: Sequence[Dict[str, Any]],
*,
provider: str = "ChatOpenAI",
stream: Literal[False] = False,
**kwargs: Any,
) -> dict:
...
@overload
@staticmethod
def create(
messages: Sequence[Dict[str, Any]],
*,
provider: str = "ChatOpenAI",
stream: Literal[True],
**kwargs: Any,
) -> Iterable:
...
@staticmethod
def create(
messages: Sequence[Dict[str, Any]],
*,
provider: str = "ChatOpenAI",
stream: bool = False,
**kwargs: Any,
) -> Union[dict, Iterable]:
models = importlib.import_module("langchain.chat_models")
model_cls = getattr(models, provider)
model_config = model_cls(**kwargs)
converted_messages = convert_openai_messages(messages)
if not stream:
result = model_config.invoke(converted_messages)
return {"choices": [{"message": convert_message_to_dict(result)}]}
else:
return (
_convert_message_chunk_to_delta(c, i)
for i, c in enumerate(model_config.stream(converted_messages))
)
@overload
@staticmethod
async def acreate(
messages: Sequence[Dict[str, Any]],
*,
provider: str = "ChatOpenAI",
stream: Literal[False] = False,
**kwargs: Any,
) -> dict:
...
@overload
@staticmethod
async def acreate(
messages: Sequence[Dict[str, Any]],
*,
provider: str = "ChatOpenAI",
stream: Literal[True],
**kwargs: Any,
) -> AsyncIterator:
...
@staticmethod
async def acreate(
messages: Sequence[Dict[str, Any]],
*,
provider: str = "ChatOpenAI",
stream: bool = False,
**kwargs: Any,
) -> Union[dict, AsyncIterator]:
models = importlib.import_module("langchain.chat_models")
model_cls = getattr(models, provider)
model_config = model_cls(**kwargs)
converted_messages = convert_openai_messages(messages)
if not stream:
result = await model_config.ainvoke(converted_messages)
return {"choices": [{"message": convert_message_to_dict(result)}]}
else:
return (
_convert_message_chunk_to_delta(c, i)
async for i, c in aenumerate(model_config.astream(converted_messages))
)

View File

@@ -475,7 +475,8 @@ class Agent(BaseSingleActionAgent):
"""
full_inputs = self.get_full_inputs(intermediate_steps, **kwargs)
full_output = await self.llm_chain.apredict(callbacks=callbacks, **full_inputs)
return self.output_parser.parse(full_output)
agent_output = await self.output_parser.aparse(full_output)
return agent_output
def get_full_inputs(
self, intermediate_steps: List[Tuple[AgentAction, str]], **kwargs: Any

View File

@@ -55,6 +55,16 @@ def dereference_refs(spec_obj: dict, full_spec: dict) -> Union[dict, list]:
@dataclass(frozen=True)
class ReducedOpenAPISpec:
"""A reduced OpenAPI spec.
This is a quick and dirty representation for OpenAPI specs.
Attributes:
servers: The servers in the spec.
description: The description of the spec.
endpoints: The endpoints in the spec.
"""
servers: List[dict]
description: str
endpoints: List[Tuple[str, str, dict]]

View File

@@ -22,7 +22,20 @@ def create_vectorstore_agent(
agent_executor_kwargs: Optional[Dict[str, Any]] = None,
**kwargs: Dict[str, Any],
) -> AgentExecutor:
"""Construct a VectorStore agent from an LLM and tools."""
"""Construct a VectorStore agent from an LLM and tools.
Args:
llm (BaseLanguageModel): LLM that will be used by the agent
toolkit (VectorStoreToolkit): Set of tools for the agent
callback_manager (Optional[BaseCallbackManager], optional): Object to handle the callback [ Defaults to None. ]
prefix (str, optional): The prefix prompt for the agent. If not provided uses default PREFIX.
verbose (bool, optional): If you want to see the content of the scratchpad. [ Defaults to False ]
agent_executor_kwargs (Optional[Dict[str, Any]], optional): If there is any other parameter you want to send to the agent. [ Defaults to None ]
**kwargs: Additional named parameters to pass to the ZeroShotAgent.
Returns:
AgentExecutor: Returns a callable AgentExecutor object. Either you can call it or use run method with the query to get the response
""" # noqa: E501
tools = toolkit.get_tools()
prompt = ZeroShotAgent.create_prompt(tools, prefix=prefix)
llm_chain = LLMChain(
@@ -50,7 +63,20 @@ def create_vectorstore_router_agent(
agent_executor_kwargs: Optional[Dict[str, Any]] = None,
**kwargs: Dict[str, Any],
) -> AgentExecutor:
"""Construct a VectorStore router agent from an LLM and tools."""
"""Construct a VectorStore router agent from an LLM and tools.
Args:
llm (BaseLanguageModel): LLM that will be used by the agent
toolkit (VectorStoreRouterToolkit): Set of tools for the agent which have routing capability with multiple vector stores
callback_manager (Optional[BaseCallbackManager], optional): Object to handle the callback [ Defaults to None. ]
prefix (str, optional): The prefix prompt for the router agent. If not provided uses default ROUTER_PREFIX.
verbose (bool, optional): If you want to see the content of the scratchpad. [ Defaults to False ]
agent_executor_kwargs (Optional[Dict[str, Any]], optional): If there is any other parameter you want to send to the agent. [ Defaults to None ]
**kwargs: Additional named parameters to pass to the ZeroShotAgent.
Returns:
AgentExecutor: Returns a callable AgentExecutor object. Either you can call it or use run method with the query to get the response.
""" # noqa: E501
tools = toolkit.get_tools()
prompt = ZeroShotAgent.create_prompt(tools, prefix=prefix)
llm_chain = LLMChain(

View File

@@ -19,12 +19,14 @@ logger = logging.getLogger(__name__)
class StructuredChatOutputParser(AgentOutputParser):
"""Output parser for the structured chat agent."""
pattern = re.compile(r"```(?:json)?\n(.*?)```", re.DOTALL)
def get_format_instructions(self) -> str:
return FORMAT_INSTRUCTIONS
def parse(self, text: str) -> Union[AgentAction, AgentFinish]:
try:
action_match = re.search(r"```(.*?)```?", text, re.DOTALL)
action_match = self.pattern.search(text)
if action_match is not None:
response = json.loads(action_match.group(1).strip(), strict=False)
if isinstance(response, list):

View File

@@ -10,6 +10,8 @@ from langchain.tools.base import BaseTool
class XMLAgentOutputParser(AgentOutputParser):
"""Output parser for XMLAgent."""
def parse(self, text: str) -> Union[AgentAction, AgentFinish]:
if "</tool>" in text:
tool, tool_input = text.split("</tool>")

View File

@@ -18,6 +18,7 @@ from langchain.callbacks.file import FileCallbackHandler
from langchain.callbacks.flyte_callback import FlyteCallbackHandler
from langchain.callbacks.human import HumanApprovalCallbackHandler
from langchain.callbacks.infino_callback import InfinoCallbackHandler
from langchain.callbacks.labelstudio_callback import LabelStudioCallbackHandler
from langchain.callbacks.manager import (
get_openai_callback,
tracing_enabled,
@@ -68,4 +69,5 @@ __all__ = [
"wandb_tracing_enabled",
"FlyteCallbackHandler",
"SageMakerCallbackHandler",
"LabelStudioCallbackHandler",
]

View File

@@ -2,6 +2,8 @@ import os
import warnings
from typing import Any, Dict, List, Optional, Union
from packaging.version import parse
from langchain.callbacks.base import BaseCallbackHandler
from langchain.schema import AgentAction, AgentFinish, LLMResult
@@ -19,11 +21,10 @@ class ArgillaCallbackHandler(BaseCallbackHandler):
default workspace will be used.
api_url: URL of the Argilla Server that we want to use, and where the
`FeedbackDataset` lives in. Defaults to `None`, which means that either
`ARGILLA_API_URL` environment variable or the default http://localhost:6900
will be used.
`ARGILLA_API_URL` environment variable or the default will be used.
api_key: API Key to connect to the Argilla Server. Defaults to `None`, which
means that either `ARGILLA_API_KEY` environment variable or the default
`argilla.apikey` will be used.
will be used.
Raises:
ImportError: if the `argilla` package is not installed.
@@ -51,6 +52,12 @@ class ArgillaCallbackHandler(BaseCallbackHandler):
"Argilla, no doubt about it."
"""
REPO_URL = "https://github.com/argilla-io/argilla"
ISSUES_URL = f"{REPO_URL}/issues"
BLOG_URL = "https://docs.argilla.io/en/latest/guides/llms/practical_guides/use_argilla_callback_in_langchain.html" # noqa: E501
DEFAULT_API_URL = "http://localhost:6900"
def __init__(
self,
dataset_name: str,
@@ -70,11 +77,10 @@ class ArgillaCallbackHandler(BaseCallbackHandler):
default workspace will be used.
api_url: URL of the Argilla Server that we want to use, and where the
`FeedbackDataset` lives in. Defaults to `None`, which means that either
`ARGILLA_API_URL` environment variable or the default
http://localhost:6900 will be used.
`ARGILLA_API_URL` environment variable or the default will be used.
api_key: API Key to connect to the Argilla Server. Defaults to `None`, which
means that either `ARGILLA_API_KEY` environment variable or the default
`argilla.apikey` will be used.
will be used.
Raises:
ImportError: if the `argilla` package is not installed.
@@ -87,41 +93,58 @@ class ArgillaCallbackHandler(BaseCallbackHandler):
# Import Argilla (not via `import_argilla` to keep hints in IDEs)
try:
import argilla as rg # noqa: F401
self.ARGILLA_VERSION = rg.__version__
except ImportError:
raise ImportError(
"To use the Argilla callback manager you need to have the `argilla` "
"Python package installed. Please install it with `pip install argilla`"
)
# Check whether the Argilla version is compatible
if parse(self.ARGILLA_VERSION) < parse("1.8.0"):
raise ImportError(
f"The installed `argilla` version is {self.ARGILLA_VERSION} but "
"`ArgillaCallbackHandler` requires at least version 1.8.0. Please "
"upgrade `argilla` with `pip install --upgrade argilla`."
)
# Show a warning message if Argilla will assume the default values will be used
if api_url is None and os.getenv("ARGILLA_API_URL") is None:
warnings.warn(
(
"Since `api_url` is None, and the env var `ARGILLA_API_URL` is not"
" set, it will default to `http://localhost:6900`."
f" set, it will default to `{self.DEFAULT_API_URL}`, which is the"
" default API URL in Argilla Quickstart."
),
)
api_url = self.DEFAULT_API_URL
if api_key is None and os.getenv("ARGILLA_API_KEY") is None:
self.DEFAULT_API_KEY = (
"admin.apikey"
if parse(self.ARGILLA_VERSION) < parse("1.11.0")
else "owner.apikey"
)
warnings.warn(
(
"Since `api_key` is None, and the env var `ARGILLA_API_KEY` is not"
" set, it will default to `argilla.apikey`."
f" set, it will default to `{self.DEFAULT_API_KEY}`, which is the"
" default API key in Argilla Quickstart."
),
)
api_url = self.DEFAULT_API_URL
# Connect to Argilla with the provided credentials, if applicable
try:
rg.init(
api_key=api_key,
api_url=api_url,
)
rg.init(api_key=api_key, api_url=api_url)
except Exception as e:
raise ConnectionError(
f"Could not connect to Argilla with exception: '{e}'.\n"
"Please check your `api_key` and `api_url`, and make sure that "
"the Argilla server is up and running. If the problem persists "
"please report it to https://github.com/argilla-io/argilla/issues "
"with the label `langchain`."
f"please report it to {self.ISSUES_URL} as an `integration` issue."
) from e
# Set the Argilla variables
@@ -130,46 +153,47 @@ class ArgillaCallbackHandler(BaseCallbackHandler):
# Retrieve the `FeedbackDataset` from Argilla (without existing records)
try:
extra_args = {}
if parse(self.ARGILLA_VERSION) < parse("1.14.0"):
warnings.warn(
f"You have Argilla {self.ARGILLA_VERSION}, but Argilla 1.14.0 or"
" higher is recommended.",
UserWarning,
)
extra_args = {"with_records": False}
self.dataset = rg.FeedbackDataset.from_argilla(
name=self.dataset_name,
workspace=self.workspace_name,
with_records=False,
**extra_args,
)
except Exception as e:
raise FileNotFoundError(
"`FeedbackDataset` retrieval from Argilla failed with exception:"
f" '{e}'.\nPlease check that the dataset with"
f" name={self.dataset_name} in the"
f"`FeedbackDataset` retrieval from Argilla failed with exception `{e}`."
f"\nPlease check that the dataset with name={self.dataset_name} in the"
f" workspace={self.workspace_name} exists in advance. If you need help"
" on how to create a `langchain`-compatible `FeedbackDataset` in"
" Argilla, please visit"
" https://docs.argilla.io/en/latest/guides/llms/practical_guides/use_argilla_callback_in_langchain.html." # noqa: E501
" If the problem persists please report it to"
" https://github.com/argilla-io/argilla/issues with the label"
" `langchain`."
f" Argilla, please visit {self.BLOG_URL}. If the problem persists"
f" please report it to {self.ISSUES_URL} as an `integration` issue."
) from e
supported_fields = ["prompt", "response"]
if supported_fields != [field.name for field in self.dataset.fields]:
raise ValueError(
f"`FeedbackDataset` with name={self.dataset_name} in the"
f" workspace={self.workspace_name} "
"had fields that are not supported yet for the `langchain` integration."
" Supported fields are: "
f"{supported_fields}, and the current `FeedbackDataset` fields are"
f" {[field.name for field in self.dataset.fields]}. "
"For more information on how to create a `langchain`-compatible"
" `FeedbackDataset` in Argilla, please visit"
" https://docs.argilla.io/en/latest/guides/llms/practical_guides/use_argilla_callback_in_langchain.html." # noqa: E501
f"`FeedbackDataset` with name={self.dataset_name} in the workspace="
f"{self.workspace_name} had fields that are not supported yet for the"
f"`langchain` integration. Supported fields are: {supported_fields},"
f" and the current `FeedbackDataset` fields are {[field.name for field in self.dataset.fields]}." # noqa: E501
" For more information on how to create a `langchain`-compatible"
f" `FeedbackDataset` in Argilla, please visit {self.BLOG_URL}."
)
self.prompts: Dict[str, List[str]] = {}
warnings.warn(
(
"The `ArgillaCallbackHandler` is currently in beta and is subject to "
"change based on updates to `langchain`. Please report any issues to "
"https://github.com/argilla-io/argilla/issues with the tag `langchain`."
"The `ArgillaCallbackHandler` is currently in beta and is subject to"
" change based on updates to `langchain`. Please report any issues to"
f" {self.ISSUES_URL} as an `integration` issue."
),
)
@@ -205,12 +229,13 @@ class ArgillaCallbackHandler(BaseCallbackHandler):
]
)
# Push the records to Argilla
self.dataset.push_to_argilla()
# Pop current run from `self.runs`
self.prompts.pop(str(kwargs["run_id"]))
if parse(self.ARGILLA_VERSION) < parse("1.14.0"):
# Push the records to Argilla
self.dataset.push_to_argilla()
def on_llm_error(
self, error: Union[Exception, KeyboardInterrupt], **kwargs: Any
) -> None:
@@ -278,15 +303,16 @@ class ArgillaCallbackHandler(BaseCallbackHandler):
]
)
# Push the records to Argilla
self.dataset.push_to_argilla()
# Pop current run from `self.runs`
if str(kwargs["parent_run_id"]) in self.prompts:
self.prompts.pop(str(kwargs["parent_run_id"]))
if str(kwargs["run_id"]) in self.prompts:
self.prompts.pop(str(kwargs["run_id"]))
if parse(self.ARGILLA_VERSION) < parse("1.14.0"):
# Push the records to Argilla
self.dataset.push_to_argilla()
def on_chain_error(
self, error: Union[Exception, KeyboardInterrupt], **kwargs: Any
) -> None:

View File

@@ -13,7 +13,7 @@ class FileCallbackHandler(BaseCallbackHandler):
self, filename: str, mode: str = "a", color: Optional[str] = None
) -> None:
"""Initialize callback handler."""
self.file = cast(TextIO, open(filename, mode))
self.file = cast(TextIO, open(filename, mode, encoding="utf-8"))
self.color = color
def __del__(self) -> None:

View File

@@ -0,0 +1,392 @@
import os
import warnings
from datetime import datetime
from enum import Enum
from typing import Any, Dict, List, Optional, Tuple, Union
from uuid import UUID
from langchain.callbacks.base import BaseCallbackHandler
from langchain.schema import (
AgentAction,
AgentFinish,
BaseMessage,
ChatMessage,
Generation,
LLMResult,
)
class LabelStudioMode(Enum):
PROMPT = "prompt"
CHAT = "chat"
def get_default_label_configs(
mode: Union[str, LabelStudioMode]
) -> Tuple[str, LabelStudioMode]:
_default_label_configs = {
LabelStudioMode.PROMPT.value: """
<View>
<Style>
.prompt-box {
background-color: white;
border-radius: 10px;
box-shadow: 0px 4px 6px rgba(0, 0, 0, 0.1);
padding: 20px;
}
</Style>
<View className="root">
<View className="prompt-box">
<Text name="prompt" value="$prompt"/>
</View>
<TextArea name="response" toName="prompt"
maxSubmissions="1" editable="true"
required="true"/>
</View>
<Header value="Rate the response:"/>
<Rating name="rating" toName="prompt"/>
</View>""",
LabelStudioMode.CHAT.value: """
<View>
<View className="root">
<Paragraphs name="dialogue"
value="$prompt"
layout="dialogue"
textKey="content"
nameKey="role"
granularity="sentence"/>
<Header value="Final response:"/>
<TextArea name="response" toName="dialogue"
maxSubmissions="1" editable="true"
required="true"/>
</View>
<Header value="Rate the response:"/>
<Rating name="rating" toName="dialogue"/>
</View>""",
}
if isinstance(mode, str):
mode = LabelStudioMode(mode)
return _default_label_configs[mode.value], mode
class LabelStudioCallbackHandler(BaseCallbackHandler):
"""Label Studio callback handler.
Provides the ability to send predictions to Label Studio
for human evaluation, feedback and annotation.
Parameters:
api_key: Label Studio API key
url: Label Studio URL
project_id: Label Studio project ID
project_name: Label Studio project name
project_config: Label Studio project config (XML)
mode: Label Studio mode ("prompt" or "chat")
Examples:
>>> from langchain.llms import OpenAI
>>> from langchain.callbacks import LabelStudioCallbackHandler
>>> handler = LabelStudioCallbackHandler(
... api_key='<your_key_here>',
... url='http://localhost:8080',
... project_name='LangChain-%Y-%m-%d',
... mode='prompt'
... )
>>> llm = OpenAI(callbacks=[handler])
>>> llm.predict('Tell me a story about a dog.')
"""
DEFAULT_PROJECT_NAME = "LangChain-%Y-%m-%d"
def __init__(
self,
api_key: Optional[str] = None,
url: Optional[str] = None,
project_id: Optional[int] = None,
project_name: str = DEFAULT_PROJECT_NAME,
project_config: Optional[str] = None,
mode: Union[str, LabelStudioMode] = LabelStudioMode.PROMPT,
):
super().__init__()
# Import LabelStudio SDK
try:
import label_studio_sdk as ls
except ImportError:
raise ImportError(
f"You're using {self.__class__.__name__} in your code,"
f" but you don't have the LabelStudio SDK "
f"Python package installed or upgraded to the latest version. "
f"Please run `pip install -U label-studio-sdk`"
f" before using this callback."
)
# Check if Label Studio API key is provided
if not api_key:
if os.getenv("LABEL_STUDIO_API_KEY"):
api_key = str(os.getenv("LABEL_STUDIO_API_KEY"))
else:
raise ValueError(
f"You're using {self.__class__.__name__} in your code,"
f" Label Studio API key is not provided. "
f"Please provide Label Studio API key: "
f"go to the Label Studio instance, navigate to "
f"Account & Settings -> Access Token and copy the key. "
f"Use the key as a parameter for the callback: "
f"{self.__class__.__name__}"
f"(label_studio_api_key='<your_key_here>', ...) or "
f"set the environment variable LABEL_STUDIO_API_KEY=<your_key_here>"
)
self.api_key = api_key
if not url:
if os.getenv("LABEL_STUDIO_URL"):
url = os.getenv("LABEL_STUDIO_URL")
else:
warnings.warn(
f"Label Studio URL is not provided, "
f"using default URL: {ls.LABEL_STUDIO_DEFAULT_URL}"
f"If you want to provide your own URL, use the parameter: "
f"{self.__class__.__name__}"
f"(label_studio_url='<your_url_here>', ...) "
f"or set the environment variable LABEL_STUDIO_URL=<your_url_here>"
)
url = ls.LABEL_STUDIO_DEFAULT_URL
self.url = url
# Maps run_id to prompts
self.payload: Dict[str, Dict] = {}
self.ls_client = ls.Client(url=self.url, api_key=self.api_key)
self.project_name = project_name
if project_config:
self.project_config = project_config
self.mode = None
else:
self.project_config, self.mode = get_default_label_configs(mode)
self.project_id = project_id or os.getenv("LABEL_STUDIO_PROJECT_ID")
if self.project_id is not None:
self.ls_project = self.ls_client.get_project(int(self.project_id))
else:
project_title = datetime.today().strftime(self.project_name)
existing_projects = self.ls_client.get_projects(title=project_title)
if existing_projects:
self.ls_project = existing_projects[0]
self.project_id = self.ls_project.id
else:
self.ls_project = self.ls_client.create_project(
title=project_title, label_config=self.project_config
)
self.project_id = self.ls_project.id
self.parsed_label_config = self.ls_project.parsed_label_config
# Find the first TextArea tag
# "from_name", "to_name", "value" will be used to create predictions
self.from_name, self.to_name, self.value, self.input_type = (
None,
None,
None,
None,
)
for tag_name, tag_info in self.parsed_label_config.items():
if tag_info["type"] == "TextArea":
self.from_name = tag_name
self.to_name = tag_info["to_name"][0]
self.value = tag_info["inputs"][0]["value"]
self.input_type = tag_info["inputs"][0]["type"]
break
if not self.from_name:
error_message = (
f'Label Studio project "{self.project_name}" '
f"does not have a TextArea tag. "
f"Please add a TextArea tag to the project."
)
if self.mode == LabelStudioMode.PROMPT:
error_message += (
"\nHINT: go to project Settings -> "
"Labeling Interface -> Browse Templates"
' and select "Generative AI -> '
'Supervised Language Model Fine-tuning" template.'
)
else:
error_message += (
"\nHINT: go to project Settings -> "
"Labeling Interface -> Browse Templates"
" and check available templates under "
'"Generative AI" section.'
)
raise ValueError(error_message)
def add_prompts_generations(
self, run_id: str, generations: List[List[Generation]]
) -> None:
# Create tasks in Label Studio
tasks = []
prompts = self.payload[run_id]["prompts"]
model_version = (
self.payload[run_id]["kwargs"]
.get("invocation_params", {})
.get("model_name")
)
for prompt, generation in zip(prompts, generations):
tasks.append(
{
"data": {
self.value: prompt,
"run_id": run_id,
},
"predictions": [
{
"result": [
{
"from_name": self.from_name,
"to_name": self.to_name,
"type": "textarea",
"value": {"text": [g.text for g in generation]},
}
],
"model_version": model_version,
}
],
}
)
self.ls_project.import_tasks(tasks)
def on_llm_start(
self,
serialized: Dict[str, Any],
prompts: List[str],
**kwargs: Any,
) -> None:
"""Save the prompts in memory when an LLM starts."""
if self.input_type != "Text":
raise ValueError(
f'\nLabel Studio project "{self.project_name}" '
f"has an input type <{self.input_type}>. "
f'To make it work with the mode="chat", '
f"the input type should be <Text>.\n"
f"Read more here https://labelstud.io/tags/text"
)
run_id = str(kwargs["run_id"])
self.payload[run_id] = {"prompts": prompts, "kwargs": kwargs}
def _get_message_role(self, message: BaseMessage) -> str:
"""Get the role of the message."""
if isinstance(message, ChatMessage):
return message.role
else:
return message.__class__.__name__
def on_chat_model_start(
self,
serialized: Dict[str, Any],
messages: List[List[BaseMessage]],
*,
run_id: UUID,
parent_run_id: Optional[UUID] = None,
tags: Optional[List[str]] = None,
metadata: Optional[Dict[str, Any]] = None,
**kwargs: Any,
) -> Any:
"""Save the prompts in memory when an LLM starts."""
if self.input_type != "Paragraphs":
raise ValueError(
f'\nLabel Studio project "{self.project_name}" '
f"has an input type <{self.input_type}>. "
f'To make it work with the mode="chat", '
f"the input type should be <Paragraphs>.\n"
f"Read more here https://labelstud.io/tags/paragraphs"
)
prompts = []
for message_list in messages:
dialog = []
for message in message_list:
dialog.append(
{
"role": self._get_message_role(message),
"content": message.content,
}
)
prompts.append(dialog)
self.payload[str(run_id)] = {
"prompts": prompts,
"tags": tags,
"metadata": metadata,
"run_id": run_id,
"parent_run_id": parent_run_id,
"kwargs": kwargs,
}
def on_llm_new_token(self, token: str, **kwargs: Any) -> None:
"""Do nothing when a new token is generated."""
pass
def on_llm_end(self, response: LLMResult, **kwargs: Any) -> None:
"""Create a new Label Studio task for each prompt and generation."""
run_id = str(kwargs["run_id"])
# Submit results to Label Studio
self.add_prompts_generations(run_id, response.generations)
# Pop current run from `self.runs`
self.payload.pop(run_id)
def on_llm_error(
self, error: Union[Exception, KeyboardInterrupt], **kwargs: Any
) -> None:
"""Do nothing when LLM outputs an error."""
pass
def on_chain_start(
self, serialized: Dict[str, Any], inputs: Dict[str, Any], **kwargs: Any
) -> None:
pass
def on_chain_end(self, outputs: Dict[str, Any], **kwargs: Any) -> None:
pass
def on_chain_error(
self, error: Union[Exception, KeyboardInterrupt], **kwargs: Any
) -> None:
"""Do nothing when LLM chain outputs an error."""
pass
def on_tool_start(
self,
serialized: Dict[str, Any],
input_str: str,
**kwargs: Any,
) -> None:
"""Do nothing when tool starts."""
pass
def on_agent_action(self, action: AgentAction, **kwargs: Any) -> Any:
"""Do nothing when agent takes a specific action."""
pass
def on_tool_end(
self,
output: str,
observation_prefix: Optional[str] = None,
llm_prefix: Optional[str] = None,
**kwargs: Any,
) -> None:
"""Do nothing when tool ends."""
pass
def on_tool_error(
self, error: Union[Exception, KeyboardInterrupt], **kwargs: Any
) -> None:
"""Do nothing when tool outputs an error."""
pass
def on_text(self, text: str, **kwargs: Any) -> None:
"""Do nothing"""
pass
def on_agent_finish(self, finish: AgentFinish, **kwargs: Any) -> None:
"""Do nothing"""
pass

View File

@@ -31,8 +31,19 @@ MODEL_COST_PER_1K_TOKENS = {
"gpt-3.5-turbo-0613-completion": 0.002,
"gpt-3.5-turbo-16k-completion": 0.004,
"gpt-3.5-turbo-16k-0613-completion": 0.004,
# Azure GPT-35 input
"gpt-35-turbo": 0.0015, # Azure OpenAI version of ChatGPT
"gpt-35-turbo-0301": 0.0015, # Azure OpenAI version of ChatGPT
"gpt-35-turbo-0613": 0.0015,
"gpt-35-turbo-16k": 0.003,
"gpt-35-turbo-16k-0613": 0.003,
# Azure GPT-35 output
"gpt-35-turbo-completion": 0.002, # Azure OpenAI version of ChatGPT
"gpt-35-turbo-0301-completion": 0.002, # Azure OpenAI version of ChatGPT
"gpt-35-turbo-0613-completion": 0.002,
"gpt-35-turbo-16k-completion": 0.004,
"gpt-35-turbo-16k-0613-completion": 0.004,
# Others
"gpt-35-turbo": 0.002, # Azure OpenAI version of ChatGPT
"text-ada-001": 0.0004,
"ada": 0.0004,
"text-babbage-001": 0.0005,
@@ -69,7 +80,9 @@ def standardize_model_name(
if "ft-" in model_name:
return model_name.split(":")[0] + "-finetuned"
elif is_completion and (
model_name.startswith("gpt-4") or model_name.startswith("gpt-3.5")
model_name.startswith("gpt-4")
or model_name.startswith("gpt-3.5")
or model_name.startswith("gpt-35")
):
return model_name + "-completion"
else:

View File

@@ -47,6 +47,9 @@ class BaseTracer(BaseCallbackHandler, ABC):
parent_run = self.run_map[str(run.parent_run_id)]
if parent_run:
self._add_child_run(parent_run, run)
parent_run.child_execution_order = max(
parent_run.child_execution_order, run.child_execution_order
)
else:
logger.debug(f"Parent run with UUID {run.parent_run_id} not found.")
self.run_map[str(run.id)] = run
@@ -131,7 +134,7 @@ class BaseTracer(BaseCallbackHandler, ABC):
run_id_ = str(run_id)
llm_run = self.run_map.get(run_id_)
if llm_run is None or llm_run.run_type != "llm":
raise TracerException("No LLM Run found to be traced")
raise TracerException(f"No LLM Run found to be traced for {run_id}")
llm_run.events.append(
{
"name": "new_token",
@@ -183,7 +186,7 @@ class BaseTracer(BaseCallbackHandler, ABC):
run_id_ = str(run_id)
llm_run = self.run_map.get(run_id_)
if llm_run is None or llm_run.run_type != "llm":
raise TracerException("No LLM Run found to be traced")
raise TracerException(f"No LLM Run found to be traced for {run_id}")
llm_run.outputs = response.dict()
for i, generations in enumerate(response.generations):
for j, generation in enumerate(generations):
@@ -211,7 +214,7 @@ class BaseTracer(BaseCallbackHandler, ABC):
run_id_ = str(run_id)
llm_run = self.run_map.get(run_id_)
if llm_run is None or llm_run.run_type != "llm":
raise TracerException("No LLM Run found to be traced")
raise TracerException(f"No LLM Run found to be traced for {run_id}")
llm_run.error = repr(error)
llm_run.end_time = datetime.utcnow()
llm_run.events.append({"name": "error", "time": llm_run.end_time})
@@ -254,18 +257,25 @@ class BaseTracer(BaseCallbackHandler, ABC):
self._on_chain_start(chain_run)
def on_chain_end(
self, outputs: Dict[str, Any], *, run_id: UUID, **kwargs: Any
self,
outputs: Dict[str, Any],
*,
run_id: UUID,
inputs: Optional[Dict[str, Any]] = None,
**kwargs: Any,
) -> None:
"""End a trace for a chain run."""
if not run_id:
raise TracerException("No run_id provided for on_chain_end callback.")
chain_run = self.run_map.get(str(run_id))
if chain_run is None:
raise TracerException("No chain Run found to be traced")
raise TracerException(f"No chain Run found to be traced for {run_id}")
chain_run.outputs = outputs
chain_run.end_time = datetime.utcnow()
chain_run.events.append({"name": "end", "time": chain_run.end_time})
if inputs is not None:
chain_run.inputs = inputs
self._end_trace(chain_run)
self._on_chain_end(chain_run)
@@ -273,6 +283,7 @@ class BaseTracer(BaseCallbackHandler, ABC):
self,
error: Union[Exception, KeyboardInterrupt],
*,
inputs: Optional[Dict[str, Any]] = None,
run_id: UUID,
**kwargs: Any,
) -> None:
@@ -281,11 +292,13 @@ class BaseTracer(BaseCallbackHandler, ABC):
raise TracerException("No run_id provided for on_chain_error callback.")
chain_run = self.run_map.get(str(run_id))
if chain_run is None:
raise TracerException("No chain Run found to be traced")
raise TracerException(f"No chain Run found to be traced for {run_id}")
chain_run.error = repr(error)
chain_run.end_time = datetime.utcnow()
chain_run.events.append({"name": "error", "time": chain_run.end_time})
if inputs is not None:
chain_run.inputs = inputs
self._end_trace(chain_run)
self._on_chain_error(chain_run)
@@ -329,7 +342,7 @@ class BaseTracer(BaseCallbackHandler, ABC):
raise TracerException("No run_id provided for on_tool_end callback.")
tool_run = self.run_map.get(str(run_id))
if tool_run is None or tool_run.run_type != "tool":
raise TracerException("No tool Run found to be traced")
raise TracerException(f"No tool Run found to be traced for {run_id}")
tool_run.outputs = {"output": output}
tool_run.end_time = datetime.utcnow()
@@ -349,7 +362,7 @@ class BaseTracer(BaseCallbackHandler, ABC):
raise TracerException("No run_id provided for on_tool_error callback.")
tool_run = self.run_map.get(str(run_id))
if tool_run is None or tool_run.run_type != "tool":
raise TracerException("No tool Run found to be traced")
raise TracerException(f"No tool Run found to be traced for {run_id}")
tool_run.error = repr(error)
tool_run.end_time = datetime.utcnow()
@@ -404,7 +417,7 @@ class BaseTracer(BaseCallbackHandler, ABC):
raise TracerException("No run_id provided for on_retriever_error callback.")
retrieval_run = self.run_map.get(str(run_id))
if retrieval_run is None or retrieval_run.run_type != "retriever":
raise TracerException("No retriever Run found to be traced")
raise TracerException(f"No retriever Run found to be traced for {run_id}")
retrieval_run.error = repr(error)
retrieval_run.end_time = datetime.utcnow()
@@ -420,7 +433,7 @@ class BaseTracer(BaseCallbackHandler, ABC):
raise TracerException("No run_id provided for on_retriever_end callback.")
retrieval_run = self.run_map.get(str(run_id))
if retrieval_run is None or retrieval_run.run_type != "retriever":
raise TracerException("No retriever Run found to be traced")
raise TracerException(f"No retriever Run found to be traced for {run_id}")
retrieval_run.outputs = {"documents": documents}
retrieval_run.end_time = datetime.utcnow()
retrieval_run.events.append({"name": "end", "time": retrieval_run.end_time})

View File

@@ -37,7 +37,7 @@ def elapsed(run: Any) -> str:
elapsed_time = run.end_time - run.start_time
milliseconds = elapsed_time.total_seconds() * 1000
if milliseconds < 1000:
return f"{milliseconds}ms"
return f"{milliseconds:.0f}ms"
return f"{(milliseconds / 1000):.2f}s"
@@ -78,28 +78,31 @@ class FunctionCallbackHandler(BaseTracer):
# logging methods
def _on_chain_start(self, run: Run) -> None:
crumbs = self.get_breadcrumbs(run)
run_type = run.run_type.capitalize()
self.function_callback(
f"{get_colored_text('[chain/start]', color='green')} "
+ get_bolded_text(f"[{crumbs}] Entering Chain run with input:\n")
+ get_bolded_text(f"[{crumbs}] Entering {run_type} run with input:\n")
+ f"{try_json_stringify(run.inputs, '[inputs]')}"
)
def _on_chain_end(self, run: Run) -> None:
crumbs = self.get_breadcrumbs(run)
run_type = run.run_type.capitalize()
self.function_callback(
f"{get_colored_text('[chain/end]', color='blue')} "
+ get_bolded_text(
f"[{crumbs}] [{elapsed(run)}] Exiting Chain run with output:\n"
f"[{crumbs}] [{elapsed(run)}] Exiting {run_type} run with output:\n"
)
+ f"{try_json_stringify(run.outputs, '[outputs]')}"
)
def _on_chain_error(self, run: Run) -> None:
crumbs = self.get_breadcrumbs(run)
run_type = run.run_type.capitalize()
self.function_callback(
f"{get_colored_text('[chain/error]', color='red')} "
+ get_bolded_text(
f"[{crumbs}] [{elapsed(run)}] Chain run errored with error:\n"
f"[{crumbs}] [{elapsed(run)}] {run_type} run errored with error:\n"
)
+ f"{try_json_stringify(run.error, '[error]')}"
)

View File

@@ -1,9 +1,11 @@
"""Base interface that all chains should implement."""
import asyncio
import inspect
import json
import logging
import warnings
from abc import ABC, abstractmethod
from functools import partial
from pathlib import Path
from typing import Any, Dict, List, Optional, Union
@@ -55,18 +57,40 @@ class Chain(Serializable, Runnable[Dict[str, Any], Dict[str, Any]], ABC):
"""
def invoke(
self, input: Dict[str, Any], config: Optional[RunnableConfig] = None
self,
input: Dict[str, Any],
config: Optional[RunnableConfig] = None,
**kwargs: Any,
) -> Dict[str, Any]:
return self(input, **(config or {}))
config = config or {}
return self(
input,
callbacks=config.get("callbacks"),
tags=config.get("tags"),
metadata=config.get("metadata"),
**kwargs,
)
async def ainvoke(
self, input: Dict[str, Any], config: Optional[RunnableConfig] = None
self,
input: Dict[str, Any],
config: Optional[RunnableConfig] = None,
**kwargs: Any,
) -> Dict[str, Any]:
if type(self)._acall == Chain._acall:
# If the chain does not implement async, fall back to default implementation
return await super().ainvoke(input, config)
return await asyncio.get_running_loop().run_in_executor(
None, partial(self.invoke, input, config, **kwargs)
)
return await self.acall(input, **(config or {}))
config = config or {}
return await self.acall(
input,
callbacks=config.get("callbacks"),
tags=config.get("tags"),
metadata=config.get("metadata"),
**kwargs,
)
memory: Optional[BaseMemory] = None
"""Optional memory object. Defaults to None.
@@ -553,7 +577,7 @@ class Chain(Serializable, Runnable[Dict[str, Any], Dict[str, Any]], ABC):
A dictionary representation of the chain.
Example:
..code-block:: python
.. code-block:: python
chain.dict(exclude_unset=True)
# -> {"_type": "foo", "verbose": False, ...}

View File

@@ -100,6 +100,13 @@ class StuffDocumentsChain(BaseCombineDocumentsChain):
)
return values
@property
def input_keys(self) -> List[str]:
extra_keys = [
k for k in self.llm_chain.input_keys if k != self.document_variable_name
]
return super().input_keys + extra_keys
def _get_inputs(self, docs: List[Document], **kwargs: Any) -> dict:
"""Construct inputs from kwargs and docs.

View File

@@ -127,6 +127,8 @@ class LLMChain(Chain):
) -> Tuple[List[PromptValue], Optional[List[str]]]:
"""Prepare prompts from inputs."""
stop = None
if len(input_list) == 0:
return [], stop
if "stop" in input_list[0]:
stop = input_list[0]["stop"]
prompts = []
@@ -151,6 +153,8 @@ class LLMChain(Chain):
) -> Tuple[List[PromptValue], Optional[List[str]]]:
"""Prepare prompts from inputs."""
stop = None
if len(input_list) == 0:
return [], stop
if "stop" in input_list[0]:
stop = input_list[0]["stop"]
prompts = []

View File

@@ -49,6 +49,8 @@ class ElementInViewPort(TypedDict):
class Crawler:
"""A crawler for web pages."""
def __init__(self) -> None:
try:
from playwright.sync_api import sync_playwright

View File

@@ -9,6 +9,7 @@ try:
except ImportError:
def v_args(*args: Any, **kwargs: Any) -> Any: # type: ignore
"""Dummy decorator for when lark is not installed."""
return lambda _: None
Transformer = object # type: ignore

View File

@@ -3,6 +3,8 @@ import functools
import logging
from typing import Any, Awaitable, Callable, Dict, List, Optional
from pydantic import Field
from langchain.callbacks.manager import (
AsyncCallbackManagerForChainRun,
CallbackManagerForChainRun,
@@ -27,9 +29,11 @@ class TransformChain(Chain):
"""The keys expected by the transform's input dictionary."""
output_variables: List[str]
"""The keys returned by the transform's output dictionary."""
transform: Callable[[Dict[str, str]], Dict[str, str]]
transform_cb: Callable[[Dict[str, str]], Dict[str, str]] = Field(alias="transform")
"""The transform function."""
atransform: Optional[Callable[[Dict[str, Any]], Awaitable[Dict[str, Any]]]] = None
atransform_cb: Optional[
Callable[[Dict[str, Any]], Awaitable[Dict[str, Any]]]
] = Field(None, alias="atransform")
"""The async coroutine transform function."""
@staticmethod
@@ -62,18 +66,18 @@ class TransformChain(Chain):
inputs: Dict[str, str],
run_manager: Optional[CallbackManagerForChainRun] = None,
) -> Dict[str, str]:
return self.transform(inputs)
return self.transform_cb(inputs)
async def _acall(
self,
inputs: Dict[str, Any],
run_manager: Optional[AsyncCallbackManagerForChainRun] = None,
) -> Dict[str, Any]:
if self.atransform is not None:
return await self.atransform(inputs)
if self.atransform_cb is not None:
return await self.atransform_cb(inputs)
else:
self._log_once(
"TransformChain's atransform is not provided, falling"
" back to synchronous transform"
)
return self.transform(inputs)
return self.transform_cb(inputs)

View File

@@ -9,9 +9,9 @@ from typing import TYPE_CHECKING, Optional, Set
import requests
from pydantic import Field, root_validator
from langchain.adapters.openai import convert_message_to_dict
from langchain.chat_models.openai import (
ChatOpenAI,
_convert_message_to_dict,
_import_tiktoken,
)
from langchain.schema.messages import BaseMessage
@@ -178,7 +178,7 @@ class ChatAnyscale(ChatOpenAI):
tokens_per_message = 3
tokens_per_name = 1
num_tokens = 0
messages_dict = [_convert_message_to_dict(m) for m in messages]
messages_dict = [convert_message_to_dict(m) for m in messages]
for message in messages_dict:
num_tokens += tokens_per_message
for key, value in message.items():

View File

@@ -40,11 +40,20 @@ class AzureChatOpenAI(ChatOpenAI):
Be aware the API version may change.
You can also specify the version of the model using ``model_version`` constructor
parameter, as Azure OpenAI doesn't return model version with the response.
Default is empty. When you specify the version, it will be appended to the
model name in the response. Setting correct version will help you to calculate the
cost properly. Model version is not validated, so make sure you set it correctly
to get the correct cost.
Any parameters that are valid to be passed to the openai.create call can be passed
in, even if not explicitly saved on this class.
"""
deployment_name: str = ""
model_version: str = ""
openai_api_type: str = ""
openai_api_base: str = ""
openai_api_version: str = ""
@@ -137,7 +146,19 @@ class AzureChatOpenAI(ChatOpenAI):
for res in response["choices"]:
if res.get("finish_reason", None) == "content_filter":
raise ValueError(
"Azure has not provided the response due to a content"
" filter being triggered"
"Azure has not provided the response due to a content filter "
"being triggered"
)
return super()._create_chat_result(response)
chat_result = super()._create_chat_result(response)
if "model" in response:
model = response["model"]
if self.model_version:
model = f"{model}-{self.model_version}"
if chat_result.llm_output is not None and isinstance(
chat_result.llm_output, dict
):
chat_result.llm_output["model_name"] = model
return chat_result

View File

@@ -51,6 +51,8 @@ def _get_verbosity() -> bool:
class BaseChatModel(BaseLanguageModel[BaseMessageChunk], ABC):
"""Base class for chat models."""
cache: Optional[bool] = None
"""Whether to cache the response."""
verbose: bool = Field(default_factory=_get_verbosity)
@@ -103,12 +105,18 @@ class BaseChatModel(BaseLanguageModel[BaseMessageChunk], ABC):
stop: Optional[List[str]] = None,
**kwargs: Any,
) -> BaseMessageChunk:
config = config or {}
return cast(
BaseMessageChunk,
cast(
ChatGeneration,
self.generate_prompt(
[self._convert_input(input)], stop=stop, **(config or {}), **kwargs
[self._convert_input(input)],
stop=stop,
callbacks=config.get("callbacks"),
tags=config.get("tags"),
metadata=config.get("metadata"),
**kwargs,
).generations[0][0],
).message,
)
@@ -127,8 +135,14 @@ class BaseChatModel(BaseLanguageModel[BaseMessageChunk], ABC):
None, partial(self.invoke, input, config, stop=stop, **kwargs)
)
config = config or {}
llm_result = await self.agenerate_prompt(
[self._convert_input(input)], stop=stop, **(config or {}), **kwargs
[self._convert_input(input)],
stop=stop,
callbacks=config.get("callbacks"),
tags=config.get("tags"),
metadata=config.get("metadata"),
**kwargs,
)
return cast(
BaseMessageChunk, cast(ChatGeneration, llm_result.generations[0][0]).message

Some files were not shown because too many files have changed in this diff Show More