Compare commits

..

211 Commits

Author SHA1 Message Date
Harrison Chase
bac676c8e7 bump version (#1057) 2023-02-15 07:09:10 -08:00
Ankush Gola
d8ac274fc2 add to async chain notebook (#1056) 2023-02-14 18:20:38 -08:00
Ankush Gola
caa8e4742e Enable streaming for OpenAI LLM (#986)
* Support a callback `on_llm_new_token` that users can implement when
`OpenAI.streaming` is set to `True`
2023-02-14 15:06:14 -08:00
Harrison Chase
f05f025e41 bump version to 0086 (#1050) 2023-02-14 07:14:40 -08:00
Sasmitha Manathunga
c67c5383fd docs: fix typo in notebook (#1046) 2023-02-14 07:06:08 -08:00
Harrison Chase
88bebb4caa Harrison/llm integrations (#1039)
Co-authored-by: jped <jonathanped@gmail.com>
Co-authored-by: Justin Torre <justintorre75@gmail.com>
Co-authored-by: Ivan Vendrov <ivan@anthropic.com>
2023-02-13 22:06:25 -08:00
Harrison Chase
ec727bf166 Align table info (#999) (#1034)
Currently the chain is getting the column names and types on the one
side and the example rows on the other. It is easier for the llm to read
the table information if the column name and examples are shown together
so that it can easily understand to which columns do the examples refer
to. For an instantiation of this, please refer to the changes in the
`sqlite.ipynb` notebook.

Also changed `eval` for `ast.literal_eval` when interpreting the results
from the sample row query since it is a better practice.

---------

Co-authored-by: Francisco Ingham <>

---------

Co-authored-by: Francisco Ingham <fpingham@gmail.com>
2023-02-13 21:48:41 -08:00
Harrison Chase
8c45f06d58 Harrison/standarize prompt loading (#1036)
Co-authored-by: Ibis Prevedello <ibiscp@gmail.com>
2023-02-13 21:48:09 -08:00
Enrico Shippole
f30dcc6359 Add GooseAI, CerebriumAI, Petals, ForefrontAI (#981)
Add GooseAI, CerebriumAI, Petals, ForefrontAI
2023-02-13 21:20:19 -08:00
Anton Troynikov
d43d430d86 Chroma persistence (#1028)
This PR adds persistence to the Chroma vector store.

Users can supply a `persist_directory` with any of the `Chroma` creation
methods. If supplied, the store will be automatically persisted at that
directory.

If a user creates a new `Chroma` instance with the same persistence
directory, it will get loaded up automatically. If they use `from_texts`
or `from_documents` in this way, the documents will be loaded into the
existing store.

There is the chance of some funky behavior if the user passes a
different embedding function from the one used to create the collection
- we will make this easier in future updates. For now, we log a warning.
2023-02-13 21:09:06 -08:00
Harrison Chase
012a6dfb16 Harrison/makefile (#1033)
Co-authored-by: blob42 <contact@blob42.xyz>
Co-authored-by: blob42 <spike@w530>
2023-02-13 21:08:47 -08:00
Harrison Chase
6a31a59400 add links (#1027) 2023-02-13 16:33:30 -08:00
Oliver Klingefjord
20889205e8 Added retry for openai.error.ServiceUnavailableError (#1022)
Imho retries should be performed for ServiceUnavailableError (which
tends to happen to me quite often).
2023-02-13 13:30:06 -08:00
Harrison Chase
fc2502cd81 bump version to 0085 (#1017) 2023-02-13 07:32:36 -08:00
Harrison Chase
0f0e69adce agent refactors (#997) 2023-02-12 23:02:13 -08:00
Harrison Chase
7fb33fca47 chroma docs (#1012) 2023-02-12 23:02:01 -08:00
Harrison Chase
0c553d2064 Harrion/kg (#1016)
Co-authored-by: William FH <13333726+hinthornw@users.noreply.github.com>
2023-02-12 23:01:26 -08:00
Anton Troynikov
78abd277ff Chroma in LangChain (#1010)
Chroma is a simple to use, open-source, zero-config, zero setup
vectorstore.

Simply `pip install chromadb`, and you're good to go. 

Out-of-the-box Chroma is suitable for most LangChain workloads, but is
highly flexible. I tested to 1M embs on my M1 mac, with out issues and
reasonably fast query times.

Look out for future releases as we integrate more Chroma features with
LangChain!
2023-02-12 17:43:48 -08:00
cragwolfe
05d8969c79 Unstructured example notebook: add a pdf, related deps (#1011)
Updates the Unstructured example notebook with a PDF example. Includes
additional dependencies for PDF processing (and images, etc).
2023-02-12 14:56:48 -08:00
Dhruv Anand
03e5794978 typo fix on chat vector db docs (#1007)
simple typo fix: because --> between
2023-02-12 12:09:21 -08:00
Harrison Chase
6d44a2285c bump version to 0084 (#1005) 2023-02-12 07:47:10 -08:00
Harrison Chase
0998577dfe Harrison/unstructured structured (#1004) 2023-02-12 07:36:11 -08:00
Harrison Chase
bbb06ca4cf pdfminer (#1003) 2023-02-12 07:29:26 -08:00
Francisco Ingham
0b6aa6a024 Added initial capital letter to bullet points that had it missing (#1000)
Co-authored-by: Francisco Ingham <>
2023-02-11 20:31:34 -08:00
Harrison Chase
10e7297306 Harrison/fake llm (#990)
Co-authored-by: Stefan Keselj <skeselj@princeton.edu>
Co-authored-by: Harrison Chase <harrisonchase@Harrisons-MBP.attlocal.net>
2023-02-11 15:12:35 -08:00
Harrison Chase
e51fad1488 Harrison/0083 (#996)
Co-authored-by: Harrison Chase <harrisonchase@Harrisons-MBP.attlocal.net>
2023-02-11 08:29:28 -08:00
Shahriar Tajbakhsh
b7747017d7 Import of declarative_base when SQLAlchemy <1.4 (#883)
In
[pyproject.toml](https://github.com/hwchase17/langchain/blob/master/pyproject.toml),
the expectation is `SQLAlchemy = "^1"`. But, the way `declarative_base`
is imported in
[cache.py](https://github.com/hwchase17/langchain/blob/master/langchain/cache.py)
will only work with SQLAlchemy >=1.4. This PR makes sure Langchain can
be run in environments with SQLAlchemy <1.4
2023-02-10 18:33:47 -08:00
Harrison Chase
2e96704d59 Harrison/airbyte (#989)
Co-authored-by: zanderchase <zanderchase@gmail.com>
Co-authored-by: Harrison Chase <harrisonchase@Harrisons-MacBook-Pro.local>
2023-02-10 18:08:00 -08:00
Charles Frye
e9799d6821 improves huggingface_hub example (#988)
The provided example uses the default `max_length` of `20` tokens, which
leads to the example generation getting cut off. 20 tokens is way too
short to show CoT reasoning, so I boosted it to `64`.

Without knowing HF's API well, it can be hard to figure out just where
those `model_kwargs` come from, and `max_length` is a super critical
one.
2023-02-10 17:56:15 -08:00
zanderchase
c2d1d903fa Zander/online pdf loader (#984) 2023-02-10 15:42:30 -08:00
Harrison Chase
055a53c27f add texts example (#985)
Co-authored-by: Harrison Chase <harrisonchase@Harrisons-MacBook-Pro.local>
2023-02-10 12:32:44 -08:00
Harrison Chase
231da14771 bump version to 0082 (#980)
Co-authored-by: Harrison Chase <harrisonchase@Harrisons-MacBook-Pro.local>
2023-02-10 11:38:24 -08:00
jeff
6ab432d62e docs: update spelling typos (#982)
Wonder why "with" is spelled "wiht" so many times by human
2023-02-10 11:37:59 -08:00
Matt Robinson
07a407d89a feat: adds UnstructuredURLLoader for loading data from urls (#979)
### Summary

Adds a `UnstructuredURLLoader` that supports loading data from a list of
URLs.


### Testing

```python
from langchain.document_loaders import UnstructuredURLLoader

urls = [
    "https://www.understandingwar.org/backgrounder/russian-offensive-campaign-assessment-february-8-2023",
    "https://www.understandingwar.org/backgrounder/russian-offensive-campaign-assessment-february-9-2023"
]
loader = UnstructuredURLLoader(urls=urls)
raw_documents = loader.load()
```
2023-02-10 10:18:38 -08:00
Harrison Chase
c64f98e2bb Harrison/format agent instructions (#973)
Co-authored-by: Andrew White <white.d.andrew@gmail.com>
Co-authored-by: Harrison Chase <harrisonchase@Harrisons-MBP.attlocal.net>
Co-authored-by: Peng Qu <82029664+pengqu123@users.noreply.github.com>
2023-02-10 10:07:26 -08:00
Harrison Chase
5469d898a9 Harrison/everynote (#974)
Co-authored-by: Harrison Chase <harrisonchase@Harrisons-MBP.attlocal.net>
2023-02-10 08:02:35 -08:00
Harrison Chase
3d639d1539 update lint (#975)
Co-authored-by: Harrison Chase <harrisonchase@Harrisons-MBP.attlocal.net>
2023-02-10 08:01:13 -08:00
Harrison Chase
91c6cea227 Harrison/batch embeds (#972)
Co-authored-by: John Dagdelen <jdagdelen@users.noreply.github.com>
Co-authored-by: Harrison Chase <harrisonchase@Harrisons-MBP.attlocal.net>
2023-02-10 06:59:50 -08:00
Harrison Chase
ba54d36787 Harrison/tiktoken spec (#964)
Co-authored-by: James Briggs <35938317+jamescalam@users.noreply.github.com>
Co-authored-by: Harrison Chase <harrisonchase@Harrisons-MBP.attlocal.net>
2023-02-09 23:30:18 -08:00
Harrison Chase
5f8082bdd7 Harrison/deps (#963)
Co-authored-by: Jon Luo <20971593+jzluo@users.noreply.github.com>
Co-authored-by: Harrison Chase <harrisonchase@Harrisons-MBP.attlocal.net>
2023-02-09 23:19:19 -08:00
Kevin Huo
512c523368 remove sample_row_in_table_info and simplify set operations in SQLDB (#932)
-Address TODO: deprecate for sample_row_in_table_info
-Simplify set operations by casting to sets to not need multiple set
casts + .difference() calls
2023-02-09 23:15:41 -08:00
Harrison Chase
e323d0cfb1 bump version 0081 (#956)
Co-authored-by: Harrison Chase <harrisonchase@Harrisons-MBP.attlocal.net>
2023-02-09 08:29:11 -08:00
Harrison Chase
01fa2d8117 Harrison/youtube fixes (#955)
Co-authored-by: Ji <jizhang.work@gmail.com>
Co-authored-by: Harrison Chase <harrisonchase@Harrisons-MBP.attlocal.net>
2023-02-09 08:12:22 -08:00
zanderchase
8e126bc9bd adding webpage loading logic (#942) 2023-02-09 07:52:50 -08:00
Harrison Chase
c71027e725 add docs for steamship deployment (#949)
Co-authored-by: Harrison Chase <harrisonchase@Harrisons-MBP.attlocal.net>
2023-02-08 16:01:19 -08:00
Usama Navid
e85c53ce68 Update readthedocs.py (#943)
Sometimes, the docs may be empty. For example for the text =
soup.find_all("main", {"id": "main-content"}) was an empty list. To
cater to these edge cases, the clean function needs to be checked if it
is empty or not.
2023-02-08 16:01:07 -08:00
Harrison Chase
3e1901e1aa gutenberg books (#946)
Co-authored-by: zanderchase <zander@unfold.ag>
Co-authored-by: Harrison Chase <harrisonchase@Harrisons-MBP.attlocal.net>
2023-02-08 12:00:47 -08:00
jeff
6a4f602156 docs: fix spelling typo (#934) 2023-02-08 11:13:35 -08:00
Ikko Eltociear Ashimine
6023d5be09 Update huggingface_hub.ipynb (#944)
HuggingFace -> Hugging Face
2023-02-08 11:05:28 -08:00
Harrison Chase
a306baacd1 bump version to 0080 (#941) 2023-02-08 07:41:25 -08:00
Harrison Chase
44ecec3896 Harrison/add roam loader (#939) 2023-02-08 00:35:33 -08:00
Ankush Gola
bc7e56e8df Add asyncio support for LLM (OpenAI), Chain (LLMChain, LLMMathChain), and Agent (#841)
Supporting asyncio in langchain primitives allows for users to run them
concurrently and creates more seamless integration with
asyncio-supported frameworks (FastAPI, etc.)

Summary of changes:

**LLM**
* Add `agenerate` and `_agenerate`
* Implement in OpenAI by leveraging `client.Completions.acreate`

**Chain**
* Add `arun`, `acall`, `_acall`
* Implement them in `LLMChain` and `LLMMathChain` for now

**Agent**
* Refactor and leverage async chain and llm methods
* Add ability for `Tools` to contain async coroutine
* Implement async SerpaPI `arun`

Create demo notebook.

Open questions:
* Should all the async stuff go in separate classes? I've seen both
patterns (keeping the same class and having async and sync methods vs.
having class separation)
2023-02-07 21:21:57 -08:00
Vincent Elster
afc7f1b892 Fix typos (#929)
accomplisehd -> accomplished
2023-02-07 14:39:45 -08:00
Harrison Chase
d43250bfa5 Harrison/ver0079 (#927) 2023-02-07 07:59:35 -08:00
Harrison Chase
bc53c928fc Harrison/athropic (#921)
Co-authored-by: Mike Lambert <mlambert@gmail.com>
Co-authored-by: mrbean <sam@you.com>
Co-authored-by: mrbean <43734688+sam-h-bean@users.noreply.github.com>
Co-authored-by: Ivan Vendrov <ivendrov@gmail.com>
2023-02-06 22:29:25 -08:00
Harrison Chase
637c0d6508 Harrison/obsidian (#920) 2023-02-06 22:21:16 -08:00
Harrison Chase
1e56879d38 Harrison/save faiss (#916)
Co-authored-by: Shrey Joshi <shreyjoshi2004@gmail.com>
2023-02-06 21:44:50 -08:00
Ankush Gola
6bd1529cb7 add GoogleDriveLoader (#914)
only deal with docs files for now
2023-02-06 21:44:35 -08:00
Harrison Chase
2584663e44 remove unused buffer (#919) 2023-02-06 20:31:30 -08:00
Harrison Chase
cc20b9425e add reqs (#918) 2023-02-06 20:30:03 -08:00
Harrison Chase
cea380174f fix docs custom prompt template (#917) 2023-02-06 20:29:48 -08:00
Harrison Chase
87fad8fc00 analyze document (#731)
add analyze document chain, which does text splitting and then analysis
2023-02-06 20:02:19 -08:00
Harrison Chase
e2b834e427 Harrison/prompt template prefix (#888)
Co-authored-by: Gabriel Simmons <simmons.gabe@gmail.com>
2023-02-06 19:09:28 -08:00
Harrison Chase
f95cedc443 Harrison/sql rows (#915)
Co-authored-by: Jon Luo <20971593+jzluo@users.noreply.github.com>
2023-02-06 18:56:18 -08:00
Harrison Chase
ba5a2f06b9 Harrison/inference endpoint (#861)
Co-authored-by: Eno Reyes <enoreyes@gmail.com>
2023-02-06 18:14:25 -08:00
Harrison Chase
2ec25ddd4c add unstructured examples (#913) 2023-02-06 18:13:46 -08:00
Kevin Huo
31b054f69d Add pinecone integration test (#911)
Basic integration test for pinecone
2023-02-06 18:13:35 -08:00
Harrison Chase
93a091cfb8 Optionally return shell output on incorrect command (#894) (#899)
This allows the LLM to correct its previous command by looking at the
error message output to the shell.

Additionally, this uses subprocess.run because that is now recommended
over subprocess.check_output:

https://docs.python.org/3/library/subprocess.html#using-the-subprocess-module

Co-authored-by: Amos Ng <me@amos.ng>
2023-02-06 12:46:16 -08:00
James Briggs
3aa53b44dd added i_end in batch extraction (#907)
Fix for issue #906 

Switches `[i : i + batch_size]` to `[i : i_end]` in Pinecone
`from_texts` method
2023-02-06 12:45:56 -08:00
Harrison Chase
82c080c6e6 bump version to 0078 (#908) 2023-02-06 00:32:44 -08:00
Harrison Chase
71e662e88d update docs (#905) 2023-02-06 00:26:20 -08:00
Harrison Chase
53d56d7650 Harrison/unstructured support (#903) 2023-02-05 23:02:07 -08:00
Harrison Chase
2a68be3e8d chat vector db chain (#902) 2023-02-05 21:38:47 -08:00
James Briggs
8217a2f26c Update pinecone init details in docs (#898)
PR to fix outdated environment details in the docs, see issue #897 

I added code comments as pointers to where users go to get API keys, and
where they can find the relevant environment variable.
2023-02-05 15:21:56 -08:00
Bagatur
7658263bfb Check type of LLM.generate prompts arg (#886)
Was passing prompt in directly as string and getting nonsense outputs.
Had to inspect source code to realize that first arg should be a list.
Could be nice if there was an explicit error or warning, seems like this
could be a common mistake.
2023-02-04 22:49:17 -08:00
Samantha Whitmore
32b11101d3 Get elements of ActionInput on newlines (#889)
The re.DOTALL flag in Python's re (regular expression) module makes the
. (dot) metacharacter match newline characters as well as any other
character.

Without re.DOTALL, the . metacharacter only matches any character except
for a newline character. With re.DOTALL, the . metacharacter matches any
character, including newline characters.
2023-02-04 20:42:25 -08:00
Harrison Chase
1614c5f5fd fix flaky tests (#892) 2023-02-04 20:41:33 -08:00
Harrison Chase
a2b699dcd2 prompt template from string (#884) 2023-02-04 17:04:58 -08:00
Alex
7cc44b3bdb Add to gallery (#882) 2023-02-04 09:45:20 -08:00
Harrison Chase
0b9f086d36 Harrison/docs splitter (#879) 2023-02-03 15:09:13 -08:00
Harrison Chase
bcfbc7a818 version 0077 (#878) 2023-02-03 14:49:52 -08:00
Ryan Walker
1dd0733515 Fix small typo in getting started docs (#876)
Just noticed this little typo while reading the docs, thought I'd open a
PR!
2023-02-03 14:22:12 -08:00
Zach Schillaci
4c79100b15 Correct prompt typo + update example for SQLDatabaseChain (#868)
See https://github.com/hwchase17/langchain/issues/821
2023-02-03 08:34:41 -08:00
Harrison Chase
777aaff841 fix routing to tiktoken encoder (#866) 2023-02-02 22:08:14 -08:00
Harrison Chase
e9ef08862d validate template (#865) 2023-02-02 22:08:01 -08:00
Harrison Chase
364b771743 sql return direct (#864) 2023-02-02 22:07:41 -08:00
Harrison Chase
483441d305 pass kwargs through to loading (#863) 2023-02-02 22:07:26 -08:00
Harrison Chase
8df6b68093 fix length based example selector (#862) 2023-02-02 22:06:56 -08:00
Harrison Chase
3f48eed5bd Harrison/milvus (#856)
Signed-off-by: Filip Haltmayer <filip.haltmayer@zilliz.com>
Signed-off-by: Frank Liu <frank.liu@zilliz.com>
Co-authored-by: Filip Haltmayer <81822489+filip-halt@users.noreply.github.com>
Co-authored-by: Frank Liu <frank@frankzliu.com>
2023-02-02 22:05:47 -08:00
Ankush Gola
933441cc52 Add retry to OpenAI llm (#849)
add ability to retry when certain exceptions are raised by
`openai.Completions.create`

Test plan: ran all OpenAI integration tests.
2023-02-02 19:56:26 -08:00
kahkeng
4a8f5cdf4b Add alternative token-based text splitter (#816)
This does not involve a separator, and will naively chunk input text at
the appropriate boundaries in token space.

This is helpful if we have strict token length limits that we need to
strictly follow the specified chunk size, and we can't use aggressive
separators like spaces to guarantee the absence of long strings.

CharacterTextSplitter will let these strings through without splitting
them, which could cause overflow errors downstream.

Splitting at arbitrary token boundaries is not ideal but is hopefully
mitigated by having a decent overlap quantity. Also this results in
chunks which has exact number of tokens desired, instead of sometimes
overcounting if we concatenate shorter strings.

Potentially also helps with #528.
2023-02-02 19:55:13 -08:00
Harrison Chase
523ad2e6bd vercel deployments (#850) 2023-02-02 19:54:09 -08:00
Harrison Chase
fc0cfd7d1f docs (#848) 2023-02-02 11:35:36 -08:00
Harrison Chase
4d32441b86 bump version to 0076 (#847) 2023-02-02 10:05:39 -08:00
Harrison Chase
23d5f64bda Harrison/ngram example (#846)
Co-authored-by: Sean Spriggens <ssprigge@syr.edu>
2023-02-02 09:44:42 -08:00
Harrison Chase
0de55048b7 return code for pal (#844) 2023-02-02 08:47:20 -08:00
Harrison Chase
d564308e0f rfc: instruct embeddings (#811)
Co-authored-by: seanaedmiston <seane999@gmail.com>
2023-02-02 08:44:02 -08:00
Nick Furlotte
576609e665 Update PAL to allow passing local and global context to PythonREPL (#774)
Passing additional variables to the python environment can be useful for
example if you want to generate code to analyze a dataset.

I also added a tracker for the executed code - `code_history`.
2023-02-02 08:34:23 -08:00
Harrison Chase
3f952eb597 add from string method (#820) 2023-02-02 08:23:54 -08:00
Ikko Eltociear Ashimine
ba26a879e0 Fix typo in crawler.py (#842)
seperator -> separator
2023-02-02 08:23:38 -08:00
Eli Mernit
bfabd1d5c0 Added new deployment template (#835)
This PR introduces a new template for deploying LangChain apps as web
endpoints. It includes template code, and links to a detailed
code-walkthrough.
2023-02-01 23:38:36 -08:00
Jonas Ehrenstein
f3508228df Minor fix for google search util: it's uncertain if "snippet" in results exists (#830)
The results from Google search may not always contain a "snippet". 

Example:
`{'kind': 'customsearch#result', 'title': 'FEMA Flood Map', 'htmlTitle':
'FEMA Flood Map', 'link': 'https://msc.fema.gov/portal/home',
'displayLink': 'msc.fema.gov', 'formattedUrl':
'https://msc.fema.gov/portal/home', 'htmlFormattedUrl':
'https://<b>msc</b>.fema.gov/portal/home'}`

This will cause a KeyError at line 99
`snippets.append(result["snippet"])`.
2023-02-01 23:37:52 -08:00
Zach Schillaci
b4eb043b81 Minor fix to SQLDatabaseChain doc (#826) 2023-02-01 23:37:38 -08:00
Istora Mandiri
06438794e1 Fix typo in textsplitter docs (#825) 2023-02-01 23:32:35 -08:00
Raza Habib
9f8e05ffd4 Update __init__.py (#827)
Remove duplicate APIChain
2023-02-01 23:31:38 -08:00
Harrison Chase
b0d560be56 add to gallery (#824) 2023-02-01 07:10:15 -08:00
Johanna Appel
ebea40ce86 Add 'truncate' parameter for CohereEmbeddings (#798)
Currently, the 'truncate' parameter of the cohere API is not supported.

This means that by default, if trying to generate and embedding that is
too big, the call will just fail with an error (which is frustrating if
using this embedding source e.g. with GPT-Index, because it's hard to
handle it properly when generating a lot of embeddings).
With the parameter, one can decide to either truncate the START or END
of the text to fit the max token length and still generate an embedding
without throwing the error.

In this PR, I added this parameter to the class.

_Arguably, there should be a better way to handle this error, e.g. by
optionally calling a function or so that gets triggered when the token
limit is reached and can split the document or some such. Especially in
the use case with GPT-Index, its often hard to estimate the token counts
for each document and I'd rather sort out the troublemakers or simply
split them than interrupting the whole execution.
Thoughts?_

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-02-01 07:09:03 -08:00
Harrison Chase
b9045f7e0d bump version to 0075 (#819) 2023-01-31 00:18:32 -08:00
Harrison Chase
7b4882a2f4 Harrison/tf embeddings (#817)
Co-authored-by: Ryohei Kuroki <10434946+yakigac@users.noreply.github.com>
2023-01-31 00:00:08 -08:00
Harrison Chase
5d4b6e4d4e conversational agent fix (#818) 2023-01-30 23:59:55 -08:00
Harrison Chase
94ae126747 return sql intermediate steps (#792) 2023-01-30 15:10:48 -08:00
bair82
ae5695ad32 Update cohere.py (#795)
When stop tokens are set in Cohere LLM constructor, they are currently
not stripped from the response, and they should be stripped
2023-01-30 14:55:44 -08:00
Johanna Appel
cacf4091c0 Fix documentation for 'model' parameter in CohereEmbeddings (#797)
Currently, the class parameter 'model_name' of the CohereEmbeddings
class is not supported, but 'model' is. The class documentation is
inconsistent with this, though, so I propose to either fix the
documentation (this PR right now) or fix the parameter.

It will create the following error:
```
ValidationError: 1 validation error for CohereEmbeddings
model_name
  extra fields not permitted (type=value_error.extra)
```
2023-01-30 14:55:08 -08:00
Jason Liu
54f9e4287f Pass kwargs from initialize_agent into agent classmethod (#799)
# Problem
I noticed that in order to change the prefix of the prompt in the
`zero-shot-react-description` agent
we had to dig around to subset strings deep into the agent's attributes.
It requires the user to inspect a long chain of attributes and classes.

`initialize_agent -> AgentExecutor -> Agent -> LLMChain -> Prompt from
Agent.create_prompt`

``` python
agent = initialize_agent(
    tools=tools,
    llm=fake_llm,
    agent="zero-shot-react-description"
)
prompt_str = agent.agent.llm_chain.prompt.template
new_prompt_str = change_prefix(prompt_str)
agent.agent.llm_chain.prompt.template = new_prompt_str
```

# Implemented Solution

`initialize_agent` accepts `**kwargs` but passes it to `AgentExecutor`
but not `ZeroShotAgent`, by simply giving the kwargs to the agent class
methods we can support changing the prefix and suffix for one agent
while allowing future agents to take advantage of `initialize_agent`.


```
agent = initialize_agent(
    tools=tools,
    llm=fake_llm,
    agent="zero-shot-react-description",
    agent_kwargs={"prefix": prefix, "suffix": suffix}
)
```

To be fair, this was before finding docs around custom agents here:
https://langchain.readthedocs.io/en/latest/modules/agents/examples/custom_agent.html?highlight=custom%20#custom-llmchain
but i find that my use case just needed to change the prefix a little.


# Changes

* Pass kwargs to Agent class method
* Added a test to check suffix and prefix

---------

Co-authored-by: Jason Liu <jason@jxnl.coA>
2023-01-30 14:54:09 -08:00
Roger Zurawicki
c331009440 docs: Update langchain link to PyPI (#800)
Simple one-line fix

CONTRIBUTING used a link that pointed to the `ruff` project.
2023-01-30 14:53:16 -08:00
Roy Williams
6086292252 Centralize logic for loading from LangChainHub, add ability to pin dependencies (#805)
It's generally considered to be a good practice to pin dependencies to
prevent surprise breakages when a new version of a dependency is
released. This commit adds the ability to pin dependencies when loading
from LangChainHub.

Centralizing this logic and using urllib fixes an issue identified by
some windows users highlighted in this video -
https://youtu.be/aJ6IQUh8MLQ?t=537
2023-01-30 14:52:17 -08:00
Harrison Chase
b3916f74a7 enable mmr search (#807) 2023-01-30 14:48:24 -08:00
Harrison Chase
f46f1d28af expose memory key name (#808) 2023-01-30 14:48:12 -08:00
Harrison Chase
7728a848d0 Harrison/tracing docs (#806)
Co-authored-by: Ankush Gola <9536492+agola11@users.noreply.github.com>
2023-01-29 20:49:35 -08:00
Harrison Chase
f3da4dc6ba Harrison/tracing docs (#804)
Co-authored-by: Ankush Gola <9536492+agola11@users.noreply.github.com>
2023-01-29 20:24:22 -08:00
Harrison Chase
ae1b589f60 Harrison/add link for support (#794) 2023-01-28 22:53:04 -08:00
Harrison Chase
6a20f07f0d add link for support (#793) 2023-01-28 22:44:23 -08:00
Harrison Chase
fb2d7afe71 bump version to 0074 (#791) 2023-01-28 18:50:22 -08:00
Harrison Chase
1ad7973cc6 Harrison/tool decorator (#790)
Co-authored-by: Jason Liu <jxnl@users.noreply.github.com>
Co-authored-by: Jason Liu <jason@jxnl.coA>
2023-01-28 18:26:24 -08:00
Harrison Chase
5f73d06502 Harrison/fix caching bug (#788)
Co-authored-by: thepok <richterthepok@yahoo.de>
2023-01-28 14:24:30 -08:00
Harrison Chase
248c297f1b Sample row in table info for SQLDatabase (#769) (#782)
The agents usually benefit from understanding what the data looks like
to be able to filter effectively. Sending just one row in the table info
allows the agent to understand the data before querying and get better
results.

---------

Co-authored-by: Francisco Ingham <>

---------

Co-authored-by: Francisco Ingham <fpingham@gmail.com>
2023-01-28 13:37:07 -08:00
Francisco Ingham
213c2e33e5 Sql prompt improvement (#787)
Co-authored-by: Francisco Ingham <>
2023-01-28 13:34:15 -08:00
Harrison Chase
2e0219cac0 fixing bash util (#779) 2023-01-28 08:26:29 -08:00
Harrison Chase
966611bbfa add model kwargs to handle stop token from cohere (#773) 2023-01-28 08:24:55 -08:00
Harrison Chase
7198a1cb22 Harrison/refactor agent (#781)
Co-authored-by: Amos Ng <me@amos.ng>
2023-01-28 08:24:13 -08:00
Harrison Chase
5bb2952860 Harrison/hf pipeline (#780)
Co-authored-by: Parth Chadha <parth29@gmail.com>
2023-01-28 08:23:59 -08:00
Harrison Chase
c658f0aed3 Harrison/add to search (#778)
Co-authored-by: Enrico Shippole <enricoship@gmail.com>
2023-01-28 08:06:00 -08:00
Bill Kish
309d86e339 increase text-davinci-003 contextsize to 4097 (#748)
text-davinci-003 supports a context size of 4097 tokens so return 4097
instead of 4000 in modelname_to_contextsize() for text-davinci-003

Co-authored-by: Bill Kish <bill@cogniac.co>
2023-01-28 08:05:35 -08:00
Amos Ng
6ad360bdef Suggestions for better debugging (#765)
Please feel free to disregard any changes you disagree with
2023-01-28 08:05:20 -08:00
Albert Ziegler
5198d6f541 Add missing verb (#768)
Mini drive-by PR:

I came across this sentence in a stack trace for an error I had, and it
confused me because the verb I missing. So I added the verb.
2023-01-28 07:26:27 -08:00
Harrison Chase
a5d003f0c9 update notebook and make backwards compatible (#772) 2023-01-28 07:23:04 -08:00
Harrison Chase
924b7ecf89 pass kwargs and bump (#770) 2023-01-27 08:56:36 -08:00
Harrison Chase
fc19d14a65 bump version to 0072 (#767) 2023-01-27 08:03:41 -08:00
Harrison Chase
b9ad214801 add docs for loading from hub (#763) 2023-01-27 07:10:26 -08:00
Samantha Whitmore
be7de427ca Serialize all the chains! (#761)
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-01-27 00:45:17 -08:00
Harrison Chase
e2a7fed890 Harrison/serialize from llm and tools (#760) 2023-01-26 23:30:39 -08:00
Harrison Chase
12dc7f26cc load agents from hub (#759) 2023-01-26 22:49:26 -08:00
Harrison Chase
7129f23511 output parser serialization (#758) 2023-01-26 21:51:13 -08:00
Harrison Chase
f273c50d62 add loading chains from hub (#757) 2023-01-26 21:11:31 -08:00
Harrison Chase
1b89a438cf (wip) Harrison/serialize agents (#725) 2023-01-26 19:48:47 -08:00
Harrison Chase
cc70565886 add prompt type (#730) 2023-01-26 19:48:00 -08:00
Francisco Ingham
374e510f94 Upper bound on number of iterations (#754)
Some custom agents might continue to iterate until they find the correct
answer, getting stuck on loops that generate request after request and
are really expensive for the end user. Putting an upper bound for the
number of iterations
by default controls this and can be explicitly tweaked by the user if
necessary.

Co-authored-by: Francisco Ingham <>
2023-01-26 19:47:01 -08:00
Smit Shah
28efbb05bf Add params to reduce K dynamically to reduce it below token limit (#739)
Referring to #687, I implemented the functionality to reduce K if it
exceeds the token limit.

Edit: I should have ran make lint locally. Also, this only applies to
`StuffDocumentChain`
2023-01-26 19:43:01 -08:00
Roy Williams
d2f882158f Add type information for crawler.py (#738)
Added type information to `crawler.py` to make it safer to use and
understand.
2023-01-26 19:37:31 -08:00
Harrison Chase
a80897478e bump version to 0071 (#755) 2023-01-26 18:55:25 -08:00
Ankush Gola
57609845df add tracing support to langchain (#741)
* add implementations of `BaseCallbackHandler` to support tracing:
`SharedTracer` which is thread-safe and `Tracer` which is not and is
meant to be used locally.
* Tracers persist runs to locally running `langchain-server`

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-01-26 17:38:13 -08:00
Harrison Chase
7f76a1189c bump version to 0.0.70 (#744) 2023-01-25 17:58:37 -08:00
Harrison Chase
2ba1128095 Harrison/backwards compat (#740) 2023-01-25 17:47:29 -08:00
Francisco Ingham
f9ddcb5705 Hotfix: distance_func and collection_name must not be in kwargs (#735)
If `distance_func` and `collection_name` are in `kwargs` they are sent
to the `QdrantClient` which results in an error being raised.

Co-authored-by: Francisco Ingham <>
2023-01-25 09:39:50 -08:00
Amos Ng
fa6826e417 Fix sqlalchemy warnings when running tests (#733)
This has been bugging me when running my own tests that call langchain
methods :P
2023-01-25 07:14:07 -08:00
Harrison Chase
bd0bf4e0a9 Harrison/generate blog post (#732)
Co-authored-by: Ren <yirenlu92@users.noreply.github.com>
2023-01-24 22:54:12 -08:00
Harrison Chase
9194a8be89 add stop to stream (#729) 2023-01-24 22:49:24 -08:00
scadEfUr
e3df8ab6dc move hyde into chains (#728)
Co-authored-by: scadEfUr <>
2023-01-24 22:23:32 -08:00
Harrison Chase
0ffeabd14f Harrison/serialize llm chain (#671) 2023-01-24 21:36:19 -08:00
Sam Hogan
499e54edda fix typos in readme and text splitter docs (#720)
Fix typos in readme and TextSplitter documentation.
2023-01-24 10:59:23 -08:00
I-E-E-E
f62dbb018b fix a url (#719) 2023-01-24 10:56:15 -08:00
Николай Шангин
18b1466893 Fix not imported 'validator' (#715)
otherwise `@validator("input_variables")` do not work
2023-01-24 07:06:50 -08:00
Feynman Liang
2824f36401 Add namespace to Pinecone.from_index (#716)
Resolves https://github.com/hwchase17/langchain/issues/718
2023-01-24 07:02:57 -08:00
Kacper Łukawski
d4f719c34b Convert numpy arrays to lists in HuggingFaceEmbeddings (#714)
`SentenceTransformer` returns a NumPy array, not a `List[List[float]]`
or `List[float]` as specified in the interface of `Embeddings`. That PR
makes it consistent with the interface.
2023-01-24 07:01:40 -08:00
Kacper Łukawski
97c3544a1e Hotfix: Qdrant.from_text embeddings (#713)
I'm providing a hotfix for Qdrant integration. Calculating a single
embedding to obtain the vector size was great idea. However, that change
introduced a bug trying to put only that single embedding into the
database. It's fixed. Right now all the embeddings will be pushed to
Qdrant.
2023-01-24 07:01:07 -08:00
Harrison Chase
b69b551c8b clarify use cases (#711) 2023-01-24 00:37:26 -08:00
Harrison Chase
1e4927a1d2 bump version to 0069 (#710) 2023-01-24 00:24:54 -08:00
Feynman Liang
3a38604f07 Fix typo (#705) 2023-01-23 23:08:38 -08:00
Nicolas
66fd57878a docs: Update vector_db_qa_with_sources.ipynb (#706) 2023-01-23 23:06:54 -08:00
Harrison Chase
fc4ad2db0f langchain hub docs (#704)
Co-authored-by: scadEfUr <123224380+scadEfUr@users.noreply.github.com>
2023-01-23 23:06:23 -08:00
Scott Leibrand
34932dd211 remove legacy embedding model name (#703)
Now that OpenAI has deprecated all embeddings models except
text-embedding-ada-002, we should stop specifying a legacy embedding
model in the example. This will also avoid confusion from people (like
me) trying to specify model="text-embedding-ada-002" and having that
erroneously expanded to text-search-text-embedding-ada-002-query-001
2023-01-23 14:31:31 -08:00
Harrison Chase
75edd85fed version 0068 (#701) 2023-01-23 07:24:09 -08:00
scadEfUr
4aba0abeaa added common prompt load method (#699)
Co-authored-by: scadEfUr
2023-01-22 23:46:11 -08:00
xloem
36b6b3cdf6 HuggingFacePipeline: Forward model_kwargs. (#696)
Since the tokenizer and model are constructed manually, model_kwargs
needs to
be passed to their constructors. Additionally, the pipeline has a
specific
named parameter to pass these with, which can provide forward
compatibility if
they are used for something other than tokenizer or model construction.
2023-01-22 23:38:47 -08:00
Harrison Chase
3a30e6daa8 Harrison/openai callback (#684) 2023-01-22 23:37:01 -08:00
Harrison Chase
aef82f5d59 fix whitespace for conversational agent (#690) 2023-01-22 22:39:53 -08:00
Amos Ng
8baf6fb920 Update examples to fix execution problems (#685)
On the [Getting Started
page](https://langchain.readthedocs.io/en/latest/modules/prompts/getting_started.html)
for prompt templates, I believe the very last example

```python
print(dynamic_prompt.format(adjective=long_string))
```

should actually be

```python
print(dynamic_prompt.format(input=long_string))
```

The existing example produces `KeyError: 'input'` as expected

***

On the [Create a custom prompt
template](https://langchain.readthedocs.io/en/latest/modules/prompts/examples/custom_prompt_template.html#id1)
page, I believe the line

```python
Function Name: {kwargs["function_name"]}
```

should actually be

```python
Function Name: {kwargs["function_name"].__name__}
```

The existing example produces the prompt:

```
        Given the function name and source code, generate an English language explanation of the function.
        Function Name: <function get_source_code at 0x7f907bc0e0e0>
        Source Code:
        def get_source_code(function_name):
    # Get the source code of the function
    return inspect.getsource(function_name)

        Explanation:
```

***

On the [Example
Selectors](https://langchain.readthedocs.io/en/latest/modules/prompts/examples/example_selectors.html)
page, the first example does not define `example_prompt`, which is also
subtly different from previous example prompts used. For user
convenience, I suggest including

```python
example_prompt = PromptTemplate(
    input_variables=["input", "output"],
    template="Input: {input}\nOutput: {output}",
)
```

in the code to be copy-pasted
2023-01-22 14:49:25 -08:00
Harrison Chase
86dbdb118b Harrison/serpapi extra tools (#691)
Co-authored-by: Bruno Bornsztein <bruno.bornsztein@gmail.com>
2023-01-22 14:48:54 -08:00
Saurav Maheshkar
b4fcdeb56c chore: move coverage config to pyproject (#694)
This PR aims to move the contents of `.coveragerc` to `pyproject.toml`
to make the overall file structure more minimal.
2023-01-22 14:48:20 -08:00
Nicolas
4ddfa82bb7 docs: small typo on serpapi.md (#693) 2023-01-22 13:10:24 -08:00
Nicolas
34cb8850e9 docs: small typo google_search.md (#692) 2023-01-22 13:09:15 -08:00
Harrison Chase
cbc146720b verbose flag (#683) 2023-01-22 12:44:14 -08:00
Harrison Chase
27cef0870d bump version to 0.0.67 (#689) 2023-01-22 10:24:03 -08:00
Samantha Whitmore
77e3d58922 ConversationEntityMemory: Chain which uses an entity extraction & sum… (#678)
…marization prompt to maintain a key-value store of memory information

cc @devennavani

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-01-22 10:10:02 -08:00
Ikko Eltociear Ashimine
64580259d0 Fix typo in hyde.ipynb (#688)
therefor -> therefore
2023-01-22 08:21:31 -08:00
dham
e04b063ff4 add faiss local saving/loading (#676)
- This uses the faiss built-in `write_index` and `load_index` to save
and load faiss indexes locally
- Also fixes #674
- The save/load functions also use the faiss library, so I refactored
the dependency into a function
2023-01-21 16:08:14 -08:00
Harrison Chase
e45f7e40e8 Harrison/few shot yaml (#682)
Co-authored-by: vintro <77507980+vintrocode@users.noreply.github.com>
2023-01-21 16:08:03 -08:00
Harrison Chase
a2eeaf3d43 strip whitespace (#680) 2023-01-21 16:03:48 -08:00
Will Olson
2f57d18b25 Update hyperlink in Custom Prompt Template page (#677)
The current link points to a non-existent page. I've updated the link to
match what is on the "Create a custom example selector" page.

<img width="584" alt="Screen Shot 2023-01-21 at 10 33 05 AM"
src="https://user-images.githubusercontent.com/6773706/213879535-d8f2953d-ac37-448d-9b32-fdeb7b73cc32.png">
2023-01-21 16:03:21 -08:00
Harrison Chase
3d41af0aba Harrison/load tools kwargs (#681)
Co-authored-by: Bruno Bornsztein <bruno.bornsztein@gmail.com>
2023-01-21 16:03:02 -08:00
trigaten
90e4b6b040 Create CITATION.cff (#672)
You may want to add doi/orcid

Followed this:
https://docs.github.com/en/repositories/managing-your-repositorys-settings-and-features/customizing-your-repository/about-citation-files
2023-01-21 15:55:58 -08:00
Harrison Chase
236ae93610 bump version to 0066 (#667) 2023-01-20 14:22:31 -08:00
Harrison Chase
0b204d8c21 Harrison/quadrant (#665)
Co-authored-by: Kacper Łukawski <kacperlukawski@users.noreply.github.com>
2023-01-20 09:45:01 -08:00
Harrison Chase
983b73f47c add search kwargs (#664) 2023-01-20 07:42:08 -08:00
vertinski
65f3a341b0 Prompt fix for empty intermediate steps in summarization (#660)
Adding quotation marks around {text} avoids generating empty or
completely random responses from OpenAI davinci-003. Empty or completely
unrelated intermediate responses in summarization messes up the final
result or makes it very inaccurate.
The error from OpenAI would be: "The model predicted a completion that
begins with a stop sequence, resulting in no output. Consider adjusting
your prompt or stop sequences."
This fix corrects the prompting for summarization chain. This works on
API too, the images are for demonstrative purposes.
This approach can be applied to other similar prompts too. 

Examples:

1) Without quotation marks
![Screenshot from 2023-01-20
07-18-19](https://user-images.githubusercontent.com/22897470/213624365-9dfc18f9-5f3f-45d2-abe1-56de67397e22.png)

2) With quotation marks
![Screenshot from 2023-01-20
07-18-35](https://user-images.githubusercontent.com/22897470/213624478-c958e742-a4a7-46fe-a163-eca6326d9dae.png)
2023-01-20 07:37:01 -08:00
iocuydi
69998b5fad Add ids parameter for pinecone from_texts / add_texts (#659)
Allow optionally specifying a list of ids for pinecone rather than
having them randomly generated.
This also permits editing the embedding/metadata of existing pinecone
entries, by id.
2023-01-20 06:50:03 -08:00
Harrison Chase
54d7f1c933 fix caching (#658) 2023-01-19 15:33:45 -08:00
Harrison Chase
d0fdc6da11 Harrison/bing wrapper (#656)
Co-authored-by: Enrico Shippole <henryshippole@gmail.com>
2023-01-19 14:48:30 -08:00
iocuydi
207e319a70 Add search_kwargs option for VectorDBQAWithSourcesChain (#657)
Allows for passing additional vectorstore params like namespace, etc. to
VectorDBQAWithSourcesChain

Example:
`chain = VectorDBQAWithSourcesChain.from_llm(OpenAI(temperature=0),
vectorstore=store, search_kwargs={"namespace": namespace})`
2023-01-19 14:48:13 -08:00
Charles Frye
bfb23f4608 typo bugfixes in getting started with prompts (#651)
tl;dr: input -> word, output -> antonym, rename to dynamic_prompt
consistently

The provided code in this example doesn't run, because the keys are
`word` and `antonym`, rather than `input` and `output`.

Also, the `ExampleSelector`-based prompt is named `few_shot_prompt` when
defined and `dynamic_prompt` in the follow-up example. The former name
is less descriptive and collides with an earlier example, so I opted for
the latter.

Thanks for making a really cool library!
2023-01-19 07:05:20 -08:00
John
3adc5227cd typo (#650) 2023-01-19 07:03:11 -08:00
Harrison Chase
052c361031 pinecone docstring (#654) 2023-01-19 07:02:52 -08:00
Harrison Chase
d54fd20ba4 bump version to 0065 (#646) 2023-01-18 07:53:39 -08:00
Harrison Chase
30abfc41c2 add instructions for saving loading (#642) 2023-01-18 00:19:05 -08:00
Harrison Chase
95720adff5 Add documentation for custom prompts for Agents (#631) (#640)
- Added a comment interpreting regex for `ZeroShotAgent`
- Added a note to the `Custom Agent` notebook

Co-authored-by: Sam Ching <samuel@duolingo.com>
2023-01-17 22:47:15 -08:00
Harrison Chase
6be5f4e4c4 Harrison/sql db chain (#641)
Co-authored-by: Bruno Bornsztein <bruno.bornsztein@gmail.com>
2023-01-17 22:32:28 -08:00
Chetanya Rastogi
b550f57912 Fix the env variable for OpenAI Base Url (#639)
For using Azure OpenAI API, we need to set multiple env vars. But as can
be seen in openai package
[here](48b69293a3/openai/__init__.py (L35)),
the env var for setting base url is named `OPENAI_API_BASE` and not
`OPENAI_API_BASE_URL`. This PR fixes that part in the documentation.
2023-01-17 22:30:29 -08:00
Harrison Chase
4d4cff0530 Harrison/cohere experimental (#638)
Co-authored-by: inyourhead <44607279+xettrisomeman@users.noreply.github.com>
2023-01-17 22:28:55 -08:00
Sasmitha Manathunga
5c97f70bf1 Fix CohereError: embed is not an available endpoint on this model (#637)
Running the Cohere embeddings example from the docs:

```python
from langchain.embeddings import CohereEmbeddings
embeddings = CohereEmbeddings(cohere_api_key= cohere_api_key)

text = "This is a test document."
query_result = embeddings.embed_query(text)
doc_result = embeddings.embed_documents([text])
```

I get the error:

```bash
CohereError(message=res['message'], http_status=response.status_code, headers=response.headers)      
cohere.error.CohereError: embed is not an available endpoint on this model
```

This is because the `model` string is set to `medium` which is not
currently available.

From the Cohere docs:

> Currently available models are small and large (default)
2023-01-17 22:26:07 -08:00
Francis
b374d481c8 fix typo (#636)
there is a small typo in one of the docs.
2023-01-17 22:17:50 -08:00
Francisco Ingham
b929fd9f59 Exclude reference to 'example' in api prompt (#629)
Co-authored-by: lesscomfortable <pancho_ingham@hotmail.com>
2023-01-16 22:45:14 -08:00
332 changed files with 23202 additions and 1808 deletions

View File

@@ -1,2 +0,0 @@
[run]
omit = tests/*

View File

@@ -31,4 +31,4 @@ jobs:
run: poetry install
- name: Run unit tests
run: |
make tests
make test

8
CITATION.cff Normal file
View File

@@ -0,0 +1,8 @@
cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Chase"
given-names: "Harrison"
title: "LangChain"
date-released: 2022-10-17
url: "https://github.com/hwchase17/langchain"

View File

@@ -47,7 +47,7 @@ good code into the codebase.
### 🏭Release process
As of now, LangChain has an ad hoc release process: releases are cut with high frequency via by
a developer and published to [PyPI](https://pypi.org/project/ruff/).
a developer and published to [PyPI](https://pypi.org/project/langchain/).
LangChain follows the [semver](https://semver.org/) versioning standard. However, as pre-1.0 software,
even patch releases may contain [non-backwards-compatible changes](https://semver.org/#spec-item-4).
@@ -77,6 +77,8 @@ Now, you should be able to run the common tasks in the following section.
## ✅Common Tasks
Type `make` for a list of common tasks.
### Code Formatting
Formatting for this project is done via a combination of [Black](https://black.readthedocs.io/en/stable/) and [isort](https://pycqa.github.io/isort/).
@@ -116,7 +118,7 @@ Unit tests cover modular logic that does not require calls to outside APIs.
To run unit tests:
```bash
make tests
make test
```
If you add new logic, please add a unit test.

View File

@@ -1,11 +1,15 @@
.PHONY: format lint tests tests_watch integration_tests
.PHONY: all clean format lint test test_watch integration_tests help
all: help
coverage:
poetry run pytest --cov \
--cov-config=.coveragerc \
--cov-report xml \
--cov-report term-missing:skip-covered
clean: docs_clean
docs_build:
cd docs && poetry run make html
@@ -25,11 +29,26 @@ lint:
poetry run isort . --check
poetry run flake8 .
test:
poetry run pytest tests/unit_tests
tests:
poetry run pytest tests/unit_tests
tests_watch:
test_watch:
poetry run ptw --now . -- tests/unit_tests
integration_tests:
poetry run pytest tests/integration_tests
help:
@echo '----'
@echo 'coverage - run unit tests and generate coverage report'
@echo 'docs_build - build the documentation'
@echo 'docs_clean - clean the documentation build artifacts'
@echo 'docs_linkcheck - run linkchecker on the documentation'
@echo 'format - run code formatters'
@echo 'lint - run linters'
@echo 'test - run unit tests'
@echo 'test_watch - run unit tests in watch mode'
@echo 'integration_tests - run integration tests'

View File

@@ -4,6 +4,9 @@
[![lint](https://github.com/hwchase17/langchain/actions/workflows/lint.yml/badge.svg)](https://github.com/hwchase17/langchain/actions/workflows/lint.yml) [![test](https://github.com/hwchase17/langchain/actions/workflows/test.yml/badge.svg)](https://github.com/hwchase17/langchain/actions/workflows/test.yml) [![linkcheck](https://github.com/hwchase17/langchain/actions/workflows/linkcheck.yml/badge.svg)](https://github.com/hwchase17/langchain/actions/workflows/linkcheck.yml) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) [![Twitter](https://img.shields.io/twitter/url/https/twitter.com/langchainai.svg?style=social&label=Follow%20%40LangChainAI)](https://twitter.com/langchainai) [![](https://dcbadge.vercel.app/api/server/6adMQxSpJS?compact=true&style=flat)](https://discord.gg/6adMQxSpJS)
**Production Support:** As you move your LangChains into production, we'd love to offer more comprehensive support.
Please fill out [this form](https://forms.gle/57d8AmXBYp8PP8tZA) and we'll set up a dedicated support Slack channel.
## Quick Install
`pip install langchain`
@@ -15,7 +18,22 @@ developers to build applications that they previously could not.
But using these LLMs in isolation is often not enough to
create a truly powerful app - the real power comes when you can combine them with other sources of computation or knowledge.
This library is aimed at assisting in the development of those types of applications.
This library is aimed at assisting in the development of those types of applications. Common examples of these types of applications include:
**❓ Question Answering over specific documents**
- [Documentation](https://langchain.readthedocs.io/en/latest/use_cases/question_answering.html)
- End-to-end Example: [Question Answering over Notion Database](https://github.com/hwchase17/notion-qa)
**💬 Chatbots**
- [Documentation](https://langchain.readthedocs.io/en/latest/use_cases/chatbots.html)
- End-to-end Example: [Chat-LangChain](https://github.com/hwchase17/chat-langchain)
**🤖 Agents**
- [Documentation](https://langchain.readthedocs.io/en/latest/use_cases/agents.html)
- End-to-end Example: [GPT+WolframAlpha](https://huggingface.co/spaces/JavaFXpert/Chat-GPT-LangChain)
## 📖 Documentation

BIN
docs/_static/HeliconeDashboard.png vendored Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 235 KiB

BIN
docs/_static/HeliconeKeys.png vendored Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 148 KiB

View File

@@ -22,3 +22,18 @@ This repo serves as a template for how deploy a LangChain with Gradio.
It implements a chatbot interface, with a "Bring-Your-Own-Token" approach (nice for not wracking up big bills).
It also contains instructions for how to deploy this app on the Hugging Face platform.
This is heavily influenced by James Weaver's [excellent examples](https://huggingface.co/JavaFXpert).
## [Beam](https://github.com/slai-labs/get-beam/tree/main/examples/langchain-question-answering)
This repo serves as a template for how deploy a LangChain with [Beam](https://beam.cloud).
It implements a Question Answering app and contains instructions for deploying the app as a serverless REST API.
## [Vercel](https://github.com/homanp/vercel-langchain)
A minimal example on how to run LangChain on Vercel using Flask.
## [SteamShip](https://github.com/steamship-core/steamship-langchain/)
This repository contains LangChain adapters for Steamship, enabling LangChain developers to rapidly deploy their apps on Steamship.
This includes: production ready endpoints, horizontal scaling across dependencies, persistant storage of app state, multi-tenancy support, etc.

View File

@@ -0,0 +1,17 @@
# CerebriumAI
This page covers how to use the CerebriumAI ecosystem within LangChain.
It is broken into two parts: installation and setup, and then references to specific CerebriumAI wrappers.
## Installation and Setup
- Install with `pip install cerebrium`
- Get an CerebriumAI api key and set it as an environment variable (`CEREBRIUMAI_API_KEY`)
## Wrappers
### LLM
There exists an CerebriumAI LLM wrapper, which you can access with
```python
from langchain.llms import CerebriumAI
```

20
docs/ecosystem/chroma.md Normal file
View File

@@ -0,0 +1,20 @@
# Chroma
This page covers how to use the Chroma ecosystem within LangChain.
It is broken into two parts: installation and setup, and then references to specific Chroma wrappers.
## Installation and Setup
- Install the Python package with `pip install chromadb`
## Wrappers
### VectorStore
There exists a wrapper around Chroma vector databases, allowing you to use it as a vectorstore,
whether for semantic search or example selection.
To import this vectorstore:
```python
from langchain.vectorstores import Chroma
```
For a more detailed walkthrough of the Chroma wrapper, see [this notebook](../modules/utils/combine_docs_examples/vectorstores.ipynb)

View File

@@ -0,0 +1,16 @@
# ForefrontAI
This page covers how to use the ForefrontAI ecosystem within LangChain.
It is broken into two parts: installation and setup, and then references to specific ForefrontAI wrappers.
## Installation and Setup
- Get an ForefrontAI api key and set it as an environment variable (`FOREFRONTAI_API_KEY`)
## Wrappers
### LLM
There exists an ForefrontAI LLM wrapper, which you can access with
```python
from langchain.llms import ForefrontAI
```

View File

@@ -1,7 +1,7 @@
# Google Search Wrapper
This page covers how to use the Google Search API within LangChain.
It is broken into two parts: installation and setup, and then references to specific Pinecone wrappers.
It is broken into two parts: installation and setup, and then references to the specific Google Search wrapper.
## Installation and Setup
- Install requirements with `pip install google-api-python-client`

23
docs/ecosystem/gooseai.md Normal file
View File

@@ -0,0 +1,23 @@
# GooseAI
This page covers how to use the GooseAI ecosystem within LangChain.
It is broken into two parts: installation and setup, and then references to specific GooseAI wrappers.
## Installation and Setup
- Install the Python SDK with `pip install openai`
- Get your GooseAI api key from this link [here](https://goose.ai/).
- Set the environment variable (`GOOSEAI_API_KEY`).
```python
import os
os.environ["GOOSEAI_API_KEY"] = "YOUR_API_KEY"
```
## Wrappers
### LLM
There exists an GooseAI LLM wrapper, which you can access with:
```python
from langchain.llms import GooseAI
```

View File

@@ -0,0 +1,21 @@
# Helicone
This page covers how to use the [Helicone](https://helicone.ai) within LangChain.
## What is Helicone?
Helicone is an [open source](https://github.com/Helicone/helicone) observability platform that proxies your OpenAI traffic and provides you key insights into your spend, latency and usage.
![Helicone](../_static/HeliconeDashboard.png)
## Quick start
With your LangChain environment you can just add the following parameter.
```bash
export OPENAI_API_BASE="https://oai.hconeai.com/v1"
```
Now head over to [helicone.ai](https://helicone.ai/onboarding?step=2) to create your account, and add your OpenAI API key within our dashboard to view your logs.
![Helicone](../_static/HeliconeKeys.png)

17
docs/ecosystem/petals.md Normal file
View File

@@ -0,0 +1,17 @@
# Petals
This page covers how to use the Petals ecosystem within LangChain.
It is broken into two parts: installation and setup, and then references to specific Petals wrappers.
## Installation and Setup
- Install with `pip install petals`
- Get an Huggingface api key and set it as an environment variable (`HUGGINGFACE_API_KEY`)
## Wrappers
### LLM
There exists an Petals LLM wrapper, which you can access with
```python
from langchain.llms import Petals
```

View File

@@ -0,0 +1,31 @@
# PromptLayer
This page covers how to use [PromptLayer](https://www.promptlayer.com) within LangChain.
It is broken into two parts: installation and setup, and then references to specific PromptLayer wrappers.
## Installation and Setup
If you want to work with PromptLayer:
- Install the promptlayer python library `pip install promptlayer`
- Create a PromptLayer account
- Create an api token and set it as an environment variable (`PROMPTLAYER_API_KEY`)
## Wrappers
### LLM
There exists an PromptLayer OpenAI LLM wrapper, which you can access with
```python
from langchain.llms import PromptLayerOpenAI
```
To tag your requests, use the argument `pl_tags` when instanializing the LLM
```python
from langchain.llms import PromptLayerOpenAI
llm = PromptLayerOpenAI(pl_tags=["langchain-requests", "chatbot"])
```
This LLM is identical to the [OpenAI LLM](./openai), except that
- all your requests will be logged to your PromptLayer account
- you can add `pl_tags` when instantializing to tag your requests on PromptLayer

View File

@@ -1,7 +1,7 @@
# SerpAPI
This page covers how to use the SerpAPI search APIs within LangChain.
It is broken into two parts: installation and setup, and then references to specific Pinecone wrappers.
It is broken into two parts: installation and setup, and then references to the specific SerpAPI wrapper.
## Installation and Setup
- Install requirements with `pip install google-search-results`

View File

@@ -77,6 +77,17 @@ Open Source
+++
A jupyter notebook demonstrating how you could create a semantic search engine on documents in one of your Google Folders
---
.. link-button:: https://github.com/venuv/langchain_semantic_search
:type: url
:text: Google Folder Semantic Search
:classes: stretched-link btn-lg
+++
Build a GitHub support bot with GPT3, LangChain, and Python.
---
@@ -188,6 +199,17 @@ Open Source
+++
This repo is a simple demonstration of using LangChain to do fact-checking with prompt chaining.
---
.. link-button:: https://github.com/arc53/docsgpt
:type: url
:text: DocsGPT
:classes: stretched-link btn-lg
+++
Answer questions about the documentation of any project
Misc. Colab Notebooks
~~~~~~~~~~~~~~~

View File

@@ -162,7 +162,7 @@ This is one of the simpler types of chains, but understanding how it works will
`````{dropdown} Agents: Dynamically call chains based on user input
So for the chains we've looked at run in a predetermined order.
So far the chains we've looked at run in a predetermined order.
Agents no longer do: they use an LLM to determine which actions to take and in what order. An action can either be using a tool and observing its output, or returning to the user.
@@ -179,6 +179,20 @@ In order to load agents, you should understand the following concepts:
**Tools**: For a list of predefined tools and their specifications, see [here](../modules/agents/tools.md).
For this example, you will also need to install the SerpAPI Python package.
```bash
pip install google-search-results
```
And set the appropriate environment variables.
```python
import os
os.environ["SERPAPI_API_KEY"] = "..."
```
Now we can get started!
```python
from langchain.agents import load_tools

View File

@@ -7,7 +7,22 @@ But using these LLMs in isolation is often not enough to
create a truly powerful app - the real power comes when you are able to
combine them with other sources of computation or knowledge.
This library is aimed at assisting in the development of those types of applications.
This library is aimed at assisting in the development of those types of applications. Common examples of these types of applications include:
**❓ Question Answering over specific documents**
- `Documentation <./use_cases/question_answering.html>`_
- End-to-end Example: `Question Answering over Notion Database <https://github.com/hwchase17/notion-qa>`_
**💬 Chatbots**
- `Documentation <./use_cases/chatbots.html>`_
- End-to-end Example: `Chat-LangChain <https://github.com/hwchase17/chat-langchain>`_
**🤖 Agents**
- `Documentation <./use_cases/agents.html>`_
- End-to-end Example: `GPT+WolframAlpha <https://huggingface.co/spaces/JavaFXpert/Chat-GPT-LangChain>`_
Getting Started
----------------
@@ -36,6 +51,8 @@ These modules are, in increasing order of complexity:
- `LLMs <./modules/llms.html>`_: This includes a generic interface for all LLMs, and common utilities for working with LLMs.
- `Document Loaders <./modules/document_loaders.html>`_: This includes a standard interface for loading documents, as well as specific integrations to all types of text data sources.
- `Utils <./modules/utils.html>`_: Language models are often more powerful when interacting with other sources of knowledge or computation. This can include Python REPLs, embeddings, search engines, and more. LangChain provides a large collection of common utils to use in your application.
- `Chains <./modules/chains.html>`_: Chains go beyond just a single LLM call, and are sequences of calls (whether to an LLM or a different utility). LangChain provides a standard interface for chains, lots of integrations with other tools, and end-to-end chains for common applications.
@@ -53,6 +70,7 @@ These modules are, in increasing order of complexity:
./modules/prompts.md
./modules/llms.md
./modules/document_loaders.md
./modules/utils.md
./modules/chains.md
./modules/agents.md
@@ -137,6 +155,8 @@ Additional Resources
Additional collection of resources we think may be useful as you develop your application!
- `LangChainHub <https://github.com/hwchase17/langchain-hub>`_: The LangChainHub is a place to share and explore other prompts, chains, and agents.
- `Glossary <./glossary.html>`_: A glossary of all related terms, papers, methods, etc. Whether implemented in LangChain or not!
- `Gallery <./gallery.html>`_: A collection of our favorite projects that use LangChain. Useful for finding inspiration or seeing how things were done in other applications.
@@ -145,6 +165,10 @@ Additional collection of resources we think may be useful as you develop your ap
- `Discord <https://discord.gg/6adMQxSpJS>`_: Join us on our Discord to discuss all things LangChain!
- `Tracing <./tracing.html>`_: A guide on using tracing in LangChain to visualize the execution of chains and agents.
- `Production Support <https://forms.gle/57d8AmXBYp8PP8tZA>`_: As you move your LangChains into production, we'd love to offer more comprehensive support. Please fill out this form and we'll set up a dedicated support Slack channel.
.. toctree::
:maxdepth: 1
@@ -152,6 +176,10 @@ Additional collection of resources we think may be useful as you develop your ap
:name: resources
:hidden:
LangChainHub <https://github.com/hwchase17/langchain-hub>
./glossary.md
./gallery.rst
./deployments.md
./tracing.md
Discord <https://discord.gg/6adMQxSpJS>
Production Support <https://forms.gle/57d8AmXBYp8PP8tZA>

View File

@@ -0,0 +1,423 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "6fb92deb-d89e-439b-855d-c7f2607d794b",
"metadata": {},
"source": [
"# Async API for Agent\n",
"\n",
"LangChain provides async support for Agents by leveraging the [asyncio](https://docs.python.org/3/library/asyncio.html) library.\n",
"\n",
"Async methods are currently supported for the following `Tools`: [`SerpAPIWrapper`](https://github.com/hwchase17/langchain/blob/master/langchain/serpapi.py) and [`LLMMathChain`](https://github.com/hwchase17/langchain/blob/master/langchain/chains/llm_math/base.py). Async support for other agent tools are on the roadmap.\n",
"\n",
"For `Tool`s that have a `coroutine` implemented (the two mentioned above), the `AgentExecutor` will `await` them directly. Otherwise, the `AgentExecutor` will call the `Tool`'s `func` via `asyncio.get_event_loop().run_in_executor` to avoid blocking the main runloop.\n",
"\n",
"You can use `arun` to call an `AgentExecutor` asynchronously."
]
},
{
"cell_type": "markdown",
"id": "97800378-cc34-4283-9bd0-43f336bc914c",
"metadata": {},
"source": [
"## Serial vs. Concurrent Execution\n",
"\n",
"In this example, we kick off agents to answer some questions serially vs. concurrently. You can see that concurrent execution significantly speeds this up."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "da5df06c-af6f-4572-b9f5-0ab971c16487",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"import asyncio\n",
"import time\n",
"\n",
"from langchain.agents import initialize_agent, load_tools\n",
"from langchain.llms import OpenAI\n",
"from langchain.callbacks.stdout import StdOutCallbackHandler\n",
"from langchain.callbacks.base import CallbackManager\n",
"from langchain.callbacks.tracers import LangChainTracer\n",
"from aiohttp import ClientSession\n",
"\n",
"questions = [\n",
" \"Who won the US Open men's final in 2019? What is his age raised to the 0.334 power?\",\n",
" \"Who is Olivia Wilde's boyfriend? What is his current age raised to the 0.23 power?\",\n",
" \"Who won the most recent formula 1 grand prix? What is their age raised to the 0.23 power?\",\n",
" \"Who won the US Open women's final in 2019? What is her age raised to the 0.34 power?\",\n",
" \"Who is Beyonce's husband? What is his age raised to the 0.19 power?\"\n",
"]"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "fd4c294e-b1d6-44b8-b32e-2765c017e503",
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3m I need to find out who won the US Open men's final in 2019 and then calculate his age raised to the 0.334 power.\n",
"Action: Search\n",
"Action Input: \"US Open men's final 2019 winner\"\u001b[0m\n",
"Observation: \u001b[33;1m\u001b[1;3mRafael Nadal\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m I need to find out Rafael Nadal's age\n",
"Action: Search\n",
"Action Input: \"Rafael Nadal age\"\u001b[0m\n",
"Observation: \u001b[33;1m\u001b[1;3m36 years\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m I need to calculate 36 raised to the 0.334 power\n",
"Action: Calculator\n",
"Action Input: 36^0.334\u001b[0m\n",
"Observation: \u001b[36;1m\u001b[1;3mAnswer: 3.3098250249682484\n",
"\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m I now know the final answer\n",
"Final Answer: Rafael Nadal, aged 36, won the US Open men's final in 2019 and his age raised to the 0.334 power is 3.3098250249682484.\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n",
"\n",
"\n",
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3m I need to find out who Olivia Wilde's boyfriend is and then calculate his age raised to the 0.23 power.\n",
"Action: Search\n",
"Action Input: \"Olivia Wilde boyfriend\"\u001b[0m\n",
"Observation: \u001b[33;1m\u001b[1;3mJason Sudeikis\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m I need to find out Jason Sudeikis' age\n",
"Action: Search\n",
"Action Input: \"Jason Sudeikis age\"\u001b[0m\n",
"Observation: \u001b[33;1m\u001b[1;3mDaniel Jason Sudeikis is an American actor, comedian, writer, and producer. In the 1990s, he began his career in improv comedy and performed with ComedySportz, iO Chicago, and The Second City.\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m I need to find out Jason Sudeikis' exact age\n",
"Action: Search\n",
"Action Input: \"Jason Sudeikis age exact\"\u001b[0m\n",
"Observation: \u001b[33;1m\u001b[1;3mDaniel Jason Sudeikis. (1975-09-18) September 18, 1975 (age 47). Fairfax, Virginia, U.S. · Fort Scott Community College · Actor; comedian; producer; writer · 1997 ...\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m I now have the information I need to calculate the age raised to the 0.23 power\n",
"Action: Calculator\n",
"Action Input: 47^0.23\u001b[0m\n",
"Observation: \u001b[36;1m\u001b[1;3mAnswer: 2.4242784855673896\n",
"\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m I now know the final answer\n",
"Final Answer: Jason Sudeikis, Olivia Wilde's boyfriend, is 47 years old and his age raised to the 0.23 power is 2.4242784855673896.\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n",
"\n",
"\n",
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3m I need to find out who won the grand prix and then calculate their age raised to the 0.23 power.\n",
"Action: Search\n",
"Action Input: \"Formula 1 Grand Prix Winner\"\u001b[0m\n",
"Observation: \u001b[33;1m\u001b[1;3mMax Emilian Verstappen is a Belgian-Dutch racing driver and the 2021 and 2022 Formula One World Champion. He competes under the Dutch flag in Formula One with Red Bull Racing. Verstappen is the son of racing drivers Jos Verstappen, who also competed in Formula One, and Sophie Kumpen.\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m I need to find out Max Emilian Verstappen's age.\n",
"Action: Search\n",
"Action Input: \"Max Emilian Verstappen age\"\u001b[0m\n",
"Observation: \u001b[33;1m\u001b[1;3m25 years\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m I now need to calculate 25 raised to the 0.23 power.\n",
"Action: Calculator\n",
"Action Input: 25^0.23\u001b[0m\n",
"Observation: \u001b[36;1m\u001b[1;3mAnswer: 2.096651272316035\n",
"\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m I now know the final answer.\n",
"Final Answer: Max Emilian Verstappen, who is 25 years old, won the most recent Formula 1 Grand Prix and his age raised to the 0.23 power is 2.096651272316035.\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n",
"\n",
"\n",
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3m I need to find out who won the US Open women's final in 2019 and then calculate her age raised to the 0.34 power.\n",
"Action: Search\n",
"Action Input: \"US Open women's final 2019 winner\"\u001b[0m\n",
"Observation: \u001b[33;1m\u001b[1;3mBianca Andreescu defeated Serena Williams in the final, 63, 75 to win the women's singles tennis title at the 2019 US Open. It was her first major title, and she became the first Canadian, as well as the first player born in the 2000s, to win a major singles title.\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m I need to find out Bianca Andreescu's age.\n",
"Action: Search\n",
"Action Input: \"Bianca Andreescu age\"\u001b[0m\n",
"Observation: \u001b[33;1m\u001b[1;3mBianca Vanessa Andreescu is a Canadian-Romanian professional tennis player. She has a career-high ranking of No. 4 in the world, and is the highest-ranked Canadian in the history of the Women's Tennis Association.\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m I now know the age of Bianca Andreescu.\n",
"Action: Calculator\n",
"Action Input: 19^0.34\u001b[0m\n",
"Observation: \u001b[36;1m\u001b[1;3mAnswer: 2.7212987634680084\n",
"\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m I now know the final answer.\n",
"Final Answer: Bianca Andreescu, aged 19, won the US Open women's final in 2019. Her age raised to the 0.34 power is 2.7212987634680084.\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n",
"\n",
"\n",
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3m I need to find out who Beyonce's husband is and then calculate his age raised to the 0.19 power.\n",
"Action: Search\n",
"Action Input: \"Who is Beyonce's husband?\"\u001b[0m\n",
"Observation: \u001b[33;1m\u001b[1;3mJay-Z\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m I need to find out Jay-Z's age\n",
"Action: Search\n",
"Action Input: \"How old is Jay-Z?\"\u001b[0m\n",
"Observation: \u001b[33;1m\u001b[1;3m53 years\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m I need to calculate 53 raised to the 0.19 power\n",
"Action: Calculator\n",
"Action Input: 53^0.19\u001b[0m\n",
"Observation: \u001b[36;1m\u001b[1;3mAnswer: 2.12624064206896\n",
"\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m I now know the final answer\n",
"Final Answer: Jay-Z is Beyonce's husband and his age raised to the 0.19 power is 2.12624064206896.\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n",
"Serial executed in 94.83 seconds.\n"
]
}
],
"source": [
"def generate_serially():\n",
" for q in questions:\n",
" llm = OpenAI(temperature=0)\n",
" tools = load_tools([\"llm-math\", \"serpapi\"], llm=llm)\n",
" agent = initialize_agent(\n",
" tools, llm, agent=\"zero-shot-react-description\", verbose=True\n",
" )\n",
" agent.run(q)\n",
"\n",
"s = time.perf_counter()\n",
"generate_serially()\n",
"elapsed = time.perf_counter() - s\n",
"print(f\"Serial executed in {elapsed:0.2f} seconds.\")"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "076d7b85-45ec-465d-8b31-c2ad119c3438",
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
"\n",
"\n",
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
"\n",
"\n",
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
"\n",
"\n",
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
"\n",
"\n",
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
"\u001b[33;1m\u001b[1;3m I need to find out who Beyonce's husband is and then calculate his age raised to the 0.19 power.\n",
"Action: Search\n",
"Action Input: \"Who is Beyonce's husband?\"\u001b[0m\u001b[31;1m\u001b[1;3m I need to find out who won the grand prix and then calculate their age raised to the 0.23 power.\n",
"Action: Search\n",
"Action Input: \"Formula 1 Grand Prix Winner\"\u001b[0m\u001b[32;1m\u001b[1;3m I need to find out who Olivia Wilde's boyfriend is and then calculate his age raised to the 0.23 power.\n",
"Action: Search\n",
"Action Input: \"Olivia Wilde boyfriend\"\u001b[0m\u001b[38;5;200m\u001b[1;3m I need to find out who won the US Open women's final in 2019 and then calculate her age raised to the 0.34 power.\n",
"Action: Search\n",
"Action Input: \"US Open women's final 2019 winner\"\u001b[0m\n",
"Observation: \u001b[33;1m\u001b[1;3mJay-Z\u001b[0m\n",
"Thought:\n",
"Observation: \u001b[33;1m\u001b[1;3mMax Emilian Verstappen is a Belgian-Dutch racing driver and the 2021 and 2022 Formula One World Champion. He competes under the Dutch flag in Formula One with Red Bull Racing. Verstappen is the son of racing drivers Jos Verstappen, who also competed in Formula One, and Sophie Kumpen.\u001b[0m\n",
"Thought:\n",
"Observation: \u001b[33;1m\u001b[1;3mJason Sudeikis\u001b[0m\n",
"Thought:\n",
"Observation: \u001b[33;1m\u001b[1;3mBianca Andreescu defeated Serena Williams in the final, 63, 75 to win the women's singles tennis title at the 2019 US Open. It was her first major title, and she became the first Canadian, as well as the first player born in the 2000s, to win a major singles title.\u001b[0m\n",
"Thought:\u001b[31;1m\u001b[1;3m I need to find out Max Emilian Verstappen's age.\n",
"Action: Search\n",
"Action Input: \"Max Emilian Verstappen age\"\u001b[0m\n",
"Observation: \u001b[33;1m\u001b[1;3m25 years\u001b[0m\n",
"Thought:\u001b[38;5;200m\u001b[1;3m I need to find out Bianca Andreescu's age.\n",
"Action: Search\n",
"Action Input: \"Bianca Andreescu age\"\u001b[0m\n",
"Observation: \u001b[33;1m\u001b[1;3mBianca Vanessa Andreescu is a Canadian-Romanian professional tennis player. She has a career-high ranking of No. 4 in the world, and is the highest-ranked Canadian in the history of the Women's Tennis Association.\u001b[0m\n",
"Thought:\u001b[36;1m\u001b[1;3m I need to find out who won the US Open men's final in 2019 and then calculate his age raised to the 0.334 power.\n",
"Action: Search\n",
"Action Input: \"US Open men's final 2019 winner\"\u001b[0m\n",
"Observation: \u001b[33;1m\u001b[1;3mRafael Nadal\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m I need to find out Jason Sudeikis' age\n",
"Action: Search\n",
"Action Input: \"Jason Sudeikis age\"\u001b[0m\n",
"Observation: \u001b[33;1m\u001b[1;3mDaniel Jason Sudeikis is an American actor, comedian, writer, and producer. In the 1990s, he began his career in improv comedy and performed with ComedySportz, iO Chicago, and The Second City.\u001b[0m\n",
"Thought:\u001b[33;1m\u001b[1;3m I need to find out Jay-Z's age\n",
"Action: Search\n",
"Action Input: \"How old is Jay-Z?\"\u001b[0m\u001b[36;1m\u001b[1;3m I need to find out Rafael Nadal's age\n",
"Action: Search\n",
"Action Input: \"Rafael Nadal age\"\u001b[0m\n",
"Observation: \u001b[33;1m\u001b[1;3m36 years\u001b[0m\n",
"Thought:\n",
"Observation: \u001b[33;1m\u001b[1;3m53 years\u001b[0m\n",
"Thought:\u001b[38;5;200m\u001b[1;3m I now know the age of Bianca Andreescu.\n",
"Action: Calculator\n",
"Action Input: 19^0.34\u001b[0m\u001b[31;1m\u001b[1;3m I now need to calculate 25 raised to the 0.23 power.\n",
"Action: Calculator\n",
"Action Input: 25^0.23\u001b[0m\n",
"Observation: \u001b[36;1m\u001b[1;3mAnswer: 2.7212987634680084\n",
"\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m I need to find out Jason Sudeikis' exact age\n",
"Action: Search\n",
"Action Input: \"Jason Sudeikis age exact\"\u001b[0m\u001b[33;1m\u001b[1;3m I need to calculate 53 raised to the 0.19 power\n",
"Action: Calculator\n",
"Action Input: 53^0.19\u001b[0m\u001b[36;1m\u001b[1;3m I need to calculate 36 raised to the 0.334 power\n",
"Action: Calculator\n",
"Action Input: 36^0.334\u001b[0m\n",
"Observation: \u001b[33;1m\u001b[1;3mDaniel Jason Sudeikis. (1975-09-18) September 18, 1975 (age 47). Fairfax, Virginia, U.S. · Fort Scott Community College · Actor; comedian; producer; writer · 1997 ...\u001b[0m\n",
"Thought:\n",
"Observation: \u001b[36;1m\u001b[1;3mAnswer: 2.096651272316035\n",
"\u001b[0m\n",
"Thought:\n",
"Observation: \u001b[36;1m\u001b[1;3mAnswer: 2.12624064206896\n",
"\u001b[0m\n",
"Thought:\n",
"Observation: \u001b[36;1m\u001b[1;3mAnswer: 3.3098250249682484\n",
"\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m I now have the information I need to calculate the age raised to the 0.23 power\n",
"Action: Calculator\n",
"Action Input: 47^0.23\u001b[0m\n",
"Observation: \u001b[36;1m\u001b[1;3mAnswer: 2.4242784855673896\n",
"\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m I now know the final answer.\n",
"Final Answer: Bianca Andreescu, aged 19, won the US Open women's final in 2019. Her age raised to the 0.34 power is 2.7212987634680084.\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n",
"\u001b[32;1m\u001b[1;3m I now know the final answer\n",
"Final Answer: Jay-Z is Beyonce's husband and his age raised to the 0.19 power is 2.12624064206896.\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n",
"\u001b[32;1m\u001b[1;3m I now know the final answer\n",
"Final Answer: Rafael Nadal, aged 36, won the US Open men's final in 2019 and his age raised to the 0.334 power is 3.3098250249682484.\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n",
"\u001b[32;1m\u001b[1;3m I now know the final answer\n",
"Final Answer: Jason Sudeikis, Olivia Wilde's boyfriend, is 47 years old and his age raised to the 0.23 power is 2.4242784855673896.\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n",
"\u001b[32;1m\u001b[1;3m I now know the final answer.\n",
"Final Answer: Max Emilian Verstappen, who is 25 years old, won the most recent Formula 1 Grand Prix and his age raised to the 0.23 power is 2.096651272316035.\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n",
"Concurrent executed in 25.06 seconds.\n"
]
}
],
"source": [
"async def generate_concurrently():\n",
" agents = []\n",
" # To make async requests in Tools more efficient, you can pass in your own aiohttp.ClientSession, \n",
" # but you must manually close the client session at the end of your program/event loop\n",
" aiosession = ClientSession()\n",
" colors = [\"blue\", \"green\", \"red\", \"pink\", \"yellow\"]\n",
" for color in colors:\n",
" # Use a custom CallbackManager to print in different colors.\n",
" manager = CallbackManager([StdOutCallbackHandler(color=color)])\n",
" llm = OpenAI(temperature=0, callback_manager=manager)\n",
" async_tools = load_tools([\"llm-math\", \"serpapi\"], llm=llm, aiosession=aiosession)\n",
" agents.append(\n",
" initialize_agent(async_tools, llm, agent=\"zero-shot-react-description\", verbose=True, callback_manager=manager)\n",
" )\n",
" tasks = [async_agent.arun(q) for async_agent, q in zip(agents, questions)]\n",
" await asyncio.gather(*tasks)\n",
" await aiosession.close()\n",
"\n",
"s = time.perf_counter()\n",
"# If running this outside of Jupyter, use asyncio.run(generate_concurrently())\n",
"await generate_concurrently()\n",
"elapsed = time.perf_counter() - s\n",
"print(f\"Concurrent executed in {elapsed:0.2f} seconds.\")"
]
},
{
"cell_type": "markdown",
"id": "97ef285c-4a43-4a4e-9698-cd52a1bc56c9",
"metadata": {},
"source": [
"## Using Tracing with Asynchronous Agents\n",
"\n",
"To use tracing with async agents, you must pass in a custom `CallbackManager` with `LangChainTracer` to each agent running asynchronously. This way, you avoid collisions while the trace is being collected."
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "44bda05a-d33e-4e91-9a71-a0f3f96aae95",
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3m I need to find out who won the US Open men's final in 2019 and then calculate his age raised to the 0.334 power.\n",
"Action: Search\n",
"Action Input: \"US Open men's final 2019 winner\"\u001b[0m\n",
"Observation: \u001b[33;1m\u001b[1;3mRafael Nadal\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m I need to find out Rafael Nadal's age\n",
"Action: Search\n",
"Action Input: \"Rafael Nadal age\"\u001b[0m\n",
"Observation: \u001b[33;1m\u001b[1;3m36 years\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m I need to calculate 36 raised to the 0.334 power\n",
"Action: Calculator\n",
"Action Input: 36^0.334\u001b[0m\n",
"Observation: \u001b[36;1m\u001b[1;3mAnswer: 3.3098250249682484\n",
"\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m I now know the final answer\n",
"Final Answer: Rafael Nadal, aged 36, won the US Open men's final in 2019 and his age raised to the 0.334 power is 3.3098250249682484.\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
}
],
"source": [
"# To make async requests in Tools more efficient, you can pass in your own aiohttp.ClientSession, \n",
"# but you must manually close the client session at the end of your program/event loop\n",
"aiosession = ClientSession()\n",
"tracer = LangChainTracer()\n",
"tracer.load_default_session()\n",
"manager = CallbackManager([StdOutCallbackHandler(), tracer])\n",
"\n",
"# Pass the manager into the llm if you want llm calls traced.\n",
"llm = OpenAI(temperature=0, callback_manager=manager)\n",
"\n",
"async_tools = load_tools([\"llm-math\", \"serpapi\"], llm=llm, aiosession=aiosession)\n",
"async_agent = initialize_agent(async_tools, llm, agent=\"zero-shot-react-description\", verbose=True, callback_manager=manager)\n",
"await async_agent.arun(questions[0])\n",
"await aiosession.close()"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.9"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -53,7 +53,7 @@
},
{
"cell_type": "code",
"execution_count": 5,
"execution_count": 2,
"id": "becda2a1",
"metadata": {},
"outputs": [],
@@ -70,7 +70,7 @@
},
{
"cell_type": "code",
"execution_count": 6,
"execution_count": 3,
"id": "339b1bb8",
"metadata": {},
"outputs": [],
@@ -99,7 +99,7 @@
},
{
"cell_type": "code",
"execution_count": 7,
"execution_count": 4,
"id": "e21d2098",
"metadata": {},
"outputs": [
@@ -133,9 +133,19 @@
"print(prompt.template)"
]
},
{
"cell_type": "markdown",
"id": "5e028e6d",
"metadata": {},
"source": [
"Note that we are able to feed agents a self-defined prompt template, i.e. not restricted to the prompt generated by the `create_prompt` function, assuming it meets the agent's requirements. \n",
"\n",
"For example, for `ZeroShotAgent`, we will need to ensure that it meets the following requirements. There should a string starting with \"Action:\" and a following string starting with \"Action Input:\", and both should be separated by a newline.\n"
]
},
{
"cell_type": "code",
"execution_count": 16,
"execution_count": 5,
"id": "9b1cc2a2",
"metadata": {},
"outputs": [],
@@ -145,17 +155,18 @@
},
{
"cell_type": "code",
"execution_count": 17,
"execution_count": 7,
"id": "e4f5092f",
"metadata": {},
"outputs": [],
"source": [
"agent = ZeroShotAgent(llm_chain=llm_chain, tools=tools)"
"tool_names = [tool.name for tool in tools]\n",
"agent = ZeroShotAgent(llm_chain=llm_chain, allowed_tools=tool_names)"
]
},
{
"cell_type": "code",
"execution_count": 18,
"execution_count": 8,
"id": "490604e9",
"metadata": {},
"outputs": [],
@@ -165,7 +176,7 @@
},
{
"cell_type": "code",
"execution_count": 19,
"execution_count": 9,
"id": "653b1617",
"metadata": {},
"outputs": [
@@ -180,22 +191,23 @@
"Action: Search\n",
"Action Input: Population of Canada\u001b[0m\n",
"Observation: \u001b[36;1m\u001b[1;3mCanada is a country in North America. Its ten provinces and three territories extend from the Atlantic Ocean to the Pacific Ocean and northward into the Arctic Ocean, covering over 9.98 million square kilometres, making it the world's second-largest country by total area.\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m I need to find out the exact population of Canada\n",
"Thought:\u001b[32;1m\u001b[1;3m I need to find out the population of Canada\n",
"Action: Search\n",
"Action Input: Population of Canada 2020\u001b[0m\n",
"Action Input: Population of Canada\u001b[0m\n",
"Observation: \u001b[36;1m\u001b[1;3mCanada is a country in North America. Its ten provinces and three territories extend from the Atlantic Ocean to the Pacific Ocean and northward into the Arctic Ocean, covering over 9.98 million square kilometres, making it the world's second-largest country by total area.\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m I now know the population of Canada\n",
"Final Answer: Arrr, Canada be home to 37.59 million people!\u001b[0m\n",
"\u001b[1m> Finished AgentExecutor chain.\u001b[0m\n"
"Final Answer: Arrr, Canada be home to over 37 million people!\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"'Arrr, Canada be home to 37.59 million people!'"
"'Arrr, Canada be home to over 37 million people!'"
]
},
"execution_count": 19,
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
@@ -350,7 +362,7 @@
],
"metadata": {
"kernelspec": {
"display_name": "Python 3.9.0 64-bit ('llm-env')",
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
@@ -364,11 +376,11 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.0"
"version": "3.10.9"
},
"vscode": {
"interpreter": {
"hash": "b1677b440931f40d89ef8be7bf03acb108ce003de0ac9b18e8d43753ea2e7103"
"hash": "18784188d7ecd866c0586ac068b02361a6896dc3a29b64f5cc957f09c590acef"
}
}
},

View File

@@ -10,15 +10,17 @@
"When constructing your own agent, you will need to provide it with a list of Tools that it can use. A Tool is defined as below.\n",
"\n",
"```python\n",
"class Tool(NamedTuple):\n",
"@dataclass \n",
"class Tool:\n",
" \"\"\"Interface for tools.\"\"\"\n",
"\n",
" name: str\n",
" func: Callable[[str], str]\n",
" description: Optional[str] = None\n",
" return_direct: bool = True\n",
"```\n",
"\n",
"The two required components of a Tool are the name and then the tool itself. A tool description is optional, as it is needed for some agents but not all."
"The two required components of a Tool are the name and then the tool itself. A tool description is optional, as it is needed for some agents but not all. You can create these tools directly, but we also provide a decorator to easily convert any function into a tool."
]
},
{
@@ -151,6 +153,94 @@
"agent.run(\"Who is Olivia Wilde's boyfriend? What is his current age raised to the 0.23 power?\")"
]
},
{
"cell_type": "markdown",
"id": "824eaf74",
"metadata": {},
"source": [
"## Using the `tool` decorator\n",
"\n",
"To make it easier to define custom tools, a `@tool` decorator is provided. This decorator can be used to quickly create a `Tool` from a simple function. The decorator uses the function name as the tool name by default, but this can be overridden by passing a string as the first argument. Additionally, the decorator will use the function's docstring as the tool's description."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "8f15307d",
"metadata": {},
"outputs": [],
"source": [
"from langchain.agents import tool\n",
"\n",
"@tool\n",
"def search_api(query: str) -> str:\n",
" \"\"\"Searches the API for the query.\"\"\"\n",
" return \"Results\""
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "0a23b91b",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Tool(name='search_api', func=<function search_api at 0x10dad7d90>, description='search_api(query: str) -> str - Searches the API for the query.', return_direct=False)"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"search_api"
]
},
{
"cell_type": "markdown",
"id": "cc6ee8c1",
"metadata": {},
"source": [
"You can also provide arguments like the tool name and whether to return directly."
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "28cdf04d",
"metadata": {},
"outputs": [],
"source": [
"@tool(\"search\", return_direct=True)\n",
"def search_api(query: str) -> str:\n",
" \"\"\"Searches the API for the query.\"\"\"\n",
" return \"Results\""
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "1085a4bd",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Tool(name='search', func=<function search_api at 0x112301bd0>, description='search(query: str) -> str - Searches the API for the query.', return_direct=True)"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"search_api"
]
},
{
"cell_type": "markdown",
"id": "1d0430d6",
@@ -432,7 +522,7 @@
},
"vscode": {
"interpreter": {
"hash": "cb23c3a7a387ab03496baa08507270f8e0861b23170e79d5edc545893cdca840"
"hash": "e90c8aa204a57276aa905271aff2d11799d0acb3547adabc5892e639a5e45e34"
}
}
},

View File

@@ -0,0 +1,108 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "991b1cc1",
"metadata": {},
"source": [
"# Loading from LangChainHub\n",
"\n",
"This notebook covers how to load agents from [LangChainHub](https://github.com/hwchase17/langchain-hub)."
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "bd4450a2",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3m Yes.\n",
"Follow up: Who is the reigning men's U.S. Open champion?\u001b[0m\n",
"Intermediate answer: \u001b[36;1m\u001b[1;3m2016 · SUI · Stan Wawrinka ; 2017 · ESP · Rafael Nadal ; 2018 · SRB · Novak Djokovic ; 2019 · ESP · Rafael Nadal.\u001b[0m\n",
"\u001b[32;1m\u001b[1;3mSo the reigning men's U.S. Open champion is Rafael Nadal.\n",
"Follow up: What is Rafael Nadal's hometown?\u001b[0m\n",
"Intermediate answer: \u001b[36;1m\u001b[1;3mIn 2016, he once again showed his deep ties to Mallorca and opened the Rafa Nadal Academy in his hometown of Manacor.\u001b[0m\n",
"\u001b[32;1m\u001b[1;3mSo the final answer is: Manacor, Mallorca, Spain.\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"'Manacor, Mallorca, Spain.'"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from langchain import OpenAI, SerpAPIWrapper\n",
"from langchain.agents import initialize_agent, Tool\n",
"\n",
"llm = OpenAI(temperature=0)\n",
"search = SerpAPIWrapper()\n",
"tools = [\n",
" Tool(\n",
" name=\"Intermediate Answer\",\n",
" func=search.run\n",
" )\n",
"]\n",
"\n",
"self_ask_with_search = initialize_agent(tools, llm, agent_path=\"lc://agents/self-ask-with-search/agent.json\", verbose=True)\n",
"self_ask_with_search.run(\"What is the hometown of the reigning men's U.S. Open champion?\")"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "3aede965",
"metadata": {},
"source": [
"# Pinning Dependencies\n",
"\n",
"Specific versions of LangChainHub agents can be pinned with the `lc@<ref>://` syntax."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e679f7b6",
"metadata": {},
"outputs": [],
"source": [
"self_ask_with_search = initialize_agent(tools, llm, agent_path=\"lc@2826ef9e8acdf88465e1e5fc8a7bf59e0f9d0a85://agents/self-ask-with-search/agent.json\", verbose=True)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.9"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,148 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "bfe18e28",
"metadata": {},
"source": [
"# Serialization\n",
"\n",
"This notebook goes over how to serialize agents. For this notebook, it is important to understand the distinction we draw between `agents` and `tools`. An agent is the LLM powered decision maker that decides which actions to take and in which order. Tools are various instruments (functions) an agent has access to, through which an agent can interact with the outside world. When people generally use agents, they primarily talk about using an agent WITH tools. However, when we talk about serialization of agents, we are talking about the agent by itself. We plan to add support for serializing an agent WITH tools sometime in the future.\n",
"\n",
"Let's start by creating an agent with tools as we normally do:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "eb729f16",
"metadata": {},
"outputs": [],
"source": [
"from langchain.agents import load_tools\n",
"from langchain.agents import initialize_agent\n",
"from langchain.llms import OpenAI\n",
"\n",
"llm = OpenAI(temperature=0)\n",
"tools = load_tools([\"serpapi\", \"llm-math\"], llm=llm)\n",
"agent = initialize_agent(tools, llm, agent=\"zero-shot-react-description\", verbose=True)"
]
},
{
"cell_type": "markdown",
"id": "0578f566",
"metadata": {},
"source": [
"Let's now serialize the agent. To be explicit that we are serializing ONLY the agent, we will call the `save_agent` method."
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "dc544de6",
"metadata": {},
"outputs": [],
"source": [
"agent.save_agent('agent.json')"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "62dd45bf",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{\r\n",
" \"llm_chain\": {\r\n",
" \"memory\": null,\r\n",
" \"verbose\": false,\r\n",
" \"prompt\": {\r\n",
" \"input_variables\": [\r\n",
" \"input\",\r\n",
" \"agent_scratchpad\"\r\n",
" ],\r\n",
" \"output_parser\": null,\r\n",
" \"template\": \"Answer the following questions as best you can. You have access to the following tools:\\n\\nSearch: A search engine. Useful for when you need to answer questions about current events. Input should be a search query.\\nCalculator: Useful for when you need to answer questions about math.\\n\\nUse the following format:\\n\\nQuestion: the input question you must answer\\nThought: you should always think about what to do\\nAction: the action to take, should be one of [Search, Calculator]\\nAction Input: the input to the action\\nObservation: the result of the action\\n... (this Thought/Action/Action Input/Observation can repeat N times)\\nThought: I now know the final answer\\nFinal Answer: the final answer to the original input question\\n\\nBegin!\\n\\nQuestion: {input}\\nThought:{agent_scratchpad}\",\r\n",
" \"template_format\": \"f-string\"\r\n",
" },\r\n",
" \"llm\": {\r\n",
" \"model_name\": \"text-davinci-003\",\r\n",
" \"temperature\": 0.0,\r\n",
" \"max_tokens\": 256,\r\n",
" \"top_p\": 1,\r\n",
" \"frequency_penalty\": 0,\r\n",
" \"presence_penalty\": 0,\r\n",
" \"n\": 1,\r\n",
" \"best_of\": 1,\r\n",
" \"request_timeout\": null,\r\n",
" \"logit_bias\": {},\r\n",
" \"_type\": \"openai\"\r\n",
" },\r\n",
" \"output_key\": \"text\",\r\n",
" \"_type\": \"llm_chain\"\r\n",
" },\r\n",
" \"return_values\": [\r\n",
" \"output\"\r\n",
" ],\r\n",
" \"_type\": \"zero-shot-react-description\"\r\n",
"}"
]
}
],
"source": [
"!cat agent.json"
]
},
{
"cell_type": "markdown",
"id": "0eb72510",
"metadata": {},
"source": [
"We can now load the agent back in"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "eb660b76",
"metadata": {},
"outputs": [],
"source": [
"agent = initialize_agent(tools, llm, agent_path=\"agent.json\", verbose=True)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "aa624ea5",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.9"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -152,7 +152,7 @@
],
"metadata": {
"kernelspec": {
"display_name": "Python 3.9.0 64-bit ('llm-env')",
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
@@ -166,7 +166,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.9"
"version": "3.9.1"
}
},
"nbformat": 4,

View File

@@ -3,6 +3,8 @@ How-To Guides
The first category of how-to guides here cover specific parts of working with agents.
`Load From Hub <./examples/load_from_hub.html>`_: This notebook covers how to load agents from `LangChainHub <https://github.com/hwchase17/langchain-hub>`_.
`Custom Tools <./examples/custom_tools.html>`_: How to create custom tools that an agent can use.
`Intermediate Steps <./examples/intermediate_steps.html>`_: How to access and use intermediate steps to get more visibility into the internals of an agent.
@@ -15,6 +17,7 @@ The first category of how-to guides here cover specific parts of working with ag
`Max Iterations <./examples/max_iterations.html>`_: How to restrict an agent to a certain number of iterations.
`Asynchronous <./examples/async_agent.html>`_: Covering asynchronous functionality.
The next set of examples are all end-to-end agents for specific applications.
In all examples there is an Agent with a particular set of tools.

View File

@@ -2,7 +2,7 @@
import time
from langchain.chains.natbot.base import NatBotChain
from langchain.chains.natbot.crawler import Crawler # type: ignore
from langchain.chains.natbot.crawler import Crawler
def run_cmd(cmd: str, _crawler: Crawler) -> None:
@@ -33,7 +33,6 @@ def run_cmd(cmd: str, _crawler: Crawler) -> None:
if __name__ == "__main__":
objective = "Make a reservation for 2 at 7pm at bistro vida in menlo park"
print("\nWelcome to natbot! What is your objective?")
i = input()

View File

@@ -22,6 +22,7 @@ tools = load_tools(tool_names, llm=llm)
```
Below is a list of all supported tools and relevant information:
- Tool Name: The name the LLM refers to the tool by.
- Tool Description: The description of the tool that is passed to the LLM.
- Notes: Notes about the tool that are NOT passed to the LLM.
@@ -31,61 +32,71 @@ Below is a list of all supported tools and relevant information:
## List of Tools
**python_repl**
- Tool Name: Python REPL
- Tool Description: A Python shell. Use this to execute python commands. Input should be a valid python command. If you expect output it should be printed out.
- Notes: Maintains state.
- Requires LLM: No
**serpapi**
- Tool Name: Search
- Tool Description: A search engine. Useful for when you need to answer questions about current events. Input should be a search query.
- Notes: Calls the Serp API and then parses results.
- Requires LLM: No
**wolfram-alpha**
- Tool Name: Wolfram Alpha
- Tool Description: A wolfram alpha search engine. Useful for when you need to answer questions about Math, Science, Technology, Culture, Society and Everyday Life. Input should be a search query.
- Notes: Calls the Wolfram Alpha API and then parses results.
- Requires LLM: No
- Extra Parameters: `wolfram_alpha_appid`: The Wolfram Alpha app id.
**requests**
- Tool Name: Requests
- Tool Description: A portal to the internet. Use this when you need to get specific content from a site. Input should be a specific url, and the output will be all the text on that page.
- Notes: Uses the Python requests module.
- Requires LLM: No
**terminal**
- Tool Name: Terminal
- Tool Description: Executes commands in a terminal. Input should be valid commands, and the output will be any output from running that command.
- Notes: Executes commands with subprocess.
- Requires LLM: No
**pal-math**
- Tool Name: PAL-MATH
- Tool Description: A language model that is excellent at solving complex word math problems. Input should be a fully worded hard word math problem.
- Notes: Based on [this paper](https://arxiv.org/pdf/2211.10435.pdf).
- Requires LLM: Yes
**pal-colored-objects**
- Tool Name: PAL-COLOR-OBJ
- Tool Description: A language model that is wonderful at reasoning about position and the color attributes of objects. Input should be a fully worded hard reasoning problem. Make sure to include all information about the objects AND the final question you want to answer.
- Notes: Based on [this paper](https://arxiv.org/pdf/2211.10435.pdf).
- Requires LLM: Yes
**llm-math**
- Tool Name: Calculator
- Tool Description: Useful for when you need to answer questions about math.
- Notes: An instance of the `LLMMath` chain.
- Requires LLM: Yes
**open-meteo-api**
- Tool Name: Open Meteo API
- Tool Description: Useful for when you want to get weather information from the OpenMeteo API. The input should be a question in natural language that this API can answer.
- Notes: A natural language connection to the Open Meteo API (`https://api.open-meteo.com/`), specifically the `/v1/forecast` endpoint.
- Requires LLM: Yes
**news-api**
- Tool Name: News API
- Tool Description: Use this when you want to get information about the top headlines of current news stories. The input should be a question in natural language that this API can answer.
- Notes: A natural language connection to the News API (`https://newsapi.org`), specifically the `/v2/top-headlines` endpoint.
@@ -93,8 +104,18 @@ Below is a list of all supported tools and relevant information:
- Extra Parameters: `news_api_key` (your API key to access this endpoint)
**tmdb-api**
- Tool Name: TMDB API
- Tool Description: Useful for when you want to get information from The Movie Database. The input should be a question in natural language that this API can answer.
- Notes: A natural language connection to the TMDB API (`https://api.themoviedb.org/3`), specifically the `/search/movie` endpoint.
- Requires LLM: Yes
- Extra Parameters: `tmdb_bearer_token` (your Bearer Token to access this endpoint - note that this is different from the API key)
**google-search**
- Tool Name: Search
- Tool Description: A wrapper around Google Search. Useful for when you need to answer questions about current events. Input should be a search query.
- Notes: Uses the Google Custom Search API
- Requires LLM: No
- Extra Parameters: `google_api_key`, `google_cse_id`
- For more information on this, see [this page](../../ecosystem/google_search.md)

View File

@@ -0,0 +1,132 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "593f7553-7038-498e-96d4-8255e5ce34f0",
"metadata": {},
"source": [
"# Async API for Chain\n",
"\n",
"LangChain provides async support for Chains by leveraging the [asyncio](https://docs.python.org/3/library/asyncio.html) library.\n",
"\n",
"Async methods are currently supported in `LLMChain` (through `arun`, `apredict`, `acall`) and `LLMMathChain` (through `arun` and `acall`), `ChatVectorDBChain`, and [QA chains](https://langchain.readthedocs.io/en/latest/modules/chains/combine_docs_examples/question_answering.html). Async support for other chains is on the roadmap."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "c19c736e-ca74-4726-bb77-0a849bcc2960",
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"BrightSmile Toothpaste Company\n",
"\n",
"\n",
"BrightSmile Toothpaste Co.\n",
"\n",
"\n",
"BrightSmile Toothpaste\n",
"\n",
"\n",
"Gleaming Smile Inc.\n",
"\n",
"\n",
"SparkleSmile Toothpaste\n",
"\u001b[1mConcurrent executed in 1.54 seconds.\u001b[0m\n",
"\n",
"\n",
"BrightSmile Toothpaste Co.\n",
"\n",
"\n",
"MintyFresh Toothpaste Co.\n",
"\n",
"\n",
"SparkleSmile Toothpaste.\n",
"\n",
"\n",
"Pearly Whites Toothpaste Co.\n",
"\n",
"\n",
"BrightSmile Toothpaste.\n",
"\u001b[1mSerial executed in 6.38 seconds.\u001b[0m\n"
]
}
],
"source": [
"import asyncio\n",
"import time\n",
"\n",
"from langchain.llms import OpenAI\n",
"from langchain.prompts import PromptTemplate\n",
"from langchain.chains import LLMChain\n",
"\n",
"\n",
"def generate_serially():\n",
" llm = OpenAI(temperature=0.9)\n",
" prompt = PromptTemplate(\n",
" input_variables=[\"product\"],\n",
" template=\"What is a good name for a company that makes {product}?\",\n",
" )\n",
" chain = LLMChain(llm=llm, prompt=prompt)\n",
" for _ in range(5):\n",
" resp = chain.run(product=\"toothpaste\")\n",
" print(resp)\n",
"\n",
"\n",
"async def async_generate(chain):\n",
" resp = await chain.arun(product=\"toothpaste\")\n",
" print(resp)\n",
"\n",
"\n",
"async def generate_concurrently():\n",
" llm = OpenAI(temperature=0.9)\n",
" prompt = PromptTemplate(\n",
" input_variables=[\"product\"],\n",
" template=\"What is a good name for a company that makes {product}?\",\n",
" )\n",
" chain = LLMChain(llm=llm, prompt=prompt)\n",
" tasks = [async_generate(chain) for _ in range(5)]\n",
" await asyncio.gather(*tasks)\n",
"\n",
"s = time.perf_counter()\n",
"# If running this outside of Jupyter, use asyncio.run(generate_concurrently())\n",
"await generate_concurrently()\n",
"elapsed = time.perf_counter() - s\n",
"print('\\033[1m' + f\"Concurrent executed in {elapsed:0.2f} seconds.\" + '\\033[0m')\n",
"\n",
"s = time.perf_counter()\n",
"generate_serially()\n",
"elapsed = time.perf_counter() - s\n",
"print('\\033[1m' + f\"Serial executed in {elapsed:0.2f} seconds.\" + '\\033[0m')"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.9"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,178 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "ad719b65",
"metadata": {},
"source": [
"# Analyze Document\n",
"\n",
"The AnalyzeDocumentChain is more of an end to chain. This chain takes in a single document, splits it up, and then runs it through a CombineDocumentsChain. This can be used as more of an end-to-end chain."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "15e1a8a2",
"metadata": {},
"outputs": [],
"source": [
"with open('../../state_of_the_union.txt') as f:\n",
" state_of_the_union = f.read()"
]
},
{
"cell_type": "markdown",
"id": "14da4012",
"metadata": {},
"source": [
"## Summarize\n",
"Let's take a look at it in action below, using it summarize a long document."
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "765d6326",
"metadata": {},
"outputs": [],
"source": [
"from langchain import OpenAI\n",
"from langchain.chains.summarize import load_summarize_chain\n",
"\n",
"llm = OpenAI(temperature=0)\n",
"summary_chain = load_summarize_chain(llm, chain_type=\"map_reduce\")"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "3a3d3ebc",
"metadata": {},
"outputs": [],
"source": [
"from langchain.chains import AnalyzeDocumentChain"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "97178aad",
"metadata": {},
"outputs": [],
"source": [
"summarize_document_chain = AnalyzeDocumentChain(combine_docs_chain=summary_chain)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "2e5a7bf7",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"\" In this speech, President Biden addresses the American people and the world, discussing the recent aggression of Russia's Vladimir Putin in Ukraine and the US response. He outlines economic sanctions and other measures taken to hold Putin accountable, and announces the US Department of Justice's task force to go after the crimes of Russian oligarchs. He also announces plans to fight inflation and lower costs for families, invest in American manufacturing, and provide military, economic, and humanitarian assistance to Ukraine. He calls for immigration reform, protecting the rights of women, and advancing the rights of LGBTQ+ Americans, and pays tribute to military families. He concludes with optimism for the future of America.\""
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"summarize_document_chain.run(state_of_the_union)"
]
},
{
"cell_type": "markdown",
"id": "35739404",
"metadata": {},
"source": [
"## Question Answering\n",
"Let's take a look at this using a question answering chain."
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "8b9b7705",
"metadata": {},
"outputs": [],
"source": [
"from langchain.chains.question_answering import load_qa_chain"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "60c309a8",
"metadata": {},
"outputs": [],
"source": [
"qa_chain = load_qa_chain(llm, chain_type=\"map_reduce\")"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "ba1fc940",
"metadata": {},
"outputs": [],
"source": [
"qa_document_chain = AnalyzeDocumentChain(combine_docs_chain=qa_chain)"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "9aa1fbde",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"' The president thanked Justice Breyer for his service.'"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"qa_document_chain.run(input_document=state_of_the_union, question=\"what did the president say about justice breyer?\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7eb02f1e",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.9"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,327 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "134a0785",
"metadata": {},
"source": [
"# Chat Vector DB\n",
"\n",
"This notebook goes over how to set up a chain to chat with a vector database. The only difference between this chain and the [VectorDBQAChain](./vector_db_qa.ipynb) is that this allows for passing in of a chat history which can be used to allow for follow up questions."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "70c4e529",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
"from langchain.vectorstores import Chroma\n",
"from langchain.text_splitter import CharacterTextSplitter\n",
"from langchain.llms import OpenAI\n",
"from langchain.chains import ChatVectorDBChain"
]
},
{
"cell_type": "markdown",
"id": "cdff94be",
"metadata": {},
"source": [
"Load in documents. You can replace this with a loader for whatever type of data you want"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "01c46e92",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"from langchain.document_loaders import TextLoader\n",
"loader = TextLoader('../../state_of_the_union.txt')\n",
"documents = loader.load()"
]
},
{
"cell_type": "markdown",
"id": "e9be4779",
"metadata": {},
"source": [
"If you had multiple loaders that you wanted to combine, you do something like:"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "433363a5",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"# loaders = [....]\n",
"# docs = []\n",
"# for loader in loaders:\n",
"# docs.extend(loader.load())"
]
},
{
"cell_type": "markdown",
"id": "239475d2",
"metadata": {},
"source": [
"We now split the documents, create embeddings for them, and put them in a vectorstore. This allows us to do semantic search over them."
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "a8930cf7",
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Running Chroma using direct local API.\n",
"Using DuckDB in-memory for database. Data will be transient.\n"
]
}
],
"source": [
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
"documents = text_splitter.split_documents(documents)\n",
"\n",
"embeddings = OpenAIEmbeddings()\n",
"vectorstore = Chroma.from_documents(documents, embeddings)"
]
},
{
"cell_type": "markdown",
"id": "3c96b118",
"metadata": {},
"source": [
"We now initialize the ChatVectorDBChain"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "7b4110f3",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"qa = ChatVectorDBChain.from_llm(OpenAI(temperature=0), vectorstore)"
]
},
{
"cell_type": "markdown",
"id": "3872432d",
"metadata": {},
"source": [
"Here's an example of asking a question with no chat history"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "7fe3e730",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"chat_history = []\n",
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
"result = qa({\"question\": query, \"chat_history\": chat_history})"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "bfff9cc8",
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/plain": [
"\" The president said that Ketanji Brown Jackson is one of the nation's top legal minds, a former top litigator in private practice, a former federal public defender, and from a family of public school educators and police officers. He also said that she is a consensus builder and has received a broad range of support from the Fraternal Order of Police to former judges appointed by Democrats and Republicans.\""
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"result[\"answer\"]"
]
},
{
"cell_type": "markdown",
"id": "9e46edf7",
"metadata": {},
"source": [
"Here's an example of asking a question with some chat history"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "00b4cf00",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"chat_history = [(query, result[\"answer\"])]\n",
"query = \"Did he mention who she suceeded\"\n",
"result = qa({\"question\": query, \"chat_history\": chat_history})"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "f01828d1",
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/plain": [
"' Justice Stephen Breyer'"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"result['answer']"
]
},
{
"cell_type": "markdown",
"id": "2324cdc6-98bf-4708-b8cd-02a98b1e5b67",
"metadata": {},
"source": [
"## Chat Vector DB with streaming to `stdout`\n",
"\n",
"Output from the chain will be streamed to `stdout` token by token in this example."
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "2efacec3-2690-4b05-8de3-a32fd2ac3911",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"from langchain.chains.llm import LLMChain\n",
"from langchain.callbacks.base import CallbackManager\n",
"from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler\n",
"from langchain.chains.chat_vector_db.prompts import CONDENSE_QUESTION_PROMPT, QA_PROMPT\n",
"from langchain.chains.question_answering import load_qa_chain\n",
"\n",
"# Construct a ChatVectorDBChain with a streaming llm for combine docs\n",
"# and a separate, non-streaming llm for question generation\n",
"llm = OpenAI(temperature=0)\n",
"streaming_llm = OpenAI(streaming=True, callback_manager=CallbackManager([StreamingStdOutCallbackHandler()]), verbose=True, temperature=0)\n",
"\n",
"question_generator = LLMChain(llm=llm, prompt=CONDENSE_QUESTION_PROMPT)\n",
"doc_chain = load_qa_chain(streaming_llm, chain_type=\"stuff\", prompt=QA_PROMPT)\n",
"\n",
"qa = ChatVectorDBChain(vectorstore=vectorstore, combine_docs_chain=doc_chain, question_generator=question_generator)"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "fd6d43f4-7428-44a4-81bc-26fe88a98762",
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" The president said that Ketanji Brown Jackson is one of the nation's top legal minds, a former top litigator in private practice, a former federal public defender, and from a family of public school educators and police officers. He also said that she is a consensus builder and has received a broad range of support from the Fraternal Order of Police to former judges appointed by Democrats and Republicans."
]
}
],
"source": [
"chat_history = []\n",
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
"result = qa({\"question\": query, \"chat_history\": chat_history})"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "5ab38978-f3e8-4fa7-808c-c79dec48379a",
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" Justice Stephen Breyer"
]
}
],
"source": [
"chat_history = [(query, result[\"answer\"])]\n",
"query = \"Did he mention who she suceeded\"\n",
"result = qa({\"question\": query, \"chat_history\": chat_history})"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a7ea93ff-1899-4171-9c24-85df20ae1a3d",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.9"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,238 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "a6850189",
"metadata": {},
"source": [
"# Graph QA\n",
"\n",
"This notebook goes over how to do question answering over a graph data structure."
]
},
{
"cell_type": "markdown",
"id": "9e516e3e",
"metadata": {},
"source": [
"## Create the graph\n",
"\n",
"In this section, we construct an example graph. At the moment, this works best for small pieces of text."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "3849873d",
"metadata": {},
"outputs": [],
"source": [
"from langchain.indexes import GraphIndexCreator\n",
"from langchain.llms import OpenAI\n",
"from langchain.document_loaders import TextLoader"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "05d65c87",
"metadata": {},
"outputs": [],
"source": [
"index_creator = GraphIndexCreator(llm=OpenAI(temperature=0))"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "0a45a5b9",
"metadata": {},
"outputs": [],
"source": [
"with open(\"../../state_of_the_union.txt\") as f:\n",
" all_text = f.read()"
]
},
{
"cell_type": "markdown",
"id": "3fca3e1b",
"metadata": {},
"source": [
"We will use just a small snippet, because extracting the knowledge triplets is a bit intensive at the moment."
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "80522bd6",
"metadata": {},
"outputs": [],
"source": [
"text = \"\\n\".join(all_text.split(\"\\n\\n\")[105:108])"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "da5aad5a",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'It wont look like much, but if you stop and look closely, youll see a “Field of dreams,” the ground on which Americas future will be built. \\nThis is where Intel, the American company that helped build Silicon Valley, is going to build its $20 billion semiconductor “mega site”. \\nUp to eight state-of-the-art factories in one place. 10,000 new good-paying jobs. '"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"text"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "8dad7b59",
"metadata": {},
"outputs": [],
"source": [
"graph = index_creator.from_text(text)"
]
},
{
"cell_type": "markdown",
"id": "2118f363",
"metadata": {},
"source": [
"We can inspect the created graph."
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "32878c13",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[('Intel', '$20 billion semiconductor \"mega site\"', 'is going to build'),\n",
" ('Intel', 'state-of-the-art factories', 'is building'),\n",
" ('Intel', '10,000 new good-paying jobs', 'is creating'),\n",
" ('Intel', 'Silicon Valley', 'is helping build'),\n",
" ('Field of dreams',\n",
" \"America's future will be built\",\n",
" 'is the ground on which')]"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"graph.get_triples()"
]
},
{
"cell_type": "markdown",
"id": "e9737be1",
"metadata": {},
"source": [
"## Querying the graph\n",
"We can now use the graph QA chain to ask question of the graph"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "76edc854",
"metadata": {},
"outputs": [],
"source": [
"from langchain.chains import GraphQAChain"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "8e7719b4",
"metadata": {},
"outputs": [],
"source": [
"chain = GraphQAChain.from_llm(OpenAI(temperature=0), graph=graph, verbose=True)"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "f6511169",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new GraphQAChain chain...\u001b[0m\n",
"Entities Extracted:\n",
"\u001b[32;1m\u001b[1;3m Intel\u001b[0m\n",
"Full Context:\n",
"\u001b[32;1m\u001b[1;3mIntel is going to build $20 billion semiconductor \"mega site\"\n",
"Intel is building state-of-the-art factories\n",
"Intel is creating 10,000 new good-paying jobs\n",
"Intel is helping build Silicon Valley\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"' Intel is going to build a $20 billion semiconductor \"mega site\" with state-of-the-art factories, creating 10,000 new good-paying jobs and helping to build Silicon Valley.'"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chain.run(\"what is Intel going to build?\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f70b9ada",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -21,7 +21,7 @@
},
{
"cell_type": "code",
"execution_count": 1,
"execution_count": 4,
"id": "78f28130",
"metadata": {},
"outputs": [],
@@ -30,14 +30,14 @@
"from langchain.embeddings.cohere import CohereEmbeddings\n",
"from langchain.text_splitter import CharacterTextSplitter\n",
"from langchain.vectorstores.elastic_vector_search import ElasticVectorSearch\n",
"from langchain.vectorstores.faiss import FAISS\n",
"from langchain.vectorstores import Chroma\n",
"from langchain.docstore.document import Document\n",
"from langchain.prompts import PromptTemplate"
]
},
{
"cell_type": "code",
"execution_count": 2,
"execution_count": 5,
"id": "4da195a3",
"metadata": {},
"outputs": [],
@@ -52,17 +52,26 @@
},
{
"cell_type": "code",
"execution_count": 3,
"execution_count": 6,
"id": "5ec2b55b",
"metadata": {},
"outputs": [],
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Running Chroma using direct local API.\n",
"Using DuckDB in-memory for database. Data will be transient.\n"
]
}
],
"source": [
"docsearch = FAISS.from_texts(texts, embeddings, metadatas=[{\"source\": i} for i in range(len(texts))])"
"docsearch = Chroma.from_texts(texts, embeddings, metadatas=[{\"source\": str(i)} for i in range(len(texts))])"
]
},
{
"cell_type": "code",
"execution_count": 4,
"execution_count": 7,
"id": "5286f58f",
"metadata": {},
"outputs": [],
@@ -73,7 +82,7 @@
},
{
"cell_type": "code",
"execution_count": 5,
"execution_count": 8,
"id": "005a47e9",
"metadata": {},
"outputs": [],
@@ -93,7 +102,7 @@
},
{
"cell_type": "code",
"execution_count": 13,
"execution_count": 9,
"id": "3722373b",
"metadata": {},
"outputs": [
@@ -103,7 +112,7 @@
"{'output_text': ' The president thanked Justice Breyer for his service.\\nSOURCES: 30-pl'}"
]
},
"execution_count": 13,
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
@@ -699,7 +708,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.9"
"version": "3.9.1"
},
"vscode": {
"interpreter": {

View File

@@ -28,7 +28,7 @@
"source": [
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
"from langchain.text_splitter import CharacterTextSplitter\n",
"from langchain.vectorstores.faiss import FAISS\n",
"from langchain.vectorstores import Chroma\n",
"from langchain.docstore.document import Document\n",
"from langchain.prompts import PromptTemplate"
]
@@ -40,27 +40,37 @@
"metadata": {},
"outputs": [],
"source": [
"with open('../../state_of_the_union.txt') as f:\n",
" state_of_the_union = f.read()\n",
"from langchain.document_loaders import TextLoader\n",
"loader = TextLoader('../../state_of_the_union.txt')\n",
"documents = loader.load()\n",
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
"texts = text_splitter.split_text(state_of_the_union)\n",
"texts = text_splitter.split_documents(documents)\n",
"\n",
"embeddings = OpenAIEmbeddings()"
]
},
{
"cell_type": "code",
"execution_count": 3,
"execution_count": 4,
"id": "fd9666a9",
"metadata": {},
"outputs": [],
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Running Chroma using direct local API.\n",
"Using DuckDB in-memory for database. Data will be transient.\n"
]
}
],
"source": [
"docsearch = FAISS.from_texts(texts, embeddings)"
"docsearch = Chroma.from_documents(texts, embeddings)"
]
},
{
"cell_type": "code",
"execution_count": 4,
"execution_count": 5,
"id": "d1eaf6e6",
"metadata": {},
"outputs": [],
@@ -673,7 +683,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.9"
"version": "3.9.1"
},
"vscode": {
"interpreter": {

View File

@@ -18,7 +18,7 @@
"outputs": [],
"source": [
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
"from langchain.vectorstores.faiss import FAISS\n",
"from langchain.vectorstores import Chroma\n",
"from langchain.text_splitter import CharacterTextSplitter\n",
"from langchain import OpenAI, VectorDBQA"
]
@@ -28,15 +28,25 @@
"execution_count": 2,
"id": "5c7049db",
"metadata": {},
"outputs": [],
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Running Chroma using direct local API.\n",
"Using DuckDB in-memory for database. Data will be transient.\n"
]
}
],
"source": [
"with open('../../state_of_the_union.txt') as f:\n",
" state_of_the_union = f.read()\n",
"from langchain.document_loaders import TextLoader\n",
"loader = TextLoader('../../state_of_the_union.txt')\n",
"documents = loader.load()\n",
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
"texts = text_splitter.split_text(state_of_the_union)\n",
"texts = text_splitter.split_documents(documents)\n",
"\n",
"embeddings = OpenAIEmbeddings()\n",
"docsearch = FAISS.from_texts(texts, embeddings)"
"docsearch = Chroma.from_documents(texts, embeddings)"
]
},
{
@@ -58,7 +68,7 @@
{
"data": {
"text/plain": [
"\" The president said that Ketanji Brown Jackson is one of the nation's top legal minds, a former top litigator in private practice, a former federal public defender, from a family of public school educators and police officers, a consensus builder, and has received a broad range of support from the Fraternal Order of Police to former judges appointed by Democrats and Republicans.\""
"\" The president said that Ketanji Brown Jackson is one of the nation's top legal minds, a former top litigator in private practice, a former federal public defender, and from a family of public school educators and police officers. He also said that she is a consensus builder, and has received a broad range of support from the Fraternal Order of Police to former judges appointed by Democrats and Republicans.\""
]
},
"execution_count": 4,
@@ -256,7 +266,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.9"
"version": "3.9.1"
},
"vscode": {
"interpreter": {

View File

@@ -5,7 +5,7 @@
"id": "efc5be67",
"metadata": {},
"source": [
"# VectorDB Question Ansering with Sources\n",
"# VectorDB Question Answering with Sources\n",
"\n",
"This notebook goes over how to do question-answering with sources over a vector database. It does this by using the `VectorDBQAWithSourcesChain`, which does the lookup of the documents from a vector database. "
]
@@ -21,7 +21,7 @@
"from langchain.embeddings.cohere import CohereEmbeddings\n",
"from langchain.text_splitter import CharacterTextSplitter\n",
"from langchain.vectorstores.elastic_vector_search import ElasticVectorSearch\n",
"from langchain.vectorstores.faiss import FAISS"
"from langchain.vectorstores import Chromaoma"
]
},
{
@@ -41,29 +41,27 @@
},
{
"cell_type": "code",
"execution_count": 3,
"execution_count": 5,
"id": "0e745d99",
"metadata": {},
"outputs": [],
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Running Chroma using direct local API.\n",
"Using DuckDB in-memory for database. Data will be transient.\n",
"Exiting: Cleaning up .chroma directory\n"
]
}
],
"source": [
"docsearch = FAISS.from_texts(texts, embeddings)"
"docsearch = Chroma.from_texts(texts, embeddings, metadatas=[{\"source\": f\"{i}-pl\"} for i in range(len(texts))])"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "f42d79dc",
"metadata": {},
"outputs": [],
"source": [
"# Add in a fake source information\n",
"for i, d in enumerate(docsearch.docstore._dict.values()):\n",
" d.metadata = {'source': f\"{i}-pl\"}"
]
},
{
"cell_type": "code",
"execution_count": 5,
"execution_count": 6,
"id": "8aa571ae",
"metadata": {},
"outputs": [],
@@ -73,7 +71,7 @@
},
{
"cell_type": "code",
"execution_count": 6,
"execution_count": 7,
"id": "aa859d4c",
"metadata": {},
"outputs": [],
@@ -85,18 +83,18 @@
},
{
"cell_type": "code",
"execution_count": 7,
"execution_count": 8,
"id": "8ba36fa7",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'answer': ' The president thanked Justice Breyer for his service.\\n',\n",
"{'answer': ' The president thanked Justice Breyer for his service and mentioned his legacy of excellence.\\n',\n",
" 'sources': '30-pl'}"
]
},
"execution_count": 7,
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
@@ -165,7 +163,7 @@
"source": [
"from langchain.chains.qa_with_sources import load_qa_with_sources_chain\n",
"qa_chain = load_qa_with_sources_chain(OpenAI(temperature=0), chain_type=\"stuff\")\n",
"qa = VectorDBQAWithSourcesChain(combine_document_chain=qa_chain, vectorstore=docsearch)"
"qa = VectorDBQAWithSourcesChain(combine_documents_chain=qa_chain, vectorstore=docsearch)"
]
},
{
@@ -187,7 +185,7 @@
}
],
"source": [
"chain({\"question\": \"What did the president say about Justice Breyer\"}, return_only_outputs=True)"
"qa({\"question\": \"What did the president say about Justice Breyer\"}, return_only_outputs=True)"
]
}
],
@@ -207,7 +205,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.9"
"version": "3.9.1"
},
"vscode": {
"interpreter": {

View File

@@ -0,0 +1,199 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Vector DB Text Generation\n",
"\n",
"This notebook walks through how to use LangChain for text generation over a vector index. This is useful if we want to generate text that is able to draw from a large body of custom text, for example, generating blog posts that have an understanding of previous blog posts written, or product tutorials that can refer to product documentation."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Prepare Data\n",
"\n",
"First, we prepare the data. For this example, we fetch a documentation site that consists of markdown files hosted on Github and split them into small enough Documents."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"from langchain.llms import OpenAI\n",
"from langchain.docstore.document import Document\n",
"import requests\n",
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
"from langchain.vectorstores import Chromama\n",
"from langchain.text_splitter import CharacterTextSplitter\n",
"from langchain.prompts import PromptTemplate\n",
"import pathlib\n",
"import subprocess\n",
"import tempfile"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Cloning into '.'...\n"
]
}
],
"source": [
"def get_github_docs(repo_owner, repo_name):\n",
" with tempfile.TemporaryDirectory() as d:\n",
" subprocess.check_call(\n",
" f\"git clone --depth 1 https://github.com/{repo_owner}/{repo_name}.git .\",\n",
" cwd=d,\n",
" shell=True,\n",
" )\n",
" git_sha = (\n",
" subprocess.check_output(\"git rev-parse HEAD\", shell=True, cwd=d)\n",
" .decode(\"utf-8\")\n",
" .strip()\n",
" )\n",
" repo_path = pathlib.Path(d)\n",
" markdown_files = list(repo_path.glob(\"*/*.md\")) + list(\n",
" repo_path.glob(\"*/*.mdx\")\n",
" )\n",
" for markdown_file in markdown_files:\n",
" with open(markdown_file, \"r\") as f:\n",
" relative_path = markdown_file.relative_to(repo_path)\n",
" github_url = f\"https://github.com/{repo_owner}/{repo_name}/blob/{git_sha}/{relative_path}\"\n",
" yield Document(page_content=f.read(), metadata={\"source\": github_url})\n",
"\n",
"sources = get_github_docs(\"yirenlu92\", \"deno-manual-forked\")\n",
"\n",
"source_chunks = []\n",
"splitter = CharacterTextSplitter(separator=\" \", chunk_size=1024, chunk_overlap=0)\n",
"for source in sources:\n",
" for chunk in splitter.split_text(source.page_content):\n",
" source_chunks.append(Document(page_content=chunk, metadata=source.metadata))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Set Up Vector DB\n",
"\n",
"Now that we have the documentation content in chunks, let's put all this information in a vector index for easy retrieval."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"search_index = Chroma.from_documents(source_chunks, OpenAIEmbeddings())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Set Up LLM Chain with Custom Prompt\n",
"\n",
"Next, let's set up a simple LLM chain but give it a custom prompt for blog post generation. Note that the custom prompt is parameterized and takes two inputs: `context`, which will be the documents fetched from the vector search, and `topic`, which is given by the user."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"from langchain.chains import LLMChain\n",
"prompt_template = \"\"\"Use the context below to write a 400 word blog post about the topic below:\n",
" Context: {context}\n",
" Topic: {topic}\n",
" Blog post:\"\"\"\n",
"\n",
"PROMPT = PromptTemplate(\n",
" template=prompt_template, input_variables=[\"context\", \"topic\"]\n",
")\n",
"\n",
"llm = OpenAI(temperature=0)\n",
"\n",
"chain = LLMChain(llm=llm, prompt=PROMPT)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Generate Text\n",
"\n",
"Finally, we write a function to apply our inputs to the chain. The function takes an input parameter `topic`. We find the documents in the vector index that correspond to that `topic`, and use them as additional context in our simple LLM chain."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"def generate_blog_post(topic):\n",
" docs = search_index.similarity_search(topic, k=4)\n",
" inputs = [{\"context\": doc.page_content, \"topic\": topic} for doc in docs]\n",
" print(chain.apply(inputs))"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[{'text': '\\n\\nEnvironment variables are a great way to store and access sensitive information in your Deno applications. Deno offers built-in support for environment variables with `Deno.env`, and you can also use a `.env` file to store and access environment variables.\\n\\nUsing `Deno.env` is simple. It has getter and setter methods, so you can easily set and retrieve environment variables. For example, you can set the `FIREBASE_API_KEY` and `FIREBASE_AUTH_DOMAIN` environment variables like this:\\n\\n```ts\\nDeno.env.set(\"FIREBASE_API_KEY\", \"examplekey123\");\\nDeno.env.set(\"FIREBASE_AUTH_DOMAIN\", \"firebasedomain.com\");\\n\\nconsole.log(Deno.env.get(\"FIREBASE_API_KEY\")); // examplekey123\\nconsole.log(Deno.env.get(\"FIREBASE_AUTH_DOMAIN\")); // firebasedomain.com\\n```\\n\\nYou can also store environment variables in a `.env` file. This is a great'}, {'text': '\\n\\nEnvironment variables are a powerful tool for managing configuration settings in a program. They allow us to set values that can be used by the program, without having to hard-code them into the code. This makes it easier to change settings without having to modify the code.\\n\\nIn Deno, environment variables can be set in a few different ways. The most common way is to use the `VAR=value` syntax. This will set the environment variable `VAR` to the value `value`. This can be used to set any number of environment variables before running a command. For example, if we wanted to set the environment variable `VAR` to `hello` before running a Deno command, we could do so like this:\\n\\n```\\nVAR=hello deno run main.ts\\n```\\n\\nThis will set the environment variable `VAR` to `hello` before running the command. We can then access this variable in our code using the `Deno.env.get()` function. For example, if we ran the following command:\\n\\n```\\nVAR=hello && deno eval \"console.log(\\'Deno: \\' + Deno.env.get(\\'VAR'}, {'text': '\\n\\nEnvironment variables are a powerful tool for developers, allowing them to store and access data without having to hard-code it into their applications. In Deno, you can access environment variables using the `Deno.env.get()` function.\\n\\nFor example, if you wanted to access the `HOME` environment variable, you could do so like this:\\n\\n```js\\n// env.js\\nDeno.env.get(\"HOME\");\\n```\\n\\nWhen running this code, you\\'ll need to grant the Deno process access to environment variables. This can be done by passing the `--allow-env` flag to the `deno run` command. You can also specify which environment variables you want to grant access to, like this:\\n\\n```shell\\n# Allow access to only the HOME env var\\ndeno run --allow-env=HOME env.js\\n```\\n\\nIt\\'s important to note that environment variables are case insensitive on Windows, so Deno also matches them case insensitively (on Windows only).\\n\\nAnother thing to be aware of when using environment variables is subprocess permissions. Subprocesses are powerful and can access system resources regardless of the permissions you granted to the Den'}, {'text': '\\n\\nEnvironment variables are an important part of any programming language, and Deno is no exception. Deno is a secure JavaScript and TypeScript runtime built on the V8 JavaScript engine, and it recently added support for environment variables. This feature was added in Deno version 1.6.0, and it is now available for use in Deno applications.\\n\\nEnvironment variables are used to store information that can be used by programs. They are typically used to store configuration information, such as the location of a database or the name of a user. In Deno, environment variables are stored in the `Deno.env` object. This object is similar to the `process.env` object in Node.js, and it allows you to access and set environment variables.\\n\\nThe `Deno.env` object is a read-only object, meaning that you cannot directly modify the environment variables. Instead, you must use the `Deno.env.set()` function to set environment variables. This function takes two arguments: the name of the environment variable and the value to set it to. For example, if you wanted to set the `FOO` environment variable to `bar`, you would use the following code:\\n\\n```'}]\n"
]
}
],
"source": [
"generate_blog_post(\"environment variables\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -11,10 +11,18 @@ The examples here are all end-to-end chains for working with documents.
`Summarization <./combine_docs_examples/summarize.html>`_: A walkthrough of how to use LangChain for summarization over specific documents.
`Vector DB Text Generation <./combine_docs_examples/vector_db_text_generation.html>`_: A walkthrough of how to use LangChain for text generation over a vector database.
`Vector DB Question Answering <./combine_docs_examples/vector_db_qa.html>`_: A walkthrough of how to use LangChain for question answering over a vector database.
`Vector DB Question Answering with Sources <./combine_docs_examples/vector_db_qa_with_sources.html>`_: A walkthrough of how to use LangChain for question answering (with sources) over a vector database.
`Graph Question Answering <./combine_docs_examples/graph_qa.html>`_: A walkthrough of how to use LangChain for question answering (with sources) over a graph database.
`Chat Vector DB <./combine_docs_examples/chat_vector_db.html>`_: A walkthrough of how to use LangChain as a chatbot over a vector database.
`Analyze Document <./combine_docs_examples/analyze_document.html>`_: A walkthrough of how to use LangChain to analyze long documents.
.. toctree::
:maxdepth: 1

View File

@@ -21,6 +21,24 @@
"from langchain import OpenAI"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9a58e15e",
"metadata": {},
"outputs": [],
"source": [
"llm = OpenAI(model_name='code-davinci-002', temperature=0, max_tokens=512)"
]
},
{
"cell_type": "markdown",
"id": "095adc76",
"metadata": {},
"source": [
"## Math Prompt"
]
},
{
"cell_type": "code",
"execution_count": 2,
@@ -28,7 +46,6 @@
"metadata": {},
"outputs": [],
"source": [
"llm = OpenAI(model_name='code-davinci-002', temperature=0, max_tokens=512)\n",
"pal_chain = PALChain.from_math_prompt(llm, verbose=True)"
]
},
@@ -64,7 +81,7 @@
" result = total_pets\n",
" return result\u001b[0m\n",
"\n",
"\u001b[1m> Finished PALChain chain.\u001b[0m\n"
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
@@ -82,6 +99,14 @@
"pal_chain.run(question)"
]
},
{
"cell_type": "markdown",
"id": "0269d20a",
"metadata": {},
"source": [
"## Colored Objects"
]
},
{
"cell_type": "code",
"execution_count": 5,
@@ -89,7 +114,6 @@
"metadata": {},
"outputs": [],
"source": [
"llm = OpenAI(model_name='code-davinci-002', temperature=0, max_tokens=512)\n",
"pal_chain = PALChain.from_colored_object_prompt(llm, verbose=True)"
]
},
@@ -147,10 +171,94 @@
"pal_chain.run(question)"
]
},
{
"cell_type": "markdown",
"id": "fc3d7f10",
"metadata": {},
"source": [
"## Intermediate Steps\n",
"You can also use the intermediate steps flag to return the code executed that generates the answer."
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "9d2d9c61",
"metadata": {},
"outputs": [],
"source": [
"pal_chain = PALChain.from_colored_object_prompt(llm, verbose=True, return_intermediate_steps=True)"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "b29b971b",
"metadata": {},
"outputs": [],
"source": [
"question = \"On the desk, you see two blue booklets, two purple booklets, and two yellow pairs of sunglasses. If I remove all the pairs of sunglasses from the desk, how many purple items remain on it?\""
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "a2c40c28",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new PALChain chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3m# Put objects into a list to record ordering\n",
"objects = []\n",
"objects += [('booklet', 'blue')] * 2\n",
"objects += [('booklet', 'purple')] * 2\n",
"objects += [('sunglasses', 'yellow')] * 2\n",
"\n",
"# Remove all pairs of sunglasses\n",
"objects = [object for object in objects if object[0] != 'sunglasses']\n",
"\n",
"# Count number of purple objects\n",
"num_purple = len([object for object in objects if object[1] == 'purple'])\n",
"answer = num_purple\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
}
],
"source": [
"result = pal_chain({\"question\": question})"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "efddd033",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"\"# Put objects into a list to record ordering\\nobjects = []\\nobjects += [('booklet', 'blue')] * 2\\nobjects += [('booklet', 'purple')] * 2\\nobjects += [('sunglasses', 'yellow')] * 2\\n\\n# Remove all pairs of sunglasses\\nobjects = [object for object in objects if object[0] != 'sunglasses']\\n\\n# Count number of purple objects\\nnum_purple = len([object for object in objects if object[1] == 'purple'])\\nanswer = num_purple\""
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"result['intermediate_steps']"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4ab20fec",
"id": "dfd88594",
"metadata": {},
"outputs": [],
"source": []

View File

@@ -56,9 +56,17 @@
"llm = OpenAI(temperature=0)"
]
},
{
"cell_type": "markdown",
"id": "3d1e692e",
"metadata": {},
"source": [
"**NOTE:** For data-sensitive projects, you can specify `return_direct=True` in the `SQLDatabaseChain` initialization to directly return the output of the SQL query without any additional formatting. This prevents the LLM from seeing any contents within the database. Note, however, the LLM still has access to the database scheme (i.e. dialect, table and key names) by default."
]
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 3,
"id": "a8fc8f23",
"metadata": {},
"outputs": [],
@@ -68,7 +76,7 @@
},
{
"cell_type": "code",
"execution_count": 3,
"execution_count": 4,
"id": "15ff81df",
"metadata": {
"pycharm": {
@@ -85,18 +93,18 @@
"\u001b[1m> Entering new SQLDatabaseChain chain...\u001b[0m\n",
"How many employees are there? \n",
"SQLQuery:\u001b[32;1m\u001b[1;3m SELECT COUNT(*) FROM Employee;\u001b[0m\n",
"SQLResult: \u001b[33;1m\u001b[1;3m[(9,)]\u001b[0m\n",
"Answer:\u001b[32;1m\u001b[1;3m There are 9 employees.\u001b[0m\n",
"SQLResult: \u001b[33;1m\u001b[1;3m[(8,)]\u001b[0m\n",
"Answer:\u001b[32;1m\u001b[1;3m There are 8 employees.\u001b[0m\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"' There are 9 employees.'"
"' There are 8 employees.'"
]
},
"execution_count": 3,
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
@@ -168,15 +176,15 @@
"\u001b[1m> Entering new SQLDatabaseChain chain...\u001b[0m\n",
"How many employees are there in the foobar table? \n",
"SQLQuery:\u001b[32;1m\u001b[1;3m SELECT COUNT(*) FROM Employee;\u001b[0m\n",
"SQLResult: \u001b[33;1m\u001b[1;3m[(9,)]\u001b[0m\n",
"Answer:\u001b[32;1m\u001b[1;3m There are 9 employees in the foobar table.\u001b[0m\n",
"SQLResult: \u001b[33;1m\u001b[1;3m[(8,)]\u001b[0m\n",
"Answer:\u001b[32;1m\u001b[1;3m There are 8 employees in the foobar table.\u001b[0m\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"' There are 9 employees in the foobar table.'"
"' There are 8 employees in the foobar table.'"
]
},
"execution_count": 7,
@@ -188,6 +196,62 @@
"db_chain.run(\"How many employees are there in the foobar table?\")"
]
},
{
"cell_type": "markdown",
"id": "88d8b969",
"metadata": {},
"source": [
"## Return Intermediate Steps\n",
"\n",
"You can also return the intermediate steps of the SQLDatabaseChain. This allows you to access the SQL statement that was generated, as well as the result of running that against the SQL Database."
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "38559487",
"metadata": {},
"outputs": [],
"source": [
"db_chain = SQLDatabaseChain(llm=llm, database=db, prompt=PROMPT, verbose=True, return_intermediate_steps=True)"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "78b6af4d",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new SQLDatabaseChain chain...\u001b[0m\n",
"How many employees are there in the foobar table? \n",
"SQLQuery:\u001b[32;1m\u001b[1;3m SELECT COUNT(*) FROM Employee;\u001b[0m\n",
"SQLResult: \u001b[33;1m\u001b[1;3m[(8,)]\u001b[0m\n",
"Answer:\u001b[32;1m\u001b[1;3m There are 8 employees in the foobar table.\u001b[0m\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"[' SELECT COUNT(*) FROM Employee;', '[(8,)]']"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"result = db_chain(\"How many employees are there in the foobar table?\")\n",
"result[\"intermediate_steps\"]"
]
},
{
"cell_type": "markdown",
"id": "b408f800",
@@ -199,7 +263,7 @@
},
{
"cell_type": "code",
"execution_count": 3,
"execution_count": 10,
"id": "6adaa799",
"metadata": {},
"outputs": [],
@@ -209,7 +273,7 @@
},
{
"cell_type": "code",
"execution_count": 8,
"execution_count": 11,
"id": "edfc8a8e",
"metadata": {},
"outputs": [
@@ -221,19 +285,19 @@
"\n",
"\u001b[1m> Entering new SQLDatabaseChain chain...\u001b[0m\n",
"What are some example tracks by composer Johann Sebastian Bach? \n",
"SQLQuery:\u001b[32;1m\u001b[1;3m SELECT Name FROM Track WHERE Composer = 'Johann Sebastian Bach' LIMIT 3;\u001b[0m\n",
"SQLResult: \u001b[33;1m\u001b[1;3m[('Concerto for 2 Violins in D Minor, BWV 1043: I. Vivace',), ('Aria Mit 30 Veränderungen, BWV 988 \"Goldberg Variations\": Aria',), ('Suite for Solo Cello No. 1 in G Major, BWV 1007: I. Prélude',)]\u001b[0m\n",
"Answer:\u001b[32;1m\u001b[1;3m Examples of tracks by Johann Sebastian Bach include 'Concerto for 2 Violins in D Minor, BWV 1043: I. Vivace', 'Aria Mit 30 Veränderungen, BWV 988 \"Goldberg Variations\": Aria', and 'Suite for Solo Cello No. 1 in G Major, BWV 1007: I. Prélude'.\u001b[0m\n",
"SQLQuery:\u001b[32;1m\u001b[1;3m SELECT Name, Composer FROM Track WHERE Composer = 'Johann Sebastian Bach' LIMIT 3;\u001b[0m\n",
"SQLResult: \u001b[33;1m\u001b[1;3m[('Concerto for 2 Violins in D Minor, BWV 1043: I. Vivace', 'Johann Sebastian Bach'), ('Aria Mit 30 Veränderungen, BWV 988 \"Goldberg Variations\": Aria', 'Johann Sebastian Bach'), ('Suite for Solo Cello No. 1 in G Major, BWV 1007: I. Prélude', 'Johann Sebastian Bach')]\u001b[0m\n",
"Answer:\u001b[32;1m\u001b[1;3m Examples of tracks by composer Johann Sebastian Bach are 'Concerto for 2 Violins in D Minor, BWV 1043: I. Vivace', 'Aria Mit 30 Veränderungen, BWV 988 \"Goldberg Variations\": Aria', and 'Suite for Solo Cello No. 1 in G Major, BWV 1007: I. Prélude'.\u001b[0m\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"' Examples of tracks by Johann Sebastian Bach include \\'Concerto for 2 Violins in D Minor, BWV 1043: I. Vivace\\', \\'Aria Mit 30 Veränderungen, BWV 988 \"Goldberg Variations\": Aria\\', and \\'Suite for Solo Cello No. 1 in G Major, BWV 1007: I. Prélude\\'.'"
"' Examples of tracks by composer Johann Sebastian Bach are \\'Concerto for 2 Violins in D Minor, BWV 1043: I. Vivace\\', \\'Aria Mit 30 Veränderungen, BWV 988 \"Goldberg Variations\": Aria\\', and \\'Suite for Solo Cello No. 1 in G Major, BWV 1007: I. Prélude\\'.'"
]
},
"execution_count": 8,
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
@@ -242,6 +306,107 @@
"db_chain.run(\"What are some example tracks by composer Johann Sebastian Bach?\")"
]
},
{
"cell_type": "markdown",
"id": "bcc5e936",
"metadata": {},
"source": [
"## Adding example rows from each table\n",
"Sometimes, the format of the data is not obvious and it is optimal to include a sample of rows from the tables in the prompt to allow the LLM to understand the data before providing a final query. Here we will use this feature to let the LLM know that artists are saved with their full names by providing two rows from the `Track` table."
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "9a22ee47",
"metadata": {},
"outputs": [],
"source": [
"db = SQLDatabase.from_uri(\n",
" \"sqlite:///../../../../notebooks/Chinook.db\",\n",
" include_tables=['Track'], # we include only one table to save tokens in the prompt :)\n",
" sample_rows_in_table_info=2)"
]
},
{
"cell_type": "markdown",
"id": "952c0b4d",
"metadata": {},
"source": [
"The sample rows are added to the prompt after each corresponding table's column information:"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "9de86267",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
" Table data will be described in the following format:\n",
"\n",
" Table 'table name' has columns: {column1 name: (column1 type, [list of example values for column1]),\n",
" column2 name: (column2 type, [list of example values for column2], ...)\n",
"\n",
" These are the tables you can use, together with their column information:\n",
"\n",
" Table 'Track' has columns: {'TrackId': ['INTEGER', ['1', '2']], 'Name': ['NVARCHAR(200)', ['For Those About To Rock (We Salute You)', 'Balls to the Wall']], 'AlbumId': ['INTEGER', ['1', '2']], 'MediaTypeId': ['INTEGER', ['1', '2']], 'GenreId': ['INTEGER', ['1', '1']], 'Composer': ['NVARCHAR(220)', ['Angus Young, Malcolm Young, Brian Johnson', 'None']], 'Milliseconds': ['INTEGER', ['343719', '342562']], 'Bytes': ['INTEGER', ['11170334', '5510424']], 'UnitPrice': ['NUMERIC(10, 2)', ['0.99', '0.99']]}\n"
]
}
],
"source": [
"print(db.table_info)"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "bcb7a489",
"metadata": {},
"outputs": [],
"source": [
"db_chain = SQLDatabaseChain(llm=llm, database=db, verbose=True)"
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "81e05d82",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new SQLDatabaseChain chain...\u001b[0m\n",
"What are some example tracks by Bach? \n",
"SQLQuery:\u001b[32;1m\u001b[1;3m SELECT Name FROM Track WHERE Composer LIKE '%Bach%' LIMIT 5;\u001b[0m\n",
"SQLResult: \u001b[33;1m\u001b[1;3m[('American Woman',), ('Concerto for 2 Violins in D Minor, BWV 1043: I. Vivace',), ('Aria Mit 30 Veränderungen, BWV 988 \"Goldberg Variations\": Aria',), ('Suite for Solo Cello No. 1 in G Major, BWV 1007: I. Prélude',), ('Toccata and Fugue in D Minor, BWV 565: I. Toccata',)]\u001b[0m\n",
"Answer:\u001b[32;1m\u001b[1;3m Some example tracks by Bach are 'American Woman', 'Concerto for 2 Violins in D Minor, BWV 1043: I. Vivace', 'Aria Mit 30 Veränderungen, BWV 988 \"Goldberg Variations\": Aria', 'Suite for Solo Cello No. 1 in G Major, BWV 1007: I. Prélude', and 'Toccata and Fugue in D Minor, BWV 565: I. Toccata'.\u001b[0m\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"' Some example tracks by Bach are \\'American Woman\\', \\'Concerto for 2 Violins in D Minor, BWV 1043: I. Vivace\\', \\'Aria Mit 30 Veränderungen, BWV 988 \"Goldberg Variations\": Aria\\', \\'Suite for Solo Cello No. 1 in G Major, BWV 1007: I. Prélude\\', and \\'Toccata and Fugue in D Minor, BWV 565: I. Toccata\\'.'"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"db_chain.run(\"What are some example tracks by Bach?\")"
]
},
{
"cell_type": "markdown",
"id": "c12ae15a",
@@ -266,7 +431,7 @@
"metadata": {},
"outputs": [],
"source": [
"from langchain.chains.sql_database.base import SQLDatabaseSequentialChain"
"from langchain.chains import SQLDatabaseSequentialChain"
]
},
{
@@ -293,14 +458,22 @@
"\n",
"\u001b[1m> Entering new SQLDatabaseSequentialChain chain...\u001b[0m\n",
"Table names to use:\n",
"\u001b[33;1m\u001b[1;3m['Employee', 'Customer']\u001b[0m\n",
"\u001b[33;1m\u001b[1;3m['Customer', 'Employee']\u001b[0m\n",
"\n",
"\u001b[1m> Entering new SQLDatabaseChain chain...\u001b[0m\n",
"How many employees are also customers? \n",
"SQLQuery:\u001b[32;1m\u001b[1;3m SELECT COUNT(*) FROM Customer c INNER JOIN Employee e ON c.SupportRepId = e.EmployeeId;\u001b[0m\n",
"SQLResult: \u001b[33;1m\u001b[1;3m[(59,)]\u001b[0m\n",
"Answer:\u001b[32;1m\u001b[1;3m There are 59 employees who are also customers.\u001b[0m\n",
"\u001b[1m> Finished chain.\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"' 0 employees are also customers.'"
"' There are 59 employees who are also customers.'"
]
},
"execution_count": 5,
@@ -311,17 +484,13 @@
"source": [
"chain.run(\"How many employees are also customers?\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b2998b03",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"@webio": {
"lastCommId": null,
"lastKernelId": null
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
@@ -337,7 +506,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.9"
"version": "3.9.1"
}
},
"nbformat": 4,

View File

@@ -0,0 +1,167 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "25c90e9e",
"metadata": {},
"source": [
"# Loading from LangChainHub\n",
"\n",
"This notebook covers how to load chains from [LangChainHub](https://github.com/hwchase17/langchain-hub)."
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "8b54479e",
"metadata": {},
"outputs": [],
"source": [
"from langchain.chains import load_chain\n",
"\n",
"chain = load_chain(\"lc://chains/llm-math/chain.json\")"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "4828f31f",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new LLMMathChain chain...\u001b[0m\n",
"whats 2 raised to .12\u001b[32;1m\u001b[1;3m\n",
"Answer: 1.0791812460476249\u001b[0m\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"'Answer: 1.0791812460476249'"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chain.run(\"whats 2 raised to .12\")"
]
},
{
"cell_type": "markdown",
"id": "8db72cda",
"metadata": {},
"source": [
"Sometimes chains will require extra arguments that were not serialized with the chain. For example, a chain that does question answering over a vector database will require a vector database."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "aab39528",
"metadata": {},
"outputs": [],
"source": [
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
"from langchain.vectorstores import Chroma\n",
"from langchain.text_splitter import CharacterTextSplitter\n",
"from langchain import OpenAI, VectorDBQA"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "16a85d5e",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Running Chroma using direct local API.\n",
"Using DuckDB in-memory for database. Data will be transient.\n"
]
}
],
"source": [
"from langchain.document_loaders import TextLoader\n",
"loader = TextLoader('../../state_of_the_union.txt')\n",
"documents = loader.load()\n",
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
"texts = text_splitter.split_documents(documents)\n",
"\n",
"embeddings = OpenAIEmbeddings()\n",
"vectorstore = Chroma.from_documents(texts, embeddings)"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "6a82e91e",
"metadata": {},
"outputs": [],
"source": [
"chain = load_chain(\"lc://chains/vector-db-qa/stuff/chain.json\", vectorstore=vectorstore)"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "efe9b25b",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"\" The president said that Ketanji Brown Jackson is a Circuit Court of Appeals Judge, one of the nation's top legal minds, a former top litigator in private practice, a former federal public defender, has received a broad range of support from the Fraternal Order of Police to former judges appointed by Democrats and Republicans, and will continue Justice Breyer's legacy of excellence.\""
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
"chain.run(query)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f910a32f",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,13 @@
{
"model_name": "text-davinci-003",
"temperature": 0.0,
"max_tokens": 256,
"top_p": 1,
"frequency_penalty": 0,
"presence_penalty": 0,
"n": 1,
"best_of": 1,
"request_timeout": null,
"logit_bias": {},
"_type": "openai"
}

View File

@@ -121,10 +121,51 @@
"llm_chain.predict(adjective=\"sad\", subject=\"ducks\")"
]
},
{
"cell_type": "markdown",
"id": "672f59d4",
"metadata": {},
"source": [
"## From string\n",
"You can also construct an LLMChain from a string template directly."
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "f8bc262e",
"metadata": {},
"outputs": [],
"source": [
"template = \"\"\"Write a {adjective} poem about {subject}.\"\"\"\n",
"llm_chain = LLMChain.from_string(llm=OpenAI(temperature=0), template=template)\n"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "cb164a76",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"\"\\n\\nThe ducks swim in the pond,\\nTheir feathers so soft and warm,\\nBut they can't help but feel so forlorn.\\n\\nTheir quacks echo in the air,\\nBut no one is there to hear,\\nFor they have no one to share.\\n\\nThe ducks paddle around in circles,\\nTheir heads hung low in despair,\\nFor they have no one to care.\\n\\nThe ducks look up to the sky,\\nBut no one is there to see,\\nFor they have no one to be.\\n\\nThe ducks drift away in the night,\\nTheir hearts filled with sorrow and pain,\\nFor they have no one to gain.\""
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"llm_chain.predict(adjective=\"sad\", subject=\"ducks\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8310cdaa",
"id": "9f0adbc7",
"metadata": {},
"outputs": [],
"source": []

View File

@@ -0,0 +1,27 @@
{
"memory": null,
"verbose": true,
"prompt": {
"input_variables": [
"question"
],
"output_parser": null,
"template": "Question: {question}\n\nAnswer: Let's think step by step.",
"template_format": "f-string"
},
"llm": {
"model_name": "text-davinci-003",
"temperature": 0.0,
"max_tokens": 256,
"top_p": 1,
"frequency_penalty": 0,
"presence_penalty": 0,
"n": 1,
"best_of": 1,
"request_timeout": null,
"logit_bias": {},
"_type": "openai"
},
"output_key": "text",
"_type": "llm_chain"
}

View File

@@ -0,0 +1,8 @@
{
"memory": null,
"verbose": true,
"prompt_path": "prompt.json",
"llm_path": "llm.json",
"output_key": "text",
"_type": "llm_chain"
}

View File

@@ -0,0 +1,8 @@
{
"input_variables": [
"question"
],
"output_parser": null,
"template": "Question: {question}\n\nAnswer: Let's think step by step.",
"template_format": "f-string"
}

View File

@@ -0,0 +1,376 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "cbe47c3a",
"metadata": {},
"source": [
"# Serialization\n",
"This notebook covers how to serialize chains to and from disk. The serialization format we use is json or yaml. Currently, only some chains support this type of serialization. We will grow the number of supported chains over time.\n"
]
},
{
"cell_type": "markdown",
"id": "e4a8a447",
"metadata": {},
"source": [
"## Saving a chain to disk\n",
"First, let's go over how to save a chain to disk. This can be done with the `.save` method, and specifying a file path with a json or yaml extension."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "26e28451",
"metadata": {},
"outputs": [],
"source": [
"from langchain import PromptTemplate, OpenAI, LLMChain\n",
"template = \"\"\"Question: {question}\n",
"\n",
"Answer: Let's think step by step.\"\"\"\n",
"prompt = PromptTemplate(template=template, input_variables=[\"question\"])\n",
"llm_chain = LLMChain(prompt=prompt, llm=OpenAI(temperature=0), verbose=True)\n"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "bfa18e1f",
"metadata": {},
"outputs": [],
"source": [
"llm_chain.save(\"llm_chain.json\")"
]
},
{
"cell_type": "markdown",
"id": "ea82665d",
"metadata": {},
"source": [
"Let's now take a look at what's inside this saved file"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "0fd33328",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{\r\n",
" \"memory\": null,\r\n",
" \"verbose\": true,\r\n",
" \"prompt\": {\r\n",
" \"input_variables\": [\r\n",
" \"question\"\r\n",
" ],\r\n",
" \"output_parser\": null,\r\n",
" \"template\": \"Question: {question}\\n\\nAnswer: Let's think step by step.\",\r\n",
" \"template_format\": \"f-string\"\r\n",
" },\r\n",
" \"llm\": {\r\n",
" \"model_name\": \"text-davinci-003\",\r\n",
" \"temperature\": 0.0,\r\n",
" \"max_tokens\": 256,\r\n",
" \"top_p\": 1,\r\n",
" \"frequency_penalty\": 0,\r\n",
" \"presence_penalty\": 0,\r\n",
" \"n\": 1,\r\n",
" \"best_of\": 1,\r\n",
" \"request_timeout\": null,\r\n",
" \"logit_bias\": {},\r\n",
" \"_type\": \"openai\"\r\n",
" },\r\n",
" \"output_key\": \"text\",\r\n",
" \"_type\": \"llm_chain\"\r\n",
"}"
]
}
],
"source": [
"!cat llm_chain.json"
]
},
{
"cell_type": "markdown",
"id": "2012c724",
"metadata": {},
"source": [
"## Loading a chain from disk\n",
"We can load a chain from disk by using the `load_chain` method."
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "342a1974",
"metadata": {},
"outputs": [],
"source": [
"from langchain.chains import load_chain"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "394b7da8",
"metadata": {},
"outputs": [],
"source": [
"chain = load_chain(\"llm_chain.json\")"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "20d99787",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new LLMChain chain...\u001b[0m\n",
"Prompt after formatting:\n",
"\u001b[32;1m\u001b[1;3mQuestion: whats 2 + 2\n",
"\n",
"Answer: Let's think step by step.\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"' 2 + 2 = 4'"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chain.run(\"whats 2 + 2\")"
]
},
{
"cell_type": "markdown",
"id": "14449679",
"metadata": {},
"source": [
"## Saving components separately\n",
"In the above example, we can see that the prompt and llm configuration information is saved in the same json as the overall chain. Alternatively, we can split them up and save them separately. This is often useful to make the saved components more modular. In order to do this, we just need to specify `llm_path` instead of the `llm` component, and `prompt_path` instead of the `prompt` component."
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "50ec35ab",
"metadata": {},
"outputs": [],
"source": [
"llm_chain.prompt.save(\"prompt.json\")"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "c48b39aa",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{\r\n",
" \"input_variables\": [\r\n",
" \"question\"\r\n",
" ],\r\n",
" \"output_parser\": null,\r\n",
" \"template\": \"Question: {question}\\n\\nAnswer: Let's think step by step.\",\r\n",
" \"template_format\": \"f-string\"\r\n",
"}"
]
}
],
"source": [
"!cat prompt.json"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "13c92944",
"metadata": {},
"outputs": [],
"source": [
"llm_chain.llm.save(\"llm.json\")"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "1b815f89",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{\r\n",
" \"model_name\": \"text-davinci-003\",\r\n",
" \"temperature\": 0.0,\r\n",
" \"max_tokens\": 256,\r\n",
" \"top_p\": 1,\r\n",
" \"frequency_penalty\": 0,\r\n",
" \"presence_penalty\": 0,\r\n",
" \"n\": 1,\r\n",
" \"best_of\": 1,\r\n",
" \"request_timeout\": null,\r\n",
" \"logit_bias\": {},\r\n",
" \"_type\": \"openai\"\r\n",
"}"
]
}
],
"source": [
"!cat llm.json"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "7e6aa9ab",
"metadata": {},
"outputs": [],
"source": [
"config = {\n",
" \"memory\": None,\n",
" \"verbose\": True,\n",
" \"prompt_path\": \"prompt.json\",\n",
" \"llm_path\": \"llm.json\",\n",
" \"output_key\": \"text\",\n",
" \"_type\": \"llm_chain\"\n",
"}\n",
"import json\n",
"with open(\"llm_chain_separate.json\", \"w\") as f:\n",
" json.dump(config, f, indent=2)"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "8e959ca6",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{\r\n",
" \"memory\": null,\r\n",
" \"verbose\": true,\r\n",
" \"prompt_path\": \"prompt.json\",\r\n",
" \"llm_path\": \"llm.json\",\r\n",
" \"output_key\": \"text\",\r\n",
" \"_type\": \"llm_chain\"\r\n",
"}"
]
}
],
"source": [
"!cat llm_chain_separate.json"
]
},
{
"cell_type": "markdown",
"id": "662731c0",
"metadata": {},
"source": [
"We can then load it in the same way"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "d69ceb93",
"metadata": {},
"outputs": [],
"source": [
"chain = load_chain(\"llm_chain_separate.json\")"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "a99d61b9",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new LLMChain chain...\u001b[0m\n",
"Prompt after formatting:\n",
"\u001b[32;1m\u001b[1;3mQuestion: whats 2 + 2\n",
"\n",
"Answer: Let's think step by step.\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"' 2 + 2 = 4'"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chain.run(\"whats 2 + 2\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "822b7c12",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.9"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -9,6 +9,7 @@ They are broken up into three categories:
1. `Generic Chains <./generic_how_to.html>`_: Generic chains, that are meant to help build other chains rather than serve a particular purpose.
2. `CombineDocuments Chains <./combine_docs_how_to.html>`_: Chains aimed at making it easy to work with documents (question answering, summarization, etc).
3. `Utility Chains <./utility_how_to.html>`_: Chains consisting of an LLMChain interacting with a specific util.
4. `Asynchronous <./async_chain.html>`_: Covering asynchronous functionality.
.. toctree::
:maxdepth: 1
@@ -18,3 +19,7 @@ They are broken up into three categories:
./generic_how_to.rst
./combine_docs_how_to.rst
./utility_how_to.rst
In addition to different types of chains, we also have the following how-to guides for working with chains in general:
`Load From Hub <./generic/from_hub.html>`_: This notebook covers how to load chains from `LangChainHub <https://github.com/hwchase17/langchain-hub>`_.

View File

@@ -0,0 +1,29 @@
Document Loaders
==========================
Combining language models with your own text data is a powerful way to differentiate them.
The first step in doing this is to load the data into "documents" - a fancy way of say some pieces of text.
This module is aimed at making this easy.
A primary driver of a lot of this is the `Unstructured <https://github.com/Unstructured-IO/unstructured>`_ python package.
This package is a great way to transform all types of files - text, powerpoint, images, html, pdf, etc - into text data.
For detailed instructions on how to get set up with Unstructured, see installation guidelines `here <https://github.com/Unstructured-IO/unstructured#coffee-getting-started>`_.
The following sections of documentation are provided:
- `Key Concepts <./document_loaders/key_concepts.html>`_: A conceptual guide going over the various concepts related to loading documents.
- `How-To Guides <./document_loaders/how_to_guides.html>`_: A collection of how-to guides. These highlight different types of loaders.
.. toctree::
:maxdepth: 1
:caption: Document Loaders
:name: Document Loaders
:hidden:
./document_loaders/key_concepts.md
./document_loaders/how_to_guides.rst

View File

@@ -0,0 +1,171 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "1f3a5ebf",
"metadata": {},
"source": [
"# Airbyte JSON\n",
"This covers how to load any source from Airbyte into a local JSON file that can be read in as a document\n",
"\n",
"Prereqs:\n",
"Have docker desktop installed\n",
"\n",
"Steps:\n",
"\n",
"1) Clone Airbyte from GitHub - `git clone https://github.com/airbytehq/airbyte.git`\n",
"\n",
"2) Switch into Airbyte directory - `cd airbyte`\n",
"\n",
"3) Start Airbyte - `docker compose up`\n",
"\n",
"4) In your browser, just visit http://localhost:8000. You will be asked for a username and password. By default, that's username `airbyte` and password `password`.\n",
"\n",
"5) Setup any source you wish.\n",
"\n",
"6) Set destination as Local JSON, with specified destination path - lets say `/json_data`. Set up manual sync.\n",
"\n",
"7) Run the connection!\n",
"\n",
"7) To see what files are create, you can navigate to: `file:///tmp/airbyte_local`\n",
"\n",
"8) Find your data and copy path. That path should be saved in the file variable below. It should start with `/tmp/airbyte_local`\n"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "180c8b74",
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import AirbyteJSONLoader"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "4af10665",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"_airbyte_raw_pokemon.jsonl\r\n"
]
}
],
"source": [
"!ls /tmp/airbyte_local/json_data/"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "721d9316",
"metadata": {},
"outputs": [],
"source": [
"loader = AirbyteJSONLoader('/tmp/airbyte_local/json_data/_airbyte_raw_pokemon.jsonl')"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "9858b946",
"metadata": {},
"outputs": [],
"source": [
"data = loader.load()"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "fca024cb",
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"abilities: \n",
"ability: \n",
"name: blaze\n",
"url: https://pokeapi.co/api/v2/ability/66/\n",
"\n",
"is_hidden: False\n",
"slot: 1\n",
"\n",
"\n",
"ability: \n",
"name: solar-power\n",
"url: https://pokeapi.co/api/v2/ability/94/\n",
"\n",
"is_hidden: True\n",
"slot: 3\n",
"\n",
"base_experience: 267\n",
"forms: \n",
"name: charizard\n",
"url: https://pokeapi.co/api/v2/pokemon-form/6/\n",
"\n",
"game_indices: \n",
"game_index: 180\n",
"version: \n",
"name: red\n",
"url: https://pokeapi.co/api/v2/version/1/\n",
"\n",
"\n",
"\n",
"game_index: 180\n",
"version: \n",
"name: blue\n",
"url: https://pokeapi.co/api/v2/version/2/\n",
"\n",
"\n",
"\n",
"game_index: 180\n",
"version: \n",
"n\n"
]
}
],
"source": [
"print(data[0].page_content[:500])"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9fa002a5",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.6"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,93 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "9c31caff",
"metadata": {},
"source": [
"# AZLyrics\n",
"This covers how to load AZLyrics webpages into a document format that we can use downstream."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "7e6f5726",
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import AZLyricsLoader"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "a0df4c24",
"metadata": {},
"outputs": [],
"source": [
"loader = AZLyricsLoader(\"https://www.azlyrics.com/lyrics/mileycyrus/flowers.html\")"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "8cd61b6e",
"metadata": {},
"outputs": [],
"source": [
"data = loader.load()"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "162fd286",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[Document(page_content=\"Miley Cyrus - Flowers Lyrics | AZLyrics.com\\n\\r\\nWe were good, we were gold\\nKinda dream that can't be sold\\nWe were right till we weren't\\nBuilt a home and watched it burn\\n\\nI didn't wanna leave you\\nI didn't wanna lie\\nStarted to cry but then remembered I\\n\\nI can buy myself flowers\\nWrite my name in the sand\\nTalk to myself for hours\\nSay things you don't understand\\nI can take myself dancing\\nAnd I can hold my own hand\\nYeah, I can love me better than you can\\n\\nCan love me better\\nI can love me better, baby\\nCan love me better\\nI can love me better, baby\\n\\nPaint my nails, cherry red\\nMatch the roses that you left\\nNo remorse, no regret\\nI forgive every word you said\\n\\nI didn't wanna leave you, baby\\nI didn't wanna fight\\nStarted to cry but then remembered I\\n\\nI can buy myself flowers\\nWrite my name in the sand\\nTalk to myself for hours, yeah\\nSay things you don't understand\\nI can take myself dancing\\nAnd I can hold my own hand\\nYeah, I can love me better than you can\\n\\nCan love me better\\nI can love me better, baby\\nCan love me better\\nI can love me better, baby\\nCan love me better\\nI can love me better, baby\\nCan love me better\\nI\\n\\nI didn't wanna wanna leave you\\nI didn't wanna fight\\nStarted to cry but then remembered I\\n\\nI can buy myself flowers\\nWrite my name in the sand\\nTalk to myself for hours (Yeah)\\nSay things you don't understand\\nI can take myself dancing\\nAnd I can hold my own hand\\nYeah, I can love me better than\\nYeah, I can love me better than you can, uh\\n\\nCan love me better\\nI can love me better, baby\\nCan love me better\\nI can love me better, baby (Than you can)\\nCan love me better\\nI can love me better, baby\\nCan love me better\\nI\\n\", lookup_str='', metadata={'source': 'https://www.azlyrics.com/lyrics/mileycyrus/flowers.html'}, lookup_index=0)]"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6358000c",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.1"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

File diff suppressed because one or more lines are too long

View File

@@ -0,0 +1,101 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "79f24a6b",
"metadata": {},
"source": [
"# Directory Loader\n",
"This covers how to use the DirectoryLoader to load all documents in a directory. Under the hood, this uses the [UnstructuredLoader](./unstructured_file.ipynb)"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "019d8520",
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import DirectoryLoader"
]
},
{
"cell_type": "markdown",
"id": "0c76cdc5",
"metadata": {},
"source": [
"We can use the `glob` parameter to control which files to load. Note that here it doesn't load the `.rst` file or the `.ipynb` files."
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "891fe56f",
"metadata": {},
"outputs": [],
"source": [
"loader = DirectoryLoader('../', glob=\"**/*.md\")"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "addfe9cf",
"metadata": {},
"outputs": [],
"source": [
"docs = loader.load()"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "b042086d",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"1"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"len(docs)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "cbc8256b",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.9"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,145 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "9fdbd55d",
"metadata": {},
"source": [
"# Email\n",
"\n",
"This notebook shows how to load email (`.eml`) files."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "40cd9806",
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import UnstructuredEmailLoader"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "2d20b852",
"metadata": {},
"outputs": [],
"source": [
"loader = UnstructuredEmailLoader('example_data/fake-email.eml')"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "579fa702",
"metadata": {},
"outputs": [],
"source": [
"data = loader.load()"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "90c1d899",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[Document(page_content='This is a test email to use for unit tests.\\n\\nImportant points:\\n\\nRoses are red\\n\\nViolets are blue', lookup_str='', metadata={'source': 'example_data/fake-email.eml'}, lookup_index=0)]"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data"
]
},
{
"cell_type": "markdown",
"id": "8bf50cba",
"metadata": {},
"source": [
"## Retain Elements\n",
"\n",
"Under the hood, Unstructured creates different \"elements\" for different chunks of text. By default we combine those together, but you can easily keep that separation by specifying `mode=\"elements\"`."
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "b9592eaf",
"metadata": {},
"outputs": [],
"source": [
"loader = UnstructuredEmailLoader('example_data/fake-email.eml', mode=\"elements\")"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "0b16d03f",
"metadata": {},
"outputs": [],
"source": [
"data = loader.load()"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "d7bdc5e5",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Document(page_content='This is a test email to use for unit tests.', lookup_str='', metadata={'source': 'example_data/fake-email.eml'}, lookup_index=0)"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data[0]"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6a074515",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,80 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "56ac1584",
"metadata": {},
"source": [
"# EveryNote\n",
"\n",
"How to load EveryNote file from disk."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "1a53ece0",
"metadata": {},
"outputs": [],
"source": [
"# !pip install pypandoc\n",
"# import pypandoc\n",
"\n",
"# pypandoc.download_pandoc()"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "88df766f",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[Document(page_content='testing this\\n\\nwhat happens?\\n\\nto the world?\\n', lookup_str='', metadata={'source': 'example_data/testing.enex'}, lookup_index=0)]"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from langchain.document_loaders import EveryNoteLoader\n",
"\n",
"loader = EveryNoteLoader(\"example_data/testing.enex\")\n",
"loader.load()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c1329905",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,9 @@
<!DOCTYPE html>
<html>
<body>
<h1>My First Heading</h1>
<p>My first paragraph.</p>
</body>
</html>

View File

@@ -0,0 +1,20 @@
MIME-Version: 1.0
Date: Fri, 16 Dec 2022 17:04:16 -0500
Message-ID: <CADc-_xaLB2FeVQ7mNsoX+NJb_7hAJhBKa_zet-rtgPGenj0uVw@mail.gmail.com>
Subject: Test Email
From: Matthew Robinson <mrobinson@unstructured.io>
To: Matthew Robinson <mrobinson@unstructured.io>
Content-Type: multipart/alternative; boundary="00000000000095c9b205eff92630"
--00000000000095c9b205eff92630
Content-Type: text/plain; charset="UTF-8"
This is a test email to use for unit tests.
Important points:
- Roses are red
- Violets are blue
--00000000000095c9b205eff92630
Content-Type: text/html; charset="UTF-8"
<div dir="ltr"><div>This is a test email to use for unit tests.</div><div><br></div><div>Important points:</div><div><ul><li>Roses are red</li><li>Violets are blue</li></ul></div></div>
--00000000000095c9b205eff92630--

View File

@@ -0,0 +1,16 @@
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE en-export SYSTEM "http://xml.evernote.com/pub/evernote-export4.dtd">
<en-export export-date="20230309T035336Z" application="Evernote" version="10.53.2">
<note>
<title>testing</title>
<created>20230209T034746Z</created>
<updated>20230209T035328Z</updated>
<note-attributes>
<author>Harrison Chase</author>
</note-attributes>
<content>
<![CDATA[<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE en-note SYSTEM "http://xml.evernote.com/pub/enml2.dtd"><en-note><div>testing this</div><div>what happens?</div><div>to the world?</div></en-note> ]]>
</content>
</note>
</en-export>

View File

@@ -0,0 +1,156 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "0ef41fd4",
"metadata": {},
"source": [
"# GCS Directory\n",
"\n",
"This covers how to load document objects from an Google Cloud Storage (GCS) directory."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "5cfb25c9",
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import GCSDirectoryLoader"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "93a4d0f1",
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"# !pip install google-cloud-storage"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "633dc839",
"metadata": {},
"outputs": [],
"source": [
"loader = GCSDirectoryLoader(project_name=\"aist\", bucket=\"testing-hwc\")"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "a863467d",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/Users/harrisonchase/workplace/langchain/.venv/lib/python3.10/site-packages/google/auth/_default.py:83: UserWarning: Your application has authenticated using end user credentials from Google Cloud SDK without a quota project. You might receive a \"quota exceeded\" or \"API not enabled\" error. We recommend you rerun `gcloud auth application-default login` and make sure a quota project is added. Or you can use service accounts instead. For more information about service accounts, see https://cloud.google.com/docs/authentication/\n",
" warnings.warn(_CLOUD_SDK_CREDENTIALS_WARNING)\n",
"/Users/harrisonchase/workplace/langchain/.venv/lib/python3.10/site-packages/google/auth/_default.py:83: UserWarning: Your application has authenticated using end user credentials from Google Cloud SDK without a quota project. You might receive a \"quota exceeded\" or \"API not enabled\" error. We recommend you rerun `gcloud auth application-default login` and make sure a quota project is added. Or you can use service accounts instead. For more information about service accounts, see https://cloud.google.com/docs/authentication/\n",
" warnings.warn(_CLOUD_SDK_CREDENTIALS_WARNING)\n"
]
},
{
"data": {
"text/plain": [
"[Document(page_content='Lorem ipsum dolor sit amet.', lookup_str='', metadata={'source': '/var/folders/y6/8_bzdg295ld6s1_97_12m4lr0000gn/T/tmpz37njh7u/fake.docx'}, lookup_index=0)]"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"loader.load()"
]
},
{
"cell_type": "markdown",
"id": "17c0dcbb",
"metadata": {},
"source": [
"## Specifying a prefix\n",
"You can also specify a prefix for more finegrained control over what files to load."
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "b3143c89",
"metadata": {},
"outputs": [],
"source": [
"loader = GCSDirectoryLoader(project_name=\"aist\", bucket=\"testing-hwc\", prefix=\"fake\")"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "226ac6f5",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/Users/harrisonchase/workplace/langchain/.venv/lib/python3.10/site-packages/google/auth/_default.py:83: UserWarning: Your application has authenticated using end user credentials from Google Cloud SDK without a quota project. You might receive a \"quota exceeded\" or \"API not enabled\" error. We recommend you rerun `gcloud auth application-default login` and make sure a quota project is added. Or you can use service accounts instead. For more information about service accounts, see https://cloud.google.com/docs/authentication/\n",
" warnings.warn(_CLOUD_SDK_CREDENTIALS_WARNING)\n",
"/Users/harrisonchase/workplace/langchain/.venv/lib/python3.10/site-packages/google/auth/_default.py:83: UserWarning: Your application has authenticated using end user credentials from Google Cloud SDK without a quota project. You might receive a \"quota exceeded\" or \"API not enabled\" error. We recommend you rerun `gcloud auth application-default login` and make sure a quota project is added. Or you can use service accounts instead. For more information about service accounts, see https://cloud.google.com/docs/authentication/\n",
" warnings.warn(_CLOUD_SDK_CREDENTIALS_WARNING)\n"
]
},
{
"data": {
"text/plain": [
"[Document(page_content='Lorem ipsum dolor sit amet.', lookup_str='', metadata={'source': '/var/folders/y6/8_bzdg295ld6s1_97_12m4lr0000gn/T/tmpylg6291i/fake.docx'}, lookup_index=0)]"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"loader.load()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f9c0734f",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.9"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,104 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "0ef41fd4",
"metadata": {},
"source": [
"# GCS File Storage\n",
"\n",
"This covers how to load document objects from an Google Cloud Storage (GCS) file object."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "5cfb25c9",
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import GCSFileLoader"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "93a4d0f1",
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"# !pip install google-cloud-storage"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "633dc839",
"metadata": {},
"outputs": [],
"source": [
"loader = GCSFileLoader(project_name=\"aist\", bucket=\"testing-hwc\", blob=\"fake.docx\")"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "a863467d",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/Users/harrisonchase/workplace/langchain/.venv/lib/python3.10/site-packages/google/auth/_default.py:83: UserWarning: Your application has authenticated using end user credentials from Google Cloud SDK without a quota project. You might receive a \"quota exceeded\" or \"API not enabled\" error. We recommend you rerun `gcloud auth application-default login` and make sure a quota project is added. Or you can use service accounts instead. For more information about service accounts, see https://cloud.google.com/docs/authentication/\n",
" warnings.warn(_CLOUD_SDK_CREDENTIALS_WARNING)\n"
]
},
{
"data": {
"text/plain": [
"[Document(page_content='Lorem ipsum dolor sit amet.', lookup_str='', metadata={'source': '/var/folders/y6/8_bzdg295ld6s1_97_12m4lr0000gn/T/tmp3srlf8n8/fake.docx'}, lookup_index=0)]"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"loader.load()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "eba3002d",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.9"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,84 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "b0ed136e-6983-4893-ae1b-b75753af05f8",
"metadata": {},
"source": [
"# Google Drive\n",
"This notebook covers how to load documents from Google Drive. Currently, only Google Docs are supported.\n",
"\n",
"## Prerequisites\n",
"\n",
"1. Create a Google Cloud project or use an existing project\n",
"1. Enable the [Google Drive API](https://console.cloud.google.com/flows/enableapi?apiid=drive.googleapis.com)\n",
"1. [Authorize credentials for desktop app](https://developers.google.com/drive/api/quickstart/python#authorize_credentials_for_a_desktop_application)\n",
"1. `pip install --upgrade google-api-python-client google-auth-httplib2 google-auth-oauthlib`\n",
"\n",
"## 🧑 Instructions for ingesting your Google Docs data\n",
"By default, the `GoogleDriveLoader` expects the `credentials.json` file to be `~/.credentials/credentials.json`, but this is configurable using the `credentials_file` keyword argument. Same thing with `token.json`. Note that `token.json` will be created automatically the first time you use the loader.\n",
"\n",
"`GoogleDriveLoader` can load from a list of Google Docs document ids or a folder id. You can obtain your folder and document id from the URL:\n",
"* Folder: https://drive.google.com/drive/u/0/folders/1yucgL9WGgWZdM1TOuKkeghlPizuzMYb5 -> folder id is `\"1yucgL9WGgWZdM1TOuKkeghlPizuzMYb5\"`\n",
"* Document: https://docs.google.com/document/d/1bfaMQ18_i56204VaQDVeAFpqEijJTgvurupdEDiaUQw/edit -> document id is `\"1bfaMQ18_i56204VaQDVeAFpqEijJTgvurupdEDiaUQw\"`"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "878928a6-a5ae-4f74-b351-64e3b01733fe",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"from langchain.document_loaders import GoogleDriveLoader"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "2216c83f-68e4-4d2f-8ea2-5878fb18bbe7",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"loader = GoogleDriveLoader(folder_id=\"1yucgL9WGgWZdM1TOuKkeghlPizuzMYb5\")"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "8f3b6aa0-b45d-4e37-8c50-5bebe70fdb9d",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"docs = loader.load()"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.9"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,83 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "bda1f3f5",
"metadata": {},
"source": [
"# Gutenberg\n",
"\n",
"This covers how to load links to Gutenberg e-books into a document format that we can use downstream."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "9bfd5e46",
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import GutenbergLoader"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "700e4ef2",
"metadata": {},
"outputs": [],
"source": [
"loader = GutenbergLoader('https://www.gutenberg.org/cache/epub/69972/pg69972.txt')"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "b6f28930",
"metadata": {},
"outputs": [],
"source": [
"data = loader.load()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7d436441",
"metadata": {},
"outputs": [],
"source": [
"data"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3b74d755",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.1"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,94 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "2dfc4698",
"metadata": {},
"source": [
"# HTML\n",
"\n",
"This covers how to load HTML documents into a document format that we can use downstream."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "24b434b5",
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import UnstructuredHTMLLoader"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "00f46fda",
"metadata": {},
"outputs": [],
"source": [
"loader = UnstructuredHTMLLoader(\"example_data/fake-content.html\")"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "b68a26b3",
"metadata": {},
"outputs": [],
"source": [
"data = loader.load()"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "34de48fa",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[Document(page_content='My First Heading\\n\\nMy first paragraph.', lookup_str='', metadata={'source': 'example_data/fake-content.html'}, lookup_index=0)]"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "79b1bce4",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.9"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

File diff suppressed because one or more lines are too long

View File

@@ -0,0 +1,145 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "34c90eed",
"metadata": {},
"source": [
"# Microsoft Word\n",
"\n",
"This notebook shows how to load text from Microsoft word documents."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "28ded768",
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import UnstructuredDocxLoader"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "f1f26035",
"metadata": {},
"outputs": [],
"source": [
"loader = UnstructuredDocxLoader('example_data/fake.docx')"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "2c87dde9",
"metadata": {},
"outputs": [],
"source": [
"data = loader.load()"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "0e4a884c",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[Document(page_content='Lorem ipsum dolor sit amet.', lookup_str='', metadata={'source': 'example_data/fake.docx'}, lookup_index=0)]"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data"
]
},
{
"cell_type": "markdown",
"id": "5d1472e9",
"metadata": {},
"source": [
"## Retain Elements\n",
"\n",
"Under the hood, Unstructured creates different \"elements\" for different chunks of text. By default we combine those together, but you can easily keep that separation by specifying `mode=\"elements\"`."
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "93abf60b",
"metadata": {},
"outputs": [],
"source": [
"loader = UnstructuredDocxLoader('example_data/fake.docx', mode=\"elements\")"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "c35cdbcc",
"metadata": {},
"outputs": [],
"source": [
"data = loader.load()"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "fae2d730",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[Document(page_content='Lorem ipsum dolor sit amet.', lookup_str='', metadata={'source': 'example_data/fake.docx'}, lookup_index=0)]"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "961a7b1d",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,82 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "1dc7df1d",
"metadata": {},
"source": [
"# Notion\n",
"This notebook covers how to load documents from a Notion database dump.\n",
"\n",
"In order to get this notion dump, follow these instructions:\n",
"\n",
"## 🧑 Instructions for ingesting your own dataset\n",
"\n",
"Export your dataset from Notion. You can do this by clicking on the three dots in the upper right hand corner and then clicking `Export`.\n",
"\n",
"When exporting, make sure to select the `Markdown & CSV` format option.\n",
"\n",
"This will produce a `.zip` file in your Downloads folder. Move the `.zip` file into this repository.\n",
"\n",
"Run the following command to unzip the zip file (replace the `Export...` with your own file name as needed).\n",
"\n",
"```shell\n",
"unzip Export-d3adfe0f-3131-4bf3-8987-a52017fc1bae.zip -d Notion_DB\n",
"```\n",
"\n",
"Run the following command to ingest the data."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "007c5cbf",
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import NotionDirectoryLoader"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a1caec59",
"metadata": {},
"outputs": [],
"source": [
"loader = NotionDirectoryLoader(\"Notion_DB\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b1c30ff7",
"metadata": {},
"outputs": [],
"source": [
"docs = loader.load()"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.9"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,66 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "1dc7df1d",
"metadata": {},
"source": [
"# Obsidian\n",
"This notebook covers how to load documents from an Obsidian database.\n",
"\n",
"Since Obsidian is just stored on disk as a folder of Markdown files, the loader just takes a path to this directory."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "007c5cbf",
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import ObsidianLoader"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a1caec59",
"metadata": {},
"outputs": [],
"source": [
"loader = ObsidianLoader(\"<path-to-obsidian>\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b1c30ff7",
"metadata": {},
"outputs": [],
"source": [
"docs = loader.load()"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.9"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

File diff suppressed because one or more lines are too long

View File

@@ -0,0 +1,299 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "f70e6118",
"metadata": {},
"source": [
"# PDF\n",
"\n",
"This covers how to load pdfs into a document format that we can use downstream."
]
},
{
"cell_type": "markdown",
"id": "743f9413",
"metadata": {},
"source": [
"## Using PyPDF\n",
"\n",
"Allows for tracking of page numbers as well."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "c428b0c5",
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import PagedPDFSplitter\n",
"\n",
"loader = PagedPDFSplitter(\"example_data/layout-parser-paper.pdf\")\n",
"pages = loader.load_and_split()"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "d333cabb",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Document(page_content='LayoutParser : A Uni\\x0ced Toolkit for Deep\\nLearning Based Document Image Analysis\\nZejiang Shen1( \\x00), Ruochen Zhang2, Melissa Dell3, Benjamin Charles Germain\\nLee4, Jacob Carlson3, and Weining Li5\\n1Allen Institute for AI\\nshannons@allenai.org\\n2Brown University\\nruochen zhang@brown.edu\\n3Harvard University\\nfmelissadell,jacob carlson g@fas.harvard.edu\\n4University of Washington\\nbcgl@cs.washington.edu\\n5University of Waterloo\\nw422li@uwaterloo.ca\\nAbstract. Recent advances in document image analysis (DIA) have been\\nprimarily driven by the application of neural networks. Ideally, research\\noutcomes could be easily deployed in production and extended for further\\ninvestigation. However, various factors like loosely organized codebases\\nand sophisticated model con\\x0cgurations complicate the easy reuse of im-\\nportant innovations by a wide audience. Though there have been on-going\\ne\\x0borts to improve reusability and simplify deep learning (DL) model\\ndevelopment in disciplines like natural language processing and computer\\nvision, none of them are optimized for challenges in the domain of DIA.\\nThis represents a major gap in the existing toolkit, as DIA is central to\\nacademic research across a wide range of disciplines in the social sciences\\nand humanities. This paper introduces LayoutParser , an open-source\\nlibrary for streamlining the usage of DL in DIA research and applica-\\ntions. The core LayoutParser library comes with a set of simple and\\nintuitive interfaces for applying and customizing DL models for layout de-\\ntection, character recognition, and many other document processing tasks.\\nTo promote extensibility, LayoutParser also incorporates a community\\nplatform for sharing both pre-trained models and full document digiti-\\nzation pipelines. We demonstrate that LayoutParser is helpful for both\\nlightweight and large-scale digitization pipelines in real-word use cases.\\nThe library is publicly available at https://layout-parser.github.io .\\nKeywords: Document Image Analysis ·Deep Learning ·Layout Analysis\\n·Character Recognition ·Open Source library ·Toolkit.\\n1 Introduction\\nDeep Learning(DL)-based approaches are the state-of-the-art for a wide range of\\ndocument image analysis (DIA) tasks including document image classi\\x0ccation [ 11,arXiv:2103.15348v2 [cs.CV] 21 Jun 2021', lookup_str='', metadata={'source': 'example_data/layout-parser-paper.pdf', 'page': '0'}, lookup_index=0)"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pages[0]"
]
},
{
"cell_type": "markdown",
"id": "ebd895e4",
"metadata": {},
"source": [
"An advantage of this approach is that documents can be retrieved with page numbers."
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "87fa7b3a",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"9: 10 Z. Shen et al.\n",
"Fig. 4: Illustration of (a) the original historical Japanese document with layout\n",
"detection results and (b) a recreated version of the document image that achieves\n",
"much better character recognition recall. The reorganization algorithm rearranges\n",
"the tokens based on the their detected bounding boxes given a maximum allowed\n",
"height.\n",
"4LayoutParser Community Platform\n",
"Another focus of LayoutParser is promoting the reusability of layout detection\n",
"models and full digitization pipelines. Similar to many existing deep learning\n",
"libraries, LayoutParser comes with a community model hub for distributing\n",
"layout models. End-users can upload their self-trained models to the model hub,\n",
"and these models can be loaded into a similar interface as the currently available\n",
"LayoutParser pre-trained models. For example, the model trained on the News\n",
"Navigator dataset [17] has been incorporated in the model hub.\n",
"Beyond DL models, LayoutParser also promotes the sharing of entire doc-\n",
"ument digitization pipelines. For example, sometimes the pipeline requires the\n",
"combination of multiple DL models to achieve better accuracy. Currently, pipelines\n",
"are mainly described in academic papers and implementations are often not pub-\n",
"licly available. To this end, the LayoutParser community platform also enables\n",
"the sharing of layout pipelines to promote the discussion and reuse of techniques.\n",
"For each shared pipeline, it has a dedicated project page, with links to the source\n",
"code, documentation, and an outline of the approaches. A discussion panel is\n",
"provided for exchanging ideas. Combined with the core LayoutParser library,\n",
"users can easily build reusable components based on the shared pipelines and\n",
"apply them to solve their unique problems.\n",
"5 Use Cases\n",
"The core objective of LayoutParser is to make it easier to create both large-scale\n",
"and light-weight document digitization pipelines. Large-scale document processing\n",
"3: 4 Z. Shen et al.\n",
"Efficient Data AnnotationC u s t o m i z e d M o d e l T r a i n i n gModel Cust omizationDI A Model HubDI A Pipeline SharingCommunity PlatformLa y out Detection ModelsDocument Images \n",
"T h e C o r e L a y o u t P a r s e r L i b r a r yOCR ModuleSt or age & VisualizationLa y out Data Structur e\n",
"Fig. 1: The overall architecture of LayoutParser . For an input document image,\n",
"the core LayoutParser library provides a set of o\u000b",
"-the-shelf tools for layout\n",
"detection, OCR, visualization, and storage, backed by a carefully designed layout\n",
"data structure. LayoutParser also supports high level customization via e\u000ecient\n",
"layout annotation and model training functions. These improve model accuracy\n",
"on the target samples. The community platform enables the easy sharing of DIA\n",
"models and whole digitization pipelines to promote reusability and reproducibility.\n",
"A collection of detailed documentation, tutorials and exemplar projects make\n",
"LayoutParser easy to learn and use.\n",
"AllenNLP [ 8] and transformers [ 34] have provided the community with complete\n",
"DL-based support for developing and deploying models for general computer\n",
"vision and natural language processing problems. LayoutParser , on the other\n",
"hand, specializes speci\f",
"cally in DIA tasks. LayoutParser is also equipped with a\n",
"community platform inspired by established model hubs such as Torch Hub [23]\n",
"andTensorFlow Hub [1]. It enables the sharing of pretrained models as well as\n",
"full document processing pipelines that are unique to DIA tasks.\n",
"There have been a variety of document data collections to facilitate the\n",
"development of DL models. Some examples include PRImA [ 3](magazine layouts),\n",
"PubLayNet [ 38](academic paper layouts), Table Bank [ 18](tables in academic\n",
"papers), Newspaper Navigator Dataset [ 16,17](newspaper \f",
"gure layouts) and\n",
"HJDataset [31](historical Japanese document layouts). A spectrum of models\n",
"trained on these datasets are currently available in the LayoutParser model zoo\n",
"to support di\u000b",
"erent use cases.\n",
"3 The Core LayoutParser Library\n",
"At the core of LayoutParser is an o\u000b",
"-the-shelf toolkit that streamlines DL-\n",
"based document image analysis. Five components support a simple interface\n",
"with comprehensive functionalities: 1) The layout detection models enable using\n",
"pre-trained or self-trained DL models for layout detection with just four lines\n",
"of code. 2) The detected layout information is stored in carefully engineered\n"
]
}
],
"source": [
"from langchain.vectorstores import FAISS\n",
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
"\n",
"faiss_index = FAISS.from_documents(pages, OpenAIEmbeddings())\n",
"docs = faiss_index.similarity_search(\"How will the community be engaged?\", k=2)\n",
"for doc in docs:\n",
" print(str(doc.metadata[\"page\"]) + \":\", doc.page_content)"
]
},
{
"cell_type": "markdown",
"id": "09d64998",
"metadata": {},
"source": [
"## Using Unstructured"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "0cc0cd42",
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import UnstructuredPDFLoader"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "082d557c",
"metadata": {},
"outputs": [],
"source": [
"loader = UnstructuredPDFLoader(\"example_data/layout-parser-paper.pdf\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "df11c953",
"metadata": {},
"outputs": [],
"source": [
"data = loader.load()"
]
},
{
"cell_type": "markdown",
"id": "09957371",
"metadata": {},
"source": [
"### Retain Elements\n",
"\n",
"Under the hood, Unstructured creates different \"elements\" for different chunks of text. By default we combine those together, but you can easily keep that separation by specifying `mode=\"elements\"`."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0fab833b",
"metadata": {},
"outputs": [],
"source": [
"loader = UnstructuredPDFLoader(\"example_data/layout-parser-paper.pdf\", mode=\"elements\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c3e8ff1b",
"metadata": {},
"outputs": [],
"source": [
"data = loader.load()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "43c23d2d",
"metadata": {},
"outputs": [],
"source": [
"data[0]"
]
},
{
"cell_type": "markdown",
"id": "21998d18",
"metadata": {},
"source": [
"## Using PDFMiner"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "2f0cc9ff",
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import PDFMinerLoader"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "42b531e8",
"metadata": {},
"outputs": [],
"source": [
"loader = PDFMinerLoader(\"example_data/layout-parser-paper.pdf\")"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "010d5cdd",
"metadata": {},
"outputs": [],
"source": [
"data = loader.load()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7301c473",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,145 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "39af9ecd",
"metadata": {},
"source": [
"# PowerPoint\n",
"\n",
"This covers how to load PowerPoint documents into a document format that we can use downstream."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "721c48aa",
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import UnstructuredPowerPointLoader"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "9d3d0e35",
"metadata": {},
"outputs": [],
"source": [
"loader = UnstructuredPowerPointLoader(\"example_data/fake-power-point.pptx\")"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "06073f91",
"metadata": {},
"outputs": [],
"source": [
"data = loader.load()"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "c9adc5cb",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[Document(page_content='Adding a Bullet Slide\\n\\nFind the bullet slide layout\\n\\nUse _TextFrame.text for first bullet\\n\\nUse _TextFrame.add_paragraph() for subsequent bullets\\n\\nHere is a lot of text!\\n\\nHere is some text in a text box!', lookup_str='', metadata={'source': 'example_data/fake-power-point.pptx'}, lookup_index=0)]"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data"
]
},
{
"cell_type": "markdown",
"id": "525d6b67",
"metadata": {},
"source": [
"## Retain Elements\n",
"\n",
"Under the hood, Unstructured creates different \"elements\" for different chunks of text. By default we combine those together, but you can easily keep that separation by specifying `mode=\"elements\"`."
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "064f9162",
"metadata": {},
"outputs": [],
"source": [
"loader = UnstructuredPowerPointLoader(\"example_data/fake-power-point.pptx\", mode=\"elements\")"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "abefbbdb",
"metadata": {},
"outputs": [],
"source": [
"data = loader.load()"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "a547c534",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Document(page_content='Adding a Bullet Slide', lookup_str='', metadata={'source': 'example_data/fake-power-point.pptx'}, lookup_index=0)"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data[0]"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "381d4139",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,78 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "17812129",
"metadata": {},
"source": [
"# ReadTheDocs Documentation\n",
"This notebook covers how to load content from html that was generated as part of a Read-The-Docs build.\n",
"\n",
"For an example of this in the wild, see [here](https://github.com/hwchase17/chat-langchain).\n",
"\n",
"This assumes that the html has already been scraped into a folder. This can be done by uncommenting and running the following command"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "84696e27",
"metadata": {},
"outputs": [],
"source": [
"#!wget -r -A.html -P rtdocs https://langchain.readthedocs.io/en/latest/"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "92dd950b",
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import ReadTheDocsLoader"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "494567c3",
"metadata": {},
"outputs": [],
"source": [
"loader = ReadTheDocsLoader(\"rtdocs\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e2e6d6f0",
"metadata": {},
"outputs": [],
"source": [
"docs = loader.load()"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.9"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,78 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "1dc7df1d",
"metadata": {},
"source": [
"# Roam\n",
"This notebook covers how to load documents from a Roam database. This takes a lot of inspiration from the example repo [here](https://github.com/JimmyLv/roam-qa).\n",
"\n",
"## 🧑 Instructions for ingesting your own dataset\n",
"\n",
"Export your dataset from Roam Research. You can do this by clicking on the three dots in the upper right hand corner and then clicking `Export`.\n",
"\n",
"When exporting, make sure to select the `Markdown & CSV` format option.\n",
"\n",
"This will produce a `.zip` file in your Downloads folder. Move the `.zip` file into this repository.\n",
"\n",
"Run the following command to unzip the zip file (replace the `Export...` with your own file name as needed).\n",
"\n",
"```shell\n",
"unzip Roam-Export-1675782732639.zip -d Roam_DB\n",
"```\n"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "007c5cbf",
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import RoamLoader"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a1caec59",
"metadata": {},
"outputs": [],
"source": [
"loader = ObsidianLoader(\"Roam_DB\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b1c30ff7",
"metadata": {},
"outputs": [],
"source": [
"docs = loader.load()"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.9"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,134 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "a634365e",
"metadata": {},
"source": [
"# s3 Directory\n",
"\n",
"This covers how to load document objects from an s3 directory object."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "2f0cd6a5",
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import S3DirectoryLoader"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "49815096",
"metadata": {},
"outputs": [],
"source": [
"#!pip install boto3"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "321cc7f1",
"metadata": {},
"outputs": [],
"source": [
"loader = S3DirectoryLoader(\"testing-hwc\")"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "2b11d155",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[Document(page_content='Lorem ipsum dolor sit amet.', lookup_str='', metadata={'source': '/var/folders/y6/8_bzdg295ld6s1_97_12m4lr0000gn/T/tmpaa9xl6ch/fake.docx'}, lookup_index=0)]"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"loader.load()"
]
},
{
"cell_type": "markdown",
"id": "0690c40a",
"metadata": {},
"source": [
"## Specifying a prefix\n",
"You can also specify a prefix for more finegrained control over what files to load."
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "72d44781",
"metadata": {},
"outputs": [],
"source": [
"loader = S3DirectoryLoader(\"testing-hwc\", prefix=\"fake\")"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "2d3c32db",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[Document(page_content='Lorem ipsum dolor sit amet.', lookup_str='', metadata={'source': '/var/folders/y6/8_bzdg295ld6s1_97_12m4lr0000gn/T/tmpujbkzf_l/fake.docx'}, lookup_index=0)]"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"loader.load()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "885dc280",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.9"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,94 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "66a7777e",
"metadata": {},
"source": [
"# s3 File\n",
"\n",
"This covers how to load document objects from an s3 file object."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "9ec8a3b3",
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import S3FileLoader"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "43128d8d",
"metadata": {},
"outputs": [],
"source": [
"#!pip install boto3"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "35d6809a",
"metadata": {},
"outputs": [],
"source": [
"loader = S3FileLoader(\"testing-hwc\", \"fake.docx\")"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "efd6be84",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[Document(page_content='Lorem ipsum dolor sit amet.', lookup_str='', metadata={'source': '/var/folders/y6/8_bzdg295ld6s1_97_12m4lr0000gn/T/tmpxvave6wl/fake.docx'}, lookup_index=0)]"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"loader.load()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "93689594",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.9"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,255 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "20deed05",
"metadata": {},
"source": [
"# Unstructured File Loader\n",
"This notebook covers how to use Unstructured to load files of many types. Unstructured currently supports loading of text files, powerpoints, html, pdfs, images, and more."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "2886982e",
"metadata": {},
"outputs": [],
"source": [
"# # Install package\n",
"!pip install unstructured[local-inference]\n",
"!pip install \"detectron2@git+https://github.com/facebookresearch/detectron2.git@v0.6#egg=detectron2\"\n",
"!pip install layoutparser[layoutmodels,tesseract]"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "54d62efd",
"metadata": {},
"outputs": [],
"source": [
"# # Install other dependencies\n",
"# # https://github.com/Unstructured-IO/unstructured/blob/main/docs/source/installing.rst\n",
"# !brew install libmagic\n",
"# !brew install poppler\n",
"# !brew install tesseract\n",
"# # If parsing xml / html documents:\n",
"# !brew install libxml2\n",
"# !brew install libxslt"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "af6a64f5",
"metadata": {},
"outputs": [],
"source": [
"# import nltk\n",
"# nltk.download('punkt')"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "79d3e549",
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import UnstructuredFileLoader"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "2593d1dc",
"metadata": {},
"outputs": [],
"source": [
"loader = UnstructuredFileLoader(\"../../state_of_the_union.txt\")"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "fe34e941",
"metadata": {},
"outputs": [],
"source": [
"docs = loader.load()"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "ee449788",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and the Cabinet. Justices of the Supreme Court. My fellow Americans.\\n\\nLast year COVID-19 kept us apart. This year we are finally together again.\\n\\nTonight, we meet as Democrats Republicans and Independents. But most importantly as Americans.\\n\\nWith a duty to one another to the American people to the Constit'"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"docs[0].page_content[:400]"
]
},
{
"cell_type": "markdown",
"id": "7874d01d",
"metadata": {},
"source": [
"## Retain Elements\n",
"\n",
"Under the hood, Unstructured creates different \"elements\" for different chunks of text. By default we combine those together, but you can easily keep that separation by specifying `mode=\"elements\"`."
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "ff5b616d",
"metadata": {},
"outputs": [],
"source": [
"loader = UnstructuredFileLoader(\"../../state_of_the_union.txt\", mode=\"elements\")"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "feca3b6c",
"metadata": {},
"outputs": [],
"source": [
"docs = loader.load()"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "fec5bbac",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[Document(page_content='Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and the Cabinet. Justices of the Supreme Court. My fellow Americans.', lookup_str='', metadata={'source': '../../state_of_the_union.txt'}, lookup_index=0),\n",
" Document(page_content='Last year COVID-19 kept us apart. This year we are finally together again.', lookup_str='', metadata={'source': '../../state_of_the_union.txt'}, lookup_index=0),\n",
" Document(page_content='Tonight, we meet as Democrats Republicans and Independents. But most importantly as Americans.', lookup_str='', metadata={'source': '../../state_of_the_union.txt'}, lookup_index=0),\n",
" Document(page_content='With a duty to one another to the American people to the Constitution.', lookup_str='', metadata={'source': '../../state_of_the_union.txt'}, lookup_index=0),\n",
" Document(page_content='And with an unwavering resolve that freedom will always triumph over tyranny.', lookup_str='', metadata={'source': '../../state_of_the_union.txt'}, lookup_index=0)]"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"docs[:5]"
]
},
{
"cell_type": "markdown",
"id": "7874d01d",
"metadata": {},
"source": [
"## PDF Example\n",
"\n",
"Processing PDF documents works exactly the same way. Unstructured detects the file type and extracts the same types of `elements`. "
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "8ca8a648",
"metadata": {},
"outputs": [],
"source": [
"!wget https://raw.githubusercontent.com/Unstructured-IO/unstructured/main/example-docs/layout-parser-paper.pdf -P \"../../\""
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "686e5eb4",
"metadata": {},
"outputs": [],
"source": [
"loader = UnstructuredFileLoader(\"../../layout-parser-paper.pdf\", mode=\"elements\")"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "f8348ca0",
"metadata": {},
"outputs": [],
"source": [
"docs = loader.load()"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "6ec859d8",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[Document(page_content='LayoutParser : A Unified Toolkit for Deep Learning Based Document Image Analysis', lookup_str='', metadata={'source': '../../layout-parser-paper.pdf'}, lookup_index=0),\n",
" Document(page_content='Zejiang Shen 1 ( (ea)\\n ), Ruochen Zhang 2 , Melissa Dell 3 , Benjamin Charles Germain Lee 4 , Jacob Carlson 3 , and Weining Li 5', lookup_str='', metadata={'source': '../../layout-parser-paper.pdf'}, lookup_index=0),\n",
" Document(page_content='Allen Institute for AI shannons@allenai.org', lookup_str='', metadata={'source': '../../layout-parser-paper.pdf'}, lookup_index=0),\n",
" Document(page_content='Brown University ruochen zhang@brown.edu', lookup_str='', metadata={'source': '../../layout-parser-paper.pdf'}, lookup_index=0),\n",
" Document(page_content='Harvard University { melissadell,jacob carlson } @fas.harvard.edu', lookup_str='', metadata={'source': '../../layout-parser-paper.pdf'}, lookup_index=0)]"
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"docs[:5]"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8ca8a648",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,78 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "2dfc4698",
"metadata": {},
"source": [
"# URL\n",
"\n",
"This covers how to load HTML documents from a list of URLs into a document format that we can use downstream."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "16c3699e",
"metadata": {},
"outputs": [],
"source": [
" from langchain.document_loaders import UnstructuredURLLoader"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "836fbac1",
"metadata": {},
"outputs": [],
"source": [
"urls = [\n",
" \"https://www.understandingwar.org/backgrounder/russian-offensive-campaign-assessment-february-8-2023\",\n",
" \"https://www.understandingwar.org/backgrounder/russian-offensive-campaign-assessment-february-9-2023\"\n",
"]\n"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "00f46fda",
"metadata": {},
"outputs": [],
"source": [
"loader = UnstructuredURLLoader(urls=urls)"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "b68a26b3",
"metadata": {},
"outputs": [],
"source": [
"data = loader.load()"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.13"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

File diff suppressed because one or more lines are too long

View File

@@ -0,0 +1,137 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "df770c72",
"metadata": {},
"source": [
"# YouTube\n",
"\n",
"How to load documents from YouTube transcripts."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "da4a867f",
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import YoutubeLoader"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "34a25b57",
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"# !pip install youtube-transcript-api"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "bc8b308a",
"metadata": {},
"outputs": [],
"source": [
"loader = YoutubeLoader.from_youtube_url(\"https://www.youtube.com/watch?v=QsYGlZkevEg\", add_video_info=True)"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "d073dd36",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[Document(page_content='LADIES AND GENTLEMEN, PEDRO PASCAL! [ CHEERS AND APPLAUSE ] >> THANK YOU, THANK YOU. THANK YOU VERY MUCH. I\\'M SO EXCITED TO BE HERE. THANK YOU. I SPENT THE LAST YEAR SHOOTING A SHOW CALLED \"THE LAST OF US\" ON HBO. FOR SOME HBO SHOES, YOU GET TO SHOOT IN A FIVE STAR ITALIAN RESORT SURROUNDED BY BEAUTIFUL PEOPLE, BUT I SAID, NO, THAT\\'S TOO EASY. I WANT TO SHOOT IN A FREEZING CANADIAN FOREST WHILE BEING CHASED AROUND BY A GUY WHOSE HEAD LOOKS LIKE A GENITAL WART. IT IS AN HONOR BEING A PART OF THESE HUGE FRANCHISEs LIKE \"GAME OF THRONES\" AND \"STAR WARS,\" BUT I\\'M STILL GETTING USED TO PEOPLE RECOGNIZING ME. THE OTHER DAY, A GUY STOPPED ME ON THE STREET AND SAYS, MY SON LOVES \"THE MANDALORIAN\" AND THE NEXT THING I KNOW, I\\'M FACE TIMING WITH A 6-YEAR-OLD WHO HAS NO IDEA WHO I AM BECAUSE MY CHARACTER WEARS A MASK THE ENTIRE SHOW. THE GUY IS LIKE, DO THE MANDO VOICE, BUT IT\\'S LIKE A BEDROOM VOICE. WITHOUT THE MASK, IT JUST SOUNDS PORNY. PEOPLE WALKING BY ON THE STREET SEE ME WHISPERING TO A 6-YEAR-OLD KID. I CAN BRING YOU IN WARM, OR I CAN BRING YOU IN COLD. EVEN THOUGH I CAME TO THE U.S. WHEN I WAS LITTLE, I WAS BORN IN CHILE, AND I HAVE 34 FIRST COUSINS WHO ARE STILL THERE. THEY\\'RE VERY PROUD OF ME. I KNOW THEY\\'RE PROUD BECAUSE THEY GIVE MY PHONE NUMBER TO EVERY PERSON THEY MEET, WHICH MEANS EVERY DAY, SOMEONE IN SANTIAGO WILL TEXT ME STUFF LIKE, CAN YOU COME TO MY WEDDING, OR CAN YOU SING MY PRIEST HAPPY BIRTHDAY, OR IS BABY YODA MEAN IN REAL LIFE. SO I HAVE TO BE LIKE NO, NO, AND HIS NAME IS GROGU. BUT MY COUSINS WEREN\\'T ALWAYS SO PROUD. EARLY IN MY CAREER, I PLAYED SMALL PARTS IN EVERY CRIME SHOW. I EVEN PLAYED TWO DIFFERENT CHARACTERS ON \"LAW AND ORDER.\" TITO CABASSA WHO LOOKED LIKE THIS. AND ONE YEAR LATER, I PLAYED REGGIE LUCKMAN WHO LOOKS LIKE THIS. AND THAT, MY FRIENDS, IS CALLED RANGE. BUT IT IS AMAZING TO BE HERE, LIKE I SAID. I WAS BORN IN CHILE, AND NINE MONTHS LATER, MY PARENTS FLED AND BROUGHT ME AND MY SISTER TO THE U.S. THEY WERE SO BRAVE, AND WITHOUT THEM, I WOULDN\\'T BE HERE IN THIS WONDERFUL COUNTRY, AND I CERTAINLY WOULDN\\'T BE STANDING HERE WITH YOU ALL TONIGHT. SO TO ALL MY FAMILY WATCHING IN CHILE, I WANT TO SAY [ SPEAKING NON-ENGLISH ] WHICH MEANS, I LOVE YOU, I MISS YOU, AND STOP GIVING OUT MY PHONE NUMBER. WE\\'VE GOT AN AMAZING SHOW FOR YOU TONIGHT. COLDPLAY IS HERE, SO STICK', lookup_str='', metadata={'source': 'QsYGlZkevEg', 'title': 'Pedro Pascal Monologue - SNL', 'description': 'First-time host Pedro Pascal talks about filming The Last of Us and being recognized by fans.\\n\\nSaturday Night Live. Stream now on Peacock: https://pck.tv/3uQxh4q\\n\\nSubscribe to SNL: https://goo.gl/tUsXwM\\nStream Current Full Episodes: http://www.nbc.com/saturday-night-live\\n\\nWATCH PAST SNL SEASONS\\nGoogle Play - http://bit.ly/SNLGooglePlay\\niTunes - http://bit.ly/SNLiTunes\\n\\nSNL ON SOCIAL\\nSNL Instagram: http://instagram.com/nbcsnl\\nSNL Facebook: https://www.facebook.com/snl\\nSNL Twitter: https://twitter.com/nbcsnl\\nSNL TikTok: https://www.tiktok.com/@nbcsnl\\n\\nGET MORE NBC\\nLike NBC: http://Facebook.com/NBC\\nFollow NBC: http://Twitter.com/NBC\\nNBC Tumblr: http://NBCtv.tumblr.com/\\nYouTube: http://www.youtube.com/nbc\\nNBC Instagram: http://instagram.com/nbc\\n\\n#SNL #PedroPascal #SNL48 #Coldplay', 'view_count': 1175057, 'thumbnail_url': 'https://i.ytimg.com/vi/QsYGlZkevEg/sddefault.jpg', 'publish_date': datetime.datetime(2023, 2, 4, 0, 0), 'length': 224, 'author': 'Saturday Night Live'}, lookup_index=0)]"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"loader.load()"
]
},
{
"cell_type": "markdown",
"id": "6b278a1b",
"metadata": {},
"source": [
"## Add video info"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "ba28af69",
"metadata": {},
"outputs": [],
"source": [
"# ! pip install pytube"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "9b8ea390",
"metadata": {},
"outputs": [],
"source": [
"loader = YoutubeLoader.from_youtube_url(\"https://www.youtube.com/watch?v=QsYGlZkevEg\", add_video_info=True)"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "97b98e92",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[Document(page_content='LADIES AND GENTLEMEN, PEDRO PASCAL! [ CHEERS AND APPLAUSE ] >> THANK YOU, THANK YOU. THANK YOU VERY MUCH. I\\'M SO EXCITED TO BE HERE. THANK YOU. I SPENT THE LAST YEAR SHOOTING A SHOW CALLED \"THE LAST OF US\" ON HBO. FOR SOME HBO SHOES, YOU GET TO SHOOT IN A FIVE STAR ITALIAN RESORT SURROUNDED BY BEAUTIFUL PEOPLE, BUT I SAID, NO, THAT\\'S TOO EASY. I WANT TO SHOOT IN A FREEZING CANADIAN FOREST WHILE BEING CHASED AROUND BY A GUY WHOSE HEAD LOOKS LIKE A GENITAL WART. IT IS AN HONOR BEING A PART OF THESE HUGE FRANCHISEs LIKE \"GAME OF THRONES\" AND \"STAR WARS,\" BUT I\\'M STILL GETTING USED TO PEOPLE RECOGNIZING ME. THE OTHER DAY, A GUY STOPPED ME ON THE STREET AND SAYS, MY SON LOVES \"THE MANDALORIAN\" AND THE NEXT THING I KNOW, I\\'M FACE TIMING WITH A 6-YEAR-OLD WHO HAS NO IDEA WHO I AM BECAUSE MY CHARACTER WEARS A MASK THE ENTIRE SHOW. THE GUY IS LIKE, DO THE MANDO VOICE, BUT IT\\'S LIKE A BEDROOM VOICE. WITHOUT THE MASK, IT JUST SOUNDS PORNY. PEOPLE WALKING BY ON THE STREET SEE ME WHISPERING TO A 6-YEAR-OLD KID. I CAN BRING YOU IN WARM, OR I CAN BRING YOU IN COLD. EVEN THOUGH I CAME TO THE U.S. WHEN I WAS LITTLE, I WAS BORN IN CHILE, AND I HAVE 34 FIRST COUSINS WHO ARE STILL THERE. THEY\\'RE VERY PROUD OF ME. I KNOW THEY\\'RE PROUD BECAUSE THEY GIVE MY PHONE NUMBER TO EVERY PERSON THEY MEET, WHICH MEANS EVERY DAY, SOMEONE IN SANTIAGO WILL TEXT ME STUFF LIKE, CAN YOU COME TO MY WEDDING, OR CAN YOU SING MY PRIEST HAPPY BIRTHDAY, OR IS BABY YODA MEAN IN REAL LIFE. SO I HAVE TO BE LIKE NO, NO, AND HIS NAME IS GROGU. BUT MY COUSINS WEREN\\'T ALWAYS SO PROUD. EARLY IN MY CAREER, I PLAYED SMALL PARTS IN EVERY CRIME SHOW. I EVEN PLAYED TWO DIFFERENT CHARACTERS ON \"LAW AND ORDER.\" TITO CABASSA WHO LOOKED LIKE THIS. AND ONE YEAR LATER, I PLAYED REGGIE LUCKMAN WHO LOOKS LIKE THIS. AND THAT, MY FRIENDS, IS CALLED RANGE. BUT IT IS AMAZING TO BE HERE, LIKE I SAID. I WAS BORN IN CHILE, AND NINE MONTHS LATER, MY PARENTS FLED AND BROUGHT ME AND MY SISTER TO THE U.S. THEY WERE SO BRAVE, AND WITHOUT THEM, I WOULDN\\'T BE HERE IN THIS WONDERFUL COUNTRY, AND I CERTAINLY WOULDN\\'T BE STANDING HERE WITH YOU ALL TONIGHT. SO TO ALL MY FAMILY WATCHING IN CHILE, I WANT TO SAY [ SPEAKING NON-ENGLISH ] WHICH MEANS, I LOVE YOU, I MISS YOU, AND STOP GIVING OUT MY PHONE NUMBER. WE\\'VE GOT AN AMAZING SHOW FOR YOU TONIGHT. COLDPLAY IS HERE, SO STICK', lookup_str='', metadata={'source': 'QsYGlZkevEg', 'title': 'Pedro Pascal Monologue - SNL', 'description': 'First-time host Pedro Pascal talks about filming The Last of Us and being recognized by fans.\\n\\nSaturday Night Live. Stream now on Peacock: https://pck.tv/3uQxh4q\\n\\nSubscribe to SNL: https://goo.gl/tUsXwM\\nStream Current Full Episodes: http://www.nbc.com/saturday-night-live\\n\\nWATCH PAST SNL SEASONS\\nGoogle Play - http://bit.ly/SNLGooglePlay\\niTunes - http://bit.ly/SNLiTunes\\n\\nSNL ON SOCIAL\\nSNL Instagram: http://instagram.com/nbcsnl\\nSNL Facebook: https://www.facebook.com/snl\\nSNL Twitter: https://twitter.com/nbcsnl\\nSNL TikTok: https://www.tiktok.com/@nbcsnl\\n\\nGET MORE NBC\\nLike NBC: http://Facebook.com/NBC\\nFollow NBC: http://Twitter.com/NBC\\nNBC Tumblr: http://NBCtv.tumblr.com/\\nYouTube: http://www.youtube.com/nbc\\nNBC Instagram: http://instagram.com/nbc\\n\\n#SNL #PedroPascal #SNL48 #Coldplay', 'view_count': 1175057, 'thumbnail_url': 'https://i.ytimg.com/vi/QsYGlZkevEg/sddefault.jpg', 'publish_date': datetime.datetime(2023, 2, 4, 0, 0), 'length': 224, 'author': 'Saturday Night Live'}, lookup_index=0)]"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"loader.load()"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,61 @@
How To Guides
====================================
There are a lot of different document loaders that LangChain supports. Below are how-to guides for working with them
`File Loader <./examples/unstructured_file.html>`_: A walkthrough of how to use Unstructured to load files of arbitrary types (pdfs, txt, html, etc).
`Directory Loader <./examples/directory_loader.html>`_: A walkthrough of how to use Unstructured load files from a given directory.
`Notion <./examples/notion.html>`_: A walkthrough of how to load data for an arbitrary Notion DB.
`ReadTheDocs <./examples/readthedocs_documentation.html>`_: A walkthrough of how to load data for documentation generated by ReadTheDocs.
`HTML <./examples/html.html>`_: A walkthrough of how to load data from an html file.
`PDF <./examples/pdf.html>`_: A walkthrough of how to load data from a PDF file.
`PowerPoint <./examples/powerpoint.html>`_: A walkthrough of how to load data from a powerpoint file.
`Email <./examples/email.html>`_: A walkthrough of how to load data from an email (`.eml`) file.
`GoogleDrive <./examples/googledrive.html>`_: A walkthrough of how to load data from Google drive.
`Microsoft Word <./examples/microsoft_word.html>`_: A walkthrough of how to load data from Microsoft Word files.
`Obsidian <./examples/obsidian.html>`_: A walkthrough of how to load data from an Obsidian file dump.
`Roam <./examples/roam.html>`_: A walkthrough of how to load data from a Roam file export.
`EveryNote <./examples/everynote.html>`_: A walkthrough of how to load data from a EveryNote (`.enex`) file.
`YouTube <./examples/youtube.html>`_: A walkthrough of how to load the transcript from a YouTube video.
`s3 File <./examples/s3_file.html>`_: A walkthrough of how to load a file from s3.
`s3 Directory <./examples/s3_directory.html>`_: A walkthrough of how to load all files in a directory from s3.
`GCS File <./examples/gcs_file.html>`_: A walkthrough of how to load a file from Google Cloud Storage (GCS).
`GCS Directory <./examples/gcs_directory.html>`_: A walkthrough of how to load all files in a directory from Google Cloud Storage (GCS).
`Web Base <./examples/web_base.html>`_: A walkthrough of how to load all text data from webpages.
`IMSDb <./examples/imsdb.html>`_: A walkthrough of how to load all text data from IMSDb webpage.
`AZLyrics <./examples/azlyrics.html>`_: A walkthrough of how to load all text data from AZLyrics webpage.
`College Confidential <./examples/college_confidential.html>`_: A walkthrough of how to load all text data from College Confidential webpage.
`Gutenberg <./examples/gutenberg.html>`_: A walkthrough of how to load data from a Gutenberg ebook text.
`Airbyte Json <./examples/airbyte_json.html>`_: A walkthrough of how to load data from a local Airbyte JSON file.
`Online PDF <./examples/online_pdf.html>`_: A walkthrough of how to load data from an online PDF.
.. toctree::
:maxdepth: 1
:glob:
:hidden:
examples/*

View File

@@ -0,0 +1,12 @@
# Key Concepts
## Document
This class is a container for document information. This contains two parts:
- `page_content`: The content of the actual page itself.
- `metadata`: The metadata associated with the document. This can be things like the file path, the url, etc.
## Loader
This base class is a way to load documents. It exposes a `load` method that returns `Document` objects.
## [Unstructured](https://github.com/Unstructured-IO/unstructured)
Unstructured is a python package specifically focused on transformations from raw documents to text.

View File

@@ -0,0 +1,150 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "f6574496-b360-4ffa-9523-7fd34a590164",
"metadata": {},
"source": [
"# Async API for LLM\n",
"\n",
"LangChain provides async support for LLMs by leveraging the [asyncio](https://docs.python.org/3/library/asyncio.html) library.\n",
"\n",
"Async support is particularly useful for calling multiple LLMs concurrently, as these calls are network-bound. Currently, only `OpenAI` is supported, but async support for other LLMs is on the roadmap.\n",
"\n",
"You can use the `agenerate` method to call an OpenAI LLM asynchronously."
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "5e49e96c-0f88-466d-b3d3-ea0966bdf19e",
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"I'm doing well. How about you?\n",
"\n",
"\n",
"I'm doing well, thank you. How about you?\n",
"\n",
"\n",
"I'm doing well, thank you. How about you?\n",
"\n",
"\n",
"I'm doing well, thank you. How about you?\n",
"\n",
"I am doing quite well. How about you?\n",
"\n",
"\n",
"I'm doing well, thank you. How about you?\n",
"\n",
"\n",
"I'm doing great, thank you! How about you?\n",
"\n",
"\n",
"I'm doing well, thanks for asking. How about you?\n",
"\n",
"\n",
"I'm doing well, thank you. How about you?\n",
"\n",
"\n",
"I'm doing well, thank you. How about you?\n",
"\u001b[1mConcurrent executed in 1.93 seconds.\u001b[0m\n",
"\n",
"\n",
"I'm doing well, thank you. How about you?\n",
"\n",
"\n",
"I'm doing well, thank you. How about you?\n",
"\n",
"\n",
"I'm doing well, thank you. How about you?\n",
"\n",
"\n",
"I'm doing well, thank you. How about you?\n",
"\n",
"\n",
"I'm doing well, thank you. How about you?\n",
"\n",
"\n",
"I'm doing well, thank you. How about you?\n",
"\n",
"I'm doing well, thank you. How about you?\n",
"\n",
"\n",
"I'm doing well, thank you. How about you?\n",
"\n",
"\n",
"I'm doing well, thank you. How about you?\n",
"\n",
"\n",
"I'm doing great, thank you. How about you?\n",
"\u001b[1mSerial executed in 10.54 seconds.\u001b[0m\n"
]
}
],
"source": [
"import time\n",
"import asyncio\n",
"\n",
"from langchain.llms import OpenAI\n",
"\n",
"def generate_serially():\n",
" llm = OpenAI(temperature=0.9)\n",
" for _ in range(10):\n",
" resp = llm.generate([\"Hello, how are you?\"])\n",
" print(resp.generations[0][0].text)\n",
"\n",
"\n",
"async def async_generate(llm):\n",
" resp = await llm.agenerate([\"Hello, how are you?\"])\n",
" print(resp.generations[0][0].text)\n",
"\n",
"\n",
"async def generate_concurrently():\n",
" llm = OpenAI(temperature=0.9)\n",
" tasks = [async_generate(llm) for _ in range(10)]\n",
" await asyncio.gather(*tasks)\n",
"\n",
"\n",
"s = time.perf_counter()\n",
"# If running this outside of Jupyter, use asyncio.run(generate_concurrently())\n",
"await generate_concurrently() \n",
"elapsed = time.perf_counter() - s\n",
"print('\\033[1m' + f\"Concurrent executed in {elapsed:0.2f} seconds.\" + '\\033[0m')\n",
"\n",
"s = time.perf_counter()\n",
"generate_serially()\n",
"elapsed = time.perf_counter() - s\n",
"print('\\033[1m' + f\"Serial executed in {elapsed:0.2f} seconds.\" + '\\033[0m')"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.9"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,138 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "052dfe58",
"metadata": {},
"source": [
"# Fake LLM\n",
"We expose a fake LLM class that can be used for testing. This allows you to mock out calls to the LLM and simulate what would happen if the LLM responded in a certain way.\n",
"\n",
"In this notebook we go over how to use this.\n",
"\n",
"We start this with using the FakeLLM in an agent."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "ef97ac4d",
"metadata": {},
"outputs": [],
"source": [
"from langchain.llms.fake import FakeListLLM"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "9a0a160f",
"metadata": {},
"outputs": [],
"source": [
"from langchain.agents import load_tools\n",
"from langchain.agents import initialize_agent"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "b272258c",
"metadata": {},
"outputs": [],
"source": [
"tools = load_tools([\"python_repl\"])"
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "94096c4c",
"metadata": {},
"outputs": [],
"source": [
"responses=[\n",
" \"Action: Python REPL\\nAction Input: print(2 + 2)\",\n",
" \"Final Answer: 4\"\n",
"]\n",
"llm = FakeListLLM(responses=responses)"
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "da226d02",
"metadata": {},
"outputs": [],
"source": [
"agent = initialize_agent(tools, llm, agent=\"zero-shot-react-description\", verbose=True)"
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "44c13426",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3mAction: Python REPL\n",
"Action Input: print(2 + 2)\u001b[0m\n",
"Observation: \u001b[36;1m\u001b[1;3m4\n",
"\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3mFinal Answer: 4\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"'4'"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"agent.run(\"whats 2 + 2\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "814c2858",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,179 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "e5715368",
"metadata": {},
"source": [
"# Token Usage Tracking\n",
"\n",
"This notebook goes over how to track your token usage for specific calls. It is currently only implemented for the OpenAI API.\n",
"\n",
"Let's first look at an extremely simple example of tracking token usage for a single LLM call."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "9455db35",
"metadata": {},
"outputs": [],
"source": [
"from langchain.llms import OpenAI\n",
"from langchain.callbacks import get_openai_callback"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "d1c55cc9",
"metadata": {},
"outputs": [],
"source": [
"llm = OpenAI(model_name=\"text-davinci-002\", n=2, best_of=2)"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "31667d54",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"42\n"
]
}
],
"source": [
"with get_openai_callback() as cb:\n",
" result = llm(\"Tell me a joke\")\n",
" print(cb.total_tokens)"
]
},
{
"cell_type": "markdown",
"id": "c0ab6d27",
"metadata": {},
"source": [
"Anything inside the context manager will get tracked. Here's an example of using it to track multiple calls in sequence."
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "e09420f4",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"83\n"
]
}
],
"source": [
"with get_openai_callback() as cb:\n",
" result = llm(\"Tell me a joke\")\n",
" result2 = llm(\"Tell me a joke\")\n",
" print(cb.total_tokens)"
]
},
{
"cell_type": "markdown",
"id": "d8186e7b",
"metadata": {},
"source": [
"If a chain or agent with multiple steps in it is used, it will track all those steps."
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "5d1125c6",
"metadata": {},
"outputs": [],
"source": [
"from langchain.agents import load_tools\n",
"from langchain.agents import initialize_agent\n",
"from langchain.llms import OpenAI\n",
"\n",
"llm = OpenAI(temperature=0)\n",
"tools = load_tools([\"serpapi\", \"llm-math\"], llm=llm)\n",
"agent = initialize_agent(tools, llm, agent=\"zero-shot-react-description\", verbose=True)"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "2f98c536",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3m I need to find out who Olivia Wilde's boyfriend is and then calculate his age raised to the 0.23 power.\n",
"Action: Search\n",
"Action Input: \"Olivia Wilde boyfriend\"\u001b[0m\n",
"Observation: \u001b[36;1m\u001b[1;3mJason Sudeikis\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m I need to find out Jason Sudeikis' age\n",
"Action: Search\n",
"Action Input: \"Jason Sudeikis age\"\u001b[0m\n",
"Observation: \u001b[36;1m\u001b[1;3m47 years\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m I need to calculate 47 raised to the 0.23 power\n",
"Action: Calculator\n",
"Action Input: 47^0.23\u001b[0m\n",
"Observation: \u001b[33;1m\u001b[1;3mAnswer: 2.4242784855673896\n",
"\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m I now know the final answer\n",
"Final Answer: Jason Sudeikis, Olivia Wilde's boyfriend, is 47 years old and his age raised to the 0.23 power is 2.4242784855673896.\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n",
"1465\n"
]
}
],
"source": [
"with get_openai_callback() as cb:\n",
" response = agent.run(\"Who is Olivia Wilde's boyfriend? What is his current age raised to the 0.23 power?\")\n",
" print(cb.total_tokens)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "80ca77a3",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.9"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -9,6 +9,10 @@ The examples here all address certain "how-to" guides for working with LLMs.
`Custom LLM <./examples/custom_llm.html>`_: How to create and use a custom LLM class, in case you have an LLM not from one of the standard providers (including one that you host yourself).
`Token Usage Tracking <./examples/token_usage_tracking.html>`_: How to track the token usage of various chains/agents/LLM calls.
`Fake LLM <./examples/fake_llm.html>`_: How to create and use a fake LLM for testing and debugging purposes.
.. toctree::
:maxdepth: 1

View File

@@ -18,7 +18,9 @@
"cell_type": "code",
"execution_count": 1,
"id": "df924055",
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"from langchain.llms import OpenAI"
@@ -207,14 +209,6 @@
"source": [
"llm.get_num_tokens(\"what a joke\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b004ffdd",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {

View File

@@ -7,6 +7,8 @@ They are split into two categories:
1. `Generic Functionality <./generic_how_to.html>`_: Covering generic functionality all LLMs should have.
2. `Integrations <./integrations.html>`_: Covering integrations with various LLM providers.
3. `Asynchronous <./async_llm.html>`_: Covering asynchronous functionality.
4. `Streaming <./streaming_llm.html>`_: Covering streaming functionality.
.. toctree::
:maxdepth: 1

View File

@@ -9,6 +9,16 @@ The examples here are all "how-to" guides for how to integrate with various LLM
`Manifest <./integrations/manifest.html>`_: Covers how to utilize the Manifest wrapper.
`Goose AI <./integrations/gooseai_example.html>`_: Covers how to utilize the Goose AI wrapper.
`Cerebrium <./integrations/cerebriumai_example.html>`_: Covers how to utilize the Cerebrium AI wrapper.
`Petals <./integrations/petals_example.html>`_: Covers how to utilize the Petals wrapper.
`Forefront AI <./integrations/forefrontai_example.html>`_: Covers how to utilize the Forefront AI wrapper.
`PromptLayer OpenAI <./integrations/promptlayer_openai.html>`_: Covers how to use `PromptLayer <https://promptlayer.com>`_ with Langchain.
.. toctree::
:maxdepth: 1

View File

@@ -20,7 +20,7 @@
"# The API version you want to use: set this to `2022-12-01` for the released version.\n",
"export OPENAI_API_VERSION=2022-12-01\n",
"# The base URL for your Azure OpenAI resource. You can find this in the Azure portal under your Azure OpenAI resource.\n",
"export OPENAI_API_BASE_URL=https://your-resource-name.openai.azure.com\n",
"export OPENAI_API_BASE=https://your-resource-name.openai.azure.com\n",
"# The API key for your Azure OpenAI resource. You can find this in the Azure portal under your Azure OpenAI resource.\n",
"export OPENAI_API_KEY=<your Azure OpenAI API key>\n",
"```\n",

View File

@@ -0,0 +1,156 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# CerebriumAI LLM Example\n",
"This notebook goes over how to use Langchain with [CerebriumAI](https://docs.cerebrium.ai/introduction)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Install cerebrium\n",
"The `cerebrium` package is required to use the CerebriumAI API. Install `cerebrium` using `pip3 install cerebrium`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"$ pip3 install cerebrium"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Imports"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"from langchain.llms import CerebriumAI\n",
"from langchain import PromptTemplate, LLMChain"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Set the Environment API Key\n",
"Make sure to get your API key from CerebriumAI. You are given a 1 hour free of serverless GPU compute to test different models."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"os.environ[\"CEREBRIUMAI_API_KEY\"] = \"YOUR_KEY_HERE\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create the CerebriumAI instance\n",
"You can specify different parameters such as the model endpoint url, max length, temperature, etc. You must provide an endpoint url."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"llm = CerebriumAI(endpoint_url=\"YOUR ENDPOINT URL HERE\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create a Prompt Template\n",
"We will create a prompt template for Question and Answer."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"template = \"\"\"Question: {question}\n",
"\n",
"Answer: Let's think step by step.\"\"\"\n",
"\n",
"prompt = PromptTemplate(template=template, input_variables=[\"question\"])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Initiate the LLMChain"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"llm_chain = LLMChain(prompt=prompt, llm=llm)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Run the LLMChain\n",
"Provide a question and run the LLMChain."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"question = \"What NFL team won the Super Bowl in the year Justin Beiber was born?\"\n",
"\n",
"llm_chain.run(question)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3.9.12 ('palm')",
"language": "python",
"name": "python3"
},
"language_info": {
"name": "python",
"version": "3.9.12"
},
"orig_nbformat": 4,
"vscode": {
"interpreter": {
"hash": "a0a0263b650d907a3bfe41c0f8d6a63a071b884df3cfdc1579f00cdc1aed6b03"
}
}
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -0,0 +1,139 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# ForefrontAI LLM Example\n",
"This notebook goes over how to use Langchain with [ForefrontAI](https://www.forefront.ai/)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Imports"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"from langchain.llms import ForefrontAI\n",
"from langchain import PromptTemplate, LLMChain"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Set the Environment API Key\n",
"Make sure to get your API key from ForefrontAI. You are given a 5 day free trial to test different models."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"os.environ[\"FOREFRONTAI_API_KEY\"] = \"YOUR_KEY_HERE\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create the ForefrontAI instance\n",
"You can specify different parameters such as the model endpoint url, length, temperature, etc. You must provide an endpoint url."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"llm = ForefrontAI(endpoint_url=\"YOUR ENDPOINT URL HERE\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create a Prompt Template\n",
"We will create a prompt template for Question and Answer."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"template = \"\"\"Question: {question}\n",
"\n",
"Answer: Let's think step by step.\"\"\"\n",
"\n",
"prompt = PromptTemplate(template=template, input_variables=[\"question\"])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Initiate the LLMChain"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"llm_chain = LLMChain(prompt=prompt, llm=llm)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Run the LLMChain\n",
"Provide a question and run the LLMChain."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"question = \"What NFL team won the Super Bowl in the year Justin Beiber was born?\"\n",
"\n",
"llm_chain.run(question)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3.9.12 ('palm')",
"language": "python",
"name": "python3"
},
"language_info": {
"name": "python",
"version": "3.9.12"
},
"orig_nbformat": 4,
"vscode": {
"interpreter": {
"hash": "a0a0263b650d907a3bfe41c0f8d6a63a071b884df3cfdc1579f00cdc1aed6b03"
}
}
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -0,0 +1,156 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# GooseAI LLM Example\n",
"This notebook goes over how to use Langchain with [GooseAI](https://goose.ai/)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Install openai\n",
"The `openai` package is required to use the GooseAI API. Install `openai` using `pip3 install openai`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"$ pip3 install openai"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Imports"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"from langchain.llms import GooseAI\n",
"from langchain import PromptTemplate, LLMChain"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Set the Environment API Key\n",
"Make sure to get your API key from GooseAI. You are given $10 in free credits to test different models."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"os.environ[\"GOOSEAI_API_KEY\"] = \"YOUR_KEY_HERE\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create the GooseAI instance\n",
"You can specify different parameters such as the model name, max tokens generated, temperature, etc."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"llm = GooseAI()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create a Prompt Template\n",
"We will create a prompt template for Question and Answer."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"template = \"\"\"Question: {question}\n",
"\n",
"Answer: Let's think step by step.\"\"\"\n",
"\n",
"prompt = PromptTemplate(template=template, input_variables=[\"question\"])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Initiate the LLMChain"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"llm_chain = LLMChain(prompt=prompt, llm=llm)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Run the LLMChain\n",
"Provide a question and run the LLMChain."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"question = \"What NFL team won the Super Bowl in the year Justin Beiber was born?\"\n",
"\n",
"llm_chain.run(question)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3.9.12 ('palm')",
"language": "python",
"name": "python3"
},
"language_info": {
"name": "python",
"version": "3.9.12"
},
"orig_nbformat": 4,
"vscode": {
"interpreter": {
"hash": "a0a0263b650d907a3bfe41c0f8d6a63a071b884df3cfdc1579f00cdc1aed6b03"
}
}
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -5,9 +5,9 @@
"id": "959300d4",
"metadata": {},
"source": [
"# HuggingFace Hub\n",
"# Hugging Face Hub\n",
"\n",
"This example showcases how to connect to the HuggingFace Hub."
"This example showcases how to connect to the Hugging Face Hub."
]
},
{
@@ -20,7 +20,7 @@
"name": "stdout",
"output_type": "stream",
"text": [
"The Seattle Seahawks won the Super Bowl in 2010. Justin Beiber was born in 2010. The\n"
"The Seattle Seahawks won the Super Bowl in 2010. Justin Beiber was born in 2010. The final answer: Seattle Seahawks.\n"
]
}
],
@@ -31,7 +31,7 @@
"\n",
"Answer: Let's think step by step.\"\"\"\n",
"prompt = PromptTemplate(template=template, input_variables=[\"question\"])\n",
"llm_chain = LLMChain(prompt=prompt, llm=HuggingFaceHub(repo_id=\"google/flan-t5-xl\", model_kwargs={\"temperature\":1e-10}))\n",
"llm_chain = LLMChain(prompt=prompt, llm=HuggingFaceHub(repo_id=\"google/flan-t5-xl\", model_kwargs={\"temperature\":0, \"max_length\":64}))\n",
"\n",
"question = \"What NFL team won the Super Bowl in the year Justin Beiber was born?\"\n",
"\n",

View File

@@ -0,0 +1,156 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Petals LLM Example\n",
"This notebook goes over how to use Langchain with [Petals](https://github.com/bigscience-workshop/petals)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Install petals\n",
"The `petals` package is required to use the Petals API. Install `petals` using `pip3 install petals`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"$ pip3 install petals"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Imports"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"from langchain.llms import Petals\n",
"from langchain import PromptTemplate, LLMChain"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Set the Environment API Key\n",
"Make sure to get your API key from Huggingface."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"os.environ[\"HUGGINGFACE_API_KEY\"] = \"YOUR_KEY_HERE\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create the Petals instance\n",
"You can specify different parameters such as the model name, max new tokens, temperature, etc."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"llm = Petals(model_name=\"bigscience/bloom-petals\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create a Prompt Template\n",
"We will create a prompt template for Question and Answer."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"template = \"\"\"Question: {question}\n",
"\n",
"Answer: Let's think step by step.\"\"\"\n",
"\n",
"prompt = PromptTemplate(template=template, input_variables=[\"question\"])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Initiate the LLMChain"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"llm_chain = LLMChain(prompt=prompt, llm=llm)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Run the LLMChain\n",
"Provide a question and run the LLMChain."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"question = \"What NFL team won the Super Bowl in the year Justin Beiber was born?\"\n",
"\n",
"llm_chain.run(question)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3.9.12 ('palm')",
"language": "python",
"name": "python3"
},
"language_info": {
"name": "python",
"version": "3.9.12"
},
"orig_nbformat": 4,
"vscode": {
"interpreter": {
"hash": "a0a0263b650d907a3bfe41c0f8d6a63a071b884df3cfdc1579f00cdc1aed6b03"
}
}
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -0,0 +1,81 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"id": "959300d4",
"metadata": {},
"source": [
"# PromptLayer OpenAI\n",
"\n",
"This example showcases how to connect to [PromptLayer](https://www.promptlayer.com) to start recording your OpenAI requests."
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "3acf0069",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"' to go outside\\n\\nUnfortunately, cats cannot go outside without being supervised by a human. Going outside can be dangerous for cats, as they may come into contact with cars, other animals, or other dangers. If you want to go outside, ask your human to take you on a supervised walk or to a safe, enclosed outdoor space.'"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from langchain.llms import PromptLayerOpenAI\n",
"import promptlayer\n",
"import os\n",
"\n",
"# Set up API keys, you can get a promptlayer api key here: https://promptlayer.com/\n",
"os.environ[\"OPENAI_API_KEY\"] = \"YOUR_OPENAI_API_KEY\"\n",
"promptlayer.api_key = \"YOUR_PROMPTLAYER_API_KEY\"\n",
"\n",
"# Optionally pass in pl_tags to track your requests\n",
"llm = PromptLayerOpenAI(pl_tags=[\"langchain\"])\n",
"\n",
"llm(\"I am a cat and I want\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ae4559c7",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "base",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.8"
},
"vscode": {
"interpreter": {
"hash": "c4fe2cd85a8d9e8baaec5340ce66faff1c77581a9f43e6c45e85e09b6fced008"
}
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,140 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "6eaf7e66-f49c-42da-8d11-22ea13bef718",
"metadata": {},
"source": [
"# Streaming with LLMs\n",
"\n",
"LangChain provides streaming support for LLMs. Currently, we only support streaming for the `OpenAI` LLM implementation, but streaming support for other LLM implementations is on the roadmap. To utilize streaming, use a [`CallbackHandler`](https://github.com/hwchase17/langchain/blob/master/langchain/callbacks/base.py) that implements `on_llm_new_token`. In this example, we are using [`StreamingStdOutCallbackHandler`]()."
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "4ac0ff54-540a-4f2b-8d9a-b590fec7fe07",
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"Verse 1\n",
"I'm sippin' on sparkling water,\n",
"It's so refreshing and light,\n",
"It's the perfect way to quench my thirst,\n",
"On a hot summer night.\n",
"\n",
"Chorus\n",
"Sparkling water, sparkling water,\n",
"It's the best way to stay hydrated,\n",
"It's so refreshing and light,\n",
"It's the perfect way to stay alive.\n",
"\n",
"Verse 2\n",
"I'm sippin' on sparkling water,\n",
"It's so bubbly and bright,\n",
"It's the perfect way to cool me down,\n",
"On a hot summer night.\n",
"\n",
"Chorus\n",
"Sparkling water, sparkling water,\n",
"It's the best way to stay hydrated,\n",
"It's so refreshing and light,\n",
"It's the perfect way to stay alive.\n",
"\n",
"Verse 3\n",
"I'm sippin' on sparkling water,\n",
"It's so crisp and clean,\n",
"It's the perfect way to keep me going,\n",
"On a hot summer day.\n",
"\n",
"Chorus\n",
"Sparkling water, sparkling water,\n",
"It's the best way to stay hydrated,\n",
"It's so refreshing and light,\n",
"It's the perfect way to stay alive."
]
}
],
"source": [
"from langchain.llms import OpenAI\n",
"from langchain.callbacks.base import CallbackManager\n",
"from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler\n",
"\n",
"\n",
"llm = OpenAI(streaming=True, callback_manager=CallbackManager([StreamingStdOutCallbackHandler()]), verbose=True, temperature=0)\n",
"resp = llm(\"Write me a song about sparkling water.\")"
]
},
{
"cell_type": "markdown",
"id": "61fb6de7-c6c8-48d0-a48e-1204c027a23c",
"metadata": {
"tags": []
},
"source": [
"We still have access to the end `LLMResult` if using `generate`. However, `token_usage` is not currently supported for streaming."
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "a35373f1-9ee6-4753-a343-5aee749b8527",
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"Q: What did the fish say when it hit the wall?\n",
"A: Dam!"
]
},
{
"data": {
"text/plain": [
"LLMResult(generations=[[Generation(text='\\n\\nQ: What did the fish say when it hit the wall?\\nA: Dam!', generation_info={'finish_reason': 'stop', 'logprobs': None})]], llm_output={'token_usage': {}})"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"llm.generate([\"Tell me a joke.\"])"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.9"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

Some files were not shown because too many files have changed in this diff Show More