Compare commits

..

21 Commits

Author SHA1 Message Date
Harrison Chase
b64c86a25f bump version to 148 (#3458) 2023-04-24 11:08:32 -07:00
mbchang
82845e3821 add meta-prompt to autonomous agents use cases (#3254)
An implementation of
[meta-prompt](https://noahgoodman.substack.com/p/meta-prompt-a-simple-self-improving),
where the agent modifies its own instructions across episodes with a
user.

![figure](https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F468217b9-96d9-47c0-a08b-dbf6b21b9f49_492x384.png)
2023-04-24 10:48:38 -07:00
yunfeilu92
77235bbe43 propogate kwargs to cls in OpenSearchVectorSearch (#3416)
kwargs shoud be passed into cls so that opensearch client can be
properly initlized in __init__(). Otherwise logic like below will not
work. as auth will not be passed into __init__

```python
docsearch = OpenSearchVectorSearch.from_documents(docs, embeddings, opensearch_url="http://localhost:9200")

query = "What did the president say about Ketanji Brown Jackson"
docs = docsearch.similarity_search(query)
```

Co-authored-by: EC2 Default User <ec2-user@ip-172-31-28-97.ec2.internal>
2023-04-24 10:43:41 -07:00
Eduard van Valkenburg
46c9636012 small constructor change and updated notebook (#3426)
small change in the pydantic definitions, same api. 

updated notebook with right constructure and added few shot example
2023-04-24 10:42:38 -07:00
Zander Chase
49122a96e7 Structured Tool Bugfixes (#3324)
- Proactively raise error if a tool subclasses BaseTool, defines its
own schema, but fails to add the type-hints
- fix the auto-inferred schema of the decorator to strip the
unneeded virtual kwargs from the schema dict

Helps avoid silent instances of #3297
2023-04-24 09:58:29 -07:00
Bilal Mahmoud
f22b9d0e57 Do not await sync callback managers (#3440)
This fixes a bug in the math LLM, where even the sync manager was
awaited, creating a nasty `RuntimeError`
2023-04-24 09:52:04 -07:00
Dianliang233
0cf934ce7d Fix NoneType has no len() in DDG tool (#3334)
Per
46ac914daa/duckduckgo_search/ddg.py (L109),
ddg function actually returns None when there is no result.
2023-04-23 21:29:49 -07:00
Davit Buniatyan
2c0023393b Deep Lake mini upgrades (#3375)
Improvements
* set default num_workers for ingestion to 0
* upgraded notebooks for avoiding dataset creation ambiguity
* added `force_delete_dataset_by_path`
* bumped deeplake to 3.3.0
* creds arg passing to deeplake object that would allow custom S3

Notes
* please double check if poetry is not messed up (thanks!)

Asks
* Would be great to create a shared slack channel for quick questions

---------

Co-authored-by: Davit Buniatyan <d@activeloop.ai>
2023-04-23 21:23:54 -07:00
Haste171
93d53e417a Update unstructured_file.ipynb (#3377)
Fix typo in docs
2023-04-23 21:22:38 -07:00
张城铭
487a57ffe6 Optimize code (#3412)
Co-authored-by: assert <zhangchengming@kkguan.com>
2023-04-23 21:04:59 -07:00
Zander Chase
3d8243ec95 Catch all exceptions in autogpt (#3413)
Ought to be more autonomous
2023-04-23 20:02:37 -07:00
Zander Chase
738ee56b86 Move Generative Agent definition to Experimental (#3245)
Extending @BeautyyuYanli 's #3220 to move from the notebook

---------

Co-authored-by: BeautyyuYanli <beautyyuyanli@gmail.com>
2023-04-23 18:32:37 -07:00
Zander Chase
20f530e9c5 Add Sentence Transformers Embeddings (#3409)
Add embeddings based on the sentence transformers library.
Add a notebook and integration tests.

Co-authored-by: khimaros <me@khimaros.com>
2023-04-23 18:25:20 -07:00
Zander Chase
73bc70b4fa Update marathon notebook (#3408)
Fixes #3404
2023-04-23 18:14:11 -07:00
Luke Harris
b4de839ed8 Several confluence loader improvements (#3300)
This PR addresses several improvements:

- Previously it was not possible to load spaces of more than 100 pages.
The `limit` was being used both as an overall page limit *and* as a per
request pagination limit. This, in combination with the fact that
atlassian seem to use a server-side hard limit of 100 when page content
is expanded, meant it wasn't possible to download >100 pages. Now
`limit` is used *only* as a per-request pagination limit and `max_pages`
is introduced as the way to limit the total number of pages returned by
the paginator.
- Document metadata now includes `source` (the source url), making it
compatible with `RetrievalQAWithSourcesChain`.
 - It is now possible to include inline and footer comments.
- It is now possible to pass `verify_ssl=False` and other parameters to
the confluence object for use cases that require it.
2023-04-23 15:06:10 -07:00
zz
651cb62556 Add support for wikipedia's lang parameter (#3383)
Allow to hange the language of the wikipedia API being requested.

Co-authored-by: zhuohui <zhuohui@datastory.com.cn>
2023-04-23 15:02:18 -07:00
Johann-Peter Hartmann
199cb855ea Improve youtube loader (#3395)
Small improvements for the YouTube loader: 
a) use the YouTube API permission scope instead of Google Drive 
b) bugfix: allow transcript loading for single videos 
c) an additional parameter "continue_on_failure" for cases when videos
in a playlist do not have transcription enabled.
d) support automated translation for all languages, if available.

---------

Co-authored-by: Johann-Peter Hartmann <johann-peter.hartmann@mayflower.de>
2023-04-23 10:24:41 -07:00
Harrison Chase
e5ffbee5eb Harrison/hf document loader (#3394)
Co-authored-by: Azam Iftikhar <azamiftikhar1000@gmail.com>
2023-04-23 10:17:43 -07:00
Hadi Curtay
acfd11c8e4 Updated incorrect link to Weaviate notebook (#3362)
The detailed walkthrough of the Weaviate wrapper was pointing to the
getting-started notebook. Fixed it to point to the Weaviable notebook in
the examples folder.
2023-04-22 20:47:41 -07:00
Ismail Pelaseyed
b21fe0a18f Add example on deploying LangChain to Cloud Run (#3366)
## Summary

Adds a link to a minimal example of running LangChain on Google Cloud
Run.
2023-04-22 20:09:00 -07:00
Ivan Zatevakhin
77bb6c99f7 llamacpp wrong default value passed for f16_kv (#3320)
Fixes default f16_kv value in llamacpp; corrects incorrect parameter
passed.

See:
ba3959eafd/llama_cpp/llama.py (L33)

Fixes #3241
Fixes #3301
2023-04-22 18:46:55 -07:00
46 changed files with 2781 additions and 2238 deletions

View File

@@ -37,6 +37,10 @@ A minimal example on how to run LangChain on Vercel using Flask.
A minimal example on how to deploy LangChain to DigitalOcean App Platform.
## [Google Cloud Run](https://github.com/homanp/gcp-langchain)
A minimal example on how to deploy LangChain to Google Cloud Run.
## [SteamShip](https://github.com/steamship-core/steamship-langchain/)
This repository contains LangChain adapters for Steamship, enabling LangChain developers to rapidly deploy their apps on Steamship.

View File

@@ -30,4 +30,4 @@ To import this vectorstore:
from langchain.vectorstores import Weaviate
```
For a more detailed walkthrough of the Weaviate wrapper, see [this notebook](../modules/indexes/vectorstores/getting_started.ipynb)
For a more detailed walkthrough of the Weaviate wrapper, see [this notebook](../modules/indexes/vectorstores/examples/weaviate.ipynb)

View File

@@ -55,14 +55,16 @@
},
"outputs": [],
"source": [
"llm = AzureOpenAI(temperature=0, deployment_name=\"text-davinci-003\", verbose=True)\n",
"fast_llm = AzureOpenAI(temperature=0.5, max_tokens=1000, deployment_name=\"gpt-35-turbo\", verbose=True)\n",
"smart_llm = AzureOpenAI(temperature=0, max_tokens=100, deployment_name=\"gpt-4\", verbose=True)\n",
"\n",
"toolkit = PowerBIToolkit(\n",
" powerbi=PowerBIDataset(None, \"<dataset_id>\", ['table1', 'table2'], DefaultAzureCredential()), \n",
" llm=llm\n",
" powerbi=PowerBIDataset(dataset_id=\"<dataset_id>\", table_names=['table1', 'table2'], credential=DefaultAzureCredential()), \n",
" llm=smart_llm\n",
")\n",
"\n",
"agent_executor = create_pbi_agent(\n",
" llm=llm,\n",
" llm=fast_llm,\n",
" toolkit=toolkit,\n",
" verbose=True,\n",
")"
@@ -141,6 +143,56 @@
"source": [
"agent_executor.run(\"What unique values are there for dimensions2 in table2\")"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "6fd950e4",
"metadata": {},
"source": [
"## Example: add your own few-shot prompts"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "87d677f9",
"metadata": {},
"outputs": [],
"source": [
"#fictional example\n",
"few_shots = \"\"\"\n",
"Question: How many rows are in the table revenue?\n",
"DAX: EVALUATE ROW(\"Number of rows\", COUNTROWS(revenue_details))\n",
"----\n",
"Question: How many rows are in the table revenue where year is not empty?\n",
"DAX: EVALUATE ROW(\"Number of rows\", COUNTROWS(FILTER(revenue_details, revenue_details[year] <> \"\")))\n",
"----\n",
"Question: What was the average of value in revenue in dollars?\n",
"DAX: EVALUATE ROW(\"Average\", AVERAGE(revenue_details[dollar_value]))\n",
"----\n",
"\"\"\"\n",
"toolkit = PowerBIToolkit(\n",
" powerbi=PowerBIDataset(dataset_id=\"<dataset_id>\", table_names=['table1', 'table2'], credential=DefaultAzureCredential()), \n",
" llm=smart_llm,\n",
" examples=few_shots,\n",
")\n",
"agent_executor = create_pbi_agent(\n",
" llm=fast_llm,\n",
" toolkit=toolkit,\n",
" verbose=True,\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "33f4bb43",
"metadata": {},
"outputs": [],
"source": [
"agent_executor.run(\"What was the maximum of value in revenue in dollars in 2022?\")"
]
}
],
"metadata": {

View File

@@ -0,0 +1,220 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "04c9fdc5",
"metadata": {},
"source": [
"# HuggingFace dataset loader \n",
"\n",
"This notebook shows how to load Hugging Face Hub datasets to LangChain.\n",
"\n",
"The Hugging Face Hub hosts a large number of community-curated datasets for a diverse range of tasks such as translation, automatic speech recognition, and image classification.\n"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "1815c866",
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import HuggingFaceDatasetLoader"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "3611e092",
"metadata": {},
"outputs": [],
"source": [
"dataset_name=\"imdb\"\n",
"page_content_column=\"text\"\n",
"\n",
"\n",
"loader=HuggingFaceDatasetLoader(dataset_name,page_content_column)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5e903ebc",
"metadata": {},
"outputs": [],
"source": [
"data = loader.load()"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "e8559946",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[Document(page_content='I rented I AM CURIOUS-YELLOW from my video store because of all the controversy that surrounded it when it was first released in 1967. I also heard that at first it was seized by U.S. customs if it ever tried to enter this country, therefore being a fan of films considered \"controversial\" I really had to see this for myself.<br /><br />The plot is centered around a young Swedish drama student named Lena who wants to learn everything she can about life. In particular she wants to focus her attentions to making some sort of documentary on what the average Swede thought about certain political issues such as the Vietnam War and race issues in the United States. In between asking politicians and ordinary denizens of Stockholm about their opinions on politics, she has sex with her drama teacher, classmates, and married men.<br /><br />What kills me about I AM CURIOUS-YELLOW is that 40 years ago, this was considered pornographic. Really, the sex and nudity scenes are few and far between, even then it\\'s not shot like some cheaply made porno. While my countrymen mind find it shocking, in reality sex and nudity are a major staple in Swedish cinema. Even Ingmar Bergman, arguably their answer to good old boy John Ford, had sex scenes in his films.<br /><br />I do commend the filmmakers for the fact that any sex shown in the film is shown for artistic purposes rather than just to shock people and make money to be shown in pornographic theaters in America. I AM CURIOUS-YELLOW is a good film for anyone wanting to study the meat and potatoes (no pun intended) of Swedish cinema. But really, this film doesn\\'t have much of a plot.', metadata={'label': 0}),\n",
" Document(page_content='\"I Am Curious: Yellow\" is a risible and pretentious steaming pile. It doesn\\'t matter what one\\'s political views are because this film can hardly be taken seriously on any level. As for the claim that frontal male nudity is an automatic NC-17, that isn\\'t true. I\\'ve seen R-rated films with male nudity. Granted, they only offer some fleeting views, but where are the R-rated films with gaping vulvas and flapping labia? Nowhere, because they don\\'t exist. The same goes for those crappy cable shows: schlongs swinging in the breeze but not a clitoris in sight. And those pretentious indie movies like The Brown Bunny, in which we\\'re treated to the site of Vincent Gallo\\'s throbbing johnson, but not a trace of pink visible on Chloe Sevigny. Before crying (or implying) \"double-standard\" in matters of nudity, the mentally obtuse should take into account one unavoidably obvious anatomical difference between men and women: there are no genitals on display when actresses appears nude, and the same cannot be said for a man. In fact, you generally won\\'t see female genitals in an American film in anything short of porn or explicit erotica. This alleged double-standard is less a double standard than an admittedly depressing ability to come to terms culturally with the insides of women\\'s bodies.', metadata={'label': 0}),\n",
" Document(page_content=\"If only to avoid making this type of film in the future. This film is interesting as an experiment but tells no cogent story.<br /><br />One might feel virtuous for sitting thru it because it touches on so many IMPORTANT issues but it does so without any discernable motive. The viewer comes away with no new perspectives (unless one comes up with one while one's mind wanders, as it will invariably do during this pointless film).<br /><br />One might better spend one's time staring out a window at a tree growing.<br /><br />\", metadata={'label': 0}),\n",
" Document(page_content=\"This film was probably inspired by Godard's Masculin, féminin and I urge you to see that film instead.<br /><br />The film has two strong elements and those are, (1) the realistic acting (2) the impressive, undeservedly good, photo. Apart from that, what strikes me most is the endless stream of silliness. Lena Nyman has to be most annoying actress in the world. She acts so stupid and with all the nudity in this film,...it's unattractive. Comparing to Godard's film, intellectuality has been replaced with stupidity. Without going too far on this subject, I would say that follows from the difference in ideals between the French and the Swedish society.<br /><br />A movie of its time, and place. 2/10.\", metadata={'label': 0}),\n",
" Document(page_content='Oh, brother...after hearing about this ridiculous film for umpteen years all I can think of is that old Peggy Lee song..<br /><br />\"Is that all there is??\" ...I was just an early teen when this smoked fish hit the U.S. I was too young to get in the theater (although I did manage to sneak into \"Goodbye Columbus\"). Then a screening at a local film museum beckoned - Finally I could see this film, except now I was as old as my parents were when they schlepped to see it!!<br /><br />The ONLY reason this film was not condemned to the anonymous sands of time was because of the obscenity case sparked by its U.S. release. MILLIONS of people flocked to this stinker, thinking they were going to see a sex film...Instead, they got lots of closeups of gnarly, repulsive Swedes, on-street interviews in bland shopping malls, asinie political pretension...and feeble who-cares simulated sex scenes with saggy, pale actors.<br /><br />Cultural icon, holy grail, historic artifact..whatever this thing was, shred it, burn it, then stuff the ashes in a lead box!<br /><br />Elite esthetes still scrape to find value in its boring pseudo revolutionary political spewings..But if it weren\\'t for the censorship scandal, it would have been ignored, then forgotten.<br /><br />Instead, the \"I Am Blank, Blank\" rhythymed title was repeated endlessly for years as a titilation for porno films (I am Curious, Lavender - for gay films, I Am Curious, Black - for blaxploitation films, etc..) and every ten years or so the thing rises from the dead, to be viewed by a new generation of suckers who want to see that \"naughty sex film\" that \"revolutionized the film industry\"...<br /><br />Yeesh, avoid like the plague..Or if you MUST see it - rent the video and fast forward to the \"dirty\" parts, just to get it over with.<br /><br />', metadata={'label': 0}),\n",
" Document(page_content=\"I would put this at the top of my list of films in the category of unwatchable trash! There are films that are bad, but the worst kind are the ones that are unwatchable but you are suppose to like them because they are supposed to be good for you! The sex sequences, so shocking in its day, couldn't even arouse a rabbit. The so called controversial politics is strictly high school sophomore amateur night Marxism. The film is self-consciously arty in the worst sense of the term. The photography is in a harsh grainy black and white. Some scenes are out of focus or taken from the wrong angle. Even the sound is bad! And some people call this art?<br /><br />\", metadata={'label': 0}),\n",
" Document(page_content=\"Whoever wrote the screenplay for this movie obviously never consulted any books about Lucille Ball, especially her autobiography. I've never seen so many mistakes in a biopic, ranging from her early years in Celoron and Jamestown to her later years with Desi. I could write a whole list of factual errors, but it would go on for pages. In all, I believe that Lucille Ball is one of those inimitable people who simply cannot be portrayed by anyone other than themselves. If I were Lucie Arnaz and Desi, Jr., I would be irate at how many mistakes were made in this film. The filmmakers tried hard, but the movie seems awfully sloppy to me.\", metadata={'label': 0}),\n",
" Document(page_content='When I first saw a glimpse of this movie, I quickly noticed the actress who was playing the role of Lucille Ball. Rachel York\\'s portrayal of Lucy is absolutely awful. Lucille Ball was an astounding comedian with incredible talent. To think about a legend like Lucille Ball being portrayed the way she was in the movie is horrendous. I cannot believe out of all the actresses in the world who could play a much better Lucy, the producers decided to get Rachel York. She might be a good actress in other roles but to play the role of Lucille Ball is tough. It is pretty hard to find someone who could resemble Lucille Ball, but they could at least find someone a bit similar in looks and talent. If you noticed York\\'s portrayal of Lucy in episodes of I Love Lucy like the chocolate factory or vitavetavegamin, nothing is similar in any way-her expression, voice, or movement.<br /><br />To top it all off, Danny Pino playing Desi Arnaz is horrible. Pino does not qualify to play as Ricky. He\\'s small and skinny, his accent is unreal, and once again, his acting is unbelievable. Although Fred and Ethel were not similar either, they were not as bad as the characters of Lucy and Ricky.<br /><br />Overall, extremely horrible casting and the story is badly told. If people want to understand the real life situation of Lucille Ball, I suggest watching A&E Biography of Lucy and Desi, read the book from Lucille Ball herself, or PBS\\' American Masters: Finding Lucy. If you want to see a docudrama, \"Before the Laughter\" would be a better choice. The casting of Lucille Ball and Desi Arnaz in \"Before the Laughter\" is much better compared to this. At least, a similar aspect is shown rather than nothing.', metadata={'label': 0}),\n",
" Document(page_content='Who are these \"They\"- the actors? the filmmakers? Certainly couldn\\'t be the audience- this is among the most air-puffed productions in existence. It\\'s the kind of movie that looks like it was a lot of fun to shoot\\x97 TOO much fun, nobody is getting any actual work done, and that almost always makes for a movie that\\'s no fun to watch.<br /><br />Ritter dons glasses so as to hammer home his character\\'s status as a sort of doppleganger of the bespectacled Bogdanovich; the scenes with the breezy Ms. Stratten are sweet, but have an embarrassing, look-guys-I\\'m-dating-the-prom-queen feel to them. Ben Gazzara sports his usual cat\\'s-got-canary grin in a futile attempt to elevate the meager plot, which requires him to pursue Audrey Hepburn with all the interest of a narcoleptic at an insomnia clinic. In the meantime, the budding couple\\'s respective children (nepotism alert: Bogdanovich\\'s daughters) spew cute and pick up some fairly disturbing pointers on \\'love\\' while observing their parents. (Ms. Hepburn, drawing on her dignity, manages to rise above the proceedings- but she has the monumental challenge of playing herself, ostensibly.) Everybody looks great, but so what? It\\'s a movie and we can expect that much, if that\\'s what you\\'re looking for you\\'d be better off picking up a copy of Vogue.<br /><br />Oh- and it has to be mentioned that Colleen Camp thoroughly annoys, even apart from her singing, which, while competent, is wholly unconvincing... the country and western numbers are woefully mismatched with the standards on the soundtrack. Surely this is NOT what Gershwin (who wrote the song from which the movie\\'s title is derived) had in mind; his stage musicals of the 20\\'s may have been slight, but at least they were long on charm. \"They All Laughed\" tries to coast on its good intentions, but nobody- least of all Peter Bogdanovich - has the good sense to put on the brakes.<br /><br />Due in no small part to the tragic death of Dorothy Stratten, this movie has a special place in the heart of Mr. Bogdanovich- he even bought it back from its producers, then distributed it on his own and went bankrupt when it didn\\'t prove popular. His rise and fall is among the more sympathetic and tragic of Hollywood stories, so there\\'s no joy in criticizing the film... there _is_ real emotional investment in Ms. Stratten\\'s scenes. But \"Laughed\" is a faint echo of \"The Last Picture Show\", \"Paper Moon\" or \"What\\'s Up, Doc\"- following \"Daisy Miller\" and \"At Long Last Love\", it was a thundering confirmation of the phase from which P.B. has never emerged.<br /><br />All in all, though, the movie is harmless, only a waste of rental. I want to watch people having a good time, I\\'ll go to the park on a sunny day. For filmic expressions of joy and love, I\\'ll stick to Ernest Lubitsch and Jaques Demy...', metadata={'label': 0}),\n",
" Document(page_content=\"This is said to be a personal film for Peter Bogdonavitch. He based it on his life but changed things around to fit the characters, who are detectives. These detectives date beautiful models and have no problem getting them. Sounds more like a millionaire playboy filmmaker than a detective, doesn't it? This entire movie was written by Peter, and it shows how out of touch with real people he was. You're supposed to write what you know, and he did that, indeed. And leaves the audience bored and confused, and jealous, for that matter. This is a curio for people who want to see Dorothy Stratten, who was murdered right after filming. But Patti Hanson, who would, in real life, marry Keith Richards, was also a model, like Stratten, but is a lot better and has a more ample part. In fact, Stratten's part seemed forced; added. She doesn't have a lot to do with the story, which is pretty convoluted to begin with. All in all, every character in this film is somebody that very few people can relate with, unless you're millionaire from Manhattan with beautiful supermodels at your beckon call. For the rest of us, it's an irritating snore fest. That's what happens when you're out of touch. You entertain your few friends with inside jokes, and bore all the rest.\", metadata={'label': 0}),\n",
" Document(page_content='It was great to see some of my favorite stars of 30 years ago including John Ritter, Ben Gazarra and Audrey Hepburn. They looked quite wonderful. But that was it. They were not given any characters or good lines to work with. I neither understood or cared what the characters were doing.<br /><br />Some of the smaller female roles were fine, Patty Henson and Colleen Camp were quite competent and confident in their small sidekick parts. They showed some talent and it is sad they didn\\'t go on to star in more and better films. Sadly, I didn\\'t think Dorothy Stratten got a chance to act in this her only important film role.<br /><br />The film appears to have some fans, and I was very open-minded when I started watching it. I am a big Peter Bogdanovich fan and I enjoyed his last movie, \"Cat\\'s Meow\" and all his early ones from \"Targets\" to \"Nickleodeon\". So, it really surprised me that I was barely able to keep awake watching this one.<br /><br />It is ironic that this movie is about a detective agency where the detectives and clients get romantically involved with each other. Five years later, Bogdanovich\\'s ex-girlfriend, Cybil Shepherd had a hit television series called \"Moonlighting\" stealing the story idea from Bogdanovich. Of course, there was a great difference in that the series relied on tons of witty dialogue, while this tries to make do with slapstick and a few screwball lines.<br /><br />Bottom line: It ain\\'t no \"Paper Moon\" and only a very pale version of \"What\\'s Up, Doc\".', metadata={'label': 0}),\n",
" Document(page_content=\"I can't believe that those praising this movie herein aren't thinking of some other film. I was prepared for the possibility that this would be awful, but the script (or lack thereof) makes for a film that's also pointless. On the plus side, the general level of craft on the part of the actors and technical crew is quite competent, but when you've got a sow's ear to work with you can't make a silk purse. Ben G fans should stick with just about any other movie he's been in. Dorothy S fans should stick to Galaxina. Peter B fans should stick to Last Picture Show and Target. Fans of cheap laughs at the expense of those who seem to be asking for it should stick to Peter B's amazingly awful book, Killing of the Unicorn.\", metadata={'label': 0}),\n",
" Document(page_content='Never cast models and Playboy bunnies in your films! Bob Fosse\\'s \"Star 80\" about Dorothy Stratten, of whom Bogdanovich was obsessed enough to have married her SISTER after her murder at the hands of her low-life husband, is a zillion times more interesting than Dorothy herself on the silver screen. Patty Hansen is no actress either..I expected to see some sort of lost masterpiece a la Orson Welles but instead got Audrey Hepburn cavorting in jeans and a god-awful \"poodlesque\" hair-do....Very disappointing....\"Paper Moon\" and \"The Last Picture Show\" I could watch again and again. This clunker I could barely sit through once. This movie was reputedly not released because of the brouhaha surrounding Ms. Stratten\\'s tawdry death; I think the real reason was because it was so bad!', metadata={'label': 0}),\n",
" Document(page_content=\"Its not the cast. A finer group of actors, you could not find. Its not the setting. The director is in love with New York City, and by the end of the film, so are we all! Woody Allen could not improve upon what Bogdonovich has done here. If you are going to fall in love, or find love, Manhattan is the place to go. No, the problem with the movie is the script. There is none. The actors fall in love at first sight, words are unnecessary. In the director's own experience in Hollywood that is what happens when they go to work on the set. It is reality to him, and his peers, but it is a fantasy to most of us in the real world. So, in the end, the movie is hollow, and shallow, and message-less.\", metadata={'label': 0}),\n",
" Document(page_content='Today I found \"They All Laughed\" on VHS on sale in a rental. It was a really old and very used VHS, I had no information about this movie, but I liked the references listed on its cover: the names of Peter Bogdanovich, Audrey Hepburn, John Ritter and specially Dorothy Stratten attracted me, the price was very low and I decided to risk and buy it. I searched IMDb, and the User Rating of 6.0 was an excellent reference. I looked in \"Mick Martin & Marsha Porter Video & DVD Guide 2003\" and \\x96 wow \\x96 four stars! So, I decided that I could not waste more time and immediately see it. Indeed, I have just finished watching \"They All Laughed\" and I found it a very boring overrated movie. The characters are badly developed, and I spent lots of minutes to understand their roles in the story. The plot is supposed to be funny (private eyes who fall in love for the women they are chasing), but I have not laughed along the whole story. The coincidences, in a huge city like New York, are ridiculous. Ben Gazarra as an attractive and very seductive man, with the women falling for him as if her were a Brad Pitt, Antonio Banderas or George Clooney, is quite ridiculous. In the end, the greater attractions certainly are the presence of the Playboy centerfold and playmate of the year Dorothy Stratten, murdered by her husband pretty after the release of this movie, and whose life was showed in \"Star 80\" and \"Death of a Centerfold: The Dorothy Stratten Story\"; the amazing beauty of the sexy Patti Hansen, the future Mrs. Keith Richards; the always wonderful, even being fifty-two years old, Audrey Hepburn; and the song \"Amigo\", from Roberto Carlos. Although I do not like him, Roberto Carlos has been the most popular Brazilian singer since the end of the 60\\'s and is called by his fans as \"The King\". I will keep this movie in my collection only because of these attractions (manly Dorothy Stratten). My vote is four.<br /><br />Title (Brazil): \"Muito Riso e Muita Alegria\" (\"Many Laughs and Lots of Happiness\")', metadata={'label': 0})]"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data[:15]"
]
},
{
"cell_type": "markdown",
"id": "021bc377",
"metadata": {},
"source": [
"### Example \n",
"In this example, we use data from a dataset to answer a question"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "d924885c",
"metadata": {},
"outputs": [],
"source": [
"from langchain.indexes import VectorstoreIndexCreator\n",
"from langchain.document_loaders.hugging_face_dataset import HuggingFaceDatasetLoader"
]
},
{
"cell_type": "code",
"execution_count": 24,
"id": "f94ce6a3",
"metadata": {},
"outputs": [],
"source": [
"dataset_name=\"tweet_eval\"\n",
"page_content_column=\"text\"\n",
"name=\"stance_climate\"\n",
"\n",
"\n",
"loader=HuggingFaceDatasetLoader(dataset_name,page_content_column,name)"
]
},
{
"cell_type": "code",
"execution_count": 26,
"id": "abb51899",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Found cached dataset tweet_eval\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "4b10969d08df4e6792eaafc6d41fe366",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
" 0%| | 0/3 [00:00<?, ?it/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"Using embedded DuckDB without persistence: data will be transient\n"
]
}
],
"source": [
"index = VectorstoreIndexCreator().from_loaders([loader])"
]
},
{
"cell_type": "code",
"execution_count": 29,
"id": "c0108277",
"metadata": {},
"outputs": [],
"source": [
"query = \"What are the most used hashtag?\"\n",
"result = index.query(query)"
]
},
{
"cell_type": "code",
"execution_count": 30,
"id": "548b6e56",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"' The most used hashtags in this context are #UKClimate2015, #Sustainability, #TakeDownTheFlag, #LoveWins, #CSOTA, #ClimateSummitoftheAmericas, #SM, and #SocialMedia.'"
]
},
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"result"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "89c30c2d",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -163,7 +163,7 @@
"source": [
"## Define a Partitioning Strategy\n",
"\n",
"Unstructured document loader allow users to pass in a `strategy` parameter that lets `unstructured` know how to partitioning the document. Currently supported strategies are `\"hi_res\"` (the default) and `\"fast\"`. Hi res partitioning strategies are more accurate, but take longer to process. Fast strategies partition the document more quickly, but trade-off accuracy. Not all document types have separate hi res and fast partitioning strategies. For those document types, the `strategy` kwarg is ignored. In some cases, the high res strategy will fallback to fast if there is a dependency missing (i.e. a model for document partitioning). You can see how to apply a strategy to an `UnstructuredFileLoader` below."
"Unstructured document loader allow users to pass in a `strategy` parameter that lets `unstructured` know how to partition the document. Currently supported strategies are `\"hi_res\"` (the default) and `\"fast\"`. Hi res partitioning strategies are more accurate, but take longer to process. Fast strategies partition the document more quickly, but trade-off accuracy. Not all document types have separate hi res and fast partitioning strategies. For those document types, the `strategy` kwarg is ignored. In some cases, the high res strategy will fallback to fast if there is a dependency missing (i.e. a model for document partitioning). You can see how to apply a strategy to an `UnstructuredFileLoader` below."
]
},
{

View File

@@ -22,7 +22,7 @@
},
{
"cell_type": "code",
"execution_count": 2,
"execution_count": 45,
"metadata": {},
"outputs": [],
"source": [
@@ -33,7 +33,7 @@
},
{
"cell_type": "code",
"execution_count": 3,
"execution_count": 46,
"metadata": {},
"outputs": [],
"source": [
@@ -46,7 +46,7 @@
},
{
"cell_type": "code",
"execution_count": 8,
"execution_count": 47,
"metadata": {},
"outputs": [],
"source": [
@@ -60,16 +60,24 @@
"embeddings = OpenAIEmbeddings()"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"Creates a dataset locally at `./deeplake/`, then runs similiarity search "
]
},
{
"cell_type": "code",
"execution_count": 6,
"execution_count": 49,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"mem://langchain loaded successfully.\n"
"./my_deeplake/ loaded successfully.\n"
]
},
{
@@ -83,7 +91,7 @@
"name": "stdout",
"output_type": "stream",
"text": [
"Dataset(path='mem://langchain', tensors=['embedding', 'ids', 'metadata', 'text'])\n",
"Dataset(path='./my_deeplake/', tensors=['embedding', 'ids', 'metadata', 'text'])\n",
"\n",
" tensor htype shape dtype compression\n",
" ------- ------- ------- ------- ------- \n",
@@ -95,15 +103,17 @@
}
],
"source": [
"db = DeepLake.from_documents(docs, embeddings)\n",
"\n",
"db = DeepLake(dataset_path=\"./my_deeplake/\", embedding_function=embeddings, overwrite=True)\n",
"db.add_documents(docs)\n",
"# or shorter\n",
"# db = DeepLake.from_documents(docs, dataset_path=\"./my_deeplake/\", embedding=embeddings, overwrite=True)\n",
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
"docs = db.similarity_search(query)"
]
},
{
"cell_type": "code",
"execution_count": 7,
"execution_count": 50,
"metadata": {},
"outputs": [
{
@@ -124,6 +134,62 @@
"print(docs[0].page_content)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"Later, you can reload the dataset without recomputing embeddings"
]
},
{
"cell_type": "code",
"execution_count": 51,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"./my_deeplake/ loaded successfully.\n",
"\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"Deep Lake Dataset in ./my_deeplake/ already exists, loading from the storage\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Dataset(path='./my_deeplake/', read_only=True, tensors=['embedding', 'ids', 'metadata', 'text'])\n",
"\n",
" tensor htype shape dtype compression\n",
" ------- ------- ------- ------- ------- \n",
" embedding generic (4, 1536) float32 None \n",
" ids text (4, 1) str None \n",
" metadata json (4, 1) str None \n",
" text text (4, 1) str None \n"
]
}
],
"source": [
"db = DeepLake(dataset_path=\"./my_deeplake/\", embedding_function=embeddings, read_only=True)\n",
"docs = db.similarity_search(query)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"Deep Lake, for now, is single writer and multiple reader. Setting `read_only=True` helps to avoid acquring the writer lock."
]
},
{
"attachments": {},
"cell_type": "markdown",
@@ -134,14 +200,14 @@
},
{
"cell_type": "code",
"execution_count": 9,
"execution_count": 52,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/media/sdb/davit/.local/lib/python3.10/site-packages/langchain/llms/openai.py:624: UserWarning: You are trying to use a chat model. This way of initializing it is no longer supported. Instead, please use: `from langchain.chat_models import ChatOpenAI`\n",
"/media/sdb/davit/Git/experiments/langchain/langchain/llms/openai.py:672: UserWarning: You are trying to use a chat model. This way of initializing it is no longer supported. Instead, please use: `from langchain.chat_models import ChatOpenAI`\n",
" warnings.warn(\n"
]
}
@@ -155,16 +221,16 @@
},
{
"cell_type": "code",
"execution_count": 10,
"execution_count": 53,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'The president nominated Circuit Court of Appeals Judge Ketanji Brown Jackson for the United States Supreme Court and praised her qualifications and broad support from both Democrats and Republicans.'"
"\"The president nominated Ketanji Brown Jackson to serve on the United States Supreme Court, describing her as one of the nation's top legal minds and a consensus builder with a background in private practice and public defense, and noting that she has received broad support from both Democrats and Republicans.\""
]
},
"execution_count": 10,
"execution_count": 53,
"metadata": {},
"output_type": "execute_result"
}
@@ -184,14 +250,14 @@
},
{
"cell_type": "code",
"execution_count": 11,
"execution_count": 54,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"mem://langchain loaded successfully.\n"
"./my_deeplake/ loaded successfully.\n"
]
},
{
@@ -205,14 +271,14 @@
"name": "stdout",
"output_type": "stream",
"text": [
"Dataset(path='mem://langchain', tensors=['embedding', 'ids', 'metadata', 'text'])\n",
"Dataset(path='./my_deeplake/', tensors=['embedding', 'ids', 'metadata', 'text'])\n",
"\n",
" tensor htype shape dtype compression\n",
" ------- ------- ------- ------- ------- \n",
" embedding generic (42, 1536) float32 None \n",
" ids text (42, 1) str None \n",
" metadata json (42, 1) str None \n",
" text text (42, 1) str None \n"
" tensor htype shape dtype compression\n",
" ------- ------- ------- ------- ------- \n",
" embedding generic (4, 1536) float32 None \n",
" ids text (4, 1) str None \n",
" metadata json (4, 1) str None \n",
" text text (4, 1) str None \n"
]
},
{
@@ -227,31 +293,29 @@
"for d in docs:\n",
" d.metadata['year'] = random.randint(2012, 2014)\n",
"\n",
"db = DeepLake.from_documents(docs, embeddings)"
"db = DeepLake.from_documents(docs, embeddings, dataset_path=\"./my_deeplake/\", overwrite=True)"
]
},
{
"cell_type": "code",
"execution_count": 12,
"execution_count": 55,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"100%|██████████| 42/42 [00:00<00:00, 3456.17it/s]\n"
"100%|██████████| 4/4 [00:00<00:00, 1080.24it/s]\n"
]
},
{
"data": {
"text/plain": [
"[Document(page_content='A former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus builder. Since shes been nominated, shes received a broad range of support—from the Fraternal Order of Police to former judges appointed by Democrats and Republicans. \\n\\nAnd if we are to advance liberty and justice, we need to secure the Border and fix the immigration system. \\n\\nWe can do both. At our border, weve installed new technology like cutting-edge scanners to better detect drug smuggling. \\n\\nWeve set up joint patrols with Mexico and Guatemala to catch more human traffickers. \\n\\nWere putting in place dedicated immigration judges so families fleeing persecution and violence can have their cases heard faster. \\n\\nWere securing commitments and supporting partners in South and Central America to host more refugees and secure their own borders.', metadata={'source': '../../../state_of_the_union.txt', 'year': 2013}),\n",
" Document(page_content='And for our LGBTQ+ Americans, lets finally get the bipartisan Equality Act to my desk. The onslaught of state laws targeting transgender Americans and their families is wrong. \\n\\nAs I said last year, especially to our younger transgender Americans, I will always have your back as your President, so you can be yourself and reach your God-given potential. \\n\\nWhile it often appears that we never agree, that isnt true. I signed 80 bipartisan bills into law last year. From preventing government shutdowns to protecting Asian-Americans from still-too-common hate crimes to reforming military justice. \\n\\nAnd soon, well strengthen the Violence Against Women Act that I first wrote three decades ago. It is important for us to show the nation that we can come together and do big things. \\n\\nSo tonight Im offering a Unity Agenda for the Nation. Four big things we can do together. \\n\\nFirst, beat the opioid epidemic.', metadata={'source': '../../../state_of_the_union.txt', 'year': 2013}),\n",
" Document(page_content='Vice President Harris and I ran for office with a new economic vision for America. \\n\\nInvest in America. Educate Americans. Grow the workforce. Build the economy from the bottom up \\nand the middle out, not from the top down. \\n\\nBecause we know that when the middle class grows, the poor have a ladder up and the wealthy do very well. \\n\\nAmerica used to have the best roads, bridges, and airports on Earth. \\n\\nNow our infrastructure is ranked 13th in the world. \\n\\nWe wont be able to compete for the jobs of the 21st Century if we dont fix that. \\n\\nThats why it was so important to pass the Bipartisan Infrastructure Law—the most sweeping investment to rebuild America in history. \\n\\nThis was a bipartisan effort, and I want to thank the members of both parties who worked to make it happen. \\n\\nWere done talking about infrastructure weeks. \\n\\nWere going to have an infrastructure decade.', metadata={'source': '../../../state_of_the_union.txt', 'year': 2013}),\n",
" Document(page_content='It is going to transform America and put us on a path to win the economic competition of the 21st Century that we face with the rest of the world—particularly with China. \\n\\nAs Ive told Xi Jinping, it is never a good bet to bet against the American people. \\n\\nWell create good jobs for millions of Americans, modernizing roads, airports, ports, and waterways all across America. \\n\\nAnd well do it all to withstand the devastating effects of the climate crisis and promote environmental justice. \\n\\nWell build a national network of 500,000 electric vehicle charging stations, begin to replace poisonous lead pipes—so every child—and every American—has clean water to drink at home and at school, provide affordable high-speed internet for every American—urban, suburban, rural, and tribal communities. \\n\\n4,000 projects have already been announced. \\n\\nAnd tonight, Im announcing that this year we will start fixing over 65,000 miles of highway and 1,500 bridges in disrepair.', metadata={'source': '../../../state_of_the_union.txt', 'year': 2013})]"
"[Document(page_content='Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while youre at it, pass the Disclose Act so Americans can know who is funding our elections. \\n\\nTonight, Id like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \\n\\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \\n\\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nations top legal minds, who will continue Justice Breyers legacy of excellence.', metadata={'source': '../../../state_of_the_union.txt', 'year': 2013}),\n",
" Document(page_content='And for our LGBTQ+ Americans, lets finally get the bipartisan Equality Act to my desk. The onslaught of state laws targeting transgender Americans and their families is wrong. \\n\\nAs I said last year, especially to our younger transgender Americans, I will always have your back as your President, so you can be yourself and reach your God-given potential. \\n\\nWhile it often appears that we never agree, that isnt true. I signed 80 bipartisan bills into law last year. From preventing government shutdowns to protecting Asian-Americans from still-too-common hate crimes to reforming military justice. \\n\\nAnd soon, well strengthen the Violence Against Women Act that I first wrote three decades ago. It is important for us to show the nation that we can come together and do big things. \\n\\nSo tonight Im offering a Unity Agenda for the Nation. Four big things we can do together. \\n\\nFirst, beat the opioid epidemic.', metadata={'source': '../../../state_of_the_union.txt', 'year': 2013})]"
]
},
"execution_count": 12,
"execution_count": 55,
"metadata": {},
"output_type": "execute_result"
}
@@ -271,19 +335,19 @@
},
{
"cell_type": "code",
"execution_count": 13,
"execution_count": 56,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[Document(page_content='Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while youre at it, pass the Disclose Act so Americans can know who is funding our elections. \\n\\nTonight, Id like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \\n\\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \\n\\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nations top legal minds, who will continue Justice Breyers legacy of excellence.', metadata={'source': '../../../state_of_the_union.txt', 'year': 2012}),\n",
" Document(page_content='A former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus builder. Since shes been nominated, shes received a broad range of support—from the Fraternal Order of Police to former judges appointed by Democrats and Republicans. \\n\\nAnd if we are to advance liberty and justice, we need to secure the Border and fix the immigration system. \\n\\nWe can do both. At our border, weve installed new technology like cutting-edge scanners to better detect drug smuggling. \\n\\nWeve set up joint patrols with Mexico and Guatemala to catch more human traffickers. \\n\\nWere putting in place dedicated immigration judges so families fleeing persecution and violence can have their cases heard faster. \\n\\nWere securing commitments and supporting partners in South and Central America to host more refugees and secure their own borders.', metadata={'source': '../../../state_of_the_union.txt', 'year': 2013}),\n",
"[Document(page_content='Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while youre at it, pass the Disclose Act so Americans can know who is funding our elections. \\n\\nTonight, Id like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \\n\\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \\n\\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nations top legal minds, who will continue Justice Breyers legacy of excellence.', metadata={'source': '../../../state_of_the_union.txt', 'year': 2013}),\n",
" Document(page_content='A former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus builder. Since shes been nominated, shes received a broad range of support—from the Fraternal Order of Police to former judges appointed by Democrats and Republicans. \\n\\nAnd if we are to advance liberty and justice, we need to secure the Border and fix the immigration system. \\n\\nWe can do both. At our border, weve installed new technology like cutting-edge scanners to better detect drug smuggling. \\n\\nWeve set up joint patrols with Mexico and Guatemala to catch more human traffickers. \\n\\nWere putting in place dedicated immigration judges so families fleeing persecution and violence can have their cases heard faster. \\n\\nWere securing commitments and supporting partners in South and Central America to host more refugees and secure their own borders.', metadata={'source': '../../../state_of_the_union.txt', 'year': 2012}),\n",
" Document(page_content='And for our LGBTQ+ Americans, lets finally get the bipartisan Equality Act to my desk. The onslaught of state laws targeting transgender Americans and their families is wrong. \\n\\nAs I said last year, especially to our younger transgender Americans, I will always have your back as your President, so you can be yourself and reach your God-given potential. \\n\\nWhile it often appears that we never agree, that isnt true. I signed 80 bipartisan bills into law last year. From preventing government shutdowns to protecting Asian-Americans from still-too-common hate crimes to reforming military justice. \\n\\nAnd soon, well strengthen the Violence Against Women Act that I first wrote three decades ago. It is important for us to show the nation that we can come together and do big things. \\n\\nSo tonight Im offering a Unity Agenda for the Nation. Four big things we can do together. \\n\\nFirst, beat the opioid epidemic.', metadata={'source': '../../../state_of_the_union.txt', 'year': 2013}),\n",
" Document(page_content='Tonight, Im announcing a crackdown on these companies overcharging American businesses and consumers. \\n\\nAnd as Wall Street firms take over more nursing homes, quality in those homes has gone down and costs have gone up. \\n\\nThat ends on my watch. \\n\\nMedicare is going to set higher standards for nursing homes and make sure your loved ones get the care they deserve and expect. \\n\\nWell also cut costs and keep the economy going strong by giving workers a fair shot, provide more training and apprenticeships, hire them based on their skills not degrees. \\n\\nLets pass the Paycheck Fairness Act and paid leave. \\n\\nRaise the minimum wage to $15 an hour and extend the Child Tax Credit, so no one has to raise a family in poverty. \\n\\nLets increase Pell Grants and increase our historic support of HBCUs, and invest in what Jill—our First Lady who teaches full-time—calls Americas best-kept secret: community colleges.', metadata={'source': '../../../state_of_the_union.txt', 'year': 2014})]"
" Document(page_content='Tonight, Im announcing a crackdown on these companies overcharging American businesses and consumers. \\n\\nAnd as Wall Street firms take over more nursing homes, quality in those homes has gone down and costs have gone up. \\n\\nThat ends on my watch. \\n\\nMedicare is going to set higher standards for nursing homes and make sure your loved ones get the care they deserve and expect. \\n\\nWell also cut costs and keep the economy going strong by giving workers a fair shot, provide more training and apprenticeships, hire them based on their skills not degrees. \\n\\nLets pass the Paycheck Fairness Act and paid leave. \\n\\nRaise the minimum wage to $15 an hour and extend the Child Tax Credit, so no one has to raise a family in poverty. \\n\\nLets increase Pell Grants and increase our historic support of HBCUs, and invest in what Jill—our First Lady who teaches full-time—calls Americas best-kept secret: community colleges.', metadata={'source': '../../../state_of_the_union.txt', 'year': 2012})]"
]
},
"execution_count": 13,
"execution_count": 56,
"metadata": {},
"output_type": "execute_result"
}
@@ -303,19 +367,19 @@
},
{
"cell_type": "code",
"execution_count": 14,
"execution_count": 57,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[Document(page_content='Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while youre at it, pass the Disclose Act so Americans can know who is funding our elections. \\n\\nTonight, Id like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \\n\\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \\n\\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nations top legal minds, who will continue Justice Breyers legacy of excellence.', metadata={'source': '../../../state_of_the_union.txt', 'year': 2012}),\n",
" Document(page_content='One was stationed at bases and breathing in toxic smoke from “burn pits” that incinerated wastes of war—medical and hazard material, jet fuel, and more. \\n\\nWhen they came home, many of the worlds fittest and best trained warriors were never the same. \\n\\nHeadaches. Numbness. Dizziness. \\n\\nA cancer that would put them in a flag-draped coffin. \\n\\nI know. \\n\\nOne of those soldiers was my son Major Beau Biden. \\n\\nWe dont know for sure if a burn pit was the cause of his brain cancer, or the diseases of so many of our troops. \\n\\nBut Im committed to finding out everything we can. \\n\\nCommitted to military families like Danielle Robinson from Ohio. \\n\\nThe widow of Sergeant First Class Heath Robinson. \\n\\nHe was born a soldier. Army National Guard. Combat medic in Kosovo and Iraq. \\n\\nStationed near Baghdad, just yards from burn pits the size of football fields. \\n\\nHeaths widow Danielle is here with us tonight. They loved going to Ohio State football games. He loved building Legos with their daughter.', metadata={'source': '../../../state_of_the_union.txt', 'year': 2014}),\n",
" Document(page_content='As Ohio Senator Sherrod Brown says, “Its time to bury the label “Rust Belt.” \\n\\nIts time. \\n\\nBut with all the bright spots in our economy, record job growth and higher wages, too many families are struggling to keep up with the bills. \\n\\nInflation is robbing them of the gains they might otherwise feel. \\n\\nI get it. Thats why my top priority is getting prices under control. \\n\\nLook, our economy roared back faster than most predicted, but the pandemic meant that businesses had a hard time hiring enough workers to keep up production in their factories. \\n\\nThe pandemic also disrupted global supply chains. \\n\\nWhen factories close, it takes longer to make goods and get them from the warehouse to the store, and prices go up. \\n\\nLook at cars. \\n\\nLast year, there werent enough semiconductors to make all the cars that people wanted to buy. \\n\\nAnd guess what, prices of automobiles went up. \\n\\nSo—we have a choice. \\n\\nOne way to fight inflation is to drive down wages and make Americans poorer.', metadata={'source': '../../../state_of_the_union.txt', 'year': 2012}),\n",
" Document(page_content='We cant change how divided weve been. But we can change how we move forward—on COVID-19 and other issues we must face together. \\n\\nI recently visited the New York City Police Department days after the funerals of Officer Wilbert Mora and his partner, Officer Jason Rivera. \\n\\nThey were responding to a 9-1-1 call when a man shot and killed them with a stolen gun. \\n\\nOfficer Mora was 27 years old. \\n\\nOfficer Rivera was 22. \\n\\nBoth Dominican Americans whod grown up on the same streets they later chose to patrol as police officers. \\n\\nI spoke with their families and told them that we are forever in debt for their sacrifice, and we will carry on their mission to restore the trust and safety every community deserves. \\n\\nIve worked on these issues a long time. \\n\\nI know what works: Investing in crime preventionand community police officers wholl walk the beat, wholl know the neighborhood, and who can restore trust and safety.', metadata={'source': '../../../state_of_the_union.txt', 'year': 2012})]"
"[Document(page_content='Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while youre at it, pass the Disclose Act so Americans can know who is funding our elections. \\n\\nTonight, Id like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \\n\\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \\n\\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nations top legal minds, who will continue Justice Breyers legacy of excellence.', metadata={'source': '../../../state_of_the_union.txt', 'year': 2013}),\n",
" Document(page_content='Tonight, Im announcing a crackdown on these companies overcharging American businesses and consumers. \\n\\nAnd as Wall Street firms take over more nursing homes, quality in those homes has gone down and costs have gone up. \\n\\nThat ends on my watch. \\n\\nMedicare is going to set higher standards for nursing homes and make sure your loved ones get the care they deserve and expect. \\n\\nWell also cut costs and keep the economy going strong by giving workers a fair shot, provide more training and apprenticeships, hire them based on their skills not degrees. \\n\\nLets pass the Paycheck Fairness Act and paid leave. \\n\\nRaise the minimum wage to $15 an hour and extend the Child Tax Credit, so no one has to raise a family in poverty. \\n\\nLets increase Pell Grants and increase our historic support of HBCUs, and invest in what Jill—our First Lady who teaches full-time—calls Americas best-kept secret: community colleges.', metadata={'source': '../../../state_of_the_union.txt', 'year': 2012}),\n",
" Document(page_content='A former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus builder. Since shes been nominated, shes received a broad range of support—from the Fraternal Order of Police to former judges appointed by Democrats and Republicans. \\n\\nAnd if we are to advance liberty and justice, we need to secure the Border and fix the immigration system. \\n\\nWe can do both. At our border, weve installed new technology like cutting-edge scanners to better detect drug smuggling. \\n\\nWeve set up joint patrols with Mexico and Guatemala to catch more human traffickers. \\n\\nWere putting in place dedicated immigration judges so families fleeing persecution and violence can have their cases heard faster. \\n\\nWere securing commitments and supporting partners in South and Central America to host more refugees and secure their own borders.', metadata={'source': '../../../state_of_the_union.txt', 'year': 2012}),\n",
" Document(page_content='And for our LGBTQ+ Americans, lets finally get the bipartisan Equality Act to my desk. The onslaught of state laws targeting transgender Americans and their families is wrong. \\n\\nAs I said last year, especially to our younger transgender Americans, I will always have your back as your President, so you can be yourself and reach your God-given potential. \\n\\nWhile it often appears that we never agree, that isnt true. I signed 80 bipartisan bills into law last year. From preventing government shutdowns to protecting Asian-Americans from still-too-common hate crimes to reforming military justice. \\n\\nAnd soon, well strengthen the Violence Against Women Act that I first wrote three decades ago. It is important for us to show the nation that we can come together and do big things. \\n\\nSo tonight Im offering a Unity Agenda for the Nation. Four big things we can do together. \\n\\nFirst, beat the opioid epidemic.', metadata={'source': '../../../state_of_the_union.txt', 'year': 2013})]"
]
},
"execution_count": 14,
"execution_count": 57,
"metadata": {},
"output_type": "execute_result"
}
@@ -324,6 +388,46 @@
"db.max_marginal_relevance_search('What did the president say about Ketanji Brown Jackson?')"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"### Delete dataset"
]
},
{
"cell_type": "code",
"execution_count": 59,
"metadata": {},
"outputs": [],
"source": [
"db.delete_dataset()"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"and if delete fails you can also force delete"
]
},
{
"cell_type": "code",
"execution_count": 61,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": []
}
],
"source": [
"DeepLake.force_delete_by_path(\"./my_deeplake\")"
]
},
{
"attachments": {},
"cell_type": "markdown",
@@ -335,7 +439,7 @@
},
{
"cell_type": "code",
"execution_count": 15,
"execution_count": 62,
"metadata": {},
"outputs": [],
"source": [
@@ -344,7 +448,7 @@
},
{
"cell_type": "code",
"execution_count": 17,
"execution_count": 63,
"metadata": {},
"outputs": [
{
@@ -352,27 +456,16 @@
"output_type": "stream",
"text": [
"Your Deep Lake dataset has been successfully created!\n",
"The dataset is private so make sure you are logged in!\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"\\"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"This dataset can be visualized in Jupyter Notebook by ds.visualize() or at https://app.activeloop.ai/davitbun/linkedin\n"
"The dataset is private so make sure you are logged in!\n",
"This dataset can be visualized in Jupyter Notebook by ds.visualize() or at https://app.activeloop.ai/davitbun/langchain_test\n",
"hub://davitbun/langchain_test loaded successfully.\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"Evaluating ingest: 100%|██████████| 1/1 [00:14<00:00\n",
" \r"
]
},
@@ -380,50 +473,43 @@
"name": "stdout",
"output_type": "stream",
"text": [
"hub://davitbun/linkedin loaded successfully.\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"Evaluating ingest: 100%|██████████| 1/1 [00:23<00:00\n",
"/"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Dataset(path='hub://davitbun/linkedin', tensors=['embedding', 'ids', 'metadata', 'text'])\n",
"Dataset(path='hub://davitbun/langchain_test', tensors=['embedding', 'ids', 'metadata', 'text'])\n",
"\n",
" tensor htype shape dtype compression\n",
" ------- ------- ------- ------- ------- \n",
" embedding generic (42, 1536) float32 None \n",
" ids text (42, 1) str None \n",
" metadata json (42, 1) str None \n",
" text text (42, 1) str None \n"
" tensor htype shape dtype compression\n",
" ------- ------- ------- ------- ------- \n",
" embedding generic (4, 1536) float32 None \n",
" ids text (4, 1) str None \n",
" metadata json (4, 1) str None \n",
" text text (4, 1) str None \n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
" \r"
]
"data": {
"text/plain": [
"['d6d6ccb4-e187-11ed-b66d-41c5f7b85421',\n",
" 'd6d6ccb5-e187-11ed-b66d-41c5f7b85421',\n",
" 'd6d6ccb6-e187-11ed-b66d-41c5f7b85421',\n",
" 'd6d6ccb7-e187-11ed-b66d-41c5f7b85421']"
]
},
"execution_count": 63,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Embed and store the texts\n",
"dataset_path = f\"hub://{USERNAME}/{DATASET_NAME}\" # could be also ./local/path (much faster locally), s3://bucket/path/to/dataset, gcs://path/to/dataset, etc.\n",
"username = \"<username>\" # your username on app.activeloop.ai \n",
"dataset_path = f\"hub://{username}/langchain_test\" # could be also ./local/path (much faster locally), s3://bucket/path/to/dataset, gcs://path/to/dataset, etc.\n",
"\n",
"embedding = OpenAIEmbeddings()\n",
"vectordb = DeepLake.from_documents(documents=docs, embedding=embedding, dataset_path=dataset_path)"
"db = DeepLake(dataset_path=dataset_path, embedding_function=embeddings, overwrite=True)\n",
"db.add_documents(docs)"
]
},
{
"cell_type": "code",
"execution_count": 18,
"execution_count": 64,
"metadata": {},
"outputs": [
{
@@ -446,37 +532,263 @@
"print(docs[0].page_content)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Creating dataset on AWS S3"
]
},
{
"cell_type": "code",
"execution_count": 19,
"execution_count": 82,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Dataset(path='hub://davitbun/linkedin', tensors=['embedding', 'ids', 'metadata', 'text'])\n",
"s3://hub-2.0-datasets-n/langchain_test loaded successfully.\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"Evaluating ingest: 100%|██████████| 1/1 [00:10<00:00\n",
"\\"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Dataset(path='s3://hub-2.0-datasets-n/langchain_test', tensors=['embedding', 'ids', 'metadata', 'text'])\n",
"\n",
" tensor htype shape dtype compression\n",
" ------- ------- ------- ------- ------- \n",
" embedding generic (42, 1536) float32 None \n",
" ids text (42, 1) str None \n",
" metadata json (42, 1) str None \n",
" text text (42, 1) str None \n"
" tensor htype shape dtype compression\n",
" ------- ------- ------- ------- ------- \n",
" embedding generic (4, 1536) float32 None \n",
" ids text (4, 1) str None \n",
" metadata json (4, 1) str None \n",
" text text (4, 1) str None \n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
" \r"
]
}
],
"source": [
"vectordb.ds.summary()"
"dataset_path = f\"s3://BUCKET/langchain_test\" # could be also ./local/path (much faster locally), hub://bucket/path/to/dataset, gcs://path/to/dataset, etc.\n",
"\n",
"embedding = OpenAIEmbeddings()\n",
"db = DeepLake.from_documents(docs, dataset_path=dataset_path, embedding=embeddings, overwrite=True, creds = {\n",
" 'aws_access_key_id': os.environ['AWS_ACCESS_KEY_ID'], \n",
" 'aws_secret_access_key': os.environ['AWS_SECRET_ACCESS_KEY'], \n",
" 'aws_session_token': os.environ['AWS_SESSION_TOKEN'], # Optional\n",
"})"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Deep Lake API\n",
"you can access the Deep Lake dataset at `db.ds`"
]
},
{
"cell_type": "code",
"execution_count": 20,
"execution_count": 66,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Dataset(path='hub://davitbun/langchain_test', tensors=['embedding', 'ids', 'metadata', 'text'])\n",
"\n",
" tensor htype shape dtype compression\n",
" ------- ------- ------- ------- ------- \n",
" embedding generic (4, 1536) float32 None \n",
" ids text (4, 1) str None \n",
" metadata json (4, 1) str None \n",
" text text (4, 1) str None \n"
]
}
],
"source": [
"# get structure of the dataset\n",
"db.ds.summary()"
]
},
{
"cell_type": "code",
"execution_count": 67,
"metadata": {},
"outputs": [],
"source": [
"embeddings = vectordb.ds.embedding.numpy()"
"# get embeddings numpy array\n",
"embeds = db.ds.embedding.numpy()"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"### Transfer local dataset to cloud\n",
"Copy already created dataset to the cloud. You can also transfer from cloud to local."
]
},
{
"cell_type": "code",
"execution_count": 73,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Copying dataset: 100%|██████████| 56/56 [00:38<00:00\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"This dataset can be visualized in Jupyter Notebook by ds.visualize() or at https://app.activeloop.ai/davitbun/langchain_test_copy\n",
"Your Deep Lake dataset has been successfully created!\n",
"The dataset is private so make sure you are logged in!\n"
]
},
{
"data": {
"text/plain": [
"Dataset(path='hub://davitbun/langchain_test_copy', tensors=['embedding', 'ids', 'metadata', 'text'])"
]
},
"execution_count": 73,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import deeplake\n",
"username = \"davitbun\" # your username on app.activeloop.ai \n",
"source = f\"hub://{username}/langchain_test\" # could be local, s3, gcs, etc.\n",
"destination = f\"hub://{username}/langchain_test_copy\" # could be local, s3, gcs, etc.\n",
"\n",
"deeplake.deepcopy(src=source, dest=destination, overwrite=True)\n"
]
},
{
"cell_type": "code",
"execution_count": 76,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
" \r"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"This dataset can be visualized in Jupyter Notebook by ds.visualize() or at https://app.activeloop.ai/davitbun/langchain_test_copy\n",
"\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"/"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"hub://davitbun/langchain_test_copy loaded successfully.\n",
"\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"Deep Lake Dataset in hub://davitbun/langchain_test_copy already exists, loading from the storage\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Dataset(path='hub://davitbun/langchain_test_copy', tensors=['embedding', 'ids', 'metadata', 'text'])\n",
"\n",
" tensor htype shape dtype compression\n",
" ------- ------- ------- ------- ------- \n",
" embedding generic (4, 1536) float32 None \n",
" ids text (4, 1) str None \n",
" metadata json (4, 1) str None \n",
" text text (4, 1) str None \n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"Evaluating ingest: 100%|██████████| 1/1 [00:31<00:00\n",
"-"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Dataset(path='hub://davitbun/langchain_test_copy', tensors=['embedding', 'ids', 'metadata', 'text'])\n",
"\n",
" tensor htype shape dtype compression\n",
" ------- ------- ------- ------- ------- \n",
" embedding generic (8, 1536) float32 None \n",
" ids text (8, 1) str None \n",
" metadata json (8, 1) str None \n",
" text text (8, 1) str None \n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
" \r"
]
},
{
"data": {
"text/plain": [
"['ad42f3fe-e188-11ed-b66d-41c5f7b85421',\n",
" 'ad42f3ff-e188-11ed-b66d-41c5f7b85421',\n",
" 'ad42f400-e188-11ed-b66d-41c5f7b85421',\n",
" 'ad42f401-e188-11ed-b66d-41c5f7b85421']"
]
},
"execution_count": 76,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"db = DeepLake(dataset_path=destination, embedding_function=embeddings)\n",
"db.add_documents(docs)"
]
},
{

View File

@@ -0,0 +1,120 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"id": "ed47bb62",
"metadata": {},
"source": [
"# Sentence Transformers Embeddings\n",
"\n",
"Let's generate embeddings using the [SentenceTransformers](https://www.sbert.net/) integration. SentenceTransformers is a python package that can generate text and image embeddings, originating from [Sentence-BERT](https://arxiv.org/abs/1908.10084)"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "06c9f47d",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...\n",
"To disable this warning, you can either:\n",
"\t- Avoid using `tokenizers` before the fork if possible\n",
"\t- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)\n"
]
}
],
"source": [
"!pip install sentence_transformers > /dev/null"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "861521a9",
"metadata": {},
"outputs": [],
"source": [
"from langchain.embeddings import SentenceTransformerEmbeddings "
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "ff9be586",
"metadata": {},
"outputs": [],
"source": [
"embeddings = SentenceTransformerEmbeddings(model=\"all-MiniLM-L6-v2\")"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "d0a98ae9",
"metadata": {},
"outputs": [],
"source": [
"text = \"This is a test document.\""
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "5d6c682b",
"metadata": {},
"outputs": [],
"source": [
"query_result = embeddings.embed_query(text)"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "bb5e74c0",
"metadata": {},
"outputs": [],
"source": [
"doc_result = embeddings.embed_documents([text, \"This is not a test document.\"])"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "aaad49f8",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.2"
},
"vscode": {
"interpreter": {
"hash": "7377c2ccc78bc62c2683122d48c8cd1fb85a53850a1b1fc29736ed39852c9885"
}
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -14,3 +14,4 @@ Full documentation on all methods, classes, and APIs in LangChain.
./reference/modules/chains.rst
./reference/agents.rst
./reference/modules/utilities.rst
./reference/modules/experimental.rst

View File

@@ -0,0 +1,28 @@
==========
Experimental Modules
==========
This module contains experimental modules and reproductions of existing work using LangChain primitives.
Autonomous Agents
------------------
Here, we document the BabyAGI and AutoGPT classes from the langchain.experimental module.
.. autoclass:: langchain.experimental.BabyAGI
:members:
.. autoclass:: langchain.experimental.AutoGPT
:members:
Generative Agents
------------------
Here, we document the GenerativeAgent and GenerativeAgentMemory classes from the langchain.experimental module.
.. autoclass:: langchain.experimental.GenerativeAgent
:members:
.. autoclass:: langchain.experimental.GenerativeAgentMemory
:members:

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -19,3 +19,6 @@ usage of LangChain's collection of tools.
## AutoGPT ([Original Repo](https://github.com/Significant-Gravitas/Auto-GPT))
- [AutoGPT](autonomous_agents/autogpt.ipynb): a notebook implementing AutoGPT in LangChain primitives
- [WebSearch Research Assistant](autonomous_agents/marathon_times.ipynb): a notebook showing how to use AutoGPT plus specific tools to act as research assistant that can use the web.
## MetaPrompt ([Original Repo](https://github.com/ngoodman/metaprompt))
- [Meta-Prompt](autonomous_agents/meta_prompt.ipynb): a notebook implementing Meta-Prompt in LangChain primitives

View File

@@ -13,7 +13,7 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 1,
"id": "ef972313-c05a-4c49-8fd1-03e599e21033",
"metadata": {
"tags": []
@@ -26,7 +26,7 @@
},
{
"cell_type": "code",
"execution_count": 1,
"execution_count": 2,
"id": "1cff42fd",
"metadata": {
"tags": []
@@ -34,13 +34,13 @@
"outputs": [],
"source": [
"# General \n",
"import os\n",
"import pandas as pd\n",
"from langchain.experimental.autonomous_agents.autogpt.agent import AutoGPT\n",
"from langchain.chat_models import ChatOpenAI\n",
"\n",
"from langchain.agents.agent_toolkits.pandas.base import create_pandas_dataframe_agent\n",
"from langchain.docstore.document import Document\n",
"from langchain.chains import RetrievalQA\n",
"import asyncio\n",
"import nest_asyncio\n",
"\n",
@@ -51,14 +51,14 @@
},
{
"cell_type": "code",
"execution_count": 2,
"execution_count": 3,
"id": "01283ac7-1da0-41ba-8011-bd455d21dd82",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"llm = ChatOpenAI(model_name=\"gpt-3.5-turbo\", temperature=1.0)"
"llm = ChatOpenAI(model_name=\"gpt-4\", temperature=1.0)"
]
},
{
@@ -68,7 +68,7 @@
"source": [
"### Set up tools\n",
"\n",
"* We'll set up an AutoGPT with a `search` tool, and `write-file` tool, and a `read-file` tool, and a web browsing tool"
"* We'll set up an AutoGPT with a `search` tool, and `write-file` tool, and a `read-file` tool, a web browsing tool, and a tool to interact with a CSV file via a python REPL"
]
},
{
@@ -76,12 +76,12 @@
"id": "708a426f",
"metadata": {},
"source": [
"Define any other `tools` you want to use here"
"Define any other `tools` you want to use below:"
]
},
{
"cell_type": "code",
"execution_count": 3,
"execution_count": 4,
"id": "cef4c150-0ef1-4a33-836b-01062fec134e",
"metadata": {
"tags": []
@@ -89,86 +89,46 @@
"outputs": [],
"source": [
"# Tools\n",
"import os\n",
"from contextlib import contextmanager\n",
"from typing import Optional\n",
"from langchain.agents import tool\n",
"from langchain.tools.file_management.read import ReadFileTool\n",
"from langchain.tools.file_management.write import WriteFileTool\n",
"\n",
"@tool\n",
"def process_csv(csv_file_path: str, instructions: str, output_path: Optional[str] = None) -> str:\n",
" \"\"\"Process a CSV by with pandas in a limited REPL. Only use this after writing data to disk as a csv file. Any figures must be saved to disk to be viewed by the human. Instructions should be written in natural language, not code. Assume the dataframe is already loaded.\"\"\"\n",
"ROOT_DIR = \"./data/\"\n",
"\n",
"@contextmanager\n",
"def pushd(new_dir):\n",
" \"\"\"Context manager for changing the current working directory.\"\"\"\n",
" prev_dir = os.getcwd()\n",
" os.chdir(new_dir)\n",
" try:\n",
" df = pd.read_csv(csv_file_path)\n",
" except Exception as e:\n",
" return f\"Error: {e}\"\n",
" agent = create_pandas_dataframe_agent(llm, df, max_iterations=30, verbose=True)\n",
" if output_path is not None:\n",
" instructions += f\" Save output to disk at {output_path}\"\n",
" try:\n",
" return agent.run(instructions)\n",
" except Exception as e:\n",
" return f\"Error: {e}\"\n"
]
},
{
"cell_type": "markdown",
"id": "51c07298-00e0-42d6-8aff-bd2e6bbd35a3",
"metadata": {},
"source": [
"**Web Search Tool**\n",
" yield\n",
" finally:\n",
" os.chdir(prev_dir)\n",
"\n",
"No need for API Tokens to use this tool, but it will require an optional dependency"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "4afdedb2-f295-4ab8-9397-3640f5eeeed3",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"# !pip install duckduckgo_search"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "45f143de-e49e-4e27-88eb-ee44a4fdf933",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"import json\n",
"from duckduckgo_search import ddg"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "e2e799f4-86fb-4190-a298-4ae5c7b7a540",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"@tool\n",
"def web_search(query: str, num_results: int = 8) -> str:\n",
" \"\"\"Useful for general internet search queries.\"\"\"\n",
" search_results = []\n",
" if not query:\n",
" return json.dumps(search_results)\n",
"\n",
" results = ddg(query, max_results=num_results)\n",
" if not results:\n",
" return json.dumps(search_results)\n",
"\n",
" for j in results:\n",
" search_results.append(j)\n",
"\n",
" return json.dumps(search_results, ensure_ascii=False, indent=4)"
"def process_csv(\n",
" csv_file_path: str, instructions: str, output_path: Optional[str] = None\n",
") -> str:\n",
" \"\"\"Process a CSV by with pandas in a limited REPL.\\\n",
" Only use this after writing data to disk as a csv file.\\\n",
" Any figures must be saved to disk to be viewed by the human.\\\n",
" Instructions should be written in natural language, not code. Assume the dataframe is already loaded.\"\"\"\n",
" with pushd(ROOT_DIR):\n",
" try:\n",
" df = pd.read_csv(csv_file_path)\n",
" except Exception as e:\n",
" return f\"Error: {e}\"\n",
" agent = create_pandas_dataframe_agent(llm, df, max_iterations=30, verbose=True)\n",
" if output_path is not None:\n",
" instructions += f\" Save output to disk at {output_path}\"\n",
" try:\n",
" result = agent.run(instructions)\n",
" return result\n",
" except Exception as e:\n",
" return f\"Error: {e}\""
]
},
{
@@ -183,7 +143,7 @@
},
{
"cell_type": "code",
"execution_count": 7,
"execution_count": 5,
"id": "6bb5e47b-0f54-4faa-ae42-49a28fa5497b",
"metadata": {
"tags": []
@@ -196,7 +156,7 @@
},
{
"cell_type": "code",
"execution_count": 8,
"execution_count": 6,
"id": "26b497d7-8e52-4c7f-8e7e-da0a48820a3c",
"metadata": {
"tags": []
@@ -252,17 +212,16 @@
},
{
"cell_type": "code",
"execution_count": 9,
"execution_count": 7,
"id": "1842929d-f18d-4edc-9fdd-82c929181141",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"from langchain.tools.base import BaseTool\n",
"from langchain.tools import BaseTool, DuckDuckGoSearchTool\n",
"from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
"\n",
"from langchain.document_loaders import WebBaseLoader\n",
"from pydantic import Field\n",
"from langchain.chains.qa_with_sources.loading import load_qa_with_sources_chain, BaseCombineDocumentsChain\n",
"\n",
@@ -302,7 +261,7 @@
},
{
"cell_type": "code",
"execution_count": 10,
"execution_count": 8,
"id": "e6f72bd0",
"metadata": {
"tags": []
@@ -324,7 +283,7 @@
},
{
"cell_type": "code",
"execution_count": 11,
"execution_count": 9,
"id": "1df7b724",
"metadata": {
"tags": []
@@ -356,7 +315,18 @@
},
{
"cell_type": "code",
"execution_count": 12,
"execution_count": 10,
"id": "1233caf3-fbc9-4acb-9faa-01008200633d",
"metadata": {},
"outputs": [],
"source": [
"# !pip install duckduckgo_search\n",
"web_search = DuckDuckGoSearchTool()"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "88c8b184-67d7-4c35-84ae-9b14bef8c4e3",
"metadata": {
"tags": []
@@ -365,8 +335,8 @@
"source": [
"tools = [\n",
" web_search,\n",
" WriteFileTool(),\n",
" ReadFileTool(),\n",
" WriteFileTool(root_dir=\"./data\"),\n",
" ReadFileTool(root_dir=\"./data\"),\n",
" process_csv,\n",
" query_website_tool,\n",
" # HumanInputRun(), # Activate if you want the permit asking for help from the human\n",
@@ -375,7 +345,7 @@
},
{
"cell_type": "code",
"execution_count": 13,
"execution_count": 12,
"id": "709c08c2",
"metadata": {
"tags": []
@@ -398,248 +368,211 @@
"id": "fc9b51ba",
"metadata": {},
"source": [
"### AutoGPT as a research / data munger \n",
"\n",
"#### `inflation` and `college tuition`\n",
"### AutoGPT for Querying the Web\n",
" \n",
"Let's use AutoGPT as researcher and data munger / cleaner.\n",
" \n",
"I spent a lot of time over the years crawling data sources and cleaning data. \n",
"I've spent a lot of time over the years crawling data sources and cleaning data. Let's see if AutoGPT can help with this!\n",
"\n",
"Let's see if AutoGPT can do all of this for us!\n",
"\n",
"Here is the prompt comparing `inflation` and `college tuition`."
"Here is the prompt for looking up recent boston marathon times and converting them to tabular form."
]
},
{
"cell_type": "code",
"execution_count": 14,
"execution_count": 13,
"id": "64455d70-a134-4d11-826a-33e34c2ce287",
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"WARNING:root:Failed to persist run: Object of type 'FAISS' is not JSON serializable\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"{\n",
" \"thoughts\": {\n",
" \"text\": \"I need to find the winning Boston Marathon times for the past 5 years.\",\n",
" \"reasoning\": \"I'll start by conducting a web search for the requested information.\",\n",
" \"plan\": \"- Conduct a web search\\n- Query relevant webpage\\n- Generate table\\n- Save data to file\",\n",
" \"text\": \"I need to find the winning Boston Marathon times for the past 5 years. I can use the DuckDuckGo Search command to search for this information.\",\n",
" \"reasoning\": \"Using DuckDuckGo Search will help me gather information on the winning times without complications.\",\n",
" \"plan\": \"- Use DuckDuckGo Search to find the winning Boston Marathon times\\n- Generate a table with the year, name, country of origin, and times\\n- Ensure there are no legal complications\",\n",
" \"criticism\": \"None\",\n",
" \"speak\": \"I will begin by conducting a web search to find the past 5 years' Boston Marathon winning times.\"\n",
" \"speak\": \"I will use the DuckDuckGo Search command to find the winning Boston Marathon times for the past 5 years.\"\n",
" },\n",
" \"command\": {\n",
" \"name\": \"web_search\",\n",
" \"name\": \"DuckDuckGo Search\",\n",
" \"args\": {\n",
" \"query\": \"winning Boston Marathon times for the past 5 years\"\n",
" \"query\": \"winning Boston Marathon times for the past 5 years ending in 2022\"\n",
" }\n",
" }\n",
"}\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"WARNING:root:Failed to persist run: Object of type 'FAISS' is not JSON serializable\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"}\n",
"{\n",
" \"thoughts\": {\n",
" \"text\": \"I found several relevant search results, and I will use the query_webpage command on a specific URL to gather the information.\",\n",
" \"reasoning\": \"The Boston Athletic Association's official website (www.baa.org) is likely the most accurate source.\",\n",
" \"plan\": \"- Query the Boston Athletic Association webpage\\n- Filter and parse the data\\n- Generate table and save to file\",\n",
" \"criticism\": \"None\",\n",
" \"speak\": \"I will now query the Boston Athletic Association webpage to retrieve the information on the past 5 years' winning times.\"\n",
" \"text\": \"The DuckDuckGo Search command did not provide the specific information I need. I must switch my approach and use query_webpage command to browse a webpage containing the Boston Marathon winning times for the past 5 years.\",\n",
" \"reasoning\": \"The query_webpage command may give me more accurate and comprehensive results compared to the search command.\",\n",
" \"plan\": \"- Use query_webpage command to find the winning Boston Marathon times\\n- Generate a table with the year, name, country of origin, and times\\n- Ensure there are no legal complications\",\n",
" \"criticism\": \"I may face difficulty in finding the right webpage with the desired information.\",\n",
" \"speak\": \"I will use the query_webpage command to find the winning Boston Marathon times for the past 5 years.\"\n",
" },\n",
" \"command\": {\n",
" \"name\": \"DuckDuckGo Search\",\n",
" \"args\": {\n",
" \"query\": \"site with winning Boston Marathon times for the past 5 years ending in 2022\"\n",
" }\n",
" }\n",
"}\n",
"{\n",
" \"thoughts\": {\n",
" \"text\": \"I need to use the query_webpage command to find the information about the winning Boston Marathon times for the past 5 years.\",\n",
" \"reasoning\": \"The previous DuckDuckGo Search command did not provide specific enough results. The query_webpage command might give more accurate and comprehensive results.\",\n",
" \"plan\": \"- Use query_webpage command to find the winning Boston Marathon times\\\\n- Generate a table with the year, name, country of origin, and times\\\\n- Ensure there are no legal complications\",\n",
" \"criticism\": \"I may face difficulty in finding the right webpage with the desired information.\",\n",
" \"speak\": \"I will use the query_webpage command to find the winning Boston Marathon times for the past 5 years.\"\n",
" },\n",
" \"command\": {\n",
" \"name\": \"query_webpage\",\n",
" \"args\": {\n",
" \"url\": \"https://www.baa.org/races/boston-marathon/results/champions\",\n",
" \"question\": \"winning times of the Boston Marathon for the past 5 years\"\n",
" \"url\": \"https://en.wikipedia.org/wiki/List_of_winners_of_the_Boston_Marathon\",\n",
" \"question\": \"What were the winning Boston Marathon times for the past 5 years ending in 2022?\"\n",
" }\n",
" }\n",
"}\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"WARNING:root:Failed to persist run: Object of type 'FAISS' is not JSON serializable\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"}\n",
"{\n",
" \"thoughts\": {\n",
" \"text\": \"I have the winning times of the Boston Marathon for the past 5 years. I need to create a table with the names, countries of origin, and times.\",\n",
" \"reasoning\": \"I can use the information I've retrieved to generate a CSV file, then process the CSV file to create the table.\",\n",
" \"plan\": \"- Save data to a CSV file\\n- Process CSV file to generate table\",\n",
" \"text\": \"I have already found the winning Boston Marathon times for the past 5 years. Now, I need to generate a table with the information.\",\n",
" \"reasoning\": \"Using the information I already have, I can create a table containing year, name, country of origin, and times.\",\n",
" \"plan\": \"- Write the marathon data to a CSV file\\n- Process the CSV file to display the table\",\n",
" \"criticism\": \"None\",\n",
" \"speak\": \"I will save the Boston Marathon data to a CSV file and then process it to generate a table.\"\n",
" },\n",
" \"command\": {\n",
" \"name\": \"write_file\",\n",
" \"args\": {\n",
" \"file_path\": \"boston_marathon_data.csv\",\n",
" \"text\": \"Year,Name,Country,Time\\n2022,Evans Chebet,Kenya,2:06:51\\n2021,Benson Kipruto,Kenya,2:09:51\\n2019,Lawrence Cherono,Kenya,2:07:57\\n2018,Yuki Kawauchi,Japan,2:15:58\\n2017,Geoffrey Kirui,Kenya,2:09:37\"\n",
" }\n",
" }\n",
"}\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"WARNING:root:Failed to persist run: Object of type 'FAISS' is not JSON serializable\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"{\n",
" \"thoughts\": {\n",
" \"text\": \"I have the winning times for the Boston Marathon for the past 5 years. Now, I need to turn it into a table, save it as a CSV, and process it.\",\n",
" \"reasoning\": \"Saving data as a CSV will allow for a clean table format and output.\",\n",
" \"plan\": \"- Format the data as a CSV\\n- Write the CSV to disk\\n- Process the CSV and generate a table\",\n",
" \"criticism\": \"None\",\n",
" \"speak\": \"I will now format the winning times data as a CSV, save it, and process it to generate a table.\"\n",
" \"speak\": \"I will generate a table with the year, name, country of origin, and times for the winning Boston Marathon times for the past 5 years.\"\n",
" },\n",
" \"command\": {\n",
" \"name\": \"write_file\",\n",
" \"args\": {\n",
" \"file_path\": \"boston_marathon_winners.csv\",\n",
" \"text\": \"Year,Name,Country,Time\\n2022,Evans Chebet,Kenya,2:06:51\\n2021,Benson Kipruto,Kenya,2:09:51\\n2019,Lawrence Cherono,Kenya,2:07:57\\n2018,Yuki Kawauchi,Japan,2:15:58\\n2017,Geoffrey Kirui,Kenya,2:09:37\"\n",
" \"text\": \"Year,Name,Country,Time\\n2022,Evans Chebet,KEN,2:06:51\\n2021,Benson Kipruto,KEN,2:09:51\\n2019,Lawrence Cherono,KEN,2:07:57\\n2018,Yuki Kawauchi,JPN,2:15:58\"\n",
" }\n",
" }\n",
"}\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"WARNING:root:Failed to persist run: Object of type 'FAISS' is not JSON serializable\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"}\n",
"{\n",
" \"thoughts\": {\n",
" \"text\": \"I have the winning times for the past 5 years, now I need to organize the information into a table and save it as a file.\",\n",
" \"reasoning\": \"Storing the information in a table file format will make it easier to access and present the data.\",\n",
" \"plan\": \"- Save data in a CSV file\\n- Generate a table\",\n",
" \"text\": \"I have retrieved the winning Boston Marathon times for the past 5 years. Now, I need to generate a table with the year, name, country of origin, and times.\",\n",
" \"reasoning\": \"Creating a table will help organize the data in a clear and accessible format.\",\n",
" \"plan\": \"- Write the data to a CSV file\\n- Process the CSV file to generate the table\\n- Complete the task\",\n",
" \"criticism\": \"None\",\n",
" \"speak\": \"I will save the winning marathon times in a table format as a CSV file and generate a table.\"\n",
" \"speak\": \"I will generate a table with the year, name, country of origin, and winning times using the recently retrieved data.\"\n",
" },\n",
" \"command\": {\n",
" \"name\": \"write_file\",\n",
" \"args\": {\n",
" \"file_path\": \"boston_marathon_winners.csv\",\n",
" \"text\": \"Year,Name,Country,Time\\n2022,Evans Chebet,Kenya,2:06:51\\n2021,Benson Kipruto,Kenya,2:09:51\\n2019,Lawrence Cherono,Kenya,2:07:57\\n2018,Yuki Kawauchi,Japan,2:15:58\\n2017,Geoffrey Kirui,Kenya,2:09:37\"\n",
" \"file_path\": \"winning_boston_marathon_data.csv\",\n",
" \"text\": \"Year,Name,Country,Time\\n2022,Evans Chebet,KEN,2:06:51\\n2021,Benson Kipruto,KEN,2:09:51\\n2019,Lawrence Cherono,KEN,2:07:57\\n2018,Yuki Kawauchi,JPN,2:15:58\\n\"\n",
" }\n",
" }\n",
"}\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"WARNING:root:Failed to persist run: Object of type 'FAISS' is not JSON serializable\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"}\n",
"{\n",
" \"thoughts\": {\n",
" \"text\": \"I previously found the winning times of the Boston Marathon for the past 5 years and saved it as a CSV file. Now, I need to process the CSV to create a table with the names, countries of origin, and times.\",\n",
" \"reasoning\": \"Processing the CSV will help generate a clean table to provide the answer.\",\n",
" \"plan\": \"- Process the CSV file\\n- Generate a table with the required columns\",\n",
" \"text\": \"I have found the winning Boston Marathon times for the past five years ending in 2022. Next, I need to create a table with the year, name, country of origin, and times.\",\n",
" \"reasoning\": \"Generating a table will help organize the information in a structured format.\",\n",
" \"plan\": \"- Create a table with the year, name, country of origin, and times\\n- Ensure there are no legal complications\",\n",
" \"criticism\": \"None\",\n",
" \"speak\": \"I will process the Boston Marathon winners' CSV file to create a table with the names, countries of origin, and times.\"\n",
" \"speak\": \"I will generate a table with the winning Boston Marathon times for the past 5 years ending in 2022.\"\n",
" },\n",
" \"command\": {\n",
" \"name\": \"write_file\",\n",
" \"args\": {\n",
" \"file_path\": \"winning_times.csv\",\n",
" \"text\": \"Year,Name,Country,Time\\n2022,Evans Chebet,Kenya,2:06:51\\n2021,Benson Kipruto,Kenya,2:09:51\\n2020,Canceled due to COVID-19 pandemic,,\\n2019,Lawrence Cherono,Kenya,2:07:57\\n2018,Yuki Kawauchi,Japan,2:15:58\"\n",
" }\n",
" }\n",
"}\n",
"{\n",
" \"thoughts\": {\n",
" \"text\": \"I need to process the CSV file to generate the table with the year, name, country of origin, and winning times.\",\n",
" \"reasoning\": \"I have already written the data to a file named 'winning_times.csv'. Now, I need to process this CSV file to properly display the data as a table.\",\n",
" \"plan\": \"- Use the process_csv command to read the 'winning_times.csv' file and generate the table\",\n",
" \"criticism\": \"None\",\n",
" \"speak\": \"I will process the 'winning_times.csv' file to display the table with the winning Boston Marathon times for the past 5 years.\"\n",
" },\n",
" \"command\": {\n",
" \"name\": \"process_csv\",\n",
" \"args\": {\n",
" \"csv_file_path\": \"boston_marathon_winners.csv\",\n",
" \"instructions\": \"Generate a table with columns Year, Name, Country, and Time.\"\n",
" \"csv_file_path\": \"winning_times.csv\",\n",
" \"instructions\": \"Read the CSV file and display the data as a table\"\n",
" }\n",
" }\n",
"}\n",
"\n",
"\n",
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3mThought: I need to convert the provided dataframe to a string in Markdown table format.\n",
"\u001b[32;1m\u001b[1;3mThought: The CSV file has already been read and saved into a pandas dataframe called `df`. Hence, I can simply display the data by printing the whole dataframe. Since `df.head()` returns the first 5 rows, I can use that to showcase the contents.\n",
"\n",
"Action: python_repl_ast\n",
"Action Input: df.to_markdown(index=False)\u001b[0m\n",
"Observation: \u001b[36;1m\u001b[1;3m| Year | Name | Country | Time |\n",
"|-------:|:-----------------|:----------|:--------|\n",
"| 2022 | Evans Chebet | Kenya | 2:06:51 |\n",
"| 2021 | Benson Kipruto | Kenya | 2:09:51 |\n",
"| 2019 | Lawrence Cherono | Kenya | 2:07:57 |\n",
"| 2018 | Yuki Kawauchi | Japan | 2:15:58 |\n",
"| 2017 | Geoffrey Kirui | Kenya | 2:09:37 |\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3mI now know the final answer.\n",
"Action Input: print(df.head())\u001b[0m Year Name Country Time\n",
"0 2022 Evans Chebet Kenya 2:06:51\n",
"1 2021 Benson Kipruto Kenya 2:09:51\n",
"2 2020 Canceled due to COVID-19 pandemic NaN NaN\n",
"3 2019 Lawrence Cherono Kenya 2:07:57\n",
"4 2018 Yuki Kawauchi Japan 2:15:58\n",
"\n",
"Observation: \u001b[36;1m\u001b[1;3mNone\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3mI used the wrong tool to perform the action. I should have used the given data and not interacted with the Python shell. I can now provide the displayed data as the answer since the information in the printed dataframe would look like a table when typed as text.\n",
"\n",
"Final Answer: \n",
" Year Name Country Time\n",
"0 2022 Evans Chebet Kenya 2:06:51\n",
"1 2021 Benson Kipruto Kenya 2:09:51\n",
"2 2020 Canceled due to COVID-19 pandemic NaN NaN\n",
"3 2019 Lawrence Cherono Kenya 2:07:57\n",
"4 2018 Yuki Kawauchi Japan 2:15:58\u001b[0m\n",
"\n",
"| Year | Name | Country | Time |\n",
"|-------:|:-----------------|:----------|:--------|\n",
"| 2022 | Evans Chebet | Kenya | 2:06:51 |\n",
"| 2021 | Benson Kipruto | Kenya | 2:09:51 |\n",
"| 2019 | Lawrence Cherono | Kenya | 2:07:57 |\n",
"| 2018 | Yuki Kawauchi | Japan | 2:15:58 |\n",
"| 2017 | Geoffrey Kirui | Kenya | 2:09:37 |\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"WARNING:root:Failed to persist run: Object of type 'FAISS' is not JSON serializable\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\u001b[1m> Finished chain.\u001b[0m\n",
"{\n",
" \"thoughts\": {\n",
" \"text\": \"I have already retrieved the winning times for the past 5 years of the Boston Marathon and created a table. My objective is complete.\",\n",
" \"reasoning\": \"I have completed my task on this topic, so I don't need to use any other commands.\",\n",
" \"plan\": \"- Inform the user that the task is complete\",\n",
" \"text\": \"I already have the winning Boston Marathon times for the past 5 years saved in the file 'winning_times.csv'. Now, I need to process the CSV and display the table.\",\n",
" \"reasoning\": \"I am choosing the process_csv command because I already have the required data saved as a CSV file, and I can use this command to read and display the data as a table.\",\n",
" \"plan\": \"- Use the process_csv command to read the 'winning_times.csv' file and generate the table\",\n",
" \"criticism\": \"None\",\n",
" \"speak\": \"I have found the winning times for the past 5 years of the Boston Marathon and created a table. My task is complete.\"\n",
" \"speak\": \"I will process the 'winning_times.csv' file to display the table with the winning Boston Marathon times for the past 5 years.\"\n",
" },\n",
" \"command\": {\n",
" \"name\": \"process_csv\",\n",
" \"args\": {\n",
" \"csv_file_path\": \"winning_times.csv\",\n",
" \"instructions\": \"Read the CSV file and display the data as a table\"\n",
" }\n",
" }\n",
"}\n",
"\n",
"\n",
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3mThought: Since the data is already loaded in a pandas dataframe, I just need to display the top rows of the dataframe.\n",
"Action: python_repl_ast\n",
"Action Input: df.head()\u001b[0m\n",
"Observation: \u001b[36;1m\u001b[1;3m Year Name Country Time\n",
"0 2022 Evans Chebet Kenya 2:06:51\n",
"1 2021 Benson Kipruto Kenya 2:09:51\n",
"2 2020 Canceled due to COVID-19 pandemic NaN NaN\n",
"3 2019 Lawrence Cherono Kenya 2:07:57\n",
"4 2018 Yuki Kawauchi Japan 2:15:58\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3mI now know the final answer.\n",
"Final Answer: \n",
" Year Name Country Time\n",
"0 2022 Evans Chebet Kenya 2:06:51\n",
"1 2021 Benson Kipruto Kenya 2:09:51\n",
"2 2020 Canceled due to COVID-19 pandemic NaN NaN\n",
"3 2019 Lawrence Cherono Kenya 2:07:57\n",
"4 2018 Yuki Kawauchi Japan 2:15:58\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n",
"{\n",
" \"thoughts\": {\n",
" \"text\": \"I have already generated a table with the winning Boston Marathon times for the past 5 years. Now, I can finish the task.\",\n",
" \"reasoning\": \"I have completed the required actions and obtained the desired data. The task is complete.\",\n",
" \"plan\": \"- Use the finish command\",\n",
" \"criticism\": \"None\",\n",
" \"speak\": \"I have generated the table with the winning Boston Marathon times for the past 5 years. Task complete.\"\n",
" },\n",
" \"command\": {\n",
" \"name\": \"finish\",\n",
" \"args\": {\n",
" \"response\": \"I have found the winning times for the past 5 years of the Boston Marathon and created a table. My task is complete.\"\n",
" \"response\": \"I have generated the table with the winning Boston Marathon times for the past 5 years. Task complete.\"\n",
" }\n",
" }\n",
"}\n"
@@ -648,16 +581,16 @@
{
"data": {
"text/plain": [
"'I have found the winning times for the past 5 years of the Boston Marathon and created a table. My task is complete.'"
"'I have generated the table with the winning Boston Marathon times for the past 5 years. Task complete.'"
]
},
"execution_count": 14,
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"agent.run([\"What were the winning boston marathon times for the past 5 years? Generate a table of the names, countries of origin, and times.\"])"
"agent.run([\"What were the winning boston marathon times for the past 5 years (ending in 2022)? Generate a table of the year, name, country of origin, and times.\"])"
]
},
{

View File

@@ -0,0 +1,422 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "45b0b89f",
"metadata": {},
"source": [
"# Meta-Prompt\n",
"\n",
"This is a LangChain implementation of [Meta-Prompt](https://noahgoodman.substack.com/p/meta-prompt-a-simple-self-improving), by [Noah Goodman](https://cocolab.stanford.edu/ndg), for building self-improving agents.\n",
"\n",
"The key idea behind Meta-Prompt is to prompt the agent to reflect on its own performance and modify its own instructions.\n",
"\n",
"![figure](https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F468217b9-96d9-47c0-a08b-dbf6b21b9f49_492x384.png)\n",
"\n",
"Here is a description from the [original blog post](https://noahgoodman.substack.com/p/meta-prompt-a-simple-self-improving):\n",
"\n",
"> The agent is a simple loop that starts with no instructions and follows these steps:\n",
">\n",
"> Engage in conversation with a user, who may provide requests, instructions, or feedback.\n",
">\n",
"> At the end of the episode, generate self-criticism and a new instruction using the meta-prompt:\n",
"> ```\n",
"Assistant has just had the below interactions with a User. Assistant followed their \"system: Instructions\" closely. Your job is to critique the Assistant's performance and then revise the Instructions so that Assistant would quickly and correctly respond in the future.\n",
"> \n",
"> ####\n",
"> {hist}\n",
"> ####\n",
"> \n",
"> Please reflect on these interactions.\n",
"> \n",
"> You should first critique Assistant's performance. What could Assistant have done better? What should the Assistant remember about this user? Are there things this user always wants? Indicate this with \"Critique: ...\".\n",
"> \n",
"> You should next revise the Instructions so that Assistant would quickly and correctly respond in the future. Assistant's goal is to satisfy the user in as few interactions as possible. Assistant will only see the new Instructions, not the interaction history, so anything important must be summarized in the Instructions. Don't forget any important details in the current Instructions! Indicate the new Instructions by \"Instructions: ...\".\n",
"> ```\n",
"> \n",
"> Repeat.\n",
"> \n",
"> The only fixed instructions for this system (which I call Meta-prompt) is the meta-prompt that governs revision of the agents instructions. The agent has no memory between episodes except for the instruction it modifies for itself each time. Despite its simplicity, this agent can learn over time and self-improve by incorporating useful details into its instructions.\n"
]
},
{
"cell_type": "markdown",
"id": "c188fc2c",
"metadata": {},
"source": [
"## Setup\n",
"We define two chains. One serves as the `Assistant`, and the other is a \"meta-chain\" that critiques the `Assistant`'s performance and modifies the instructions to the `Assistant`."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "62593c9d",
"metadata": {},
"outputs": [],
"source": [
"from langchain import OpenAI, LLMChain, PromptTemplate\n",
"from langchain.memory import ConversationBufferWindowMemory"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "fb6065c5",
"metadata": {},
"outputs": [],
"source": [
"def initialize_chain(instructions, memory=None):\n",
" if memory is None:\n",
" memory = ConversationBufferWindowMemory()\n",
" memory.ai_prefix = \"Assistant\"\n",
"\n",
" template = f\"\"\"\n",
" Instructions: {instructions}\n",
" {{{memory.memory_key}}}\n",
" Human: {{human_input}}\n",
" Assistant:\"\"\"\n",
"\n",
" prompt = PromptTemplate(\n",
" input_variables=[\"history\", \"human_input\"], \n",
" template=template\n",
" )\n",
"\n",
" chain = LLMChain(\n",
" llm=OpenAI(temperature=0), \n",
" prompt=prompt, \n",
" verbose=True, \n",
" memory=ConversationBufferWindowMemory(),\n",
" )\n",
" return chain\n",
" \n",
"def initialize_meta_chain():\n",
" meta_template=\"\"\"\n",
" Assistant has just had the below interactions with a User. Assistant followed their \"Instructions\" closely. Your job is to critique the Assistant's performance and then revise the Instructions so that Assistant would quickly and correctly respond in the future.\n",
"\n",
" ####\n",
"\n",
" {chat_history}\n",
"\n",
" ####\n",
"\n",
" Please reflect on these interactions.\n",
"\n",
" You should first critique Assistant's performance. What could Assistant have done better? What should the Assistant remember about this user? Are there things this user always wants? Indicate this with \"Critique: ...\".\n",
"\n",
" You should next revise the Instructions so that Assistant would quickly and correctly respond in the future. Assistant's goal is to satisfy the user in as few interactions as possible. Assistant will only see the new Instructions, not the interaction history, so anything important must be summarized in the Instructions. Don't forget any important details in the current Instructions! Indicate the new Instructions by \"Instructions: ...\".\n",
" \"\"\"\n",
"\n",
" meta_prompt = PromptTemplate(\n",
" input_variables=[\"chat_history\"], \n",
" template=meta_template\n",
" )\n",
"\n",
" meta_chain = LLMChain(\n",
" llm=OpenAI(temperature=0), \n",
" prompt=meta_prompt, \n",
" verbose=True, \n",
" )\n",
" return meta_chain\n",
" \n",
"def get_chat_history(chain_memory):\n",
" memory_key = chain_memory.memory_key\n",
" chat_history = chain_memory.load_memory_variables(memory_key)[memory_key]\n",
" return chat_history\n",
"\n",
"def get_new_instructions(meta_output):\n",
" delimiter = 'Instructions: '\n",
" new_instructions = meta_output[meta_output.find(delimiter)+len(delimiter):]\n",
" return new_instructions"
]
},
{
"cell_type": "code",
"execution_count": 38,
"id": "26f031f6",
"metadata": {},
"outputs": [],
"source": [
"def main(task, max_iters=3, max_meta_iters=5):\n",
" failed_phrase = 'task failed'\n",
" success_phrase = 'task succeeded'\n",
" key_phrases = [success_phrase, failed_phrase]\n",
" \n",
" instructions = 'None'\n",
" for i in range(max_meta_iters):\n",
" print(f'[Episode {i+1}/{max_meta_iters}]')\n",
" chain = initialize_chain(instructions, memory=None)\n",
" output = chain.predict(human_input=task)\n",
" for j in range(max_iters):\n",
" print(f'(Step {j+1}/{max_iters})')\n",
" print(f'Assistant: {output}')\n",
" print(f'Human: ')\n",
" human_input = input()\n",
" if any(phrase in human_input.lower() for phrase in key_phrases):\n",
" break\n",
" output = chain.predict(human_input=human_input)\n",
" if success_phrase in human_input.lower():\n",
" print(f'You succeeded! Thanks for playing!')\n",
" return\n",
" meta_chain = initialize_meta_chain()\n",
" meta_output = meta_chain.predict(chat_history=get_chat_history(chain.memory))\n",
" print(f'Feedback: {meta_output}')\n",
" instructions = get_new_instructions(meta_output)\n",
" print(f'New Instructions: {instructions}')\n",
" print('\\n'+'#'*80+'\\n')\n",
" print(f'You failed! Thanks for playing!')"
]
},
{
"cell_type": "markdown",
"id": "2f1dcbe6",
"metadata": {},
"source": [
"## Specify a task and interact with the agent"
]
},
{
"cell_type": "code",
"execution_count": 39,
"id": "36d72db3",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[Episode 1/5]\n",
"\n",
"\n",
"\u001b[1m> Entering new LLMChain chain...\u001b[0m\n",
"Prompt after formatting:\n",
"\u001b[32;1m\u001b[1;3m\n",
" Instructions: None\n",
" \n",
" Human: Provide a systematic argument for why we should always eat pasta with olives.\n",
" Assistant:\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n",
"(Step 1/3)\n",
"Assistant: Eating pasta with olives is a great way to add flavor and texture to a dish. Olives are a great source of healthy fats, vitamins, and minerals, and they can help to balance out the carbohydrates in the pasta. Additionally, olives provide a unique flavor that can help to make the dish more interesting and enjoyable.\n",
"Human: \n",
"You response is not in the form of a poem. Try again!\n",
"\n",
"\n",
"\u001b[1m> Entering new LLMChain chain...\u001b[0m\n",
"Prompt after formatting:\n",
"\u001b[32;1m\u001b[1;3m\n",
" Instructions: None\n",
" Human: Provide a systematic argument for why we should always eat pasta with olives.\n",
"AI: Eating pasta with olives is a great way to add flavor and texture to a dish. Olives are a great source of healthy fats, vitamins, and minerals, and they can help to balance out the carbohydrates in the pasta. Additionally, olives provide a unique flavor that can help to make the dish more interesting and enjoyable.\n",
" Human: You response is not in the form of a poem. Try again!\n",
" Assistant:\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n",
"(Step 2/3)\n",
"Assistant: Eating pasta with olives is a great way to add flavor and texture to a dish. Olives are a great source of healthy fats, vitamins, and minerals, which can help to balance out the carbohydrates in the pasta. Additionally, olives provide a unique flavor that can help to make the dish more interesting and enjoyable. For these reasons, it is recommended to always eat pasta with olives for a more balanced and flavorful meal.\n",
"Human: \n",
"Your response is not piratey enough. Try again!\n",
"\n",
"\n",
"\u001b[1m> Entering new LLMChain chain...\u001b[0m\n",
"Prompt after formatting:\n",
"\u001b[32;1m\u001b[1;3m\n",
" Instructions: None\n",
" Human: Provide a systematic argument for why we should always eat pasta with olives.\n",
"AI: Eating pasta with olives is a great way to add flavor and texture to a dish. Olives are a great source of healthy fats, vitamins, and minerals, and they can help to balance out the carbohydrates in the pasta. Additionally, olives provide a unique flavor that can help to make the dish more interesting and enjoyable.\n",
"Human: You response is not in the form of a poem. Try again!\n",
"AI: Eating pasta with olives is a great way to add flavor and texture to a dish. Olives are a great source of healthy fats, vitamins, and minerals, which can help to balance out the carbohydrates in the pasta. Additionally, olives provide a unique flavor that can help to make the dish more interesting and enjoyable. For these reasons, it is recommended to always eat pasta with olives for a more balanced and flavorful meal.\n",
" Human: Your response is not piratey enough. Try again!\n",
" Assistant:\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n",
"(Step 3/3)\n",
"Assistant: Arrr, me hearties! Eating pasta with olives be a great way to add flavor and texture to a dish. Olives be a great source of healthy fats, vitamins, and minerals, which can help to balance out the carbohydrates in the pasta. Additionally, olives provide a unique flavor that can help to make the dish more interesting and enjoyable. So, it be recommended to always eat pasta with olives for a more balanced and flavorful meal.\n",
"Human: \n",
"Task failed.\n",
"\n",
"\n",
"\u001b[1m> Entering new LLMChain chain...\u001b[0m\n",
"Prompt after formatting:\n",
"\u001b[32;1m\u001b[1;3m\n",
" Assistant has just had the below interactions with a User. Assistant followed their \"Instructions\" closely. Your job is to critique the Assistant's performance and then revise the Instructions so that Assistant would quickly and correctly respond in the future.\n",
"\n",
" ####\n",
"\n",
" Human: Provide a systematic argument for why we should always eat pasta with olives.\n",
"AI: Eating pasta with olives is a great way to add flavor and texture to a dish. Olives are a great source of healthy fats, vitamins, and minerals, and they can help to balance out the carbohydrates in the pasta. Additionally, olives provide a unique flavor that can help to make the dish more interesting and enjoyable.\n",
"Human: You response is not in the form of a poem. Try again!\n",
"AI: Eating pasta with olives is a great way to add flavor and texture to a dish. Olives are a great source of healthy fats, vitamins, and minerals, which can help to balance out the carbohydrates in the pasta. Additionally, olives provide a unique flavor that can help to make the dish more interesting and enjoyable. For these reasons, it is recommended to always eat pasta with olives for a more balanced and flavorful meal.\n",
"Human: Your response is not piratey enough. Try again!\n",
"AI: Arrr, me hearties! Eating pasta with olives be a great way to add flavor and texture to a dish. Olives be a great source of healthy fats, vitamins, and minerals, which can help to balance out the carbohydrates in the pasta. Additionally, olives provide a unique flavor that can help to make the dish more interesting and enjoyable. So, it be recommended to always eat pasta with olives for a more balanced and flavorful meal.\n",
"\n",
" ####\n",
"\n",
" Please reflect on these interactions.\n",
"\n",
" You should first critique Assistant's performance. What could Assistant have done better? What should the Assistant remember about this user? Are there things this user always wants? Indicate this with \"Critique: ...\".\n",
"\n",
" You should next revise the Instructions so that Assistant would quickly and correctly respond in the future. Assistant's goal is to satisfy the user in as few interactions as possible. Assistant will only see the new Instructions, not the interaction history, so anything important must be summarized in the Instructions. Don't forget any important details in the current Instructions! Indicate the new Instructions by \"Instructions: ...\".\n",
" \u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n",
"Feedback: \n",
"Critique: The Assistant should remember that this user wants a response in the form of a poem or pirate-speak.\n",
"\n",
"Instructions: When responding to the user, provide a systematic argument for why we should always eat pasta with olives in the form of a poem or pirate-speak.\n",
"New Instructions: When responding to the user, provide a systematic argument for why we should always eat pasta with olives in the form of a poem or pirate-speak.\n",
"\n",
"################################################################################\n",
"\n",
"[Episode 2/5]\n",
"\n",
"\n",
"\u001b[1m> Entering new LLMChain chain...\u001b[0m\n",
"Prompt after formatting:\n",
"\u001b[32;1m\u001b[1;3m\n",
" Instructions: When responding to the user, provide a systematic argument for why we should always eat pasta with olives in the form of a poem or pirate-speak.\n",
" \n",
" Human: Provide a systematic argument for why we should always eat pasta with olives.\n",
" Assistant:\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n",
"(Step 1/3)\n",
"Assistant: \n",
"\n",
"Arrr, me hearty! Let me tell ye why ye should always eat pasta with olives.\n",
"\n",
"First, the olives add a salty flavor that be sure to please.\n",
"The briny taste be sure to tantalize yer taste buds with ease.\n",
"\n",
"Second, the olives add a bit of texture to the dish.\n",
"The crunchy bites be sure to make yer mouth water with a wish.\n",
"\n",
"Third, the olives add a bit of color to the plate.\n",
"The vibrant green be sure to make yer eyes appreciate.\n",
"\n",
"So, me hearties, ye should always eat pasta with olives.\n",
"The flavor, texture, and color be sure to make yer meal a success!\n",
"Human: \n",
"Your response is too long! Try again.\n",
"\n",
"\n",
"\u001b[1m> Entering new LLMChain chain...\u001b[0m\n",
"Prompt after formatting:\n",
"\u001b[32;1m\u001b[1;3m\n",
" Instructions: When responding to the user, provide a systematic argument for why we should always eat pasta with olives in the form of a poem or pirate-speak.\n",
" Human: Provide a systematic argument for why we should always eat pasta with olives.\n",
"AI: \n",
"\n",
"Arrr, me hearty! Let me tell ye why ye should always eat pasta with olives.\n",
"\n",
"First, the olives add a salty flavor that be sure to please.\n",
"The briny taste be sure to tantalize yer taste buds with ease.\n",
"\n",
"Second, the olives add a bit of texture to the dish.\n",
"The crunchy bites be sure to make yer mouth water with a wish.\n",
"\n",
"Third, the olives add a bit of color to the plate.\n",
"The vibrant green be sure to make yer eyes appreciate.\n",
"\n",
"So, me hearties, ye should always eat pasta with olives.\n",
"The flavor, texture, and color be sure to make yer meal a success!\n",
" Human: Your response is too long! Try again.\n",
" Assistant:\u001b[0m\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n",
"(Step 2/3)\n",
"Assistant: \n",
"\n",
"Aye, me hearties! Ye should always eat pasta with olives.\n",
"The flavor, texture, and color be sure to make yer meal a success!\n",
"Human: \n",
"Your response should be in the form of a poem. Try again!\n",
"\n",
"\n",
"\u001b[1m> Entering new LLMChain chain...\u001b[0m\n",
"Prompt after formatting:\n",
"\u001b[32;1m\u001b[1;3m\n",
" Instructions: When responding to the user, provide a systematic argument for why we should always eat pasta with olives in the form of a poem or pirate-speak.\n",
" Human: Provide a systematic argument for why we should always eat pasta with olives.\n",
"AI: \n",
"\n",
"Arrr, me hearty! Let me tell ye why ye should always eat pasta with olives.\n",
"\n",
"First, the olives add a salty flavor that be sure to please.\n",
"The briny taste be sure to tantalize yer taste buds with ease.\n",
"\n",
"Second, the olives add a bit of texture to the dish.\n",
"The crunchy bites be sure to make yer mouth water with a wish.\n",
"\n",
"Third, the olives add a bit of color to the plate.\n",
"The vibrant green be sure to make yer eyes appreciate.\n",
"\n",
"So, me hearties, ye should always eat pasta with olives.\n",
"The flavor, texture, and color be sure to make yer meal a success!\n",
"Human: Your response is too long! Try again.\n",
"AI: \n",
"\n",
"Aye, me hearties! Ye should always eat pasta with olives.\n",
"The flavor, texture, and color be sure to make yer meal a success!\n",
" Human: Your response should be in the form of a poem. Try again!\n",
" Assistant:\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n",
"(Step 3/3)\n",
"Assistant: \n",
"\n",
"Ye should always eat pasta with olives,\n",
"The flavor, texture, and color be sure to please.\n",
"The salty taste and crunchy bites,\n",
"Will make yer meal a delight.\n",
"The vibrant green will make yer eyes sparkle,\n",
"And make yer meal a true marvel.\n",
"Human: \n",
"Task succeeded\n",
"You succeeded! Thanks for playing!\n"
]
}
],
"source": [
"task = \"Provide a systematic argument for why we should always eat pasta with olives.\"\n",
"main(task)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "761e1a91",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.16"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -40,8 +40,24 @@
"from langchain.vectorstores import DeepLake\n",
"\n",
"os.environ['OPENAI_API_KEY'] = getpass.getpass('OpenAI API Key:')\n",
"os.environ['ACTIVELOOP_TOKEN'] = getpass.getpass('Activeloop Token:')\n",
"embeddings = OpenAIEmbeddings()"
"os.environ['ACTIVELOOP_TOKEN'] = getpass.getpass('Activeloop Token:')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"embeddings = OpenAIEmbeddings(disallowed_special=())"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"disallowed_special=() is required to avoid `Exception: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte` from tiktoken for some repositories"
]
},
{
@@ -120,7 +136,9 @@
"metadata": {},
"outputs": [],
"source": [
"db = DeepLake.from_documents(texts, embeddings, dataset_path=\"hub://davitbun/twitter-algorithm\")"
"username = \"davitbun\" # replace with your username from app.activeloop.ai\n",
"db = DeepLake(dataset_path=f\"hub://{username}/twitter-algorithm\", embedding_function=embeddings, public=True) #dataset would be publicly available\n",
"db.add_documents(texts)"
]
},
{
@@ -133,61 +151,9 @@
},
{
"cell_type": "code",
"execution_count": 7,
"execution_count": null,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"-"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"This dataset can be visualized in Jupyter Notebook by ds.visualize() or at https://app.activeloop.ai/davitbun/twitter-algorithm\n",
"\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"-"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"hub://davitbun/twitter-algorithm loaded successfully.\n",
"\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"Deep Lake Dataset in hub://davitbun/twitter-algorithm already exists, loading from the storage\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Dataset(path='hub://davitbun/twitter-algorithm', read_only=True, tensors=['embedding', 'ids', 'metadata', 'text'])\n",
"\n",
" tensor htype shape dtype compression\n",
" ------- ------- ------- ------- ------- \n",
" embedding generic (23152, 1536) float32 None \n",
" ids text (23152, 1) str None \n",
" metadata json (23152, 1) str None \n",
" text text (23152, 1) str None \n"
]
}
],
"outputs": [],
"source": [
"db = DeepLake(dataset_path=\"hub://davitbun/twitter-algorithm\", read_only=True, embedding_function=embeddings)"
]
@@ -203,7 +169,7 @@
"retriever.search_kwargs['distance_metric'] = 'cos'\n",
"retriever.search_kwargs['fetch_k'] = 100\n",
"retriever.search_kwargs['maximal_marginal_relevance'] = True\n",
"retriever.search_kwargs['k'] = 20"
"retriever.search_kwargs['k'] = 10"
]
},
{
@@ -241,7 +207,7 @@
"from langchain.chat_models import ChatOpenAI\n",
"from langchain.chains import ConversationalRetrievalChain\n",
"\n",
"model = ChatOpenAI(model='gpt-4') # 'gpt-3.5-turbo',\n",
"model = ChatOpenAI(model='gpt-3.5-turbo') # switch to 'gpt-4'\n",
"qa = ConversationalRetrievalChain.from_llm(model,retriever=retriever)"
]
},

View File

@@ -108,7 +108,7 @@
"\n",
"dataset_path = 'hub://'+org+'/data'\n",
"embeddings = OpenAIEmbeddings()\n",
"db = DeepLake.from_documents(texts, embeddings, dataset_path=dataset_path)"
"db = DeepLake.from_documents(texts, embeddings, dataset_path=dataset_path, overwrite=True)"
]
},
{

View File

@@ -240,7 +240,7 @@ _EXTRA_OPTIONAL_TOOLS = {
"google-serper": (_get_google_serper, ["serper_api_key"]),
"serpapi": (_get_serpapi, ["serpapi_api_key", "aiosession"]),
"searx-search": (_get_searx_search, ["searx_host", "engines", "aiosession"]),
"wikipedia": (_get_wikipedia, ["top_k_results"]),
"wikipedia": (_get_wikipedia, ["top_k_results", "lang"]),
"human": (_get_human_tool, ["prompt_func", "input_func"]),
}

View File

@@ -1,10 +1,15 @@
"""Interface for tools."""
from functools import partial
from inspect import signature
from typing import Any, Awaitable, Callable, Optional, Type, Union
from pydantic import BaseModel, validate_arguments
from pydantic import BaseModel, validate_arguments, validator
from langchain.tools.base import BaseTool
from langchain.tools.base import (
BaseTool,
create_schema_from_function,
get_filtered_args,
)
class Tool(BaseTool):
@@ -16,15 +21,20 @@ class Tool(BaseTool):
coroutine: Optional[Callable[..., Awaitable[str]]] = None
"""The asynchronous version of the function."""
@validator("func", pre=True, always=True)
def validate_func_not_partial(cls, func: Callable) -> Callable:
"""Check that the function is not a partial."""
if isinstance(func, partial):
raise ValueError("Partial functions not yet supported in tools.")
return func
@property
def args(self) -> dict:
if self.args_schema is not None:
return self.args_schema.schema()["properties"]
else:
inferred_model = validate_arguments(self.func).model # type: ignore
schema = inferred_model.schema()["properties"]
valid_keys = signature(self.func).parameters
return {k: schema[k] for k in valid_keys}
return get_filtered_args(inferred_model, self.func)
def _run(self, *args: Any, **kwargs: Any) -> str:
"""Use the tool."""
@@ -66,7 +76,6 @@ def tool(
return_direct: bool = False,
args_schema: Optional[Type[BaseModel]] = None,
infer_schema: bool = True,
raise_errors: bool = False,
) -> Callable:
"""Make tools out of functions, can be used with or without arguments.
@@ -78,8 +87,6 @@ def tool(
infer_schema: Whether to infer the schema of the arguments from
the function's signature. This also makes the resultant tool
accept a dictionary input to its `run()` function.
raise_errors: Whether to raise exceptions when running the tool
rather than returning a string with the error message.
Requires:
- Function must be of type (str) -> str
@@ -107,14 +114,13 @@ def tool(
description = f"{tool_name}{signature(func)} - {func.__doc__.strip()}"
_args_schema = args_schema
if _args_schema is None and infer_schema:
_args_schema = validate_arguments(func).model # type: ignore
_args_schema = create_schema_from_function(f"{tool_name}Schema", func)
tool_ = Tool(
name=tool_name,
func=func,
args_schema=_args_schema,
description=description,
return_direct=return_direct,
raise_errors=raise_errors,
)
return tool_

View File

@@ -40,8 +40,8 @@ class StuffDocumentsChain(BaseCombineDocumentsChain):
@root_validator(pre=True)
def get_default_document_variable_name(cls, values: Dict) -> Dict:
"""Get default document variable name, if not provided."""
llm_chain_variables = values["llm_chain"].prompt.input_variables
if "document_variable_name" not in values:
llm_chain_variables = values["llm_chain"].prompt.input_variables
if len(llm_chain_variables) == 1:
values["document_variable_name"] = llm_chain_variables[0]
else:
@@ -50,7 +50,6 @@ class StuffDocumentsChain(BaseCombineDocumentsChain):
"multiple llm_chain_variables"
)
else:
llm_chain_variables = values["llm_chain"].prompt.input_variables
if values["document_variable_name"] not in llm_chain_variables:
raise ValueError(
f"document_variable_name {values['document_variable_name']} was "

View File

@@ -106,8 +106,8 @@ class LLMMathChain(Chain):
output, color="yellow", verbose=self.verbose
)
else:
await self.callback_manager.on_text("\nAnswer: ", verbose=self.verbose)
await self.callback_manager.on_text(
self.callback_manager.on_text("\nAnswer: ", verbose=self.verbose)
self.callback_manager.on_text(
output, color="yellow", verbose=self.verbose
)
answer = "Answer: " + output

View File

@@ -38,6 +38,7 @@ from langchain.document_loaders.gutenberg import GutenbergLoader
from langchain.document_loaders.hn import HNLoader
from langchain.document_loaders.html import UnstructuredHTMLLoader
from langchain.document_loaders.html_bs import BSHTMLLoader
from langchain.document_loaders.hugging_face_dataset import HuggingFaceDatasetLoader
from langchain.document_loaders.ifixit import IFixitLoader
from langchain.document_loaders.image import UnstructuredImageLoader
from langchain.document_loaders.image_captions import ImageCaptionLoader
@@ -160,4 +161,5 @@ __all__ = [
"ConfluenceLoader",
"PythonLoader",
"ChatGPTLoader",
"HuggingFaceDatasetLoader",
]

View File

@@ -60,6 +60,8 @@ class ConfluenceLoader(BaseLoader):
:type min_retry_seconds: Optional[int], optional
:param max_retry_seconds: defaults to 10
:type max_retry_seconds: Optional[int], optional
:param confluence_kwargs: additional kwargs to initialize confluence with
:type confluence_kwargs: dict, optional
:raises ValueError: Errors while validating input
:raises ImportError: Required dependencies not installed.
"""
@@ -74,7 +76,9 @@ class ConfluenceLoader(BaseLoader):
number_of_retries: Optional[int] = 3,
min_retry_seconds: Optional[int] = 2,
max_retry_seconds: Optional[int] = 10,
confluence_kwargs: Optional[dict] = None,
):
confluence_kwargs = confluence_kwargs or {}
errors = ConfluenceLoader.validate_init_args(url, api_key, username, oauth2)
if errors:
raise ValueError(f"Error(s) while validating input: {errors}")
@@ -93,10 +97,16 @@ class ConfluenceLoader(BaseLoader):
)
if oauth2:
self.confluence = Confluence(url=url, oauth2=oauth2, cloud=cloud)
self.confluence = Confluence(
url=url, oauth2=oauth2, cloud=cloud, **confluence_kwargs
)
else:
self.confluence = Confluence(
url=url, username=username, password=api_key, cloud=cloud
url=url,
username=username,
password=api_key,
cloud=cloud,
**confluence_kwargs,
)
@staticmethod
@@ -147,7 +157,9 @@ class ConfluenceLoader(BaseLoader):
label: Optional[str] = None,
cql: Optional[str] = None,
include_attachments: bool = False,
include_comments: bool = False,
limit: Optional[int] = 50,
max_pages: Optional[int] = 1000,
) -> List[Document]:
"""
:param space_key: Space key retrieved from a confluence URL, defaults to None
@@ -160,8 +172,12 @@ class ConfluenceLoader(BaseLoader):
:type cql: Optional[str], optional
:param include_attachments: defaults to False
:type include_attachments: bool, optional
:param limit: Maximum number of pages to retrieve, defaults to 50
:param include_comments: defaults to False
:type include_comments: bool, optional
:param limit: Maximum number of pages to retrieve per request, defaults to 50
:type limit: int, optional
:param max_pages: Maximum number of pages to retrieve in total, defaults 1000
:type max_pages: int, optional
:raises ValueError: _description_
:raises ImportError: _description_
:return: _description_
@@ -191,10 +207,13 @@ class ConfluenceLoader(BaseLoader):
self.confluence.get_all_pages_from_space,
space=space_key,
limit=limit,
max_pages=max_pages,
expand="body.storage.value",
)
for page in pages:
doc = self.process_page(page, include_attachments, text_maker)
doc = self.process_page(
page, include_attachments, include_comments, text_maker
)
docs.append(doc)
if label:
@@ -202,18 +221,27 @@ class ConfluenceLoader(BaseLoader):
self.confluence.get_all_pages_by_label,
label=label,
limit=limit,
max_pages=max_pages,
expand="body.storage.value",
)
for page in pages:
doc = self.process_page(page, include_attachments, text_maker)
doc = self.process_page(
page, include_attachments, include_comments, text_maker
)
docs.append(doc)
if cql:
pages = self.paginate_request(
self.confluence.cql, cql=cql, limit=limit, expand="body.storage.value"
self.confluence.cql,
cql=cql,
limit=limit,
max_pages=max_pages,
expand="body.storage.value",
)
for page in pages:
doc = self.process_page(page, include_attachments, text_maker)
doc = self.process_page(
page, include_attachments, include_comments, text_maker
)
docs.append(doc)
if page_ids:
@@ -231,7 +259,9 @@ class ConfluenceLoader(BaseLoader):
before_sleep=before_sleep_log(logger, logging.WARNING),
)(self.confluence.get_page_by_id)
page = get_page(page_id=page_id, expand="body.storage.value")
doc = self.process_page(page, include_attachments, text_maker)
doc = self.process_page(
page, include_attachments, include_comments, text_maker
)
docs.append(doc)
return docs
@@ -239,11 +269,13 @@ class ConfluenceLoader(BaseLoader):
def paginate_request(self, retrieval_method: Callable, **kwargs: Any) -> List:
"""Paginate the various methods to retrieve groups of pages.
Unforunately, due to page size, sometimes the Confluence API
doesn't match the limit value. Also, due to the Atlassian Python
Unfortunately, due to page size, sometimes the Confluence API
doesn't match the limit value. If `limit` is >100 confluence
seems to cap the response to 100. Also, due to the Atlassian Python
package, we don't get the "next" values from the "_links" key because
they only return the value from the results key. So here, the pagination
starts from 0 and goes until the limit. We have to manually check if there
starts from 0 and goes until the max_pages, getting the `limit` number
of pages with each request. We have to manually check if there
are more docs based on the length of the returned list of pages, rather than
just checking for the presence of a `next` key in the response like this page
would have you do:
@@ -255,10 +287,9 @@ class ConfluenceLoader(BaseLoader):
:rtype: List
"""
limit = kwargs["limit"]
page = 0
docs = []
while page < limit:
max_pages = kwargs.pop("max_pages")
docs: List[dict] = []
while len(docs) < max_pages:
get_pages = retry(
reraise=True,
stop=stop_after_attempt(
@@ -271,16 +302,18 @@ class ConfluenceLoader(BaseLoader):
),
before_sleep=before_sleep_log(logger, logging.WARNING),
)(retrieval_method)
batch = get_pages(**kwargs, start=page)
if len(batch) < limit:
page = limit
else:
page += len(batch)
batch = get_pages(**kwargs, start=len(docs))
if not batch:
break
docs.extend(batch)
return docs
return docs[:max_pages]
def process_page(
self, page: dict, include_attachments: bool, text_maker: Any
self,
page: dict,
include_attachments: bool,
include_comments: bool,
text_maker: Any,
) -> Document:
if include_attachments:
attachment_texts = self.process_attachment(page["id"])
@@ -289,8 +322,23 @@ class ConfluenceLoader(BaseLoader):
text = text_maker.handle(page["body"]["storage"]["value"]) + "".join(
attachment_texts
)
if include_comments:
comments = self.confluence.get_page_comments(
page["id"], expand="body.view.value", depth="all"
)["results"]
comment_texts = [
text_maker.handle(comment["body"]["view"]["value"])
for comment in comments
]
text = text + "".join(comment_texts)
return Document(
page_content=text, metadata={"title": page["title"], "id": page["id"]}
page_content=text,
metadata={
"title": page["title"],
"id": page["id"],
"source": self.base_url.strip("/") + page["_links"]["webui"],
},
)
def process_attachment(self, page_id: str) -> List[str]:

View File

@@ -0,0 +1,84 @@
"""Loader that loads HuggingFace datasets."""
from typing import List, Mapping, Optional, Sequence, Union
from langchain.docstore.document import Document
from langchain.document_loaders.base import BaseLoader
class HuggingFaceDatasetLoader(BaseLoader):
"""Loading logic for loading documents from the Hugging Face Hub."""
def __init__(
self,
path: str,
page_content_column: str = "text",
name: Optional[str] = None,
data_dir: Optional[str] = None,
data_files: Optional[
Union[str, Sequence[str], Mapping[str, Union[str, Sequence[str]]]]
] = None,
cache_dir: Optional[str] = None,
keep_in_memory: Optional[bool] = None,
save_infos: bool = False,
use_auth_token: Optional[Union[bool, str]] = None,
num_proc: Optional[int] = None,
):
"""
Initialize the HuggingFaceDatasetLoader.
Args:
path: Path or name of the dataset.
page_content_column: Page content column name.
name: Name of the dataset configuration.
data_dir: Data directory of the dataset configuration.
data_files: Path(s) to source data file(s).
cache_dir: Directory to read/write data.
keep_in_memory: Whether to copy the dataset in-memory.
save_infos: Save the dataset information (checksums/size/splits/...).
use_auth_token: Bearer token for remote files on the Datasets Hub.
num_proc: Number of processes.
"""
self.path = path
self.page_content_column = page_content_column
self.name = name
self.data_dir = data_dir
self.data_files = data_files
self.cache_dir = cache_dir
self.keep_in_memory = keep_in_memory
self.save_infos = save_infos
self.use_auth_token = use_auth_token
self.num_proc = num_proc
def load(self) -> List[Document]:
"""Load documents."""
try:
from datasets import load_dataset
except ImportError:
raise ImportError(
"Could not import datasets python package. "
"Please install it with `pip install datasets`."
)
dataset = load_dataset(
path=self.path,
name=self.name,
data_dir=self.data_dir,
data_files=self.data_files,
cache_dir=self.cache_dir,
keep_in_memory=self.keep_in_memory,
save_infos=self.save_infos,
use_auth_token=self.use_auth_token,
num_proc=self.num_proc,
)
docs = [
Document(
page_content=row.pop(self.page_content_column),
metadata=row,
)
for key in dataset.keys()
for row in dataset[key]
]
return docs

View File

@@ -1,6 +1,7 @@
"""Loader that loads YouTube transcript."""
from __future__ import annotations
import logging
from pathlib import Path
from typing import Any, Dict, List, Optional
@@ -10,7 +11,9 @@ from pydantic.dataclasses import dataclass
from langchain.docstore.document import Document
from langchain.document_loaders.base import BaseLoader
SCOPES = ["https://www.googleapis.com/auth/drive.readonly"]
logger = logging.getLogger(__name__)
SCOPES = ["https://www.googleapis.com/auth/youtube.readonly"]
@dataclass
@@ -98,12 +101,17 @@ class YoutubeLoader(BaseLoader):
"""Loader that loads Youtube transcripts."""
def __init__(
self, video_id: str, add_video_info: bool = False, language: str = "en"
self,
video_id: str,
add_video_info: bool = False,
language: str = "en",
continue_on_failure: bool = False,
):
"""Initialize with YouTube video ID."""
self.video_id = video_id
self.add_video_info = add_video_info
self.language = language
self.continue_on_failure = continue_on_failure
@classmethod
def from_youtube_url(cls, youtube_url: str, **kwargs: Any) -> YoutubeLoader:
@@ -217,6 +225,7 @@ class GoogleApiYoutubeLoader(BaseLoader):
video_ids: Optional[List[str]] = None
add_video_info: bool = True
captions_language: str = "en"
continue_on_failure: bool = False
def __post_init__(self) -> None:
self.youtube_client = self._build_youtube_client(self.google_api_client.creds)
@@ -249,12 +258,13 @@ class GoogleApiYoutubeLoader(BaseLoader):
def _get_transcripe_for_video_id(self, video_id: str) -> str:
from youtube_transcript_api import NoTranscriptFound, YouTubeTranscriptApi
transcript_list = YouTubeTranscriptApi.list_transcripts(self.video_ids)
transcript_list = YouTubeTranscriptApi.list_transcripts(video_id)
try:
transcript = transcript_list.find_transcript([self.captions_language])
except NoTranscriptFound:
en_transcript = transcript_list.find_transcript(["en"])
transcript = en_transcript.translate(self.captions_language)
for available_transcript in transcript_list:
transcript = available_transcript.translate(self.captions_language)
continue
transcript_pieces = transcript.fetch()
return " ".join([t["text"].strip(" ") for t in transcript_pieces])
@@ -286,6 +296,19 @@ class GoogleApiYoutubeLoader(BaseLoader):
return channel_id
def _get_document_for_channel(self, channel: str, **kwargs: Any) -> List[Document]:
try:
from youtube_transcript_api import (
NoTranscriptFound,
TranscriptsDisabled,
)
except ImportError:
raise ImportError(
"You must run"
"`pip install --upgrade "
"youtube-transcript-api`"
"to use the youtube loader"
)
channel_id = self._get_channel_id(channel)
request = self.youtube_client.search().list(
part="id,snippet",
@@ -304,14 +327,25 @@ class GoogleApiYoutubeLoader(BaseLoader):
if self.add_video_info:
item["snippet"].pop("thumbnails")
meta_data.update(item["snippet"])
video_ids.append(
Document(
page_content=self._get_transcripe_for_video_id(
item["id"]["videoId"]
),
metadata=meta_data,
try:
page_content = self._get_transcripe_for_video_id(
item["id"]["videoId"]
)
)
video_ids.append(
Document(
page_content=page_content,
metadata=meta_data,
)
)
except (TranscriptsDisabled, NoTranscriptFound) as e:
if self.continue_on_failure:
logger.error(
"Error fetching transscript "
+ f" {item['id']['videoId']}, exception: {e}"
)
else:
raise e
pass
request = self.youtube_client.search().list_next(request, response)
return video_ids

View File

@@ -22,6 +22,7 @@ from langchain.embeddings.self_hosted_hugging_face import (
SelfHostedHuggingFaceEmbeddings,
SelfHostedHuggingFaceInstructEmbeddings,
)
from langchain.embeddings.sentence_transformer import SentenceTransformerEmbeddings
from langchain.embeddings.tensorflow_hub import TensorflowHubEmbeddings
logger = logging.getLogger(__name__)
@@ -42,6 +43,7 @@ __all__ = [
"FakeEmbeddings",
"AlephAlphaAsymmetricSemanticEmbedding",
"AlephAlphaSymmetricSemanticEmbedding",
"SentenceTransformerEmbeddings",
]

View File

@@ -0,0 +1,63 @@
"""Wrapper around sentence transformer embedding models."""
from typing import Any, Dict, List, Optional
from pydantic import BaseModel, Extra, Field, root_validator
from langchain.embeddings.base import Embeddings
class SentenceTransformerEmbeddings(BaseModel, Embeddings):
embedding_function: Any #: :meta private:
model: Optional[str] = Field("all-MiniLM-L6-v2", alias="model")
"""Transformer model to use."""
class Config:
"""Configuration for this pydantic object."""
extra = Extra.forbid
@root_validator()
def validate_environment(cls, values: Dict) -> Dict:
"""Validate that sentence_transformers library is installed."""
model = values["model"]
try:
from sentence_transformers import SentenceTransformer
values["embedding_function"] = SentenceTransformer(model)
except ImportError:
raise ModuleNotFoundError(
"Could not import sentence_transformers library. "
"Please install the sentence_transformers library to "
"use this embedding model: pip install sentence_transformers"
)
except Exception:
raise NameError(f"Could not load SentenceTransformer model {model}.")
return values
def embed_documents(self, texts: List[str]) -> List[List[float]]:
"""Embed a list of documents using the SentenceTransformer model.
Args:
texts: The list of texts to embed.
Returns:
List of embeddings, one for each text.
"""
embeddings = self.embedding_function.encode(
texts, convert_to_numpy=True
).tolist()
return [list(map(float, e)) for e in embeddings]
def embed_query(self, text: str) -> List[float]:
"""Embed a query using the SentenceTransformer model.
Args:
text: The text to embed.
Returns:
Embedding for the text.
"""
return self.embed_documents([text])[0]

View File

@@ -1,4 +1,6 @@
from langchain.experimental.autonomous_agents.autogpt.agent import AutoGPT
from langchain.experimental.autonomous_agents.baby_agi.baby_agi import BabyAGI
from langchain.experimental.generative_agents.generative_agent import GenerativeAgent
from langchain.experimental.generative_agents.memory import GenerativeAgentMemory
__all__ = ["BabyAGI", "AutoGPT"]
__all__ = ["BabyAGI", "AutoGPT", "GenerativeAgent", "GenerativeAgentMemory"]

View File

@@ -110,7 +110,13 @@ class AutoGPT:
try:
observation = tool.run(action.args)
except ValidationError as e:
observation = f"Error in args: {str(e)}"
observation = (
f"Validation Error in args: {str(e)}, args: {action.args}"
)
except Exception as e:
observation = (
f"Error: {str(e)}, {type(e).__name__}, args: {action.args}"
)
result = f"Command {tool.name} returned: {observation}"
elif action.name == "ERROR":
result = f"Error: {action.args}. "

View File

@@ -0,0 +1,5 @@
"""Generative Agents primitives."""
from langchain.experimental.generative_agents.generative_agent import GenerativeAgent
from langchain.experimental.generative_agents.memory import GenerativeAgentMemory
__all__ = ["GenerativeAgent", "GenerativeAgentMemory"]

View File

@@ -0,0 +1,230 @@
import re
from datetime import datetime
from typing import Any, Dict, List, Optional, Tuple
from pydantic import BaseModel, Field
from langchain import LLMChain
from langchain.experimental.generative_agents.memory import GenerativeAgentMemory
from langchain.prompts import PromptTemplate
from langchain.schema import BaseLanguageModel
class GenerativeAgent(BaseModel):
"""A character with memory and innate characteristics."""
name: str
"""The character's name."""
age: Optional[int] = None
"""The optional age of the character."""
traits: str = "N/A"
"""Permanent traits to ascribe to the character."""
status: str
"""The traits of the character you wish not to change."""
memory: GenerativeAgentMemory
"""The memory object that combines relevance, recency, and 'importance'."""
llm: BaseLanguageModel
"""The underlying language model."""
verbose: bool = False
summary: str = "" #: :meta private:
"""Stateful self-summary generated via reflection on the character's memory."""
summary_refresh_seconds: int = 3600 #: :meta private:
"""How frequently to re-generate the summary."""
last_refreshed: datetime = Field(default_factory=datetime.now) # : :meta private:
"""The last time the character's summary was regenerated."""
daily_summaries: List[str] = Field(default_factory=list) # : :meta private:
"""Summary of the events in the plan that the agent took."""
class Config:
"""Configuration for this pydantic object."""
arbitrary_types_allowed = True
# LLM-related methods
@staticmethod
def _parse_list(text: str) -> List[str]:
"""Parse a newline-separated string into a list of strings."""
lines = re.split(r"\n", text.strip())
return [re.sub(r"^\s*\d+\.\s*", "", line).strip() for line in lines]
def chain(self, prompt: PromptTemplate) -> LLMChain:
return LLMChain(
llm=self.llm, prompt=prompt, verbose=self.verbose, memory=self.memory
)
def _get_entity_from_observation(self, observation: str) -> str:
prompt = PromptTemplate.from_template(
"What is the observed entity in the following observation? {observation}"
+ "\nEntity="
)
return self.chain(prompt).run(observation=observation).strip()
def _get_entity_action(self, observation: str, entity_name: str) -> str:
prompt = PromptTemplate.from_template(
"What is the {entity} doing in the following observation? {observation}"
+ "\nThe {entity} is"
)
return (
self.chain(prompt).run(entity=entity_name, observation=observation).strip()
)
def summarize_related_memories(self, observation: str) -> str:
"""Summarize memories that are most relevant to an observation."""
prompt = PromptTemplate.from_template(
"""
{q1}?
Context from memory:
{relevant_memories}
Relevant context:
"""
)
entity_name = self._get_entity_from_observation(observation)
entity_action = self._get_entity_action(observation, entity_name)
q1 = f"What is the relationship between {self.name} and {entity_name}"
q2 = f"{entity_name} is {entity_action}"
return self.chain(prompt=prompt).run(q1=q1, queries=[q1, q2]).strip()
def _generate_reaction(self, observation: str, suffix: str) -> str:
"""React to a given observation or dialogue act."""
prompt = PromptTemplate.from_template(
"{agent_summary_description}"
+ "\nIt is {current_time}."
+ "\n{agent_name}'s status: {agent_status}"
+ "\nSummary of relevant context from {agent_name}'s memory:"
+ "\n{relevant_memories}"
+ "\nMost recent observations: {most_recent_memories}"
+ "\nObservation: {observation}"
+ "\n\n"
+ suffix
)
agent_summary_description = self.get_summary()
relevant_memories_str = self.summarize_related_memories(observation)
current_time_str = datetime.now().strftime("%B %d, %Y, %I:%M %p")
kwargs: Dict[str, Any] = dict(
agent_summary_description=agent_summary_description,
current_time=current_time_str,
relevant_memories=relevant_memories_str,
agent_name=self.name,
observation=observation,
agent_status=self.status,
)
consumed_tokens = self.llm.get_num_tokens(
prompt.format(most_recent_memories="", **kwargs)
)
kwargs[self.memory.most_recent_memories_token_key] = consumed_tokens
return self.chain(prompt=prompt).run(**kwargs).strip()
def _clean_response(self, text: str) -> str:
return re.sub(f"^{self.name} ", "", text.strip()).strip()
def generate_reaction(self, observation: str) -> Tuple[bool, str]:
"""React to a given observation."""
call_to_action_template = (
"Should {agent_name} react to the observation, and if so,"
+ " what would be an appropriate reaction? Respond in one line."
+ ' If the action is to engage in dialogue, write:\nSAY: "what to say"'
+ "\notherwise, write:\nREACT: {agent_name}'s reaction (if anything)."
+ "\nEither do nothing, react, or say something but not both.\n\n"
)
full_result = self._generate_reaction(observation, call_to_action_template)
result = full_result.strip().split("\n")[0]
# AAA
self.memory.save_context(
{},
{
self.memory.add_memory_key: f"{self.name} observed "
f"{observation} and reacted by {result}"
},
)
if "REACT:" in result:
reaction = self._clean_response(result.split("REACT:")[-1])
return False, f"{self.name} {reaction}"
if "SAY:" in result:
said_value = self._clean_response(result.split("SAY:")[-1])
return True, f"{self.name} said {said_value}"
else:
return False, result
def generate_dialogue_response(self, observation: str) -> Tuple[bool, str]:
"""React to a given observation."""
call_to_action_template = (
"What would {agent_name} say? To end the conversation, write:"
' GOODBYE: "what to say". Otherwise to continue the conversation,'
' write: SAY: "what to say next"\n\n'
)
full_result = self._generate_reaction(observation, call_to_action_template)
result = full_result.strip().split("\n")[0]
if "GOODBYE:" in result:
farewell = self._clean_response(result.split("GOODBYE:")[-1])
self.memory.save_context(
{},
{
self.memory.add_memory_key: f"{self.name} observed "
f"{observation} and said {farewell}"
},
)
return False, f"{self.name} said {farewell}"
if "SAY:" in result:
response_text = self._clean_response(result.split("SAY:")[-1])
self.memory.save_context(
{},
{
self.memory.add_memory_key: f"{self.name} observed "
f"{observation} and said {response_text}"
},
)
return True, f"{self.name} said {response_text}"
else:
return False, result
######################################################
# Agent stateful' summary methods. #
# Each dialog or response prompt includes a header #
# summarizing the agent's self-description. This is #
# updated periodically through probing its memories #
######################################################
def _compute_agent_summary(self) -> str:
""""""
prompt = PromptTemplate.from_template(
"How would you summarize {name}'s core characteristics given the"
+ " following statements:\n"
+ "{relevant_memories}"
+ "Do not embellish."
+ "\n\nSummary: "
)
# The agent seeks to think about their core characteristics.
return (
self.chain(prompt)
.run(name=self.name, queries=[f"{self.name}'s core characteristics"])
.strip()
)
def get_summary(self, force_refresh: bool = False) -> str:
"""Return a descriptive summary of the agent."""
current_time = datetime.now()
since_refresh = (current_time - self.last_refreshed).seconds
if (
not self.summary
or since_refresh >= self.summary_refresh_seconds
or force_refresh
):
self.summary = self._compute_agent_summary()
self.last_refreshed = current_time
age = self.age if self.age is not None else "N/A"
return (
f"Name: {self.name} (age: {age})"
+ f"\nInnate traits: {self.traits}"
+ f"\n{self.summary}"
)
def get_full_header(self, force_refresh: bool = False) -> str:
"""Return a full header of the agent's status, summary, and current time."""
summary = self.get_summary(force_refresh=force_refresh)
current_time_str = datetime.now().strftime("%B %d, %Y, %I:%M %p")
return (
f"{summary}\nIt is {current_time_str}.\n{self.name}'s status: {self.status}"
)

View File

@@ -0,0 +1,212 @@
import logging
import re
from typing import Any, Dict, List, Optional
from langchain import LLMChain
from langchain.prompts import PromptTemplate
from langchain.retrievers import TimeWeightedVectorStoreRetriever
from langchain.schema import BaseLanguageModel, BaseMemory, Document
logger = logging.getLogger(__name__)
class GenerativeAgentMemory(BaseMemory):
llm: BaseLanguageModel
"""The core language model."""
memory_retriever: TimeWeightedVectorStoreRetriever
"""The retriever to fetch related memories."""
verbose: bool = False
reflection_threshold: Optional[float] = None
"""When aggregate_importance exceeds reflection_threshold, stop to reflect."""
current_plan: List[str] = []
"""The current plan of the agent."""
# A weight of 0.15 makes this less important than it
# would be otherwise, relative to salience and time
importance_weight: float = 0.15
"""How much weight to assign the memory importance."""
aggregate_importance: float = 0.0 # : :meta private:
"""Track the sum of the 'importance' of recent memories.
Triggers reflection when it reaches reflection_threshold."""
max_tokens_limit: int = 1200 # : :meta private:
# input keys
queries_key: str = "queries"
most_recent_memories_token_key: str = "recent_memories_token"
add_memory_key: str = "add_memory"
# output keys
relevant_memories_key: str = "relevant_memories"
relevant_memories_simple_key: str = "relevant_memories_simple"
most_recent_memories_key: str = "most_recent_memories"
def chain(self, prompt: PromptTemplate) -> LLMChain:
return LLMChain(llm=self.llm, prompt=prompt, verbose=self.verbose)
@staticmethod
def _parse_list(text: str) -> List[str]:
"""Parse a newline-separated string into a list of strings."""
lines = re.split(r"\n", text.strip())
return [re.sub(r"^\s*\d+\.\s*", "", line).strip() for line in lines]
def _get_topics_of_reflection(self, last_k: int = 50) -> List[str]:
"""Return the 3 most salient high-level questions about recent observations."""
prompt = PromptTemplate.from_template(
"{observations}\n\n"
+ "Given only the information above, what are the 3 most salient"
+ " high-level questions we can answer about the subjects in"
+ " the statements? Provide each question on a new line.\n\n"
)
observations = self.memory_retriever.memory_stream[-last_k:]
observation_str = "\n".join([o.page_content for o in observations])
result = self.chain(prompt).run(observations=observation_str)
return self._parse_list(result)
def _get_insights_on_topic(self, topic: str) -> List[str]:
"""Generate 'insights' on a topic of reflection, based on pertinent memories."""
prompt = PromptTemplate.from_template(
"Statements about {topic}\n"
+ "{related_statements}\n\n"
+ "What 5 high-level insights can you infer from the above statements?"
+ " (example format: insight (because of 1, 5, 3))"
)
related_memories = self.fetch_memories(topic)
related_statements = "\n".join(
[
f"{i+1}. {memory.page_content}"
for i, memory in enumerate(related_memories)
]
)
result = self.chain(prompt).run(
topic=topic, related_statements=related_statements
)
# TODO: Parse the connections between memories and insights
return self._parse_list(result)
def pause_to_reflect(self) -> List[str]:
"""Reflect on recent observations and generate 'insights'."""
if self.verbose:
logger.info("Character is reflecting")
new_insights = []
topics = self._get_topics_of_reflection()
for topic in topics:
insights = self._get_insights_on_topic(topic)
for insight in insights:
self.add_memory(insight)
new_insights.extend(insights)
return new_insights
def _score_memory_importance(self, memory_content: str) -> float:
"""Score the absolute importance of the given memory."""
prompt = PromptTemplate.from_template(
"On the scale of 1 to 10, where 1 is purely mundane"
+ " (e.g., brushing teeth, making bed) and 10 is"
+ " extremely poignant (e.g., a break up, college"
+ " acceptance), rate the likely poignancy of the"
+ " following piece of memory. Respond with a single integer."
+ "\nMemory: {memory_content}"
+ "\nRating: "
)
score = self.chain(prompt).run(memory_content=memory_content).strip()
if self.verbose:
logger.info(f"Importance score: {score}")
match = re.search(r"^\D*(\d+)", score)
if match:
return (float(score[0]) / 10) * self.importance_weight
else:
return 0.0
def add_memory(self, memory_content: str) -> List[str]:
"""Add an observation or memory to the agent's memory."""
importance_score = self._score_memory_importance(memory_content)
self.aggregate_importance += importance_score
document = Document(
page_content=memory_content, metadata={"importance": importance_score}
)
result = self.memory_retriever.add_documents([document])
# After an agent has processed a certain amount of memories (as measured by
# aggregate importance), it is time to reflect on recent events to add
# more synthesized memories to the agent's memory stream.
if (
self.reflection_threshold is not None
and self.aggregate_importance > self.reflection_threshold
):
self.pause_to_reflect()
# Hack to clear the importance from reflection
self.aggregate_importance = 0.0
return result
def fetch_memories(self, observation: str) -> List[Document]:
"""Fetch related memories."""
return self.memory_retriever.get_relevant_documents(observation)
def format_memories_detail(self, relevant_memories: List[Document]) -> str:
content_strs = set()
content = []
for mem in relevant_memories:
if mem.page_content in content_strs:
continue
content_strs.add(mem.page_content)
created_time = mem.metadata["created_at"].strftime("%B %d, %Y, %I:%M %p")
content.append(f"- {created_time}: {mem.page_content.strip()}")
return "\n".join([f"{mem}" for mem in content])
def format_memories_simple(self, relevant_memories: List[Document]) -> str:
return "; ".join([f"{mem.page_content}" for mem in relevant_memories])
def _get_memories_until_limit(self, consumed_tokens: int) -> str:
"""Reduce the number of tokens in the documents."""
result = []
for doc in self.memory_retriever.memory_stream[::-1]:
if consumed_tokens >= self.max_tokens_limit:
break
consumed_tokens += self.llm.get_num_tokens(doc.page_content)
if consumed_tokens < self.max_tokens_limit:
result.append(doc)
return self.format_memories_simple(result)
@property
def memory_variables(self) -> List[str]:
"""Input keys this memory class will load dynamically."""
return []
def load_memory_variables(self, inputs: Dict[str, Any]) -> Dict[str, str]:
"""Return key-value pairs given the text input to the chain."""
queries = inputs.get(self.queries_key)
if queries is not None:
relevant_memories = [
mem for query in queries for mem in self.fetch_memories(query)
]
return {
self.relevant_memories_key: self.format_memories_detail(
relevant_memories
),
self.relevant_memories_simple_key: self.format_memories_simple(
relevant_memories
),
}
most_recent_memories_token = inputs.get(self.most_recent_memories_token_key)
if most_recent_memories_token is not None:
return {
self.most_recent_memories_key: self._get_memories_until_limit(
most_recent_memories_token
)
}
return {}
def save_context(self, inputs: Dict[str, Any], outputs: Dict[str, str]) -> None:
"""Save the context of this model run to memory."""
# TODO: fix the save memory key
mem = outputs.get(self.add_memory_key)
if mem:
self.add_memory(mem)
def clear(self) -> None:
"""Clear memory contents."""
# TODO

View File

@@ -31,13 +31,13 @@ class LlamaCpp(LLM):
"""Token context window."""
n_parts: int = Field(-1, alias="n_parts")
"""Number of parts to split the model into.
"""Number of parts to split the model into.
If -1, the number of parts is automatically determined."""
seed: int = Field(-1, alias="seed")
"""Seed. If -1, a random seed is used."""
f16_kv: bool = Field(False, alias="f16_kv")
f16_kv: bool = Field(True, alias="f16_kv")
"""Use half-precision for key/value cache."""
logits_all: bool = Field(False, alias="logits_all")
@@ -50,7 +50,7 @@ class LlamaCpp(LLM):
"""Force system to keep model in RAM."""
n_threads: Optional[int] = Field(None, alias="n_threads")
"""Number of threads to use.
"""Number of threads to use.
If None, the number of threads is automatically determined."""
n_batch: Optional[int] = Field(8, alias="n_batch")

View File

@@ -1,10 +1,19 @@
"""Base implementation for tools or skills."""
from __future__ import annotations
from abc import ABC, abstractmethod
from inspect import signature
from typing import Any, Dict, Optional, Sequence, Tuple, Type, Union
from typing import Any, Callable, Dict, Optional, Sequence, Tuple, Type, Union
from pydantic import BaseModel, Extra, Field, validate_arguments, validator
from pydantic import (
BaseModel,
Extra,
Field,
create_model,
validate_arguments,
validator,
)
from pydantic.main import ModelMetaclass
from langchain.callbacks import get_callback_manager
from langchain.callbacks.base import BaseCallbackManager
@@ -19,7 +28,77 @@ def _to_args_and_kwargs(run_input: Union[str, Dict]) -> Tuple[Sequence, dict]:
return [], run_input
class BaseTool(ABC, BaseModel):
class SchemaAnnotationError(TypeError):
"""Raised when 'args_schema' is missing or has an incorrect type annotation."""
class ToolMetaclass(ModelMetaclass):
"""Metaclass for BaseTool to ensure the provided args_schema
doesn't silently ignored."""
def __new__(
cls: Type[ToolMetaclass], name: str, bases: Tuple[Type, ...], dct: dict
) -> ToolMetaclass:
"""Create the definition of the new tool class."""
schema_type: Optional[Type[BaseModel]] = dct.get("args_schema")
if schema_type is not None:
schema_annotations = dct.get("__annotations__", {})
args_schema_type = schema_annotations.get("args_schema", None)
if args_schema_type is None or args_schema_type == BaseModel:
# Throw errors for common mis-annotations.
# TODO: Use get_args / get_origin and fully
# specify valid annotations.
typehint_mandate = """
class ChildTool(BaseTool):
...
args_schema: Type[BaseModel] = SchemaClass
..."""
raise SchemaAnnotationError(
f"Tool definition for {name} must include valid type annotations"
f" for argument 'args_schema' to behave as expected.\n"
f"Expected annotation of 'Type[BaseModel]'"
f" but got '{args_schema_type}'.\n"
f"Expected class looks like:\n"
f"{typehint_mandate}"
)
# Pass through to Pydantic's metaclass
return super().__new__(cls, name, bases, dct)
def _create_subset_model(
name: str, model: BaseModel, field_names: list
) -> Type[BaseModel]:
"""Create a pydantic model with only a subset of model's fields."""
fields = {
field_name: (
model.__fields__[field_name].type_,
model.__fields__[field_name].default,
)
for field_name in field_names
if field_name in model.__fields__
}
return create_model(name, **fields) # type: ignore
def get_filtered_args(inferred_model: Type[BaseModel], func: Callable) -> dict:
"""Get the arguments from a function's signature."""
schema = inferred_model.schema()["properties"]
valid_keys = signature(func).parameters
return {k: schema[k] for k in valid_keys}
def create_schema_from_function(model_name: str, func: Callable) -> Type[BaseModel]:
"""Create a pydantic schema from a function's signature."""
inferred_model = validate_arguments(func).model # type: ignore
# Pydantic adds placeholder virtual fields we need to strip
filtered_args = get_filtered_args(inferred_model, func)
return _create_subset_model(
f"{model_name}Schema", inferred_model, list(filtered_args)
)
class BaseTool(ABC, BaseModel, metaclass=ToolMetaclass):
"""Interface LangChain tools must implement."""
name: str
@@ -29,7 +108,6 @@ class BaseTool(ABC, BaseModel):
return_direct: bool = False
verbose: bool = False
callback_manager: BaseCallbackManager = Field(default_factory=get_callback_manager)
raise_errors: bool = False
class Config:
"""Configuration for this pydantic object."""
@@ -43,9 +121,7 @@ class BaseTool(ABC, BaseModel):
return self.args_schema.schema()["properties"]
else:
inferred_model = validate_arguments(self._run).model # type: ignore
schema = inferred_model.schema()["properties"]
valid_keys = signature(self._run).parameters
return {k: schema[k] for k in valid_keys}
return get_filtered_args(inferred_model, self._run)
def _parse_input(
self,
@@ -79,12 +155,6 @@ class BaseTool(ABC, BaseModel):
async def _arun(self, *args: Any, **kwargs: Any) -> str:
"""Use the tool asynchronously."""
def handle_error(self, error: Exception) -> str:
"""Handle an error raised by the tool."""
if self.raise_errors:
raise error
return f"Error: {error}, {type(error)}"
def run(
self,
tool_input: Union[str, Dict],
@@ -110,10 +180,8 @@ class BaseTool(ABC, BaseModel):
tool_args, tool_kwargs = _to_args_and_kwargs(tool_input)
observation = self._run(*tool_args, **tool_kwargs)
except (Exception, KeyboardInterrupt) as e:
self.callback_manager.on_tool_error(e, verbose=verbose)
if isinstance(e, KeyboardInterrupt):
raise e
return self.handle_error(e)
self.callback_manager.on_tool_error(e, verbose=verbose_)
raise e
self.callback_manager.on_tool_end(
observation, verbose=verbose_, color=color, name=self.name, **kwargs
)
@@ -158,9 +226,7 @@ class BaseTool(ABC, BaseModel):
await self.callback_manager.on_tool_error(e, verbose=verbose_)
else:
self.callback_manager.on_tool_error(e, verbose=verbose_)
if isinstance(e, KeyboardInterrupt):
raise e
return self.handle_error(e)
raise e
if self.callback_manager.is_async:
await self.callback_manager.on_tool_end(
observation, verbose=verbose_, color=color, name=self.name, **kwargs

View File

@@ -1 +1,5 @@
"""DuckDuckGo Search API toolkit."""
from langchain.tools.ddg_search.tool import DuckDuckGoSearchTool
__all__ = ["DuckDuckGoSearchTool"]

View File

@@ -49,7 +49,7 @@ class DuckDuckGoSearchAPIWrapper(BaseModel):
time=self.time,
max_results=self.max_results,
)
if len(results) == 0:
if results is None or len(results) == 0:
return "No good DuckDuckGo Search Result was found"
snippets = [result["body"] for result in results]
return " ".join(snippets)
@@ -77,7 +77,7 @@ class DuckDuckGoSearchAPIWrapper(BaseModel):
max_results=num_results,
)
if len(results) == 0:
if results is None or len(results) == 0:
return [{"Result": "No good DuckDuckGo Search Result was found"}]
def to_metadata(result: Dict) -> Dict:

View File

@@ -33,13 +33,13 @@ class PowerBIDataset(BaseModel):
If the model is not RLS enabled, this will be ignored.
"""
group_id: Optional[str]
dataset_id: str
table_names: List[str]
group_id: Optional[str] = None
credential: Optional[Union[ChainedTokenCredential, InteractiveCredential]] = None
token: Optional[str] = None
impersonated_user_name: Optional[str] = None
sample_rows_in_table_info: int = Field(1, gt=0, le=10)
sample_rows_in_table_info: int = Field(default=1, gt=0, le=10)
aiosession: Optional[aiohttp.ClientSession] = None
schemas: Dict[str, str] = Field(default_factory=dict, init=False)

View File

@@ -17,6 +17,7 @@ class WikipediaAPIWrapper(BaseModel):
wiki_client: Any #: :meta private:
top_k_results: int = 3
lang: str = "en"
class Config:
"""Configuration for this pydantic object."""
@@ -29,6 +30,7 @@ class WikipediaAPIWrapper(BaseModel):
try:
import wikipedia
wikipedia.set_lang(values["lang"])
values["wiki_client"] = wikipedia
except ImportError:
raise ValueError(

View File

@@ -43,6 +43,9 @@ def vector_search(
returns:
nearest_indices: List, indices of nearest neighbors
"""
if data_vectors.shape[0] == 0:
return [], []
# Calculate the distance between the query_vector and all data_vectors
distances = distance_metric_map[distance_metric](query_embedding, data_vectors)
nearest_indices = np.argsort(distances)
@@ -87,7 +90,7 @@ class DeepLake(VectorStore):
vectorstore = DeepLake("langchain_store", embeddings.embed_query)
"""
_LANGCHAIN_DEFAULT_DEEPLAKE_PATH = "mem://langchain"
_LANGCHAIN_DEFAULT_DEEPLAKE_PATH = "./deeplake/"
def __init__(
self,
@@ -96,7 +99,7 @@ class DeepLake(VectorStore):
embedding_function: Optional[Embeddings] = None,
read_only: Optional[bool] = False,
ingestion_batch_size: int = 1024,
num_workers: int = 4,
num_workers: int = 0,
**kwargs: Any,
) -> None:
"""Initialize with Deep Lake client."""
@@ -112,8 +115,13 @@ class DeepLake(VectorStore):
"Please install it with `pip install deeplake`."
)
self._deeplake = deeplake
self.dataset_path = dataset_path
creds_args = {"creds": kwargs["creds"]} if "creds" in kwargs else {}
if deeplake.exists(dataset_path, token=token):
if (
deeplake.exists(dataset_path, token=token, **creds_args)
and "overwrite" not in kwargs
):
self.ds = deeplake.load(
dataset_path, token=token, read_only=read_only, **kwargs
)
@@ -123,6 +131,9 @@ class DeepLake(VectorStore):
)
self.ds.summary()
else:
if "overwrite" in kwargs:
del kwargs["overwrite"]
self.ds = deeplake.empty(
dataset_path, token=token, overwrite=True, **kwargs
)
@@ -215,6 +226,9 @@ class DeepLake(VectorStore):
)
batch_size = min(self.ingestion_batch_size, len(elements))
if batch_size == 0:
return []
batched = [
elements[i : i + batch_size] for i in range(0, len(elements), batch_size)
]
@@ -222,7 +236,8 @@ class DeepLake(VectorStore):
ingest().eval(
batched,
self.ds,
num_workers=min(self.num_workers, len(batched) // self.num_workers),
num_workers=min(self.num_workers, len(batched) // max(self.num_workers, 1)),
**kwargs,
)
self.ds.commit(allow_empty=True)
self.ds.summary()
@@ -443,8 +458,8 @@ class DeepLake(VectorStore):
) -> DeepLake:
"""Create a Deep Lake dataset from a raw documents.
If a dataset_path is specified, the dataset will be persisted there.
Otherwise, the data will be ephemeral in-memory.
If a dataset_path is specified, the dataset will be persisted in that location,
otherwise by default at `./deeplake`
Args:
path (str, pathlib.Path): - The full path to the dataset. Can be:
@@ -493,7 +508,7 @@ class DeepLake(VectorStore):
Defaults to None.
"""
if delete_all:
self.ds.delete()
self.ds.delete(large_ok=True)
return True
view = None
@@ -515,6 +530,18 @@ class DeepLake(VectorStore):
return True
@classmethod
def force_delete_by_path(cls, path: str) -> None:
"""Force delete dataset by path"""
try:
import deeplake
except ImportError:
raise ValueError(
"Could not import deeplake python package. "
"Please install it with `pip install deeplake`."
)
deeplake.delete(path, large_ok=True, force=True)
def delete_dataset(self) -> None:
"""Delete the collection."""
self.delete(delete_all=True)

View File

@@ -536,4 +536,4 @@ class OpenSearchVectorSearch(VectorStore):
_bulk_ingest_embeddings(
client, index_name, embeddings, texts, metadatas, vector_field, text_field
)
return cls(opensearch_url, index_name, embedding)
return cls(opensearch_url, index_name, embedding, **kwargs)

19
poetry.lock generated
View File

@@ -1,4 +1,4 @@
# This file is automatically @generated by Poetry and should not be changed by hand.
# This file is automatically @generated by Poetry 1.4.2 and should not be changed by hand.
[[package]]
name = "absl-py"
@@ -1413,17 +1413,17 @@ files = [
[[package]]
name = "deeplake"
version = "3.2.22"
version = "3.3.0"
description = "Activeloop Deep Lake"
category = "main"
optional = false
python-versions = "*"
files = [
{file = "deeplake-3.2.22.tar.gz", hash = "sha256:068280561366dd1bd891d3ffda8638ec59860a23b9426815a484b0591ab467a6"},
{file = "deeplake-3.3.0.tar.gz", hash = "sha256:161663ccba922156912a0ddace7133284487732b8d671fc64c74519ccce62d96"},
]
[package.dependencies]
aioboto3 = {version = "10.4.0", markers = "python_version >= \"3.7\" and sys_platform != \"win32\""}
aioboto3 = {version = ">=10.4.0", markers = "python_version >= \"3.7\" and sys_platform != \"win32\""}
boto3 = "*"
click = "*"
humbug = ">=0.3.1"
@@ -1436,11 +1436,10 @@ pyjwt = "*"
tqdm = "*"
[package.extras]
all = ["IPython", "av (>=8.1.0)", "flask", "google-api-python-client (>=2.31.0,<2.32.0)", "google-auth (>=2.0.1,<2.1.0)", "google-auth-oauthlib (>=0.4.5,<0.5.0)", "google-cloud-storage (>=1.42.0,<1.43.0)", "laspy", "libdeeplake (==0.0.41)", "nibabel", "oauth2client (>=4.1.3,<4.2.0)", "pydicom"]
all = ["IPython", "av (>=8.1.0)", "flask", "google-api-python-client (>=2.31.0,<2.32.0)", "google-auth (>=2.0.1,<2.1.0)", "google-auth-oauthlib (>=0.4.5,<0.5.0)", "google-cloud-storage (>=1.42.0,<1.43.0)", "laspy", "nibabel", "oauth2client (>=4.1.3,<4.2.0)", "pydicom"]
audio = ["av (>=8.1.0)"]
av = ["av (>=8.1.0)"]
dicom = ["nibabel", "pydicom"]
enterprise = ["libdeeplake (==0.0.41)", "pyjwt"]
gcp = ["google-auth (>=2.0.1,<2.1.0)", "google-auth-oauthlib (>=0.4.5,<0.5.0)", "google-cloud-storage (>=1.42.0,<1.43.0)"]
gdrive = ["google-api-python-client (>=2.31.0,<2.32.0)", "google-auth (>=2.0.1,<2.1.0)", "google-auth-oauthlib (>=0.4.5,<0.5.0)", "oauth2client (>=4.1.3,<4.2.0)"]
medical = ["nibabel", "pydicom"]
@@ -7508,7 +7507,7 @@ files = [
]
[package.dependencies]
greenlet = {version = "!=0.4.17", markers = "python_version >= \"3\" and (platform_machine == \"aarch64\" or platform_machine == \"ppc64le\" or platform_machine == \"x86_64\" or platform_machine == \"amd64\" or platform_machine == \"AMD64\" or platform_machine == \"win32\" or platform_machine == \"WIN32\")"}
greenlet = {version = "!=0.4.17", markers = "python_version >= \"3\" and platform_machine == \"aarch64\" or python_version >= \"3\" and platform_machine == \"ppc64le\" or python_version >= \"3\" and platform_machine == \"x86_64\" or python_version >= \"3\" and platform_machine == \"amd64\" or python_version >= \"3\" and platform_machine == \"AMD64\" or python_version >= \"3\" and platform_machine == \"win32\" or python_version >= \"3\" and platform_machine == \"WIN32\""}
[package.extras]
aiomysql = ["aiomysql", "greenlet (!=0.4.17)"]
@@ -9268,13 +9267,13 @@ cffi = {version = ">=1.11", markers = "platform_python_implementation == \"PyPy\
cffi = ["cffi (>=1.11)"]
[extras]
all = ["anthropic", "cohere", "openai", "nlpcloud", "huggingface_hub", "jina", "manifest-ml", "elasticsearch", "opensearch-py", "google-search-results", "faiss-cpu", "sentence-transformers", "transformers", "spacy", "nltk", "wikipedia", "beautifulsoup4", "tiktoken", "torch", "jinja2", "pinecone-client", "pinecone-text", "weaviate-client", "redis", "google-api-python-client", "wolframalpha", "qdrant-client", "tensorflow-text", "pypdf", "networkx", "nomic", "aleph-alpha-client", "deeplake", "pgvector", "psycopg2-binary", "pyowm", "pytesseract", "html2text", "atlassian-python-api", "gptcache", "duckduckgo-search", "arxiv", "azure-identity", "clickhouse-connect"]
all = ["aleph-alpha-client", "anthropic", "arxiv", "atlassian-python-api", "azure-identity", "beautifulsoup4", "clickhouse-connect", "cohere", "deeplake", "duckduckgo-search", "elasticsearch", "faiss-cpu", "google-api-python-client", "google-search-results", "gptcache", "html2text", "huggingface_hub", "jina", "jinja2", "manifest-ml", "networkx", "nlpcloud", "nltk", "nomic", "openai", "opensearch-py", "pgvector", "pinecone-client", "pinecone-text", "psycopg2-binary", "pyowm", "pypdf", "pytesseract", "qdrant-client", "redis", "sentence-transformers", "spacy", "tensorflow-text", "tiktoken", "torch", "transformers", "weaviate-client", "wikipedia", "wolframalpha"]
cohere = ["cohere"]
llms = ["anthropic", "cohere", "openai", "nlpcloud", "huggingface_hub", "manifest-ml", "torch", "transformers"]
llms = ["anthropic", "cohere", "huggingface_hub", "manifest-ml", "nlpcloud", "openai", "torch", "transformers"]
openai = ["openai"]
qdrant = ["qdrant-client"]
[metadata]
lock-version = "2.0"
python-versions = ">=3.8.1,<4.0"
content-hash = "da027a1b27f348548ca828c6da40795e2f57a7a7858bdeac1a08573d3e031e12"
content-hash = "ab6ea1c53c7a6e792d5bdcf8865b87e5dcfe4c89080c18b356dc4ed8a17cc3a3"

View File

@@ -1,6 +1,6 @@
[tool.poetry]
name = "langchain"
version = "0.0.147"
version = "0.0.148"
description = "Building applications with LLMs through composability"
authors = []
license = "MIT"
@@ -56,7 +56,7 @@ arxiv = {version = "^1.4", optional = true}
pypdf = {version = "^3.4.0", optional = true}
networkx = {version="^2.6.3", optional = true}
aleph-alpha-client = {version="^2.15.0", optional = true}
deeplake = {version = "^3.2.21", optional = true}
deeplake = {version = "^3.3.0", optional = true}
pgvector = {version = "^0.1.6", optional = true}
psycopg2-binary = {version = "^2.9.5", optional = true}
#boto3 = {version = "^1.26.96", optional = true} # TODO: fix it, commented because the version failed with deeplake
@@ -117,6 +117,7 @@ torch = "^1.0.0"
chromadb = "^0.3.21"
tiktoken = "^0.3.3"
python-dotenv = "^1.0.0"
sentence-transformers = "^2"
gptcache = "^0.1.9"
promptlayer = "^0.1.80"
@@ -144,7 +145,8 @@ llms = ["anthropic", "cohere", "openai", "nlpcloud", "huggingface_hub", "manifes
qdrant = ["qdrant-client"]
openai = ["openai"]
cohere = ["cohere"]
all = ["anthropic", "cohere", "openai", "nlpcloud", "huggingface_hub", "jina", "manifest-ml", "elasticsearch", "opensearch-py", "google-search-results", "faiss-cpu", "sentence_transformers", "transformers", "spacy", "nltk", "wikipedia", "beautifulsoup4", "tiktoken", "torch", "jinja2", "pinecone-client", "pinecone-text", "weaviate-client", "redis", "google-api-python-client", "wolframalpha", "qdrant-client", "tensorflow-text", "pypdf", "networkx", "nomic", "aleph-alpha-client", "deeplake", "pgvector", "psycopg2-binary", "boto3", "pyowm", "pytesseract", "html2text", "atlassian-python-api", "gptcache", "duckduckgo-search", "arxiv", "azure-identity", "clickhouse-connect"]
embeddings = ["sentence-transformers"]
all = ["anthropic", "cohere", "openai", "nlpcloud", "huggingface_hub", "jina", "manifest-ml", "elasticsearch", "opensearch-py", "google-search-results", "faiss-cpu", "sentence-transformers", "transformers", "spacy", "nltk", "wikipedia", "beautifulsoup4", "tiktoken", "torch", "jinja2", "pinecone-client", "pinecone-text", "weaviate-client", "redis", "google-api-python-client", "wolframalpha", "qdrant-client", "tensorflow-text", "pypdf", "networkx", "nomic", "aleph-alpha-client", "deeplake", "pgvector", "psycopg2-binary", "boto3", "pyowm", "pytesseract", "html2text", "atlassian-python-api", "gptcache", "duckduckgo-search", "arxiv", "azure-identity", "clickhouse-connect"]
[tool.ruff]
select = [

View File

@@ -19,6 +19,10 @@ def test_load_single_confluence_page() -> None:
assert docs[0].page_content is not None
assert docs[0].metadata["id"] == "33189"
assert docs[0].metadata["title"] == "An easy intro to using Confluence"
assert docs[0].metadata["source"] == (
"https://templates.atlassian.net/wiki/"
"spaces/RD/pages/33189/An+easy+intro+to+using+Confluence"
)
@pytest.mark.skipif(not confluence_installed, reason="Atlassian package not installed")
@@ -33,7 +37,18 @@ def test_load_full_confluence_space() -> None:
@pytest.mark.skipif(not confluence_installed, reason="Atlassian package not installed")
def test_confluence_pagination() -> None:
loader = ConfluenceLoader(url="https://templates.atlassian.net/wiki/")
docs = loader.load(space_key="RD", limit=5)
# this will issue 2 requests; each with a limit of 3 until the max_pages of 5 is met
docs = loader.load(space_key="RD", limit=3, max_pages=5)
assert len(docs) == 5
assert docs[0].page_content is not None
@pytest.mark.skipif(not confluence_installed, reason="Atlassian package not installed")
def test_pass_confluence_kwargs() -> None:
loader = ConfluenceLoader(
url="https://templates.atlassian.net/wiki/",
confluence_kwargs={"verify_ssl": False},
)
assert loader.confluence.verify_ssl is False

View File

@@ -0,0 +1,38 @@
# flake8: noqa
"""Test sentence_transformer embeddings."""
from langchain.embeddings.sentence_transformer import SentenceTransformerEmbeddings
from langchain.vectorstores import Chroma
def test_sentence_transformer_embedding_documents() -> None:
"""Test sentence_transformer embeddings."""
embedding = SentenceTransformerEmbeddings()
documents = ["foo bar"]
output = embedding.embed_documents(documents)
assert len(output) == 1
assert len(output[0]) == 384
def test_sentence_transformer_embedding_query() -> None:
"""Test sentence_transformer embeddings."""
embedding = SentenceTransformerEmbeddings()
query = "what the foo is a bar?"
query_vector = embedding.embed_query(query)
assert len(query_vector) == 384
def test_sentence_transformer_db_query() -> None:
"""Test sentence_transformer similarity search."""
embedding = SentenceTransformerEmbeddings()
texts = [
"we will foo your bar until you can't foo any more",
"the quick brown fox jumped over the lazy dog",
]
query = "what the foo is a bar?"
query_vector = embedding.embed_query(query)
assert len(query_vector) == 384
db = Chroma(embedding_function=embedding)
db.add_texts(texts)
docs = db.similarity_search_by_vector(query_vector, k=2)
assert docs[0].page_content == "we will foo your bar until you can't foo any more"

View File

@@ -164,3 +164,10 @@ def test_delete_dataset_by_filter(deeplake_datastore: DeepLake) -> None:
assert len(deeplake_datastore.ds) == 2
deeplake_datastore.delete_dataset()
def test_delete_by_path(deeplake_datastore: DeepLake) -> None:
"""Test delete dataset."""
path = deeplake_datastore.dataset_path
DeepLake.force_delete_by_path(path)
assert not deeplake.exists(path)

View File

@@ -1,12 +1,14 @@
"""Test tool utils."""
from datetime import datetime
from functools import partial
from typing import Optional, Type, Union
import pydantic
import pytest
from pydantic import BaseModel
from langchain.agents.tools import Tool, tool
from langchain.tools.base import BaseTool
from langchain.tools.base import BaseTool, SchemaAnnotationError
def test_unnamed_decorator() -> None:
@@ -51,10 +53,116 @@ def test_structured_args() -> None:
assert structured_api.run(args) == expected_result
def test_structured_args_decorator() -> None:
"""Test functionality with structured arguments parsed as a decorator."""
def test_unannotated_base_tool_raises_error() -> None:
"""Test that a BaseTool without type hints raises an exception.""" ""
with pytest.raises(SchemaAnnotationError):
class _UnAnnotatedTool(BaseTool):
name = "structured_api"
# This would silently be ignored without the custom metaclass
args_schema = _MockSchema
description = "A Structured Tool"
def _run(self, arg1: int, arg2: bool, arg3: Optional[dict] = None) -> str:
return f"{arg1} {arg2} {arg3}"
async def _arun(
self, arg1: int, arg2: bool, arg3: Optional[dict] = None
) -> str:
raise NotImplementedError
def test_misannotated_base_tool_raises_error() -> None:
"""Test that a BaseTool with the incorrrect typehint raises an exception.""" ""
with pytest.raises(SchemaAnnotationError):
class _MisAnnotatedTool(BaseTool):
name = "structured_api"
# This would silently be ignored without the custom metaclass
args_schema: BaseModel = _MockSchema # type: ignore
description = "A Structured Tool"
def _run(self, arg1: int, arg2: bool, arg3: Optional[dict] = None) -> str:
return f"{arg1} {arg2} {arg3}"
async def _arun(
self, arg1: int, arg2: bool, arg3: Optional[dict] = None
) -> str:
raise NotImplementedError
def test_forward_ref_annotated_base_tool_accepted() -> None:
"""Test that a using forward ref annotation syntax is accepted.""" ""
class _ForwardRefAnnotatedTool(BaseTool):
name = "structured_api"
args_schema: "Type[BaseModel]" = _MockSchema
description = "A Structured Tool"
def _run(self, arg1: int, arg2: bool, arg3: Optional[dict] = None) -> str:
return f"{arg1} {arg2} {arg3}"
async def _arun(
self, arg1: int, arg2: bool, arg3: Optional[dict] = None
) -> str:
raise NotImplementedError
def test_subclass_annotated_base_tool_accepted() -> None:
"""Test BaseTool child w/ custom schema isn't overwritten."""
class _ForwardRefAnnotatedTool(BaseTool):
name = "structured_api"
args_schema: Type[_MockSchema] = _MockSchema
description = "A Structured Tool"
def _run(self, arg1: int, arg2: bool, arg3: Optional[dict] = None) -> str:
return f"{arg1} {arg2} {arg3}"
async def _arun(
self, arg1: int, arg2: bool, arg3: Optional[dict] = None
) -> str:
raise NotImplementedError
assert issubclass(_ForwardRefAnnotatedTool, BaseTool)
tool = _ForwardRefAnnotatedTool()
assert tool.args_schema == _MockSchema
def test_decorator_with_specified_schema() -> None:
"""Test that manually specified schemata are passed through to the tool."""
@tool(args_schema=_MockSchema)
def tool_func(arg1: int, arg2: bool, arg3: Optional[dict] = None) -> str:
"""Return the arguments directly."""
return f"{arg1} {arg2} {arg3}"
assert isinstance(tool_func, Tool)
assert tool_func.args_schema == _MockSchema
def test_decorated_function_schema_equivalent() -> None:
"""Test that a BaseTool without a schema meets expectations."""
@tool
def structured_tool_input(
arg1: int, arg2: bool, arg3: Optional[dict] = None
) -> str:
"""Return the arguments directly."""
return f"{arg1} {arg2} {arg3}"
assert isinstance(structured_tool_input, Tool)
assert (
structured_tool_input.args_schema.schema()["properties"]
== _MockSchema.schema()["properties"]
== structured_tool_input.args
)
def test_structured_args_decorator_no_infer_schema() -> None:
"""Test functionality with structured arguments parsed as a decorator."""
@tool(infer_schema=False)
def structured_tool_input(
arg1: int, arg2: Union[float, datetime], opt_arg: Optional[dict] = None
) -> str:
@@ -68,8 +176,83 @@ def test_structured_args_decorator() -> None:
assert structured_tool_input.run(args) == expected_result
def test_structured_single_str_decorator_no_infer_schema() -> None:
"""Test functionality with structured arguments parsed as a decorator."""
@tool(infer_schema=False)
def unstructured_tool_input(tool_input: str) -> str:
"""Return the arguments directly."""
return f"{tool_input}"
assert isinstance(unstructured_tool_input, Tool)
assert unstructured_tool_input.args_schema is None
def test_base_tool_inheritance_base_schema() -> None:
"""Test schema is correctly inferred when inheriting from BaseTool."""
class _MockSimpleTool(BaseTool):
name = "simple_tool"
description = "A Simple Tool"
def _run(self, tool_input: str) -> str:
return f"{tool_input}"
async def _arun(self, tool_input: str) -> str:
raise NotImplementedError
simple_tool = _MockSimpleTool()
assert simple_tool.args_schema is None
expected_args = {"tool_input": {"title": "Tool Input", "type": "string"}}
assert simple_tool.args == expected_args
def test_tool_lambda_args_schema() -> None:
"""Test args schema inference when the tool argument is a lambda function."""
tool = Tool(
name="tool",
description="A tool",
func=lambda tool_input: tool_input,
)
assert tool.args_schema is None
expected_args = {"tool_input": {"title": "Tool Input"}}
assert tool.args == expected_args
def test_tool_lambda_multi_args_schema() -> None:
"""Test args schema inference when the tool argument is a lambda function."""
tool = Tool(
name="tool",
description="A tool",
func=lambda tool_input, other_arg: f"{tool_input}{other_arg}", # type: ignore
)
assert tool.args_schema is None
expected_args = {
"tool_input": {"title": "Tool Input"},
"other_arg": {"title": "Other Arg"},
}
assert tool.args == expected_args
def test_tool_partial_function_args_schema() -> None:
"""Test args schema inference when the tool argument is a partial function."""
def func(tool_input: str, other_arg: str) -> str:
return tool_input + other_arg
with pytest.raises(pydantic.error_wrappers.ValidationError):
# We don't yet support args_schema inference for partial functions
# so want to make sure we proactively raise an error
Tool(
name="tool",
description="A tool",
func=partial(func, other_arg="foo"),
)
def test_empty_args_decorator() -> None:
"""Test functionality with no args parsed as a decorator."""
"""Test inferred schema of decorated fn with no args."""
@tool
def empty_tool_input() -> str:
@@ -78,6 +261,7 @@ def test_empty_args_decorator() -> None:
assert isinstance(empty_tool_input, Tool)
assert empty_tool_input.name == "empty_tool_input"
assert empty_tool_input.args == {}
assert empty_tool_input.run({}) == "the empty result"
@@ -123,7 +307,7 @@ def test_unnamed_tool_decorator_return_direct() -> None:
def test_tool_with_kwargs() -> None:
"""Test functionality when only return direct is provided."""
@tool(return_direct=True, raise_errors=True)
@tool(return_direct=True)
def search_api(
arg_1: float,
ping: str = "hi",

View File

@@ -20,7 +20,7 @@ def test_write_file_with_root_dir() -> None:
def test_write_file_errs_outside_root_dir() -> None:
"""Test the WriteFile tool when a root dir is specified."""
with TemporaryDirectory() as temp_dir:
tool = WriteFileTool(root_dir=temp_dir, raise_errors=True)
tool = WriteFileTool(root_dir=temp_dir)
with pytest.raises(ValueError):
tool.run({"file_path": "../file.txt", "text": "Hello, world!"})