Compare commits

...

15 Commits

Author SHA1 Message Date
Harrison Chase
56b850648f cr (#1436) 2023-03-04 08:38:56 -08:00
Harrison Chase
63a5614d23 Harrison/simple memory (#1435)
Co-authored-by: Tim Asp <707699+timothyasp@users.noreply.github.com>
2023-03-04 08:15:52 -08:00
Harrison Chase
a1b9dfc099 Harrison/similarity search chroma (#1434)
Co-authored-by: shibuiwilliam <shibuiyusuke@gmail.com>
2023-03-04 08:10:15 -08:00
Peng Qu
68ce68f290 Fix an unusual issue that occurs when using OpenAIChat for llm_math (#1410)
Fix an issue that occurs when using OpenAIChat for llm_math, refer to
the code style of the "Final Answer:" in Mrkl。 the reason is I found a
issue when I try OpenAIChat for llm_math, when I try the question in
Chinese, the model generate the format like "\n\nQuestion: What is the
square of 29?\nAnswer: 841", it translate the question first , then
answer. below is my snapshot:
<img width="945" alt="snapshot"
src="https://user-images.githubusercontent.com/82029664/222642193-10ecca77-db7b-4759-bc46-32a8f8ddc48f.png">
2023-03-04 07:56:07 -08:00
Ikko Eltociear Ashimine
b8a7828d1f Update huggingface_datasets.ipynb (#1417)
HuggingFace -> Hugging Face
2023-03-04 00:22:31 -08:00
Kentaro Tanaka
6a4ee07e4f Fix type hint of 'vectorstore_cls' arg in SemanticSimilarityExampleSelector (#1427)
Hello! Thank you for the amazing library you've created!

While following the tutorial at [the link(`Using an example
selector`)](https://langchain.readthedocs.io/en/latest/modules/prompts/examples/few_shot_examples.html#using-an-example-selector),
I noticed that passing Chroma as an argument to from_examples results in
a type hint error.

Error message(mypy):
```
Argument 3 to "from_examples" of "SemanticSimilarityExampleSelector" has incompatible type "Type[Chroma]"; expected "VectorStore"  [arg-type]mypy(error)
```

This pull request fixes the type hint and allows the VectorStore class
to be specified as an argument.
2023-03-04 00:20:18 -08:00
Tim Asp
23231d65a9 Add PyMuPDF PDF loader (#1426)
Different PDF libraries have different strengths and weaknesses. PyMuPDF
does a good job at extracting the most amount of content from the doc,
regardless of the source quality, extremely fast (especially compared to
Unstructured).

https://pymupdf.readthedocs.io/en/latest/index.html
2023-03-03 20:59:28 -08:00
blob42
3d54b05863 searx: add install instructions, update doc and notebooks (#1420)
- Added instructions on setting up self hosted searx
- Add notebook example with agent
- Use `localhost:8888` as example url to stay consistent since public
instances are not really usable.

Co-authored-by: blob42 <spike@w530>
2023-03-03 20:57:50 -08:00
Tim Asp
bca0935d90 [docs] fix minor import error (#1425) 2023-03-03 16:10:07 -08:00
Jon Luo
882f7964fb fix sql misinterpretation of % in query (#1408)
% is being misinterpreted by sqlalchemy as parameter passing, so any
`LIKE 'asdf%'` will result in a value error with mysql, mariadb, and
maybe some others. This is one way to fix it - the alternative is to
simply double up %, like `LIKE 'asdf%%'` but this seemed cleaner in
terms of output.
Fixes #1383
2023-03-02 16:03:16 -08:00
JonLuca De Caro
443992c4d5 [Docs] Add missing word from prompt docs (#1406)
The prompt in the first example of the quickstart guide was missing `for
`
2023-03-02 16:02:54 -08:00
Eugene Yurtsev
a83a371069 Minor documentation update in initialize_agent (#1397)
Updating documentation in initialize_agent.

One thing that could benefit from further clarification is the
responsibility
breakdown by between an AgentExecutor vs. an Agent. The documentation
for an
AgentExecutor does not clarify that. From the class attributes, it
appears that
executor has access to the tools, while the agent is only aware of the
tool
names. Anyway, additional clarification would be beneficial on the
AgentExecutor class.
2023-03-02 11:46:35 -08:00
Nuno Campos
499e76b199 Allow the regular openai class to be used for ChatGPT models (#1393)
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-03-02 09:04:18 -08:00
Kacper Łukawski
8947797250 Return Cohere embeddings as lists of floats (#1394)
This PR fixes the types returned by Cohere embeddings. Currently, Cohere
client returns instances of `cohere.embeddings.Embeddings`. Since the
transport layer relies on JSON, some numbers might be represented as
ints, not floats, which happens quite often. While that doesn't seem to
be an issue, it breaks some pydantic models if they require strict
floats.
2023-03-02 09:02:10 -08:00
Jason Gill
1989e7d4c2 Update examples to prevent confusing missing _type warning (#1391)
The YAML and JSON examples of prompt serialization now give a strange
`No '_type' key found, defaulting to 'prompt'` message when you try to
run them yourself or copy the format of the files. The reason for this
harmless warning is that the _type key was not in the config files,
which means they are parsed as a standard prompt.

This could be confusing to new users (like it was confusing to me after
upgrading from 0.0.85 to 0.0.86+ for my few_shot prompts that needed a
_type added to the example_prompt config), so this update includes the
_type key just for clarity.

Obviously this is not critical as the warning is harmless, but it could
be confusing to track down or be interpreted as an error by a new user,
so this update should resolve that.
2023-03-02 07:39:57 -08:00
39 changed files with 639 additions and 86 deletions

1
.gitignore vendored
View File

@@ -106,6 +106,7 @@ celerybeat.pid
# Environments
.env
.envrc
.venv
.venvs
env/

View File

@@ -5,21 +5,44 @@ It is broken into two parts: installation and setup, and then references to the
## Installation and Setup
- You can find a list of public SearxNG instances [here](https://searx.space/).
- It recommended to use a self-hosted instance to avoid abuse on the public instances. Also note that public instances often have a limit on the number of requests.
- To run a self-hosted instance see [this page](https://searxng.github.io/searxng/admin/installation.html) for more information.
- To use the tool you need to provide the searx host url by:
1. passing the named parameter `searx_host` when creating the instance.
2. exporting the environment variable `SEARXNG_HOST`.
While it is possible to utilize the wrapper in conjunction with [public searx
instances](https://searx.space/) these instances frequently do not permit API
access (see note on output format below) and have limitations on the frequency
of requests. It is recommended to opt for a self-hosted instance instead.
### Self Hosted Instance:
See [this page](https://searxng.github.io/searxng/admin/installation.html) for installation instructions.
When you install SearxNG, the only active output format by default is the HTML format.
You need to activate the `json` format to use the API. This can be done by adding the following line to the `settings.yml` file:
```yaml
search:
formats:
- html
- json
```
You can make sure that the API is working by issuing a curl request to the API endpoint:
`curl -kLX GET --data-urlencode q='langchain' -d format=json http://localhost:8888`
This should return a JSON object with the results.
## Wrappers
### Utility
To use the wrapper we need to pass the host of the SearxNG instance to the wrapper with:
1. the named parameter `searx_host` when creating the instance.
2. exporting the environment variable `SEARXNG_HOST`.
You can use the wrapper to get results from a SearxNG instance.
```python
from langchain.utilities import SearxSearchWrapper
s = SearxSearchWrapper(searx_host="http://localhost:8888")
s.run("what is a large language model?")
```
### Tool
@@ -29,7 +52,7 @@ You can do this with:
```python
from langchain.agents import load_tools
tools = load_tools(["searx-search"], searx_host="https://searx.example.com")
tools = load_tools(["searx-search"], searx_host="http://localhost:8888")
```
For more information on this, see [this page](../modules/agents/tools.md)
For more information on tools, see [this page](../modules/agents/tools.md)

View File

@@ -66,7 +66,7 @@ llm = OpenAI(temperature=0.9)
We can now call it on some input!
```python
text = "What would be a good company name a company that makes colorful socks?"
text = "What would be a good company name for a company that makes colorful socks?"
print(llm(text))
```

View File

@@ -12,7 +12,7 @@
},
{
"cell_type": "code",
"execution_count": 1,
"execution_count": 2,
"id": "e6860c2d",
"metadata": {
"pycharm": {
@@ -28,7 +28,7 @@
},
{
"cell_type": "code",
"execution_count": 2,
"execution_count": 3,
"id": "dadbcfcd",
"metadata": {},
"outputs": [],
@@ -238,6 +238,92 @@
"source": [
"agent.run(\"What is the weather in Pomfret?\")"
]
},
{
"cell_type": "markdown",
"id": "eabad3af",
"metadata": {},
"source": [
"## SearxNG Meta Search Engine\n",
"\n",
"Here we will be using a self hosted SearxNG meta search engine."
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "b196c704",
"metadata": {},
"outputs": [],
"source": [
"tools = load_tools([\"searx-search\"], searx_host=\"http://localhost:8888\", llm=llm)"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "9023eeaa",
"metadata": {},
"outputs": [],
"source": [
"agent = initialize_agent(tools, llm, agent=\"zero-shot-react-description\", verbose=True)"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "3aad92c1",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3m I should look up the current weather\n",
"Action: SearX Search\n",
"Action Input: \"weather in Pomfret\"\u001b[0m\n",
"Observation: \u001b[36;1m\u001b[1;3mMainly cloudy with snow showers around in the morning. High around 40F. Winds NNW at 5 to 10 mph. Chance of snow 40%. Snow accumulations less than one inch.\n",
"\n",
"10 Day Weather - Pomfret, MD As of 1:37 pm EST Today 49°/ 41° 52% Mon 27 | Day 49° 52% SE 14 mph Cloudy with occasional rain showers. High 49F. Winds SE at 10 to 20 mph. Chance of rain 50%....\n",
"\n",
"10 Day Weather - Pomfret, VT As of 3:51 am EST Special Weather Statement Today 39°/ 32° 37% Wed 01 | Day 39° 37% NE 4 mph Cloudy with snow showers developing for the afternoon. High 39F....\n",
"\n",
"Pomfret, CT ; Current Weather. 1:06 AM. 35°F · RealFeel® 32° ; TODAY'S WEATHER FORECAST. 3/3. 44°Hi. RealFeel® 50° ; TONIGHT'S WEATHER FORECAST. 3/3. 32°Lo.\n",
"\n",
"Pomfret, MD Forecast Today Hourly Daily Morning 41° 1% Afternoon 43° 0% Evening 35° 3% Overnight 34° 2% Don't Miss Finally, Heres Why We Get More Colds and Flu When Its Cold Coast-To-Coast...\n",
"\n",
"Pomfret, MD Weather Forecast | AccuWeather Current Weather 5:35 PM 35° F RealFeel® 36° RealFeel Shade™ 36° Air Quality Excellent Wind E 3 mph Wind Gusts 5 mph Cloudy More Details WinterCast...\n",
"\n",
"Pomfret, VT Weather Forecast | AccuWeather Current Weather 11:21 AM 23° F RealFeel® 27° RealFeel Shade™ 25° Air Quality Fair Wind ESE 3 mph Wind Gusts 7 mph Cloudy More Details WinterCast...\n",
"\n",
"Pomfret Center, CT Weather Forecast | AccuWeather Daily Current Weather 6:50 PM 39° F RealFeel® 36° Air Quality Fair Wind NW 6 mph Wind Gusts 16 mph Mostly clear More Details WinterCast...\n",
"\n",
"12:00 pm · Feels Like36° · WindN 5 mph · Humidity43% · UV Index3 of 10 · Cloud Cover65% · Rain Amount0 in ...\n",
"\n",
"Pomfret Center, CT Weather Conditions | Weather Underground star Popular Cities San Francisco, CA 49 °F Clear Manhattan, NY 37 °F Fair Schiller Park, IL (60176) warning39 °F Mostly Cloudy...\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m I now know the final answer\n",
"Final Answer: The current weather in Pomfret is mainly cloudy with snow showers around in the morning. The temperature is around 40F with winds NNW at 5 to 10 mph. Chance of snow is 40%.\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"'The current weather in Pomfret is mainly cloudy with snow showers around in the morning. The temperature is around 40F with winds NNW at 5 to 10 mph. Chance of snow is 40%.'"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"agent.run(\"What is the weather in Pomfret\")"
]
}
],
"metadata": {
@@ -256,7 +342,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.1"
"version": "3.9.11"
},
"vscode": {
"interpreter": {

View File

@@ -36,6 +36,25 @@
{
"cell_type": "code",
"execution_count": 1,
"id": "7a886879",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"cannot find .env file\n"
]
}
],
"source": [
"%load_ext dotenv\n",
"%dotenv"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "3f2f9b8c",
"metadata": {},
"outputs": [],
@@ -47,7 +66,7 @@
},
{
"cell_type": "code",
"execution_count": 2,
"execution_count": 3,
"id": "b8237d1a",
"metadata": {},
"outputs": [],
@@ -64,7 +83,7 @@
},
{
"cell_type": "code",
"execution_count": 3,
"execution_count": 4,
"id": "4a391730",
"metadata": {},
"outputs": [],
@@ -82,7 +101,7 @@
},
{
"cell_type": "code",
"execution_count": 4,
"execution_count": 5,
"id": "9368bd63",
"metadata": {},
"outputs": [],
@@ -94,7 +113,7 @@
},
{
"cell_type": "code",
"execution_count": 5,
"execution_count": 6,
"id": "d39e15f5",
"metadata": {},
"outputs": [
@@ -107,22 +126,20 @@
"\u001b[1m> Entering new SimpleSequentialChain chain...\u001b[0m\n",
"\u001b[36;1m\u001b[1;3m\n",
"\n",
"Tragedy at Sunset on the Beach follows the story of a young couple, Jack and Annie, who have just started to explore the possibility of a relationship together. After a day spent in the sun and sand, they decide to take a romantic stroll down the beach as the sun sets. \n",
"Tragedy at Sunset on the Beach is a story of a young couple, Jack and Sarah, who are in love and looking forward to their future together. On the night of their anniversary, they decide to take a walk on the beach at sunset. As they are walking, they come across a mysterious figure, who tells them that their love will be tested in the near future. \n",
"\n",
"However, their romantic evening quickly turns tragic when they stumble upon a body lying in the sand. As they approach to investigate, they are shocked to discover that it is Jack's long-lost brother, who has been missing for several years. \n",
"The figure then tells the couple that the sun will soon set, and with it, a tragedy will strike. If Jack and Sarah can stay together and pass the test, they will be granted everlasting love. However, if they fail, their love will be lost forever.\n",
"\n",
"The story follows Jack and Annie as they navigate their way through the tragedy and their newfound relationship. With the help of their friends, family, and the beach's inhabitants, Jack and Annie must come to terms with their deep-seated emotions and the reality of the situation. \n",
"\n",
"Ultimately, the play explores themes of family, love, and loss, as Jack and Annie's story unfolds against the beautiful backdrop of the beach at sunset.\u001b[0m\n",
"The play follows the couple as they struggle to stay together and battle the forces that threaten to tear them apart. Despite the tragedy that awaits them, they remain devoted to one another and fight to keep their love alive. In the end, the couple must decide whether to take a chance on their future together or succumb to the tragedy of the sunset.\u001b[0m\n",
"\u001b[33;1m\u001b[1;3m\n",
"\n",
"Tragedy at Sunset on the Beach is an emotionally complex tale of family, love, and loss. Told against the beautiful backdrop of a beach at sunset, the story follows Jack and Annie, a young couple just beginning to explore a relationship together. When they stumble upon the body of Jack's long-lost brother on the beach, they must face the reality of the tragedy and come to terms with their deep-seated emotions. \n",
"Tragedy at Sunset on the Beach is an emotionally gripping story of love, hope, and sacrifice. Through the story of Jack and Sarah, the audience is taken on a journey of self-discovery and the power of love to overcome even the greatest of obstacles. \n",
"\n",
"The playwright has crafted a heartfelt and thought-provoking story, one that probes into the depths of the human experience. The cast of characters is well-rounded and fully realized, and the dialogue is natural and emotional. The direction and choreography are top-notch, and the scenic design is breathtaking. \n",
"The play's talented cast brings the characters to life, allowing us to feel the depths of their emotion and the intensity of their struggle. With its compelling story and captivating performances, this play is sure to draw in audiences and leave them on the edge of their seats. \n",
"\n",
"Overall, Tragedy at Sunset on the Beach is a powerful and moving story about the fragility of life and the strength of love. It is sure to tug at your heartstrings and leave you with a newfound appreciation of life's precious moments. Highly recommended.\u001b[0m\n",
"The play's setting of the beach at sunset adds a touch of poignancy and romanticism to the story, while the mysterious figure serves to keep the audience enthralled. Overall, Tragedy at Sunset on the Beach is an engaging and thought-provoking play that is sure to leave audiences feeling inspired and hopeful.\u001b[0m\n",
"\n",
"\u001b[1m> Finished SimpleSequentialChain chain.\u001b[0m\n"
"\u001b[1m> Finished chain.\u001b[0m\n"
]
}
],
@@ -132,7 +149,7 @@
},
{
"cell_type": "code",
"execution_count": 6,
"execution_count": 7,
"id": "c6649a01",
"metadata": {},
"outputs": [
@@ -142,11 +159,11 @@
"text": [
"\n",
"\n",
"Tragedy at Sunset on the Beach is an emotionally complex tale of family, love, and loss. Told against the beautiful backdrop of a beach at sunset, the story follows Jack and Annie, a young couple just beginning to explore a relationship together. When they stumble upon the body of Jack's long-lost brother on the beach, they must face the reality of the tragedy and come to terms with their deep-seated emotions. \n",
"Tragedy at Sunset on the Beach is an emotionally gripping story of love, hope, and sacrifice. Through the story of Jack and Sarah, the audience is taken on a journey of self-discovery and the power of love to overcome even the greatest of obstacles. \n",
"\n",
"The playwright has crafted a heartfelt and thought-provoking story, one that probes into the depths of the human experience. The cast of characters is well-rounded and fully realized, and the dialogue is natural and emotional. The direction and choreography are top-notch, and the scenic design is breathtaking. \n",
"The play's talented cast brings the characters to life, allowing us to feel the depths of their emotion and the intensity of their struggle. With its compelling story and captivating performances, this play is sure to draw in audiences and leave them on the edge of their seats. \n",
"\n",
"Overall, Tragedy at Sunset on the Beach is a powerful and moving story about the fragility of life and the strength of love. It is sure to tug at your heartstrings and leave you with a newfound appreciation of life's precious moments. Highly recommended.\n"
"The play's setting of the beach at sunset adds a touch of poignancy and romanticism to the story, while the mysterious figure serves to keep the audience enthralled. Overall, Tragedy at Sunset on the Beach is an engaging and thought-provoking play that is sure to leave audiences feeling inspired and hopeful.\n"
]
}
],
@@ -167,7 +184,7 @@
},
{
"cell_type": "code",
"execution_count": 7,
"execution_count": 8,
"id": "02016a51",
"metadata": {},
"outputs": [],
@@ -185,7 +202,7 @@
},
{
"cell_type": "code",
"execution_count": 8,
"execution_count": 9,
"id": "8bd38cc2",
"metadata": {},
"outputs": [],
@@ -203,7 +220,7 @@
},
{
"cell_type": "code",
"execution_count": 9,
"execution_count": 10,
"id": "524523af",
"metadata": {},
"outputs": [],
@@ -220,7 +237,7 @@
},
{
"cell_type": "code",
"execution_count": 10,
"execution_count": 11,
"id": "3fd3a7be",
"metadata": {},
"outputs": [
@@ -231,14 +248,8 @@
"\n",
"\n",
"\u001b[1m> Entering new SequentialChain chain...\u001b[0m\n",
"\u001b[1mChain 0\u001b[0m:\n",
"{'synopsis': \" \\n\\nTragedy at Sunset on the Beach is a dark and gripping drama set in Victorian England. The play follows the story of two lovers, Emma and Edward, whose passionate relationship is threatened by the strict rules and regulations of the time.\\n\\nThe two are deeply in love, but Edward is from a wealthy family and Emma is from a lower class background. Despite the obstacles, the two are determined to be together and decide to elope.\\n\\nOn the night of their planned escape, Emma and Edward meet at the beach at sunset to declare their love for one another and begin a new life together. However, their plans are disrupted when Emma's father discovers their plan and appears on the beach with a gun.\\n\\nIn a heartbreaking scene, Emma's father orders Edward to leave, but Edward refuses and fights for their love. In a fit of rage, Emma's father shoots Edward, killing him instantly. \\n\\nThe tragedy of the play lies in the fact that Emma and Edward are denied their chance at a happy ending due to the rigid social conventions of Victorian England. The audience is left with a heavy heart as the play ends with Emma standing alone on the beach, mourning the loss of her beloved.\"}\n",
"\n",
"\u001b[1mChain 1\u001b[0m:\n",
"{'review': \"\\n\\nTragedy at Sunset on the Beach is an emotionally charged production that will leave audiences heartsick. The play follows the ill-fated love story of Emma and Edward, two star-crossed lovers whose passionate relationship is tragically thwarted by Victorian England's societal conventions. The performance is captivating from start to finish, as the audience is taken on an emotional rollercoaster of love, loss, and heartbreak.\\n\\nThe acting is powerful and sincere, and the performances of the two leads are particularly stirring. Emma and Edward are both portrayed with such tenderness and emotion that it's hard not to feel their pain as they fight for their forbidden love. The climactic scene, in which Edward is shot by Emma's father, is especially heartbreaking and will leave audience members on the edge of their seats.\\n\\nOverall, Tragedy at Sunset on the Beach is a powerful and moving work of theatre. It is a tragedy of impossible love, and a vivid reminder of the devastating consequences of social injustice. The play is sure to leave a lasting impression on anyone who experiences it.\"}\n",
"\n",
"\n",
"\u001b[1m> Finished SequentialChain chain.\u001b[0m\n"
"\u001b[1m> Finished chain.\u001b[0m\n"
]
}
],
@@ -246,10 +257,91 @@
"review = overall_chain({\"title\":\"Tragedy at sunset on the beach\", \"era\": \"Victorian England\"})"
]
},
{
"cell_type": "markdown",
"id": "d2fac817",
"metadata": {},
"source": [
"### Memory in Sequential Chains\n",
"Sometimes you may want to pass along some context to use in each step of the chain or in a later part of the chain, but maintaining and chaining together the input/output variables can quickly get messy. Using `SimpleMemory` is a convenient way to do manage this and clean up your chains.\n",
"\n",
"For example, using the previous playwright SequentialChain, lets say you wanted to include some context about date, time and location of the play, and using the generated synopsis and review, create some social media post text. You could add these new context variables as `input_variables`, or we can add a `SimpleMemory` to the chain to manage this context:"
]
},
{
"cell_type": "markdown",
"id": "b2cf3098",
"metadata": {},
"source": []
},
{
"cell_type": "code",
"execution_count": 12,
"id": "6b7b3a7a",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new SequentialChain chain...\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"{'title': 'Tragedy at sunset on the beach',\n",
" 'era': 'Victorian England',\n",
" 'time': 'December 25th, 8pm PST',\n",
" 'location': 'Theater in the Park',\n",
" 'social_post_text': \"\\nSpend your Christmas night with us at Theater in the Park and experience the heartbreaking story of love and loss that is 'A Walk on the Beach'. Set in Victorian England, this romantic tragedy follows the story of Frances and Edward, a young couple whose love is tragically cut short. Don't miss this emotional and thought-provoking production that is sure to leave you in tears. #AWalkOnTheBeach #LoveAndLoss #TheaterInThePark #VictorianEngland\"}"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from langchain.chains import SequentialChain\n",
"from langchain.chains.base import SimpleMemory\n",
"\n",
"llm = OpenAI(temperature=.7)\n",
"template = \"\"\"You are a social media manager for a theater company. Given the title of play, the era it is set in, the date,time and location, the synopsis of the play, and the review of the play, it is your job to write a social media post for that play.\n",
"\n",
"Here is some context about the time and location of the play:\n",
"Date and Time: {time}\n",
"Location: {location}\n",
"\n",
"Play Synopsis:\n",
"{synopsis}\n",
"Review from a New York Times play critic of the above play:\n",
"{review}\n",
"\n",
"Social Media Post:\n",
"\"\"\"\n",
"prompt_template = PromptTemplate(input_variables=[\"synopsis\", \"review\", \"time\", \"location\"], template=template)\n",
"social_chain = LLMChain(llm=llm, prompt=prompt_template, output_key=\"social_post_text\")\n",
"\n",
"overall_chain = SequentialChain(\n",
" memory=SimpleMemory(memories={\"time\": \"December 25th, 8pm PST\", \"location\": \"Theater in the Park\"}),\n",
" chains=[synopsis_chain, review_chain, social_chain],\n",
" input_variables=[\"era\", \"title\"],\n",
" # Here we return multiple variables\n",
" output_variables=[\"social_post_text\"],\n",
" verbose=True)\n",
"\n",
"overall_chain({\"title\":\"Tragedy at sunset on the beach\", \"era\": \"Victorian England\"})"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6be70d27",
"id": "ee9bc09c",
"metadata": {},
"outputs": [],
"source": []
@@ -271,7 +363,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.9"
"version": "3.9.1"
}
},
"nbformat": 4,

View File

@@ -104,10 +104,10 @@
"Efficient Data AnnotationC u s t o m i z e d M o d e l T r a i n i n gModel Cust omizationDI A Model HubDI A Pipeline SharingCommunity PlatformLa y out Detection ModelsDocument Images \n",
"T h e C o r e L a y o u t P a r s e r L i b r a r yOCR ModuleSt or age & VisualizationLa y out Data Structur e\n",
"Fig. 1: The overall architecture of LayoutParser . For an input document image,\n",
"the core LayoutParser library provides a set of o\u000b",
"the core LayoutParser library provides a set of o\u000B",
"-the-shelf tools for layout\n",
"detection, OCR, visualization, and storage, backed by a carefully designed layout\n",
"data structure. LayoutParser also supports high level customization via e\u000ecient\n",
"data structure. LayoutParser also supports high level customization via e\u000Ecient\n",
"layout annotation and model training functions. These improve model accuracy\n",
"on the target samples. The community platform enables the easy sharing of DIA\n",
"models and whole digitization pipelines to promote reusability and reproducibility.\n",
@@ -128,10 +128,10 @@
"gure layouts) and\n",
"HJDataset [31](historical Japanese document layouts). A spectrum of models\n",
"trained on these datasets are currently available in the LayoutParser model zoo\n",
"to support di\u000b",
"to support di\u000B",
"erent use cases.\n",
"3 The Core LayoutParser Library\n",
"At the core of LayoutParser is an o\u000b",
"At the core of LayoutParser is an o\u000B",
"-the-shelf toolkit that streamlines DL-\n",
"based document image analysis. Five components support a simple interface\n",
"with comprehensive functionalities: 1) The layout detection models enable using\n",
@@ -266,13 +266,87 @@
"data = loader.load()"
]
},
{
"cell_type": "markdown",
"source": [
"## Using PyMuPDF\n",
"\n",
"This is the fastest of the PDF parsing options, and contains detailed metadata about the PDF and its pages, as well as returns one document per page."
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "code",
"execution_count": 1,
"outputs": [],
"source": [
"from langchain.document_loaders import PyMuPDFLoader"
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "code",
"execution_count": 2,
"outputs": [],
"source": [
"loader = PyMuPDFLoader(\"example_data/layout-parser-paper.pdf\")"
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "code",
"execution_count": 3,
"outputs": [],
"source": [
"data = loader.load()"
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "code",
"execution_count": 4,
"outputs": [
{
"data": {
"text/plain": "Document(page_content='LayoutParser: A Unified Toolkit for Deep\\nLearning Based Document Image Analysis\\nZejiang Shen1 (<28>), Ruochen Zhang2, Melissa Dell3, Benjamin Charles Germain\\nLee4, Jacob Carlson3, and Weining Li5\\n1 Allen Institute for AI\\nshannons@allenai.org\\n2 Brown University\\nruochen zhang@brown.edu\\n3 Harvard University\\n{melissadell,jacob carlson}@fas.harvard.edu\\n4 University of Washington\\nbcgl@cs.washington.edu\\n5 University of Waterloo\\nw422li@uwaterloo.ca\\nAbstract. Recent advances in document image analysis (DIA) have been\\nprimarily driven by the application of neural networks. Ideally, research\\noutcomes could be easily deployed in production and extended for further\\ninvestigation. However, various factors like loosely organized codebases\\nand sophisticated model configurations complicate the easy reuse of im-\\nportant innovations by a wide audience. Though there have been on-going\\nefforts to improve reusability and simplify deep learning (DL) model\\ndevelopment in disciplines like natural language processing and computer\\nvision, none of them are optimized for challenges in the domain of DIA.\\nThis represents a major gap in the existing toolkit, as DIA is central to\\nacademic research across a wide range of disciplines in the social sciences\\nand humanities. This paper introduces LayoutParser, an open-source\\nlibrary for streamlining the usage of DL in DIA research and applica-\\ntions. The core LayoutParser library comes with a set of simple and\\nintuitive interfaces for applying and customizing DL models for layout de-\\ntection, character recognition, and many other document processing tasks.\\nTo promote extensibility, LayoutParser also incorporates a community\\nplatform for sharing both pre-trained models and full document digiti-\\nzation pipelines. We demonstrate that LayoutParser is helpful for both\\nlightweight and large-scale digitization pipelines in real-word use cases.\\nThe library is publicly available at https://layout-parser.github.io.\\nKeywords: Document Image Analysis · Deep Learning · Layout Analysis\\n· Character Recognition · Open Source library · Toolkit.\\n1\\nIntroduction\\nDeep Learning(DL)-based approaches are the state-of-the-art for a wide range of\\ndocument image analysis (DIA) tasks including document image classification [11,\\narXiv:2103.15348v2 [cs.CV] 21 Jun 2021\\n', lookup_str='', metadata={'file_path': 'example_data/layout-parser-paper.pdf', 'page_number': 1, 'total_pages': 16, 'format': 'PDF 1.5', 'title': '', 'author': '', 'subject': '', 'keywords': '', 'creator': 'LaTeX with hyperref', 'producer': 'pdfTeX-1.40.21', 'creationDate': 'D:20210622012710Z', 'modDate': 'D:20210622012710Z', 'trapped': '', 'encryption': None}, lookup_index=0)"
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data[0]"
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "markdown",
"source": [
"Additionally, you can pass along any of the options from the [PyMuPDF documentation](https://pymupdf.readthedocs.io/en/latest/app1.html#plain-text/) as keyword arguments in the `load` call, and it will be pass along to the `get_text()` call."
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "code",
"execution_count": null,
"id": "7301c473",
"metadata": {},
"outputs": [],
"source": []
"source": [],
"metadata": {
"collapsed": false
}
}
],
"metadata": {

View File

@@ -21,7 +21,7 @@
"from langchain.embeddings.cohere import CohereEmbeddings\n",
"from langchain.text_splitter import CharacterTextSplitter\n",
"from langchain.vectorstores.elastic_vector_search import ElasticVectorSearch\n",
"from langchain.vectorstores import Chromaoma"
"from langchain.vectorstores import Chroma"
]
},
{
@@ -215,4 +215,4 @@
},
"nbformat": 4,
"nbformat_minor": 5
}
}

View File

@@ -89,6 +89,46 @@
"print(docs[0].page_content)"
]
},
{
"cell_type": "markdown",
"id": "18152965",
"metadata": {},
"source": [
"## Similarity search with score"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "72aaa9c8",
"metadata": {},
"outputs": [],
"source": [
"docs = db.similarity_search_with_score(query)"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "d88e958e",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(Document(page_content='In state after state, new laws have been passed, not only to suppress the vote, but to subvert entire elections. \\n\\nWe cannot let this happen. \\n\\nTonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while youre at it, pass the Disclose Act so Americans can know who is funding our elections. \\n\\nTonight, Id like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \\n\\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \\n\\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nations top legal minds, who will continue Justice Breyers legacy of excellence.', lookup_str='', metadata={'source': '../../state_of_the_union.txt'}, lookup_index=0),\n",
" 0.3913410007953644)"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"docs[0]"
]
},
{
"cell_type": "markdown",
"id": "8061454b",

View File

@@ -1,4 +1,5 @@
{
"_type": "prompt",
"input_variables": ["input", "output"],
"template": "Input: {input}\nOutput: {output}"
}

View File

@@ -3,6 +3,7 @@
"input_variables": ["adjective"],
"prefix": "Write antonyms for the following words.",
"example_prompt": {
"_type": "prompt",
"input_variables": ["input", "output"],
"template": "Input: {input}\nOutput: {output}"
},

View File

@@ -4,6 +4,7 @@ input_variables:
prefix:
Write antonyms for the following words.
example_prompt:
_type: prompt
input_variables:
["input", "output"]
template:

View File

@@ -3,6 +3,7 @@
"input_variables": ["adjective"],
"prefix": "Write antonyms for the following words.",
"example_prompt": {
"_type": "prompt",
"input_variables": ["input", "output"],
"template": "Input: {input}\nOutput: {output}"
},

View File

@@ -4,6 +4,7 @@ input_variables:
prefix:
Write antonyms for the following words.
example_prompt:
_type: prompt
input_variables:
["input", "output"]
template:

View File

@@ -58,6 +58,7 @@
"name": "stdout",
"output_type": "stream",
"text": [
"_type: prompt\r\n",
"input_variables:\r\n",
" [\"adjective\", \"content\"]\r\n",
"template: \r\n",
@@ -108,6 +109,7 @@
"output_type": "stream",
"text": [
"{\r\n",
" \"_type\": \"prompt\",\r\n",
" \"input_variables\": [\"adjective\", \"content\"],\r\n",
" \"template\": \"Tell me a {adjective} joke about {content}.\"\r\n",
"}\r\n"
@@ -156,6 +158,7 @@
"output_type": "stream",
"text": [
"{\r\n",
" \"_type\": \"prompt\",\r\n",
" \"input_variables\": [\"adjective\", \"content\"],\r\n",
" \"template_path\": \"simple_template.txt\"\r\n",
"}\r\n"
@@ -279,6 +282,7 @@
"prefix: \r\n",
" Write antonyms for the following words.\r\n",
"example_prompt:\r\n",
" _type: prompt\r\n",
" input_variables:\r\n",
" [\"input\", \"output\"]\r\n",
" template:\r\n",
@@ -346,6 +350,7 @@
"prefix: \r\n",
" Write antonyms for the following words.\r\n",
"example_prompt:\r\n",
" _type: prompt\r\n",
" input_variables:\r\n",
" [\"input\", \"output\"]\r\n",
" template:\r\n",
@@ -413,6 +418,7 @@
" \"input_variables\": [\"adjective\"],\r\n",
" \"prefix\": \"Write antonyms for the following words.\",\r\n",
" \"example_prompt\": {\r\n",
" \"_type\": \"prompt\",\r\n",
" \"input_variables\": [\"input\", \"output\"],\r\n",
" \"template\": \"Input: {input}\\nOutput: {output}\"\r\n",
" },\r\n",
@@ -478,6 +484,7 @@
" \"input_variables\": [\"adjective\"],\r\n",
" \"prefix\": \"Write antonyms for the following words.\",\r\n",
" \"example_prompt\": {\r\n",
" \"_type\": \"prompt\",\r\n",
" \"input_variables\": [\"input\", \"output\"],\r\n",
" \"template\": \"Input: {input}\\nOutput: {output}\"\r\n",
" },\r\n",
@@ -542,6 +549,7 @@
"output_type": "stream",
"text": [
"{\r\n",
" \"_type\": \"prompt\",\r\n",
" \"input_variables\": [\"input\", \"output\"],\r\n",
" \"template\": \"Input: {input}\\nOutput: {output}\" \r\n",
"}\r\n"
@@ -622,7 +630,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.9"
"version": "3.11.2"
},
"vscode": {
"interpreter": {

View File

@@ -1,4 +1,5 @@
{
"_type": "prompt",
"input_variables": ["adjective", "content"],
"template": "Tell me a {adjective} joke about {content}."
}

View File

@@ -1,3 +1,4 @@
_type: prompt
input_variables:
["adjective", "content"]
template:

View File

@@ -1,4 +1,5 @@
{
"_type": "prompt",
"input_variables": ["adjective", "content"],
"template_path": "simple_template.txt"
}

View File

@@ -5,9 +5,9 @@
"id": "3cadcf88",
"metadata": {},
"source": [
"# Using HuggingFace Datasets\n",
"# Using Hugging Face Datasets\n",
"\n",
"This example shows how to use HuggingFace datasets to evaluate models. Specifically, we show how to load examples to evaluate models on from HuggingFace's dataset package."
"This example shows how to use Hugging Face datasets to evaluate models. Specifically, we show how to load examples to evaluate models on from Hugging Face's dataset package."
]
},
{
@@ -60,7 +60,7 @@
"source": [
"## Examples\n",
"\n",
"Now we load a dataset from HuggingFace, and then convert it to a list of dictionaries for easier usage."
"Now we load a dataset from Hugging Face, and then convert it to a list of dictionaries for easier usage."
]
},
{

View File

@@ -17,12 +17,12 @@ def initialize_agent(
agent_kwargs: Optional[dict] = None,
**kwargs: Any,
) -> AgentExecutor:
"""Load agent given tools and LLM.
"""Load an agent executor given tools and LLM.
Args:
tools: List of tools this agent has access to.
llm: Language model to use as the agent.
agent: The agent to use. Valid options are:
agent: A string that specified the agent type to use. Valid options are:
`zero-shot-react-description`
`react-docstore`
`self-ask-with-search`
@@ -32,10 +32,11 @@ def initialize_agent(
callback_manager: CallbackManager to use. Global callback manager is used if
not provided. Defaults to None.
agent_path: Path to serialized agent to use.
**kwargs: Additional key word arguments to pass to the agent.
agent_kwargs: Additional key word arguments to pass to the underlying agent
**kwargs: Additional key word arguments passed to the agent executor
Returns:
An agent.
An agent executor
"""
if agent is None and agent_path is None:
agent = "zero-shot-react-description"

View File

@@ -28,7 +28,10 @@ class Memory(BaseModel, ABC):
@abstractmethod
def load_memory_variables(self, inputs: Dict[str, Any]) -> Dict[str, str]:
"""Return key-value pairs given the text input to the chain."""
"""Return key-value pairs given the text input to the chain.
If None, return all memories
"""
@abstractmethod
def save_context(self, inputs: Dict[str, Any], outputs: Dict[str, str]) -> None:
@@ -39,6 +42,29 @@ class Memory(BaseModel, ABC):
"""Clear memory contents."""
class SimpleMemory(Memory, BaseModel):
"""Simple memory for storing context or other bits of information that shouldn't
ever change between prompts.
"""
memories: Dict[str, Any] = dict()
@property
def memory_variables(self) -> List[str]:
return list(self.memories.keys())
def load_memory_variables(self, inputs: Dict[str, Any]) -> Dict[str, str]:
return self.memories
def save_context(self, inputs: Dict[str, Any], outputs: Dict[str, str]) -> None:
"""Nothing should be saved or changed, my memory is set in stone."""
pass
def clear(self) -> None:
"""Nothing to clear, got a memory like a vault."""
pass
def _get_verbosity() -> bool:
return langchain.verbose

View File

@@ -276,6 +276,12 @@ class ConversationEntityMemory(Memory, BaseModel):
}
def save_context(self, inputs: Dict[str, Any], outputs: Dict[str, str]) -> None:
if inputs is None:
raise ValueError(
"Inputs must be provided to save context from "
"ConversationEntityMemory."
)
"""Save context from this conversation to buffer."""
if self.input_key is None:
prompt_input_key = _get_prompt_input_key(inputs, self.memory_variables)

View File

@@ -131,10 +131,12 @@ class LLMChain(Chain, BaseModel):
]
def _call(self, inputs: Dict[str, Any]) -> Dict[str, str]:
return self.apply([inputs])[0]
known_values = self.prep_inputs(inputs.copy())
return self.apply([known_values])[0]
async def _acall(self, inputs: Dict[str, Any]) -> Dict[str, str]:
return (await self.aapply([inputs]))[0]
known_values = self.prep_inputs(inputs.copy())
return (await self.aapply([known_values]))[0]
def predict(self, **kwargs: Any) -> str:
"""Format prompt with kwargs and pass to LLM.

View File

@@ -62,6 +62,8 @@ class LLMMathChain(Chain, BaseModel):
answer = "Answer: " + output
elif t.startswith("Answer:"):
answer = t
elif "Answer:" in t:
answer = "Answer: " + t.split("Answer:")[-1]
else:
raise ValueError(f"unknown format from LLM: {t}")
return {self.output_key: answer}

View File

@@ -1,5 +1,4 @@
"""Chain pipeline where the outputs of one step feed directly into next."""
from typing import Dict, List
from pydantic import BaseModel, Extra, root_validator
@@ -9,7 +8,7 @@ from langchain.input import get_color_mapping
class SequentialChain(Chain, BaseModel):
"""Chain where the outputs of one step feed directly into next."""
"""Chain where the outputs of one chain feed directly into next."""
chains: List[Chain]
input_variables: List[str]
@@ -24,7 +23,7 @@ class SequentialChain(Chain, BaseModel):
@property
def input_keys(self) -> List[str]:
"""Expect input key.
"""Return expected input keys to the chain.
:meta private:
"""
@@ -43,7 +42,20 @@ class SequentialChain(Chain, BaseModel):
"""Validate that the correct inputs exist for all chains."""
chains = values["chains"]
input_variables = values["input_variables"]
known_variables = set(input_variables)
memory_keys = list()
if "memory" in values and values["memory"] is not None:
"""Validate that prompt input variables are consistent."""
memory_keys = values["memory"].memory_variables
if any(input_variables) in memory_keys:
overlapping_keys = input_variables & memory_keys
raise ValueError(
f"The the input key(s) {''.join(overlapping_keys)} are found "
f"in the Memory keys ({memory_keys}) - please use input and "
f"memory keys that don't overlap."
)
known_variables = set(input_variables + memory_keys)
for chain in chains:
missing_vars = set(chain.input_keys).difference(known_variables)
if missing_vars:
@@ -56,6 +68,7 @@ class SequentialChain(Chain, BaseModel):
raise ValueError(
f"Chain returned keys that already exist: {overlapping_keys}"
)
known_variables |= set(chain.output_keys)
if "output_variables" not in values:
@@ -70,6 +83,7 @@ class SequentialChain(Chain, BaseModel):
raise ValueError(
f"Expected output variables that were not found: {missing_vars}."
)
return values
def _call(self, inputs: Dict[str, str]) -> Dict[str, str]:

View File

@@ -24,7 +24,11 @@ from langchain.document_loaders.notion import NotionDirectoryLoader
from langchain.document_loaders.obsidian import ObsidianLoader
from langchain.document_loaders.online_pdf import OnlinePDFLoader
from langchain.document_loaders.paged_pdf import PagedPDFSplitter
from langchain.document_loaders.pdf import PDFMinerLoader, UnstructuredPDFLoader
from langchain.document_loaders.pdf import (
PDFMinerLoader,
PyMuPDFLoader,
UnstructuredPDFLoader,
)
from langchain.document_loaders.powerpoint import UnstructuredPowerPointLoader
from langchain.document_loaders.readthedocs import ReadTheDocsLoader
from langchain.document_loaders.roam import RoamLoader
@@ -78,6 +82,7 @@ __all__ = [
"AirbyteJSONLoader",
"OnlinePDFLoader",
"PDFMinerLoader",
"PyMuPDFLoader",
"TelegramChatLoader",
"SRTLoader",
"FacebookChatLoader",

View File

@@ -1,5 +1,5 @@
"""Loader that loads PDF files."""
from typing import List
from typing import Any, List, Optional
from langchain.docstore.document import Document
from langchain.document_loaders.base import BaseLoader
@@ -27,6 +27,7 @@ class PDFMinerLoader(BaseLoader):
"pdfminer package not found, please install it with "
"`pip install pdfminer.six`"
)
self.file_path = file_path
def load(self) -> List[Document]:
@@ -36,3 +37,37 @@ class PDFMinerLoader(BaseLoader):
text = extract_text(self.file_path)
metadata = {"source": self.file_path}
return [Document(page_content=text, metadata=metadata)]
class PyMuPDFLoader(BaseLoader):
"""Loader that uses PyMuPDF to load PDF files."""
def __init__(self, file_path: str):
"""Initialize with file path."""
try:
import fitz # noqa:F401
except ImportError:
raise ValueError(
"PyMuPDF package not found, please install it with "
"`pip install pymupdf`"
)
self.file_path = file_path
def load(self, **kwargs: Optional[Any]) -> List[Document]:
"""Load file."""
import fitz
doc = fitz.open(self.file_path) # open document
return [
Document(
page_content=page.get_text(**kwargs).encode("utf-8"),
metadata={
"file_path": self.file_path,
"page_number": page.number + 1,
"total_pages": len(doc),
}
| doc.metadata,
)
for page in doc
]

View File

@@ -64,7 +64,7 @@ class CohereEmbeddings(BaseModel, Embeddings):
embeddings = self.client.embed(
model=self.model, texts=texts, truncate=self.truncate
).embeddings
return embeddings
return [list(map(float, e)) for e in embeddings]
def embed_query(self, text: str) -> List[float]:
"""Call out to Cohere's embedding endpoint.
@@ -78,4 +78,4 @@ class CohereEmbeddings(BaseModel, Embeddings):
embedding = self.client.embed(
model=self.model, texts=[text], truncate=self.truncate
).embeddings[0]
return embedding
return list(map(float, embedding))

View File

@@ -161,6 +161,12 @@ class BaseOpenAI(BaseLLM, BaseModel):
streaming: bool = False
"""Whether to stream the results or not."""
def __new__(cls, **data: Any) -> Union[OpenAIChat, BaseOpenAI]: # type: ignore
"""Initialize the OpenAI object."""
if data.get("model_name", "").startswith("gpt-3.5-turbo"):
return OpenAIChat(**data)
return super().__new__(cls)
class Config:
"""Configuration for this pydantic object."""

View File

@@ -1,7 +1,7 @@
"""Example selector that selects examples based on SemanticSimilarity."""
from __future__ import annotations
from typing import Any, Dict, List, Optional
from typing import Any, Dict, List, Optional, Type
from pydantic import BaseModel, Extra
@@ -65,7 +65,7 @@ class SemanticSimilarityExampleSelector(BaseExampleSelector, BaseModel):
cls,
examples: List[dict],
embeddings: Embeddings,
vectorstore_cls: VectorStore,
vectorstore_cls: Type[VectorStore],
k: int = 4,
input_keys: Optional[List[str]] = None,
**vectorstore_cls_kwargs: Any,
@@ -131,7 +131,7 @@ class MaxMarginalRelevanceExampleSelector(SemanticSimilarityExampleSelector, Bas
cls,
examples: List[dict],
embeddings: Embeddings,
vectorstore_cls: VectorStore,
vectorstore_cls: Type[VectorStore],
k: int = 4,
input_keys: Optional[List[str]] = None,
fetch_k: int = 20,

View File

@@ -3,7 +3,7 @@ from __future__ import annotations
from typing import Any, Iterable, List, Optional
from sqlalchemy import MetaData, create_engine, inspect, select
from sqlalchemy import MetaData, create_engine, inspect, select, text
from sqlalchemy.engine import Engine
from sqlalchemy.exc import ProgrammingError, SQLAlchemyError
from sqlalchemy.schema import CreateTable
@@ -177,7 +177,7 @@ class SQLDatabase:
with self._engine.begin() as connection:
if self._schema is not None:
connection.exec_driver_sql(f"SET search_path TO {self._schema}")
cursor = connection.exec_driver_sql(command)
cursor = connection.execute(text(command))
if cursor.returns_rows:
if fetch == "all":
result = cursor.fetchall()

View File

@@ -1,7 +1,13 @@
"""Chain that calls SearxNG meta search API.
SearxNG is a privacy-friendly free metasearch engine that aggregates results from
multiple search engines and databases.
`multiple search engines
<https://docs.searxng.org/admin/engines/configured_engines.html>`_ and databases and
supports the `OpenSearch
<https://github.com/dewitt/opensearch/blob/master/opensearch-1-1-draft-6.md>`_
specification.
More detailes on the installtion instructions `here. <../../ecosystem/searx.html>`_
For the search API refer to https://docs.searxng.org/dev/search_api.html
@@ -176,7 +182,7 @@ class SearxSearchWrapper(BaseModel):
.. code-block:: python
from langchain.utilities import SearxSearchWrapper
searx = SearxSearchWrapper(searx_host="https://searx.example.com")
searx = SearxSearchWrapper(searx_host="http://localhost:8888")
Example with SSL disabled:
.. code-block:: python
@@ -184,7 +190,7 @@ class SearxSearchWrapper(BaseModel):
from langchain.utilities import SearxSearchWrapper
# note the unsecure parameter is not needed if you pass the url scheme as
# http
searx = SearxSearchWrapper(searx_host="http://searx.example.com",
searx = SearxSearchWrapper(searx_host="http://localhost:8888",
unsecure=True)

View File

@@ -3,7 +3,7 @@ from __future__ import annotations
import logging
import uuid
from typing import Any, Dict, Iterable, List, Optional
from typing import Any, Dict, Iterable, List, Optional, Tuple
from langchain.docstore.document import Document
from langchain.embeddings.base import Embeddings
@@ -116,6 +116,27 @@ class Chroma(VectorStore):
Returns:
List[Document]: List of documents most simmilar to the query text.
"""
docs_and_scores = self.similarity_search_with_score(query, k)
return [doc for doc, _ in docs_and_scores]
def similarity_search_with_score(
self,
query: str,
k: int = 4,
filter: Optional[Dict[str, str]] = None,
**kwargs: Any,
) -> List[Tuple[Document, float]]:
"""Run similarity search with Chroma with distance.
Args:
query (str): Query text to search for.
k (int): Number of results to return. Defaults to 4.
filter (Optional[Dict[str, str]]): Filter by metadata. Defaults to None.
Returns:
List[Tuple[Document, float]]: List of documents most similar to the query
text with distance in float.
"""
if self._embedding_function is None:
results = self._collection.query(
query_texts=[query], n_results=k, where=filter
@@ -129,8 +150,12 @@ class Chroma(VectorStore):
docs = [
# TODO: Chroma can do batch querying,
# we shouldn't hard code to the 1st result
Document(page_content=result[0], metadata=result[1])
for result in zip(results["documents"][0], results["metadatas"][0])
(Document(page_content=result[0], metadata=result[1]), result[2])
for result in zip(
results["documents"][0],
results["metadatas"][0],
results["distances"][0],
)
]
return docs

View File

@@ -1,6 +1,6 @@
[tool.poetry]
name = "langchain"
version = "0.0.99"
version = "0.0.101"
description = "Building applications with LLMs through composability"
authors = []
license = "MIT"
@@ -96,7 +96,7 @@ playwright = "^1.28.0"
[tool.poetry.extras]
llms = ["anthropic", "cohere", "openai", "nlpcloud", "huggingface_hub", "manifest-ml", "torch", "transformers"]
all = ["anthropic", "cohere", "openai", "nlpcloud", "huggingface_hub", "manifest-ml", "elasticsearch", "opensearch-py", "google-search-results", "faiss-cpu", "sentence_transformers", "transformers", "spacy", "nltk", "wikipedia", "beautifulsoup4", "tiktoken", "torch", "jinja2", "pinecone-client", "weaviate-client", "redis", "google-api-python-client", "wolframalpha", "qdrant-client", "tensorflow-text", "pypdf", "networkx", "nomic"]
all = ["anthropic", "cohere", "openai", "nlpcloud", "huggingface_hub", "manifest-ml", "elasticsearch", "opensearch-py", "google-search-results", "faiss-cpu", "sentence_transformers", "transformers", "spacy", "nltk", "wikipedia", "beautifulsoup4", "tiktoken", "torch", "jinja2", "pinecone-client", "weaviate-client", "redis", "google-api-python-client", "wolframalpha", "qdrant-client", "tensorflow-text", "pypdf", "networkx", "nomic", "aleph-alpha-client", "deeplake", ]
[tool.ruff]
select = [

View File

@@ -0,0 +1,46 @@
from pathlib import Path
from langchain.document_loaders import (
PDFMinerLoader,
PyMuPDFLoader,
UnstructuredPDFLoader,
)
def test_unstructured_pdf_loader() -> None:
"""Test unstructured loader."""
file_path = Path(__file__).parent.parent / "examples/hello.pdf"
loader = UnstructuredPDFLoader(str(file_path))
docs = loader.load()
assert len(docs) == 1
def test_pdfminer_loader() -> None:
"""Test PDFMiner loader."""
file_path = Path(__file__).parent.parent / "examples/hello.pdf"
loader = PDFMinerLoader(str(file_path))
docs = loader.load()
assert len(docs) == 1
file_path = Path(__file__).parent.parent / "examples/layout-parser-paper.pdf"
loader = PDFMinerLoader(str(file_path))
docs = loader.load()
assert len(docs) == 1
def test_pymupdf_loader() -> None:
"""Test PyMuPDF loader."""
file_path = Path(__file__).parent.parent / "examples/hello.pdf"
loader = PyMuPDFLoader(str(file_path))
docs = loader.load()
assert len(docs) == 1
file_path = Path(__file__).parent.parent / "examples/layout-parser-paper.pdf"
loader = PyMuPDFLoader(str(file_path))
docs = loader.load()
assert len(docs) == 16

View File

@@ -144,6 +144,13 @@ async def test_openai_async_streaming_callback() -> None:
assert isinstance(result, LLMResult)
def test_openai_chat_wrong_class() -> None:
"""Test OpenAIChat with wrong class still works."""
llm = OpenAI(model_name="gpt-3.5-turbo")
output = llm("Say foo:")
assert isinstance(output, str)
def test_openai_chat() -> None:
"""Test OpenAIChat."""
llm = OpenAIChat(max_tokens=10)

View File

@@ -28,6 +28,20 @@ def test_chroma_with_metadatas() -> None:
assert output == [Document(page_content="foo", metadata={"page": "0"})]
def test_chroma_with_metadatas_with_scores() -> None:
"""Test end to end construction and search."""
texts = ["foo", "bar", "baz"]
metadatas = [{"page": str(i)} for i in range(len(texts))]
docsearch = Chroma.from_texts(
collection_name="test_collection",
texts=texts,
embedding=FakeEmbeddings(),
metadatas=metadatas,
)
output = docsearch.similarity_search_with_score("foo", k=1)
assert output == [(Document(page_content="foo", metadata={"page": "0"}), 1.0)]
def test_chroma_with_persistence() -> None:
"""Test end to end construction and search, with persistence."""
chroma_persist_dir = "./tests/persist_dir"

View File

@@ -1,5 +1,5 @@
"""Test logic on base chain class."""
from typing import Any, Dict, List
from typing import Any, Dict, List, Optional
import pytest
from pydantic import BaseModel
@@ -17,7 +17,9 @@ class FakeMemory(Memory, BaseModel):
"""Return baz variable."""
return ["baz"]
def load_memory_variables(self, inputs: Dict[str, Any]) -> Dict[str, str]:
def load_memory_variables(
self, inputs: Optional[Dict[str, Any]] = None
) -> Dict[str, str]:
"""Return baz variable."""
return {"baz": "foo"}

View File

@@ -0,0 +1,11 @@
from langchain.chains.base import SimpleMemory
def test_simple_memory() -> None:
"""Test SimpleMemory."""
memory = SimpleMemory(memories={"baz": "foo"})
output = memory.load_memory_variables({})
assert output == {"baz": "foo"}
assert ["baz"] == memory.memory_variables

View File

@@ -4,7 +4,7 @@ from typing import Dict, List
import pytest
from pydantic import BaseModel
from langchain.chains.base import Chain
from langchain.chains.base import Chain, SimpleMemory
from langchain.chains.sequential import SequentialChain, SimpleSequentialChain
@@ -56,6 +56,19 @@ def test_sequential_usage_multiple_inputs() -> None:
assert output == expected_output
def test_sequential_usage_memory() -> None:
"""Test sequential usage with memory."""
memory = SimpleMemory(memories={"zab": "rab"})
chain_1 = FakeChain(input_variables=["foo"], output_variables=["bar"])
chain_2 = FakeChain(input_variables=["bar"], output_variables=["baz"])
chain = SequentialChain(
memory=memory, chains=[chain_1, chain_2], input_variables=["foo"]
)
output = chain({"foo": "123"})
expected_output = {"baz": "123foofoo", "foo": "123", "zab": "rab"}
assert output == expected_output
def test_sequential_usage_multiple_outputs() -> None:
"""Test sequential usage on multiple output chains."""
chain_1 = FakeChain(input_variables=["foo"], output_variables=["bar", "test"])