Seeing a lot of issues in Discord in which the LLM is not using the
correct LIMIT clause for different SQL dialects. ie, it's using `LIMIT`
for mssql instead of `TOP`, or instead of `ROWNUM` for Oracle, etc.
I think this could be due to us specifying the LIMIT statement in the
example rows portion of `table_info`. So the LLM is seeing the `LIMIT`
statement used in the prompt.
Since we can't specify each dialect's method here, I think it's fine to
just replace the `SELECT... LIMIT 3;` statement with `3 rows from
table_name table:`, and wrap everything in a block comment directly
following the `CREATE` statement. The Rajkumar et al paper wrapped the
example rows and `SELECT` statement in a block comment as well anyway.
Thoughts @fpingham?
`OnlinePDFLoader` and `PagedPDFSplitter` lived separate from the rest of
the pdf loaders.
Because they're all similar, I propose moving all to `pdy.py` and the
same docs/examples page.
Additionally, `PagedPDFSplitter` naming doesn't match the pattern the
rest of the loaders follow, so I renamed to `PyPDFLoader` and had it
inherit from `BasePDFLoader` so it can now load from remote file
sources.
Provide shared memory capability for the Agent.
Inspired by #1293 .
## Problem
If both Agent and Tools (i.e., LLMChain) use the same memory, both of
them will save the context. It can be annoying in some cases.
## Solution
Create a memory wrapper that ignores the save and clear, thereby
preventing updates from Agent or Tools.
Simple CSV document loader which wraps `csv` reader, and preps the file
with a single `Document` per row.
The column header is prepended to each value for context which is useful
for context with embedding and semantic search
This PR adds additional evaluation metrics for data-augmented QA,
resulting in a report like this at the end of the notebook:

The score calculation is based on the
[Critique](https://docs.inspiredco.ai/critique/) toolkit, an API-based
toolkit (like OpenAI) that has minimal dependencies, so it should be
easy for people to run if they choose.
The code could further be simplified by actually adding a chain that
calls Critique directly, but that probably should be saved for another
PR if necessary. Any comments or change requests are welcome!
This pull request proposes an update to the Lightweight wrapper
library's documentation. The current documentation provides an example
of how to use the library's requests.run method, as follows:
requests.run("https://www.google.com"). However, this example does not
work for the 0.0.102 version of the library.
Testing:
The changes have been tested locally to ensure they are working as
intended.
Thank you for considering this pull request.
Different PDF libraries have different strengths and weaknesses. PyMuPDF
does a good job at extracting the most amount of content from the doc,
regardless of the source quality, extremely fast (especially compared to
Unstructured).
https://pymupdf.readthedocs.io/en/latest/index.html
- Added instructions on setting up self hosted searx
- Add notebook example with agent
- Use `localhost:8888` as example url to stay consistent since public
instances are not really usable.
Co-authored-by: blob42 <spike@w530>
The YAML and JSON examples of prompt serialization now give a strange
`No '_type' key found, defaulting to 'prompt'` message when you try to
run them yourself or copy the format of the files. The reason for this
harmless warning is that the _type key was not in the config files,
which means they are parsed as a standard prompt.
This could be confusing to new users (like it was confusing to me after
upgrading from 0.0.85 to 0.0.86+ for my few_shot prompts that needed a
_type added to the example_prompt config), so this update includes the
_type key just for clarity.
Obviously this is not critical as the warning is harmless, but it could
be confusing to track down or be interpreted as an error by a new user,
so this update should resolve that.
This PR:
- Increases `qdrant-client` version to 1.0.4
- Introduces custom content and metadata keys (as requested in #1087)
- Moves all the `QdrantClient` parameters into the method parameters to
simplify code completion