community[patch]: DuckDB VS - expose similarity, improve performance of from_texts (#20971)

3 fixes of DuckDB vector store:
- unify defaults in constructor and from_texts (users no longer have to
specify `vector_key`).
- include search similarity into output metadata (fixes #20969)
- significantly improve performance of `from_documents`

Dependencies: added Pandas to speed up `from_documents`.
I was thinking about CSV and JSON options, but I expect trouble loading
JSON values this way and also CSV and JSON options require storing data
to disk.
Anyway, the poetry file for langchain-community already contains a
dependency on Pandas.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Co-authored-by: ccurme <chester.curme@gmail.com>
This commit is contained in:
Jan Soubusta
2024-05-25 00:17:52 +02:00
committed by GitHub
parent 42207f5bef
commit cccc8fbe2f
2 changed files with 60 additions and 22 deletions

View File

@@ -14,7 +14,7 @@
"metadata": {},
"outputs": [],
"source": [
"! pip install duckdb langchain-community"
"! pip install duckdb langchain langchain-community langchain-openai"
]
},
{
@@ -86,7 +86,7 @@
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
@@ -100,9 +100,9 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.2"
"version": "3.9.1"
}
},
"nbformat": 4,
"nbformat_minor": 2
"nbformat_minor": 4
}