mirror of
https://github.com/hwchase17/langchain.git
synced 2026-02-07 17:50:35 +00:00
Compare commits
1 Commits
v0.0.226
...
vwp/embedd
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
f3f89e0535 |
@@ -16,6 +16,22 @@
|
||||
{%- set development_attrs = '' %}
|
||||
{%- endif %}
|
||||
|
||||
{# title, link, link_attrs #}
|
||||
{%- set drop_down_navigation = [
|
||||
('Getting Started', pathto('getting_started'), ''),
|
||||
('Tutorial', pathto('tutorial/index'), ''),
|
||||
("What's new", pathto('whats_new/v' + version), ''),
|
||||
('Glossary', pathto('glossary'), ''),
|
||||
('Development', development_link, development_attrs),
|
||||
('FAQ', pathto('faq'), ''),
|
||||
('Support', pathto('support'), ''),
|
||||
('Related packages', pathto('related_projects'), ''),
|
||||
('Roadmap', pathto('roadmap'), ''),
|
||||
('Governance', pathto('governance'), ''),
|
||||
('About us', pathto('about'), ''),
|
||||
('GitHub', 'https://github.com/scikit-learn/scikit-learn', ''),
|
||||
('Other Versions and Download', 'https://scikit-learn.org/dev/versions.html', '')]
|
||||
-%}
|
||||
|
||||
<nav id="navbar" class="{{ nav_bar_class }} navbar navbar-expand-md navbar-light bg-light py-0">
|
||||
<div class="container-fluid {{ top_container_cls }} px-0">
|
||||
|
||||
@@ -1,124 +0,0 @@
|
||||
# Tutorials
|
||||
|
||||
|
||||
⛓ icon marks a new addition [last update 2023-07-05]
|
||||
|
||||
---------------------
|
||||
|
||||
### DeepLearning.AI courses
|
||||
by [Harrison Chase](https://github.com/hwchase17) and [Andrew Ng](https://en.wikipedia.org/wiki/Andrew_Ng)
|
||||
- [LangChain for LLM Application Development](https://learn.deeplearning.ai/langchain)
|
||||
- ⛓ [LangChain Chat with Your Data](https://learn.deeplearning.ai/langchain-chat-with-your-data)
|
||||
|
||||
### Handbook
|
||||
[LangChain AI Handbook](https://www.pinecone.io/learn/langchain/) By **James Briggs** and **Francisco Ingham**
|
||||
|
||||
### Short Tutorials
|
||||
[LangChain Crash Course - Build apps with language models](https://youtu.be/LbT1yp6quS8) by [Patrick Loeber](https://www.youtube.com/@patloeber)
|
||||
|
||||
[LangChain Crash Course: Build an AutoGPT app in 25 minutes](https://youtu.be/MlK6SIjcjE8) by [Nicholas Renotte](https://www.youtube.com/@NicholasRenotte)
|
||||
|
||||
[LangChain Explained in 13 Minutes | QuickStart Tutorial for Beginners](https://youtu.be/aywZrzNaKjs) by [Rabbitmetrics](https://www.youtube.com/@rabbitmetrics)
|
||||
|
||||
|
||||
## Tutorials
|
||||
|
||||
### [LangChain for Gen AI and LLMs](https://www.youtube.com/playlist?list=PLIUOU7oqGTLieV9uTIFMm6_4PXg-hlN6F) by [James Briggs](https://www.youtube.com/@jamesbriggs)
|
||||
- #1 [Getting Started with `GPT-3` vs. Open Source LLMs](https://youtu.be/nE2skSRWTTs)
|
||||
- #2 [Prompt Templates for `GPT 3.5` and other LLMs](https://youtu.be/RflBcK0oDH0)
|
||||
- #3 [LLM Chains using `GPT 3.5` and other LLMs](https://youtu.be/S8j9Tk0lZHU)
|
||||
- [LangChain Data Loaders, Tokenizers, Chunking, and Datasets - Data Prep 101](https://youtu.be/eqOfr4AGLk8)
|
||||
- #4 [Chatbot Memory for `Chat-GPT`, `Davinci` + other LLMs](https://youtu.be/X05uK0TZozM)
|
||||
- #5 [Chat with OpenAI in LangChain](https://youtu.be/CnAgB3A5OlU)
|
||||
- #6 [Fixing LLM Hallucinations with Retrieval Augmentation in LangChain](https://youtu.be/kvdVduIJsc8)
|
||||
- #7 [LangChain Agents Deep Dive with `GPT 3.5`](https://youtu.be/jSP-gSEyVeI)
|
||||
- #8 [Create Custom Tools for Chatbots in LangChain](https://youtu.be/q-HNphrWsDE)
|
||||
- #9 [Build Conversational Agents with Vector DBs](https://youtu.be/H6bCqqw9xyI)
|
||||
- [Using NEW `MPT-7B` in Hugging Face and LangChain](https://youtu.be/DXpk9K7DgMo)
|
||||
- ⛓ [`MPT-30B` Chatbot with LangChain](https://youtu.be/pnem-EhT6VI)
|
||||
|
||||
|
||||
### [LangChain 101](https://www.youtube.com/playlist?list=PLqZXAkvF1bPNQER9mLmDbntNfSpzdDIU5) by [Greg Kamradt (Data Indy)](https://www.youtube.com/@DataIndependent)
|
||||
- [What Is LangChain? - LangChain + `ChatGPT` Overview](https://youtu.be/_v_fgW2SkkQ)
|
||||
- [Quickstart Guide](https://youtu.be/kYRB-vJFy38)
|
||||
- [Beginner Guide To 7 Essential Concepts](https://youtu.be/2xxziIWmaSA)
|
||||
- [Beginner Guide To 9 Use Cases](https://youtu.be/vGP4pQdCocw)
|
||||
- [Agents Overview + Google Searches](https://youtu.be/Jq9Sf68ozk0)
|
||||
- [`OpenAI` + `Wolfram Alpha`](https://youtu.be/UijbzCIJ99g)
|
||||
- [Ask Questions On Your Custom (or Private) Files](https://youtu.be/EnT-ZTrcPrg)
|
||||
- [Connect `Google Drive Files` To `OpenAI`](https://youtu.be/IqqHqDcXLww)
|
||||
- [`YouTube Transcripts` + `OpenAI`](https://youtu.be/pNcQ5XXMgH4)
|
||||
- [Question A 300 Page Book (w/ `OpenAI` + `Pinecone`)](https://youtu.be/h0DHDp1FbmQ)
|
||||
- [Workaround `OpenAI's` Token Limit With Chain Types](https://youtu.be/f9_BWhCI4Zo)
|
||||
- [Build Your Own OpenAI + LangChain Web App in 23 Minutes](https://youtu.be/U_eV8wfMkXU)
|
||||
- [Working With The New `ChatGPT API`](https://youtu.be/e9P7FLi5Zy8)
|
||||
- [OpenAI + LangChain Wrote Me 100 Custom Sales Emails](https://youtu.be/y1pyAQM-3Bo)
|
||||
- [Structured Output From `OpenAI` (Clean Dirty Data)](https://youtu.be/KwAXfey-xQk)
|
||||
- [Connect `OpenAI` To +5,000 Tools (LangChain + `Zapier`)](https://youtu.be/7tNm0yiDigU)
|
||||
- [Use LLMs To Extract Data From Text (Expert Mode)](https://youtu.be/xZzvwR9jdPA)
|
||||
- [Extract Insights From Interview Transcripts Using LLMs](https://youtu.be/shkMOHwJ4SM)
|
||||
- [5 Levels Of LLM Summarizing: Novice to Expert](https://youtu.be/qaPMdcCqtWk)
|
||||
- [Control Tone & Writing Style Of Your LLM Output](https://youtu.be/miBG-a3FuhU)
|
||||
- [Build Your Own `AI Twitter Bot` Using LLMs](https://youtu.be/yLWLDjT01q8)
|
||||
- [ChatGPT made my interview questions for me (`Streamlit` + LangChain)](https://youtu.be/zvoAMx0WKkw)
|
||||
- [Function Calling via ChatGPT API - First Look With LangChain](https://youtu.be/0-zlUy7VUjg)
|
||||
- ⛓ [Extract Topics From Video/Audio With LLMs (Topic Modeling w/ LangChain)](https://youtu.be/pEkxRQFNAs4)
|
||||
|
||||
|
||||
### [LangChain How to and guides](https://www.youtube.com/playlist?list=PL8motc6AQftk1Bs42EW45kwYbyJ4jOdiZ) by [Sam Witteveen](https://www.youtube.com/@samwitteveenai)
|
||||
- [LangChain Basics - LLMs & PromptTemplates with Colab](https://youtu.be/J_0qvRt4LNk)
|
||||
- [LangChain Basics - Tools and Chains](https://youtu.be/hI2BY7yl_Ac)
|
||||
- [`ChatGPT API` Announcement & Code Walkthrough with LangChain](https://youtu.be/phHqvLHCwH4)
|
||||
- [Conversations with Memory (explanation & code walkthrough)](https://youtu.be/X550Zbz_ROE)
|
||||
- [Chat with `Flan20B`](https://youtu.be/VW5LBavIfY4)
|
||||
- [Using `Hugging Face Models` locally (code walkthrough)](https://youtu.be/Kn7SX2Mx_Jk)
|
||||
- [`PAL` : Program-aided Language Models with LangChain code](https://youtu.be/dy7-LvDu-3s)
|
||||
- [Building a Summarization System with LangChain and `GPT-3` - Part 1](https://youtu.be/LNq_2s_H01Y)
|
||||
- [Building a Summarization System with LangChain and `GPT-3` - Part 2](https://youtu.be/d-yeHDLgKHw)
|
||||
- [Microsoft's `Visual ChatGPT` using LangChain](https://youtu.be/7YEiEyfPF5U)
|
||||
- [LangChain Agents - Joining Tools and Chains with Decisions](https://youtu.be/ziu87EXZVUE)
|
||||
- [Comparing LLMs with LangChain](https://youtu.be/rFNG0MIEuW0)
|
||||
- [Using `Constitutional AI` in LangChain](https://youtu.be/uoVqNFDwpX4)
|
||||
- [Talking to `Alpaca` with LangChain - Creating an Alpaca Chatbot](https://youtu.be/v6sF8Ed3nTE)
|
||||
- [Talk to your `CSV` & `Excel` with LangChain](https://youtu.be/xQ3mZhw69bc)
|
||||
- [`BabyAGI`: Discover the Power of Task-Driven Autonomous Agents!](https://youtu.be/QBcDLSE2ERA)
|
||||
- [Improve your `BabyAGI` with LangChain](https://youtu.be/DRgPyOXZ-oE)
|
||||
- [Master `PDF` Chat with LangChain - Your essential guide to queries on documents](https://youtu.be/ZzgUqFtxgXI)
|
||||
- [Using LangChain with `DuckDuckGO` `Wikipedia` & `PythonREPL` Tools](https://youtu.be/KerHlb8nuVc)
|
||||
- [Building Custom Tools and Agents with LangChain (gpt-3.5-turbo)](https://youtu.be/biS8G8x8DdA)
|
||||
- [LangChain Retrieval QA Over Multiple Files with `ChromaDB`](https://youtu.be/3yPBVii7Ct0)
|
||||
- [LangChain Retrieval QA with Instructor Embeddings & `ChromaDB` for PDFs](https://youtu.be/cFCGUjc33aU)
|
||||
- [LangChain + Retrieval Local LLMs for Retrieval QA - No OpenAI!!!](https://youtu.be/9ISVjh8mdlA)
|
||||
- [`Camel` + LangChain for Synthetic Data & Market Research](https://youtu.be/GldMMK6-_-g)
|
||||
- [Information Extraction with LangChain & `Kor`](https://youtu.be/SW1ZdqH0rRQ)
|
||||
- [Converting a LangChain App from OpenAI to OpenSource](https://youtu.be/KUDn7bVyIfc)
|
||||
- [Using LangChain `Output Parsers` to get what you want out of LLMs](https://youtu.be/UVn2NroKQCw)
|
||||
- [Building a LangChain Custom Medical Agent with Memory](https://youtu.be/6UFtRwWnHws)
|
||||
- [Understanding `ReACT` with LangChain](https://youtu.be/Eug2clsLtFs)
|
||||
- [`OpenAI Functions` + LangChain : Building a Multi Tool Agent](https://youtu.be/4KXK6c6TVXQ)
|
||||
- [What can you do with 16K tokens in LangChain?](https://youtu.be/z2aCZBAtWXs)
|
||||
- [Tagging and Extraction - Classification using `OpenAI Functions`](https://youtu.be/a8hMgIcUEnE)
|
||||
- ⛓ [HOW to Make Conversational Form with LangChain](https://youtu.be/IT93On2LB5k)
|
||||
|
||||
|
||||
### [LangChain](https://www.youtube.com/playlist?list=PLVEEucA9MYhOu89CX8H3MBZqayTbcCTMr) by [Prompt Engineering](https://www.youtube.com/@engineerprompt)
|
||||
- [LangChain Crash Course — All You Need to Know to Build Powerful Apps with LLMs](https://youtu.be/5-fc4Tlgmro)
|
||||
- [Working with MULTIPLE `PDF` Files in LangChain: `ChatGPT` for your Data](https://youtu.be/s5LhRdh5fu4)
|
||||
- [`ChatGPT` for YOUR OWN `PDF` files with LangChain](https://youtu.be/TLf90ipMzfE)
|
||||
- [Talk to YOUR DATA without OpenAI APIs: LangChain](https://youtu.be/wrD-fZvT6UI)
|
||||
- [Langchain: PDF Chat App (GUI) | ChatGPT for Your PDF FILES](https://youtu.be/RIWbalZ7sTo)
|
||||
- [LangFlow: Build Chatbots without Writing Code](https://youtu.be/KJ-ux3hre4s)
|
||||
- [LangChain: Giving Memory to LLMs](https://youtu.be/dxO6pzlgJiY)
|
||||
- [BEST OPEN Alternative to `OPENAI's EMBEDDINGs` for Retrieval QA: LangChain](https://youtu.be/ogEalPMUCSY)
|
||||
|
||||
|
||||
### LangChain by [Chat with data](https://www.youtube.com/@chatwithdata)
|
||||
- [LangChain Beginner's Tutorial for `Typescript`/`Javascript`](https://youtu.be/bH722QgRlhQ)
|
||||
- [`GPT-4` Tutorial: How to Chat With Multiple `PDF` Files (~1000 pages of Tesla's 10-K Annual Reports)](https://youtu.be/Ix9WIZpArm0)
|
||||
- [`GPT-4` & LangChain Tutorial: How to Chat With A 56-Page `PDF` Document (w/`Pinecone`)](https://youtu.be/ih9PBGVVOO4)
|
||||
- [LangChain & Supabase Tutorial: How to Build a ChatGPT Chatbot For Your Website](https://youtu.be/R2FMzcsmQY8)
|
||||
- [LangChain Agents: Build Personal Assistants For Your Data (Q&A with Harrison Chase and Mayo Oshin)](https://youtu.be/gVkF8cwfBLI)
|
||||
|
||||
|
||||
---------------------
|
||||
⛓ icon marks a new addition [last update 2023-07-05]
|
||||
@@ -1864,14 +1864,6 @@
|
||||
"source": "/en/latest/modules/models/llms/integrations/writer.html",
|
||||
"destination": "/docs/modules/model_io/models/llms/integrations/writer"
|
||||
},
|
||||
{
|
||||
"source": "/en/latest/modules/prompts/output_parsers.html",
|
||||
"destination": "/docs/modules/model_io/output_parsers/"
|
||||
},
|
||||
{
|
||||
"source": "/docs/modules/prompts/output_parsers.html",
|
||||
"destination": "/docs/modules/model_io/output_parsers/"
|
||||
},
|
||||
{
|
||||
"source": "/en/latest/modules/prompts/output_parsers/examples/datetime.html",
|
||||
"destination": "/docs/modules/model_io/output_parsers/datetime"
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
# YouTube videos
|
||||
# YouTube tutorials
|
||||
|
||||
⛓ icon marks a new addition [last update 2023-06-20]
|
||||
This is a collection of `LangChain` videos on `YouTube`.
|
||||
|
||||
### [Official LangChain YouTube channel](https://www.youtube.com/@LangChain)
|
||||
|
||||
@@ -9,6 +9,7 @@
|
||||
- [LangChain and Weaviate with Harrison Chase and Bob van Luijt - Weaviate Podcast #36](https://youtu.be/lhby7Ql7hbk) by [Weaviate • Vector Database](https://www.youtube.com/@Weaviate)
|
||||
- [LangChain Demo + Q&A with Harrison Chase](https://youtu.be/zaYTXQFR0_s?t=788) by [Full Stack Deep Learning](https://www.youtube.com/@FullStackDeepLearning)
|
||||
- [LangChain Agents: Build Personal Assistants For Your Data (Q&A with Harrison Chase and Mayo Oshin)](https://youtu.be/gVkF8cwfBLI) by [Chat with data](https://www.youtube.com/@chatwithdata)
|
||||
- ⛓️ [LangChain "Agents in Production" Webinar](https://youtu.be/k8GNCCs16F4) by [LangChain](https://www.youtube.com/@LangChain)
|
||||
|
||||
## Videos (sorted by views)
|
||||
|
||||
@@ -30,9 +31,6 @@
|
||||
- [`Weaviate` + LangChain for LLM apps presented by Erika Cardenas](https://youtu.be/7AGj4Td5Lgw) by [`Weaviate` • Vector Database](https://www.youtube.com/@Weaviate)
|
||||
- [Langchain Overview — How to Use Langchain & `ChatGPT`](https://youtu.be/oYVYIq0lOtI) by [Python In Office](https://www.youtube.com/@pythoninoffice6568)
|
||||
- [Langchain Overview - How to Use Langchain & `ChatGPT`](https://youtu.be/oYVYIq0lOtI) by [Python In Office](https://www.youtube.com/@pythoninoffice6568)
|
||||
- [LangChain Tutorials](https://www.youtube.com/watch?v=FuqdVNB_8c0&list=PL9V0lbeJ69brU-ojMpU1Y7Ic58Tap0Cw6) by [Edrick](https://www.youtube.com/@edrickdch):
|
||||
- [LangChain, Chroma DB, OpenAI Beginner Guide | ChatGPT with your PDF](https://youtu.be/FuqdVNB_8c0)
|
||||
- [LangChain 101: The Complete Beginner's Guide](https://youtu.be/P3MAbZ2eMUI)
|
||||
- [Custom langchain Agent & Tools with memory. Turn any `Python function` into langchain tool with Gpt 3](https://youtu.be/NIG8lXk0ULg) by [echohive](https://www.youtube.com/@echohive)
|
||||
- [LangChain: Run Language Models Locally - `Hugging Face Models`](https://youtu.be/Xxxuw4_iCzw) by [Prompt Engineering](https://www.youtube.com/@engineerprompt)
|
||||
- [`ChatGPT` with any `YouTube` video using langchain and `chromadb`](https://youtu.be/TQZfB2bzVwU) by [echohive](https://www.youtube.com/@echohive)
|
||||
@@ -48,68 +46,154 @@
|
||||
- [Langchain + `Zapier` Agent](https://youtu.be/yribLAb-pxA) by [Merk](https://www.youtube.com/@merksworld)
|
||||
- [Connecting the Internet with `ChatGPT` (LLMs) using Langchain And Answers Your Questions](https://youtu.be/9Y0TBC63yZg) by [Kamalraj M M](https://www.youtube.com/@insightbuilder)
|
||||
- [Build More Powerful LLM Applications for Business’s with LangChain (Beginners Guide)](https://youtu.be/sp3-WLKEcBg) by[ No Code Blackbox](https://www.youtube.com/@nocodeblackbox)
|
||||
- [LangFlow LLM Agent Demo for 🦜🔗LangChain](https://youtu.be/zJxDHaWt-6o) by [Cobus Greyling](https://www.youtube.com/@CobusGreylingZA)
|
||||
- [Chatbot Factory: Streamline Python Chatbot Creation with LLMs and Langchain](https://youtu.be/eYer3uzrcuM) by [Finxter](https://www.youtube.com/@CobusGreylingZA)
|
||||
- [LangChain Tutorial - ChatGPT mit eigenen Daten](https://youtu.be/0XDLyY90E2c) by [Coding Crashkurse](https://www.youtube.com/@codingcrashkurse6429)
|
||||
- [Chat with a `CSV` | LangChain Agents Tutorial (Beginners)](https://youtu.be/tjeti5vXWOU) by [GoDataProf](https://www.youtube.com/@godataprof)
|
||||
- [Introdução ao Langchain - #Cortes - Live DataHackers](https://youtu.be/fw8y5VRei5Y) by [Prof. João Gabriel Lima](https://www.youtube.com/@profjoaogabriellima)
|
||||
- [LangChain: Level up `ChatGPT` !? | LangChain Tutorial Part 1](https://youtu.be/vxUGx8aZpDE) by [Code Affinity](https://www.youtube.com/@codeaffinitydev)
|
||||
- [KI schreibt krasses Youtube Skript 😲😳 | LangChain Tutorial Deutsch](https://youtu.be/QpTiXyK1jus) by [SimpleKI](https://www.youtube.com/@simpleki)
|
||||
- [Chat with Audio: Langchain, `Chroma DB`, OpenAI, and `Assembly AI`](https://youtu.be/Kjy7cx1r75g) by [AI Anytime](https://www.youtube.com/@AIAnytime)
|
||||
- [QA over documents with Auto vector index selection with Langchain router chains](https://youtu.be/9G05qybShv8) by [echohive](https://www.youtube.com/@echohive)
|
||||
- [Build your own custom LLM application with `Bubble.io` & Langchain (No Code & Beginner friendly)](https://youtu.be/O7NhQGu1m6c) by [No Code Blackbox](https://www.youtube.com/@nocodeblackbox)
|
||||
- [Simple App to Question Your Docs: Leveraging `Streamlit`, `Hugging Face Spaces`, LangChain, and `Claude`!](https://youtu.be/X4YbNECRr7o) by [Chris Alexiuk](https://www.youtube.com/@chrisalexiuk)
|
||||
- [LANGCHAIN AI- `ConstitutionalChainAI` + Databutton AI ASSISTANT Web App](https://youtu.be/5zIU6_rdJCU) by [Avra](https://www.youtube.com/@Avra_b)
|
||||
- [LANGCHAIN AI AUTONOMOUS AGENT WEB APP - 👶 `BABY AGI` 🤖 with EMAIL AUTOMATION using `DATABUTTON`](https://youtu.be/cvAwOGfeHgw) by [Avra](https://www.youtube.com/@Avra_b)
|
||||
- [The Future of Data Analysis: Using A.I. Models in Data Analysis (LangChain)](https://youtu.be/v_LIcVyg5dk) by [Absent Data](https://www.youtube.com/@absentdata)
|
||||
- [Memory in LangChain | Deep dive (python)](https://youtu.be/70lqvTFh_Yg) by [Eden Marco](https://www.youtube.com/@EdenMarco)
|
||||
- [9 LangChain UseCases | Beginner's Guide | 2023](https://youtu.be/zS8_qosHNMw) by [Data Science Basics](https://www.youtube.com/@datasciencebasics)
|
||||
- [Use Large Language Models in Jupyter Notebook | LangChain | Agents & Indexes](https://youtu.be/JSe11L1a_QQ) by [Abhinaw Tiwari](https://www.youtube.com/@AbhinawTiwariAT)
|
||||
- [How to Talk to Your Langchain Agent | `11 Labs` + `Whisper`](https://youtu.be/N4k459Zw2PU) by [VRSEN](https://www.youtube.com/@vrsen)
|
||||
- [LangChain Deep Dive: 5 FUN AI App Ideas To Build Quickly and Easily](https://youtu.be/mPYEPzLkeks) by [James NoCode](https://www.youtube.com/@jamesnocode)
|
||||
- [BEST OPEN Alternative to OPENAI's EMBEDDINGs for Retrieval QA: LangChain](https://youtu.be/ogEalPMUCSY) by [Prompt Engineering](https://www.youtube.com/@engineerprompt)
|
||||
- [LangChain 101: Models](https://youtu.be/T6c_XsyaNSQ) by [Mckay Wrigley](https://www.youtube.com/@realmckaywrigley)
|
||||
- [LangChain with JavaScript Tutorial #1 | Setup & Using LLMs](https://youtu.be/W3AoeMrg27o) by [Leon van Zyl](https://www.youtube.com/@leonvanzyl)
|
||||
- [LangChain Overview & Tutorial for Beginners: Build Powerful AI Apps Quickly & Easily (ZERO CODE)](https://youtu.be/iI84yym473Q) by [James NoCode](https://www.youtube.com/@jamesnocode)
|
||||
- [LangChain In Action: Real-World Use Case With Step-by-Step Tutorial](https://youtu.be/UO699Szp82M) by [Rabbitmetrics](https://www.youtube.com/@rabbitmetrics)
|
||||
- [Summarizing and Querying Multiple Papers with LangChain](https://youtu.be/p_MQRWH5Y6k) by [Automata Learning Lab](https://www.youtube.com/@automatalearninglab)
|
||||
- [Using Langchain (and `Replit`) through `Tana`, ask `Google`/`Wikipedia`/`Wolfram Alpha` to fill out a table](https://youtu.be/Webau9lEzoI) by [Stian Håklev](https://www.youtube.com/@StianHaklev)
|
||||
- [Langchain PDF App (GUI) | Create a ChatGPT For Your `PDF` in Python](https://youtu.be/wUAUdEw5oxM) by [Alejandro AO - Software & Ai](https://www.youtube.com/@alejandro_ao)
|
||||
- [Auto-GPT with LangChain 🔥 | Create Your Own Personal AI Assistant](https://youtu.be/imDfPmMKEjM) by [Data Science Basics](https://www.youtube.com/@datasciencebasics)
|
||||
- [Create Your OWN Slack AI Assistant with Python & LangChain](https://youtu.be/3jFXRNn2Bu8) by [Dave Ebbelaar](https://www.youtube.com/@daveebbelaar)
|
||||
- [How to Create LOCAL Chatbots with GPT4All and LangChain [Full Guide]](https://youtu.be/4p1Fojur8Zw) by [Liam Ottley](https://www.youtube.com/@LiamOttley)
|
||||
- [Build a `Multilingual PDF` Search App with LangChain, `Cohere` and `Bubble`](https://youtu.be/hOrtuumOrv8) by [Menlo Park Lab](https://www.youtube.com/@menloparklab)
|
||||
- [Building a LangChain Agent (code-free!) Using `Bubble` and `Flowise`](https://youtu.be/jDJIIVWTZDE) by [Menlo Park Lab](https://www.youtube.com/@menloparklab)
|
||||
- [Build a LangChain-based Semantic PDF Search App with No-Code Tools Bubble and Flowise](https://youtu.be/s33v5cIeqA4) by [Menlo Park Lab](https://www.youtube.com/@menloparklab)
|
||||
- [LangChain Memory Tutorial | Building a ChatGPT Clone in Python](https://youtu.be/Cwq91cj2Pnc) by [Alejandro AO - Software & Ai](https://www.youtube.com/@alejandro_ao)
|
||||
- [ChatGPT For Your DATA | Chat with Multiple Documents Using LangChain](https://youtu.be/TeDgIDqQmzs) by [Data Science Basics](https://www.youtube.com/@datasciencebasics)
|
||||
- [`Llama Index`: Chat with Documentation using URL Loader](https://youtu.be/XJRoDEctAwA) by [Merk](https://www.youtube.com/@merksworld)
|
||||
- [Using OpenAI, LangChain, and `Gradio` to Build Custom GenAI Applications](https://youtu.be/1MsmqMg3yUc) by [David Hundley](https://www.youtube.com/@dkhundley)
|
||||
- [LangChain, Chroma DB, OpenAI Beginner Guide | ChatGPT with your PDF](https://youtu.be/FuqdVNB_8c0)
|
||||
- ⛓ [Build AI chatbot with custom knowledge base using OpenAI API and GPT Index](https://youtu.be/vDZAZuaXf48) by [Irina Nik](https://www.youtube.com/@irina_nik)
|
||||
- ⛓ [Build Your Own Auto-GPT Apps with LangChain (Python Tutorial)](https://youtu.be/NYSWn1ipbgg) by [Dave Ebbelaar](https://www.youtube.com/@daveebbelaar)
|
||||
- ⛓ [Chat with Multiple `PDFs` | LangChain App Tutorial in Python (Free LLMs and Embeddings)](https://youtu.be/dXxQ0LR-3Hg) by [Alejandro AO - Software & Ai](https://www.youtube.com/@alejandro_ao)
|
||||
- ⛓ [Chat with a `CSV` | `LangChain Agents` Tutorial (Beginners)](https://youtu.be/tjeti5vXWOU) by [Alejandro AO - Software & Ai](https://www.youtube.com/@alejandro_ao)
|
||||
- ⛓ [Create Your Own ChatGPT with `PDF` Data in 5 Minutes (LangChain Tutorial)](https://youtu.be/au2WVVGUvc8) by [Liam Ottley](https://www.youtube.com/@LiamOttley)
|
||||
- ⛓ [Using ChatGPT with YOUR OWN Data. This is magical. (LangChain OpenAI API)](https://youtu.be/9AXP7tCI9PI) by [TechLead](https://www.youtube.com/@TechLead)
|
||||
- ⛓ [Build a Custom Chatbot with OpenAI: `GPT-Index` & LangChain | Step-by-Step Tutorial](https://youtu.be/FIDv6nc4CgU) by [Fabrikod](https://www.youtube.com/@fabrikod)
|
||||
- ⛓ [`Flowise` is an open source no-code UI visual tool to build 🦜🔗LangChain applications](https://youtu.be/CovAPtQPU0k) by [Cobus Greyling](https://www.youtube.com/@CobusGreylingZA)
|
||||
- ⛓ [LangChain & GPT 4 For Data Analysis: The `Pandas` Dataframe Agent](https://youtu.be/rFQ5Kmkd4jc) by [Rabbitmetrics](https://www.youtube.com/@rabbitmetrics)
|
||||
- ⛓ [`GirlfriendGPT` - AI girlfriend with LangChain](https://youtu.be/LiN3D1QZGQw) by [Toolfinder AI](https://www.youtube.com/@toolfinderai)
|
||||
- ⛓ [`PrivateGPT`: Chat to your FILES OFFLINE and FREE [Installation and Tutorial]](https://youtu.be/G7iLllmx4qc) by [Prompt Engineering](https://www.youtube.com/@engineerprompt)
|
||||
- ⛓ [How to build with Langchain 10x easier | ⛓️ LangFlow & `Flowise`](https://youtu.be/Ya1oGL7ZTvU) by [AI Jason](https://www.youtube.com/@AIJasonZ)
|
||||
- ⛓ [Getting Started With LangChain In 20 Minutes- Build Celebrity Search Application](https://youtu.be/_FpT1cwcSLg) by [Krish Naik](https://www.youtube.com/@krishnaik06)
|
||||
- ⛓️ [LangFlow LLM Agent Demo for 🦜🔗LangChain](https://youtu.be/zJxDHaWt-6o) by [Cobus Greyling](https://www.youtube.com/@CobusGreylingZA)
|
||||
- ⛓️ [Chatbot Factory: Streamline Python Chatbot Creation with LLMs and Langchain](https://youtu.be/eYer3uzrcuM) by [Finxter](https://www.youtube.com/@CobusGreylingZA)
|
||||
- ⛓️ [LangChain Tutorial - ChatGPT mit eigenen Daten](https://youtu.be/0XDLyY90E2c) by [Coding Crashkurse](https://www.youtube.com/@codingcrashkurse6429)
|
||||
- ⛓️ [Chat with a `CSV` | LangChain Agents Tutorial (Beginners)](https://youtu.be/tjeti5vXWOU) by [GoDataProf](https://www.youtube.com/@godataprof)
|
||||
- ⛓️ [Introdução ao Langchain - #Cortes - Live DataHackers](https://youtu.be/fw8y5VRei5Y) by [Prof. João Gabriel Lima](https://www.youtube.com/@profjoaogabriellima)
|
||||
- ⛓️ [LangChain: Level up `ChatGPT` !? | LangChain Tutorial Part 1](https://youtu.be/vxUGx8aZpDE) by [Code Affinity](https://www.youtube.com/@codeaffinitydev)
|
||||
- ⛓️ [KI schreibt krasses Youtube Skript 😲😳 | LangChain Tutorial Deutsch](https://youtu.be/QpTiXyK1jus) by [SimpleKI](https://www.youtube.com/@simpleki)
|
||||
- ⛓️ [Chat with Audio: Langchain, `Chroma DB`, OpenAI, and `Assembly AI`](https://youtu.be/Kjy7cx1r75g) by [AI Anytime](https://www.youtube.com/@AIAnytime)
|
||||
- ⛓️ [QA over documents with Auto vector index selection with Langchain router chains](https://youtu.be/9G05qybShv8) by [echohive](https://www.youtube.com/@echohive)
|
||||
- ⛓️ [Build your own custom LLM application with `Bubble.io` & Langchain (No Code & Beginner friendly)](https://youtu.be/O7NhQGu1m6c) by [No Code Blackbox](https://www.youtube.com/@nocodeblackbox)
|
||||
- ⛓️ [Simple App to Question Your Docs: Leveraging `Streamlit`, `Hugging Face Spaces`, LangChain, and `Claude`!](https://youtu.be/X4YbNECRr7o) by [Chris Alexiuk](https://www.youtube.com/@chrisalexiuk)
|
||||
- ⛓️ [LANGCHAIN AI- `ConstitutionalChainAI` + Databutton AI ASSISTANT Web App](https://youtu.be/5zIU6_rdJCU) by [Avra](https://www.youtube.com/@Avra_b)
|
||||
- ⛓️ [LANGCHAIN AI AUTONOMOUS AGENT WEB APP - 👶 `BABY AGI` 🤖 with EMAIL AUTOMATION using `DATABUTTON`](https://youtu.be/cvAwOGfeHgw) by [Avra](https://www.youtube.com/@Avra_b)
|
||||
- ⛓️ [The Future of Data Analysis: Using A.I. Models in Data Analysis (LangChain)](https://youtu.be/v_LIcVyg5dk) by [Absent Data](https://www.youtube.com/@absentdata)
|
||||
- ⛓️ [Memory in LangChain | Deep dive (python)](https://youtu.be/70lqvTFh_Yg) by [Eden Marco](https://www.youtube.com/@EdenMarco)
|
||||
- ⛓️ [9 LangChain UseCases | Beginner's Guide | 2023](https://youtu.be/zS8_qosHNMw) by [Data Science Basics](https://www.youtube.com/@datasciencebasics)
|
||||
- ⛓️ [Use Large Language Models in Jupyter Notebook | LangChain | Agents & Indexes](https://youtu.be/JSe11L1a_QQ) by [Abhinaw Tiwari](https://www.youtube.com/@AbhinawTiwariAT)
|
||||
- ⛓️ [How to Talk to Your Langchain Agent | `11 Labs` + `Whisper`](https://youtu.be/N4k459Zw2PU) by [VRSEN](https://www.youtube.com/@vrsen)
|
||||
- ⛓️ [LangChain Deep Dive: 5 FUN AI App Ideas To Build Quickly and Easily](https://youtu.be/mPYEPzLkeks) by [James NoCode](https://www.youtube.com/@jamesnocode)
|
||||
- ⛓️ [BEST OPEN Alternative to OPENAI's EMBEDDINGs for Retrieval QA: LangChain](https://youtu.be/ogEalPMUCSY) by [Prompt Engineering](https://www.youtube.com/@engineerprompt)
|
||||
- ⛓️ [LangChain 101: Models](https://youtu.be/T6c_XsyaNSQ) by [Mckay Wrigley](https://www.youtube.com/@realmckaywrigley)
|
||||
- ⛓️ [LangChain with JavaScript Tutorial #1 | Setup & Using LLMs](https://youtu.be/W3AoeMrg27o) by [Leon van Zyl](https://www.youtube.com/@leonvanzyl)
|
||||
- ⛓️ [LangChain Overview & Tutorial for Beginners: Build Powerful AI Apps Quickly & Easily (ZERO CODE)](https://youtu.be/iI84yym473Q) by [James NoCode](https://www.youtube.com/@jamesnocode)
|
||||
- ⛓️ [LangChain In Action: Real-World Use Case With Step-by-Step Tutorial](https://youtu.be/UO699Szp82M) by [Rabbitmetrics](https://www.youtube.com/@rabbitmetrics)
|
||||
- ⛓️ [Summarizing and Querying Multiple Papers with LangChain](https://youtu.be/p_MQRWH5Y6k) by [Automata Learning Lab](https://www.youtube.com/@automatalearninglab)
|
||||
- ⛓️ [Using Langchain (and `Replit`) through `Tana`, ask `Google`/`Wikipedia`/`Wolfram Alpha` to fill out a table](https://youtu.be/Webau9lEzoI) by [Stian Håklev](https://www.youtube.com/@StianHaklev)
|
||||
- ⛓️ [Langchain PDF App (GUI) | Create a ChatGPT For Your `PDF` in Python](https://youtu.be/wUAUdEw5oxM) by [Alejandro AO - Software & Ai](https://www.youtube.com/@alejandro_ao)
|
||||
- ⛓️ [Auto-GPT with LangChain 🔥 | Create Your Own Personal AI Assistant](https://youtu.be/imDfPmMKEjM) by [Data Science Basics](https://www.youtube.com/@datasciencebasics)
|
||||
- ⛓️ [Create Your OWN Slack AI Assistant with Python & LangChain](https://youtu.be/3jFXRNn2Bu8) by [Dave Ebbelaar](https://www.youtube.com/@daveebbelaar)
|
||||
- ⛓️ [How to Create LOCAL Chatbots with GPT4All and LangChain [Full Guide]](https://youtu.be/4p1Fojur8Zw) by [Liam Ottley](https://www.youtube.com/@LiamOttley)
|
||||
- ⛓️ [Build a `Multilingual PDF` Search App with LangChain, `Cohere` and `Bubble`](https://youtu.be/hOrtuumOrv8) by [Menlo Park Lab](https://www.youtube.com/@menloparklab)
|
||||
- ⛓️ [Building a LangChain Agent (code-free!) Using `Bubble` and `Flowise`](https://youtu.be/jDJIIVWTZDE) by [Menlo Park Lab](https://www.youtube.com/@menloparklab)
|
||||
- ⛓️ [Build a LangChain-based Semantic PDF Search App with No-Code Tools Bubble and Flowise](https://youtu.be/s33v5cIeqA4) by [Menlo Park Lab](https://www.youtube.com/@menloparklab)
|
||||
- ⛓️ [LangChain Memory Tutorial | Building a ChatGPT Clone in Python](https://youtu.be/Cwq91cj2Pnc) by [Alejandro AO - Software & Ai](https://www.youtube.com/@alejandro_ao)
|
||||
- ⛓️ [ChatGPT For Your DATA | Chat with Multiple Documents Using LangChain](https://youtu.be/TeDgIDqQmzs) by [Data Science Basics](https://www.youtube.com/@datasciencebasics)
|
||||
- ⛓️ [`Llama Index`: Chat with Documentation using URL Loader](https://youtu.be/XJRoDEctAwA) by [Merk](https://www.youtube.com/@merksworld)
|
||||
- ⛓️ [Using OpenAI, LangChain, and `Gradio` to Build Custom GenAI Applications](https://youtu.be/1MsmqMg3yUc) by [David Hundley](https://www.youtube.com/@dkhundley)
|
||||
- ⛓️ [LangChain, Chroma DB, OpenAI Beginner Guide | ChatGPT with your PDF](https://youtu.be/FuqdVNB_8c0)
|
||||
- [LangChain Crash Course: Build an AutoGPT app in 25 minutes](https://youtu.be/MlK6SIjcjE8) by [Nicholas Renotte](https://www.youtube.com/@NicholasRenotte)
|
||||
- [LangChain Crash Course - Build apps with language models](https://youtu.be/LbT1yp6quS8) by [Patrick Loeber](https://www.youtube.com/@patloeber)
|
||||
- [LangChain Explained in 13 Minutes | QuickStart Tutorial for Beginners](https://youtu.be/aywZrzNaKjs) by [Rabbitmetrics](https://www.youtube.com/@rabbitmetrics)
|
||||
|
||||
|
||||
## Tutorial Series
|
||||
|
||||
### [Prompt Engineering and LangChain](https://www.youtube.com/watch?v=muXbPpG_ys4&list=PLEJK-H61Xlwzm5FYLDdKt_6yibO33zoMW) by [Venelin Valkov](https://www.youtube.com/@venelin_valkov)
|
||||
|
||||
⛓ icon marks a new addition [last update 2023-05-15]
|
||||
|
||||
### DeepLearning.AI course
|
||||
⛓[LangChain for LLM Application Development](https://learn.deeplearning.ai/langchain) by Harrison Chase presented by [Andrew Ng](https://en.wikipedia.org/wiki/Andrew_Ng)
|
||||
|
||||
### Handbook
|
||||
[LangChain AI Handbook](https://www.pinecone.io/learn/langchain/) By **James Briggs** and **Francisco Ingham**
|
||||
|
||||
### Tutorials
|
||||
[LangChain Tutorials](https://www.youtube.com/watch?v=FuqdVNB_8c0&list=PL9V0lbeJ69brU-ojMpU1Y7Ic58Tap0Cw6) by [Edrick](https://www.youtube.com/@edrickdch):
|
||||
- ⛓ [LangChain, Chroma DB, OpenAI Beginner Guide | ChatGPT with your PDF](https://youtu.be/FuqdVNB_8c0)
|
||||
- ⛓ [LangChain 101: The Complete Beginner's Guide](https://youtu.be/P3MAbZ2eMUI)
|
||||
|
||||
[LangChain Crash Course: Build an AutoGPT app in 25 minutes](https://youtu.be/MlK6SIjcjE8) by [Nicholas Renotte](https://www.youtube.com/@NicholasRenotte)
|
||||
|
||||
|
||||
[LangChain Crash Course - Build apps with language models](https://youtu.be/LbT1yp6quS8) by [Patrick Loeber](https://www.youtube.com/@patloeber)
|
||||
|
||||
|
||||
[LangChain Explained in 13 Minutes | QuickStart Tutorial for Beginners](https://youtu.be/aywZrzNaKjs) by [Rabbitmetrics](https://www.youtube.com/@rabbitmetrics)
|
||||
|
||||
|
||||
### [LangChain for Gen AI and LLMs](https://www.youtube.com/playlist?list=PLIUOU7oqGTLieV9uTIFMm6_4PXg-hlN6F) by [James Briggs](https://www.youtube.com/@jamesbriggs):
|
||||
- #1 [Getting Started with `GPT-3` vs. Open Source LLMs](https://youtu.be/nE2skSRWTTs)
|
||||
- #2 [Prompt Templates for `GPT 3.5` and other LLMs](https://youtu.be/RflBcK0oDH0)
|
||||
- #3 [LLM Chains using `GPT 3.5` and other LLMs](https://youtu.be/S8j9Tk0lZHU)
|
||||
- #4 [Chatbot Memory for `Chat-GPT`, `Davinci` + other LLMs](https://youtu.be/X05uK0TZozM)
|
||||
- #5 [Chat with OpenAI in LangChain](https://youtu.be/CnAgB3A5OlU)
|
||||
- ⛓ #6 [Fixing LLM Hallucinations with Retrieval Augmentation in LangChain](https://youtu.be/kvdVduIJsc8)
|
||||
- ⛓ #7 [LangChain Agents Deep Dive with GPT 3.5](https://youtu.be/jSP-gSEyVeI)
|
||||
- ⛓ #8 [Create Custom Tools for Chatbots in LangChain](https://youtu.be/q-HNphrWsDE)
|
||||
- ⛓ #9 [Build Conversational Agents with Vector DBs](https://youtu.be/H6bCqqw9xyI)
|
||||
|
||||
|
||||
### [LangChain 101](https://www.youtube.com/playlist?list=PLqZXAkvF1bPNQER9mLmDbntNfSpzdDIU5) by [Data Independent](https://www.youtube.com/@DataIndependent):
|
||||
- [What Is LangChain? - LangChain + `ChatGPT` Overview](https://youtu.be/_v_fgW2SkkQ)
|
||||
- [Quickstart Guide](https://youtu.be/kYRB-vJFy38)
|
||||
- [Beginner Guide To 7 Essential Concepts](https://youtu.be/2xxziIWmaSA)
|
||||
- [`OpenAI` + `Wolfram Alpha`](https://youtu.be/UijbzCIJ99g)
|
||||
- [Ask Questions On Your Custom (or Private) Files](https://youtu.be/EnT-ZTrcPrg)
|
||||
- [Connect `Google Drive Files` To `OpenAI`](https://youtu.be/IqqHqDcXLww)
|
||||
- [`YouTube Transcripts` + `OpenAI`](https://youtu.be/pNcQ5XXMgH4)
|
||||
- [Question A 300 Page Book (w/ `OpenAI` + `Pinecone`)](https://youtu.be/h0DHDp1FbmQ)
|
||||
- [Workaround `OpenAI's` Token Limit With Chain Types](https://youtu.be/f9_BWhCI4Zo)
|
||||
- [Build Your Own OpenAI + LangChain Web App in 23 Minutes](https://youtu.be/U_eV8wfMkXU)
|
||||
- [Working With The New `ChatGPT API`](https://youtu.be/e9P7FLi5Zy8)
|
||||
- [OpenAI + LangChain Wrote Me 100 Custom Sales Emails](https://youtu.be/y1pyAQM-3Bo)
|
||||
- [Structured Output From `OpenAI` (Clean Dirty Data)](https://youtu.be/KwAXfey-xQk)
|
||||
- [Connect `OpenAI` To +5,000 Tools (LangChain + `Zapier`)](https://youtu.be/7tNm0yiDigU)
|
||||
- [Use LLMs To Extract Data From Text (Expert Mode)](https://youtu.be/xZzvwR9jdPA)
|
||||
- ⛓ [Extract Insights From Interview Transcripts Using LLMs](https://youtu.be/shkMOHwJ4SM)
|
||||
- ⛓ [5 Levels Of LLM Summarizing: Novice to Expert](https://youtu.be/qaPMdcCqtWk)
|
||||
|
||||
|
||||
### [LangChain How to and guides](https://www.youtube.com/playlist?list=PL8motc6AQftk1Bs42EW45kwYbyJ4jOdiZ) by [Sam Witteveen](https://www.youtube.com/@samwitteveenai):
|
||||
- [LangChain Basics - LLMs & PromptTemplates with Colab](https://youtu.be/J_0qvRt4LNk)
|
||||
- [LangChain Basics - Tools and Chains](https://youtu.be/hI2BY7yl_Ac)
|
||||
- [`ChatGPT API` Announcement & Code Walkthrough with LangChain](https://youtu.be/phHqvLHCwH4)
|
||||
- [Conversations with Memory (explanation & code walkthrough)](https://youtu.be/X550Zbz_ROE)
|
||||
- [Chat with `Flan20B`](https://youtu.be/VW5LBavIfY4)
|
||||
- [Using `Hugging Face Models` locally (code walkthrough)](https://youtu.be/Kn7SX2Mx_Jk)
|
||||
- [`PAL` : Program-aided Language Models with LangChain code](https://youtu.be/dy7-LvDu-3s)
|
||||
- [Building a Summarization System with LangChain and `GPT-3` - Part 1](https://youtu.be/LNq_2s_H01Y)
|
||||
- [Building a Summarization System with LangChain and `GPT-3` - Part 2](https://youtu.be/d-yeHDLgKHw)
|
||||
- [Microsoft's `Visual ChatGPT` using LangChain](https://youtu.be/7YEiEyfPF5U)
|
||||
- [LangChain Agents - Joining Tools and Chains with Decisions](https://youtu.be/ziu87EXZVUE)
|
||||
- [Comparing LLMs with LangChain](https://youtu.be/rFNG0MIEuW0)
|
||||
- [Using `Constitutional AI` in LangChain](https://youtu.be/uoVqNFDwpX4)
|
||||
- [Talking to `Alpaca` with LangChain - Creating an Alpaca Chatbot](https://youtu.be/v6sF8Ed3nTE)
|
||||
- [Talk to your `CSV` & `Excel` with LangChain](https://youtu.be/xQ3mZhw69bc)
|
||||
- [`BabyAGI`: Discover the Power of Task-Driven Autonomous Agents!](https://youtu.be/QBcDLSE2ERA)
|
||||
- [Improve your `BabyAGI` with LangChain](https://youtu.be/DRgPyOXZ-oE)
|
||||
- ⛓ [Master `PDF` Chat with LangChain - Your essential guide to queries on documents](https://youtu.be/ZzgUqFtxgXI)
|
||||
- ⛓ [Using LangChain with `DuckDuckGO` `Wikipedia` & `PythonREPL` Tools](https://youtu.be/KerHlb8nuVc)
|
||||
- ⛓ [Building Custom Tools and Agents with LangChain (gpt-3.5-turbo)](https://youtu.be/biS8G8x8DdA)
|
||||
- ⛓ [LangChain Retrieval QA Over Multiple Files with `ChromaDB`](https://youtu.be/3yPBVii7Ct0)
|
||||
- ⛓ [LangChain Retrieval QA with Instructor Embeddings & `ChromaDB` for PDFs](https://youtu.be/cFCGUjc33aU)
|
||||
- ⛓ [LangChain + Retrieval Local LLMs for Retrieval QA - No OpenAI!!!](https://youtu.be/9ISVjh8mdlA)
|
||||
|
||||
|
||||
### [LangChain](https://www.youtube.com/playlist?list=PLVEEucA9MYhOu89CX8H3MBZqayTbcCTMr) by [Prompt Engineering](https://www.youtube.com/@engineerprompt):
|
||||
- [LangChain Crash Course — All You Need to Know to Build Powerful Apps with LLMs](https://youtu.be/5-fc4Tlgmro)
|
||||
- [Working with MULTIPLE `PDF` Files in LangChain: `ChatGPT` for your Data](https://youtu.be/s5LhRdh5fu4)
|
||||
- [`ChatGPT` for YOUR OWN `PDF` files with LangChain](https://youtu.be/TLf90ipMzfE)
|
||||
- [Talk to YOUR DATA without OpenAI APIs: LangChain](https://youtu.be/wrD-fZvT6UI)
|
||||
- ⛓️ [CHATGPT For WEBSITES: Custom ChatBOT](https://youtu.be/RBnuhhmD21U)
|
||||
|
||||
|
||||
### LangChain by [Chat with data](https://www.youtube.com/@chatwithdata)
|
||||
- [LangChain Beginner's Tutorial for `Typescript`/`Javascript`](https://youtu.be/bH722QgRlhQ)
|
||||
- [`GPT-4` Tutorial: How to Chat With Multiple `PDF` Files (~1000 pages of Tesla's 10-K Annual Reports)](https://youtu.be/Ix9WIZpArm0)
|
||||
- [`GPT-4` & LangChain Tutorial: How to Chat With A 56-Page `PDF` Document (w/`Pinecone`)](https://youtu.be/ih9PBGVVOO4)
|
||||
- ⛓ [LangChain & Supabase Tutorial: How to Build a ChatGPT Chatbot For Your Website](https://youtu.be/R2FMzcsmQY8)
|
||||
|
||||
|
||||
### [Get SH\*T Done with Prompt Engineering and LangChain](https://www.youtube.com/watch?v=muXbPpG_ys4&list=PLEJK-H61Xlwzm5FYLDdKt_6yibO33zoMW) by [Venelin Valkov](https://www.youtube.com/@venelin_valkov)
|
||||
- [Getting Started with LangChain: Load Custom Data, Run OpenAI Models, Embeddings and `ChatGPT`](https://www.youtube.com/watch?v=muXbPpG_ys4)
|
||||
- [Loaders, Indexes & Vectorstores in LangChain: Question Answering on `PDF` files with `ChatGPT`](https://www.youtube.com/watch?v=FQnvfR8Dmr0)
|
||||
- [LangChain Models: `ChatGPT`, `Flan Alpaca`, `OpenAI Embeddings`, Prompt Templates & Streaming](https://www.youtube.com/watch?v=zy6LiK5F5-s)
|
||||
- [LangChain Chains: Use `ChatGPT` to Build Conversational Agents, Summaries and Q&A on Text With LLMs](https://www.youtube.com/watch?v=h1tJZQPcimM)
|
||||
- [Analyze Custom CSV Data with `GPT-4` using Langchain](https://www.youtube.com/watch?v=Ew3sGdX8at4)
|
||||
- [Build ChatGPT Chatbots with LangChain Memory: Understanding and Implementing Memory in Conversations](https://youtu.be/CyuUlf54wTs)
|
||||
|
||||
- ⛓ [Build ChatGPT Chatbots with LangChain Memory: Understanding and Implementing Memory in Conversations](https://youtu.be/CyuUlf54wTs)
|
||||
|
||||
---------------------
|
||||
⛓ icon marks a new addition [last update 2023-06-20]
|
||||
⛓ icon marks a new addition [last update 2023-05-15]
|
||||
|
||||
@@ -2,121 +2,203 @@
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "944e4194",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Arthur"
|
||||
"# Arthur LangChain integration"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "b1ccdfe8",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"[Arthur](https://arthur.ai) is a model monitoring and observability platform.\n",
|
||||
"[Arthur](https://www.arthur.ai/) is a model monitoring and observability platform.\n",
|
||||
"\n",
|
||||
"The following guide shows how to run a registered chat LLM with the Arthur callback handler to automatically log model inferences to Arthur.\n",
|
||||
"This notebook shows how to register LLMs (chat and non-chat) as models with the Arthur platform. Then we show how to set up langchain LLMs with an Arthur callback that will automatically log model inferences to Arthur.\n",
|
||||
"\n",
|
||||
"If you do not have a model currently onboarded to Arthur, visit our [onboarding guide for generative text models](https://docs.arthur.ai/user-guide/walkthroughs/model-onboarding/generative_text_onboarding.html). For more information about how to use the Arthur SDK, visit our [docs](https://docs.arthur.ai/)."
|
||||
"For more information about how to use the Arthur SDK, visit our [docs](http://docs.arthur.ai), in particular our [model onboarding guide](https://docs.arthur.ai/user-guide/walkthroughs/model-onboarding/index.html)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"metadata": {
|
||||
"id": "y8ku6X96sebl"
|
||||
},
|
||||
"execution_count": 21,
|
||||
"id": "961c6691",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.callbacks import ArthurCallbackHandler\n",
|
||||
"from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler\n",
|
||||
"from langchain.chat_models import ChatOpenAI\n",
|
||||
"from langchain.schema import HumanMessage"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Place Arthur credentials here"
|
||||
"from langchain.chat_models import ChatOpenAI, ChatAnthropic\n",
|
||||
"from langchain.schema import HumanMessage\n",
|
||||
"from langchain.llms import OpenAI, Cohere, HuggingFacePipeline"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"metadata": {
|
||||
"id": "Me3prhqjsoqz"
|
||||
},
|
||||
"id": "a23d1963",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"arthur_url = \"https://app.arthur.ai\"\n",
|
||||
"arthur_login = \"your-arthur-login-username-here\"\n",
|
||||
"arthur_model_id = \"your-arthur-model-id-here\""
|
||||
"from arthurai import ArthurAI\n",
|
||||
"from arthurai.common.constants import InputType, OutputType, Stage, ValueType\n",
|
||||
"from arthurai.core.attributes import ArthurAttribute, AttributeCategory"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "4d1b90c0",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Create Langchain LLM with Arthur callback handler"
|
||||
"# ArthurModel for chatbot with only input text and output text attributes"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "1a4a4a8a",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Connect to Arthur client"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"metadata": {
|
||||
"id": "9Hq9snQasynA"
|
||||
},
|
||||
"id": "f49e9b79",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def make_langchain_chat_llm(chat_model=):\n",
|
||||
" return ChatOpenAI(\n",
|
||||
" streaming=True,\n",
|
||||
" temperature=0.1,\n",
|
||||
" callbacks=[\n",
|
||||
" StreamingStdOutCallbackHandler(),\n",
|
||||
" ArthurCallbackHandler.from_credentials(\n",
|
||||
" arthur_model_id, \n",
|
||||
" arthur_url=arthur_url, \n",
|
||||
" arthur_login=arthur_login)\n",
|
||||
" ])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Please enter password for admin: ········\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"chatgpt = make_langchain_chat_llm()"
|
||||
"arthur_url = \"https://app.arthur.ai\"\n",
|
||||
"arthur_login = \"your-username-here\"\n",
|
||||
"arthur = ArthurAI(url=arthur_url, login=arthur_login)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "aXRyj50Ls8eP"
|
||||
},
|
||||
"id": "c6e063bf",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Running the chat LLM with this `run` function will save the chat history in an ongoing list so that the conversation can reference earlier messages and log each response to the Arthur platform. You can view the history of this model's inferences on your [model dashboard page](https://app.arthur.ai/).\n",
|
||||
"Before you can register model inferences to Arthur, you must have a registered model with an ID in the Arthur platform. We will provide this ID to the ArthurCallbackHandler.\n",
|
||||
"\n",
|
||||
"Enter `q` to quit the run loop"
|
||||
"You can register a model with Arthur here in the notebook using this `register_chat_llm()` function. This function returns the ID of the model saved to the platform. To use the function, uncomment `arthur_model_chatbot_id = register_chat_llm()` in the cell below."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 13,
|
||||
"metadata": {
|
||||
"id": "4taWSbN-s31Y"
|
||||
},
|
||||
"execution_count": 5,
|
||||
"id": "31b17b5e",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def run(llm):\n",
|
||||
"def register_chat_llm():\n",
|
||||
"\n",
|
||||
" arthur_model = arthur.model(\n",
|
||||
" display_name=\"LangChainChat\",\n",
|
||||
" input_type=InputType.NLP,\n",
|
||||
" output_type=OutputType.TokenSequence\n",
|
||||
" )\n",
|
||||
"\n",
|
||||
" arthur_model._add_attribute_to_model(ArthurAttribute(\n",
|
||||
" name=\"my_input_text\",\n",
|
||||
" stage=Stage.ModelPipelineInput,\n",
|
||||
" value_type=ValueType.Unstructured_Text,\n",
|
||||
" categorical=True,\n",
|
||||
" is_unique=True\n",
|
||||
" ))\n",
|
||||
" arthur_model._add_attribute_to_model(ArthurAttribute(\n",
|
||||
" name=\"my_output_text\",\n",
|
||||
" stage=Stage.PredictedValue,\n",
|
||||
" value_type=ValueType.Unstructured_Text,\n",
|
||||
" categorical=True,\n",
|
||||
" is_unique=False,\n",
|
||||
" ))\n",
|
||||
" \n",
|
||||
" return arthur_model.save()\n",
|
||||
"# arthur_model_chatbot_id = register_chat_llm()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "0d1d1e60",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Alternatively, you can set the `arthur_model_chatbot_id` variable to be the ID of your model on your [model dashboard](https://app.arthur.ai/)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "cdfa02c8",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"arthur_model_chatbot_id = \"your-model-id-here\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "58be5234",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"This function creates a Langchain chat LLM with the ArthurCallbackHandler to log inferences to Arthur. We provide our `arthur_model_chatbot_id`, as well as the Arthur url and login we are using."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"id": "448a8fee",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def make_langchain_chat_llm(chat_model=ChatOpenAI):\n",
|
||||
" if chat_model not in [ChatOpenAI, ChatAnthropic]:\n",
|
||||
" raise ValueError(\"For this notebook, use one of the chat models imported from langchain.chat_models\")\n",
|
||||
" return chat_model(\n",
|
||||
" streaming=True, \n",
|
||||
" temperature=0.1,\n",
|
||||
" callbacks=[\n",
|
||||
" StreamingStdOutCallbackHandler(), \n",
|
||||
" ArthurCallbackHandler.from_credentials(arthur_model_chatbot_id, arthur_url=arthur_url, arthur_login=arthur_login)\n",
|
||||
" ])\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "17c182da",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"id": "2dfc00ed",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"chat_llm = make_langchain_chat_llm()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "139291f2",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Run the chatbot (it will save the chat history in the `history` list so that the conversation can reference earlier messages)\n",
|
||||
"\n",
|
||||
"Type `q` to quit"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"id": "7480a443",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def run_langchain_chat_llm(llm):\n",
|
||||
" history = []\n",
|
||||
" while True:\n",
|
||||
" user_input = input(\"\\n>>> input >>>\\n>>>: \")\n",
|
||||
@@ -127,54 +209,238 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 17,
|
||||
"metadata": {
|
||||
"id": "MEx8nWJps-EG"
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
">>> input >>>\n",
|
||||
">>>: What is a callback handler?\n",
|
||||
"A callback handler, also known as a callback function or callback method, is a piece of code that is executed in response to a specific event or condition. It is commonly used in programming languages that support event-driven or asynchronous programming paradigms.\n",
|
||||
"\n",
|
||||
"The purpose of a callback handler is to provide a way for developers to define custom behavior that should be executed when a certain event occurs. Instead of waiting for a result or blocking the execution, the program registers a callback function and continues with other tasks. When the event is triggered, the callback function is invoked, allowing the program to respond accordingly.\n",
|
||||
"\n",
|
||||
"Callback handlers are commonly used in various scenarios, such as handling user input, responding to network requests, processing asynchronous operations, and implementing event-driven architectures. They provide a flexible and modular way to handle events and decouple different components of a system.\n",
|
||||
">>> input >>>\n",
|
||||
">>>: What do I need to do to get the full benefits of this\n",
|
||||
"To get the full benefits of using a callback handler, you should consider the following:\n",
|
||||
"\n",
|
||||
"1. Understand the event or condition: Identify the specific event or condition that you want to respond to with a callback handler. This could be user input, network requests, or any other asynchronous operation.\n",
|
||||
"\n",
|
||||
"2. Define the callback function: Create a function that will be executed when the event or condition occurs. This function should contain the desired behavior or actions you want to take in response to the event.\n",
|
||||
"\n",
|
||||
"3. Register the callback function: Depending on the programming language or framework you are using, you may need to register or attach the callback function to the appropriate event or condition. This ensures that the callback function is invoked when the event occurs.\n",
|
||||
"\n",
|
||||
"4. Handle the callback: Implement the necessary logic within the callback function to handle the event or condition. This could involve updating the user interface, processing data, making further requests, or triggering other actions.\n",
|
||||
"\n",
|
||||
"5. Consider error handling: It's important to handle any potential errors or exceptions that may occur within the callback function. This ensures that your program can gracefully handle unexpected situations and prevent crashes or undesired behavior.\n",
|
||||
"\n",
|
||||
"6. Maintain code readability and modularity: As your codebase grows, it's crucial to keep your callback handlers organized and maintainable. Consider using design patterns or architectural principles to structure your code in a modular and scalable way.\n",
|
||||
"\n",
|
||||
"By following these steps, you can leverage the benefits of callback handlers, such as asynchronous and event-driven programming, improved responsiveness, and modular code design.\n",
|
||||
">>> input >>>\n",
|
||||
">>>: q\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"execution_count": 10,
|
||||
"id": "6868ce71",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"run(chatgpt)"
|
||||
"run_langchain_chat_llm(chat_llm)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "a0be7d01",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# ArthurModel with input text, output text, token likelihoods, finish reason, and amount of token usage attributes"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "1ee4b741",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"This function registers an LLM with additional metadata attributes to log to Arthur with each inference\n",
|
||||
"\n",
|
||||
"As above, you can register your callback handler for an LLM using this function here in the notebook or by pasting the ID of an already-registered model from your [model dashboard](https://app.arthur.ai/)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"id": "e671836c",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def register_llm():\n",
|
||||
"\n",
|
||||
" arthur_model = arthur.model(\n",
|
||||
" display_name=\"LangChainLLM\",\n",
|
||||
" input_type=InputType.NLP,\n",
|
||||
" output_type=OutputType.TokenSequence\n",
|
||||
" )\n",
|
||||
" arthur_model._add_attribute_to_model(ArthurAttribute(\n",
|
||||
" name=\"my_input_text\",\n",
|
||||
" stage=Stage.ModelPipelineInput,\n",
|
||||
" value_type=ValueType.Unstructured_Text,\n",
|
||||
" categorical=True,\n",
|
||||
" is_unique=True\n",
|
||||
" ))\n",
|
||||
" arthur_model._add_attribute_to_model(ArthurAttribute(\n",
|
||||
" name=\"my_output_text\",\n",
|
||||
" stage=Stage.PredictedValue,\n",
|
||||
" value_type=ValueType.Unstructured_Text,\n",
|
||||
" categorical=True,\n",
|
||||
" is_unique=False,\n",
|
||||
" token_attribute_link=\"my_output_likelihoods\"\n",
|
||||
" ))\n",
|
||||
" arthur_model._add_attribute_to_model(ArthurAttribute(\n",
|
||||
" name=\"my_output_likelihoods\",\n",
|
||||
" stage=Stage.PredictedValue,\n",
|
||||
" value_type=ValueType.TokenLikelihoods,\n",
|
||||
" token_attribute_link=\"my_output_text\"\n",
|
||||
" ))\n",
|
||||
" arthur_model._add_attribute_to_model(ArthurAttribute(\n",
|
||||
" name=\"finish_reason\",\n",
|
||||
" stage=Stage.NonInputData,\n",
|
||||
" value_type=ValueType.String,\n",
|
||||
" categorical=True,\n",
|
||||
" categories=[\n",
|
||||
" AttributeCategory(value='stop'),\n",
|
||||
" AttributeCategory(value='length'),\n",
|
||||
" AttributeCategory(value='content_filter'),\n",
|
||||
" AttributeCategory(value='null')\n",
|
||||
" ]\n",
|
||||
" ))\n",
|
||||
" arthur_model._add_attribute_to_model(ArthurAttribute(\n",
|
||||
" name=\"prompt_tokens\",\n",
|
||||
" stage=Stage.NonInputData,\n",
|
||||
" value_type=ValueType.Integer\n",
|
||||
" ))\n",
|
||||
" arthur_model._add_attribute_to_model(ArthurAttribute(\n",
|
||||
" name=\"completion_tokens\",\n",
|
||||
" stage=Stage.NonInputData,\n",
|
||||
" value_type=ValueType.Integer\n",
|
||||
" ))\n",
|
||||
" arthur_model._add_attribute_to_model(ArthurAttribute(\n",
|
||||
" name=\"duration\",\n",
|
||||
" stage=Stage.NonInputData,\n",
|
||||
" value_type=ValueType.Float\n",
|
||||
" ))\n",
|
||||
" \n",
|
||||
" return arthur_model.save()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 12,
|
||||
"id": "2a6686f7",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"arthur_model_llm_id = \"your-model-id-here\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "2dcacb96",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"These functions create Langchain LLMs with the ArthurCallbackHandler to log inferences to Arthur.\n",
|
||||
"\n",
|
||||
"There are small differences in the underlying Langchain integrations with these libraries and the available metadata for model inputs & outputs"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 23,
|
||||
"id": "34cf0072",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def make_langchain_openai_llm():\n",
|
||||
" return OpenAI(\n",
|
||||
" temperature=0.1,\n",
|
||||
" model_kwargs = {'logprobs': 3},\n",
|
||||
" callbacks=[\n",
|
||||
" ArthurCallbackHandler.from_credentials(arthur_model_llm_id, arthur_url=arthur_url, arthur_login=arthur_login)\n",
|
||||
" ])\n",
|
||||
"\n",
|
||||
"def make_langchain_cohere_llm():\n",
|
||||
" return Cohere(\n",
|
||||
" temperature=0.1,\n",
|
||||
" callbacks=[\n",
|
||||
" ArthurCallbackHandler.from_credentials(arthur_model_chatbot_id, arthur_url=arthur_url, arthur_login=arthur_login)\n",
|
||||
" ])\n",
|
||||
"\n",
|
||||
"def make_langchain_huggingface_llm():\n",
|
||||
" llm = HuggingFacePipeline.from_model_id(\n",
|
||||
" model_id=\"bert-base-uncased\", \n",
|
||||
" task=\"text-generation\", \n",
|
||||
" model_kwargs={\"temperature\":2.5, \"max_length\":64})\n",
|
||||
" llm.callbacks = [\n",
|
||||
" ArthurCallbackHandler.from_credentials(arthur_model_chatbot_id, arthur_url=arthur_url, arthur_login=arthur_login)\n",
|
||||
" ]\n",
|
||||
" return llm"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 24,
|
||||
"id": "f40c3ce0",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"openai_llm = make_langchain_openai_llm()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 25,
|
||||
"id": "8476d531",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"cohere_llm = make_langchain_cohere_llm()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "7483b9d3",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"huggingface_llm = make_langchain_huggingface_llm()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "c17d8e86",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Run the LLM (each completion is independent, no chat history is saved as we were doing above with the chat llms)\n",
|
||||
"\n",
|
||||
"Type `q` to quit"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 17,
|
||||
"id": "72ee0790",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def run_langchain_llm(llm):\n",
|
||||
" while True:\n",
|
||||
" print(\"Type your text for completion:\\n\")\n",
|
||||
" user_input = input(\"\\n>>> input >>>\\n>>>: \")\n",
|
||||
" if user_input == 'q': break\n",
|
||||
" print(llm(user_input), \"\\n================\\n\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 18,
|
||||
"id": "fb864057",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"run_langchain_llm(openai_llm)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 19,
|
||||
"id": "e6673769",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"run_langchain_llm(cohere_llm)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "85541f1c",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"run_langchain_llm(huggingface_llm)"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"colab": {
|
||||
"provenance": []
|
||||
},
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
@@ -190,9 +456,9 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.11"
|
||||
"version": "3.10.8"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 1
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
|
||||
@@ -1,52 +0,0 @@
|
||||
# Clarifai
|
||||
|
||||
>[Clarifai](https://clarifai.com) is one of first deep learning platforms having been founded in 2013. Clarifai provides an AI platform with the full AI lifecycle for data exploration, data labeling, model training, evaluation and inference around images, video, text and audio data. In the LangChain ecosystem, as far as we're aware, Clarifai is the only provider that supports LLMs, embeddings and a vector store in one production scale platform, making it an excellent choice to operationalize your LangChain implementations.
|
||||
|
||||
## Installation and Setup
|
||||
- Install the Python SDK:
|
||||
```bash
|
||||
pip install clarifai
|
||||
```
|
||||
[Sign-up](https://clarifai.com/signup) for a Clarifai account, then get a personal access token to access the Clarifai API from your [security settings](https://clarifai.com/settings/security) and set it as an environment variable (`CLARIFAI_PAT`).
|
||||
|
||||
|
||||
## Models
|
||||
|
||||
Clarifai provides 1,000s of AI models for many different use cases. You can [explore them here](https://clarifai.com/explore) to find the one most suited for your use case. These models include those created by other providers such as OpenAI, Anthropic, Cohere, AI21, etc. as well as state of the art from open source such as Falcon, InstructorXL, etc. so that you build the best in AI into your products. You'll find these organized by the creator's user_id and into projects we call applications denoted by their app_id. Those IDs will be needed in additional to the model_id and optionally the version_id, so make note of all these IDs once you found the best model for your use case!
|
||||
|
||||
Also note that given there are many models for images, video, text and audio understanding, you can build some interested AI agents that utilize the variety of AI models as experts to understand those data types.
|
||||
|
||||
### LLMs
|
||||
|
||||
To find the selection of LLMs in the Clarifai platform you can select the text to text model type [here](https://clarifai.com/explore/models?filterData=%5B%7B%22field%22%3A%22model_type_id%22%2C%22value%22%3A%5B%22text-to-text%22%5D%7D%5D&page=1&perPage=24).
|
||||
|
||||
```python
|
||||
from langchain.llms import Clarifai
|
||||
llm = Clarifai(pat=CLARIFAI_PAT, user_id=USER_ID, app_id=APP_ID, model_id=MODEL_ID)
|
||||
```
|
||||
|
||||
For more details, the docs on the Clarifai LLM wrapper provide a [detailed walkthrough](/docs/modules/model_io/models/llms/integrations/clarifai.html).
|
||||
|
||||
|
||||
### Text Embedding Models
|
||||
|
||||
To find the selection of text embeddings models in the Clarifai platform you can select the text to embedding model type [here](https://clarifai.com/explore/models?page=1&perPage=24&filterData=%5B%7B%22field%22%3A%22model_type_id%22%2C%22value%22%3A%5B%22text-embedder%22%5D%7D%5D).
|
||||
|
||||
There is a Clarifai Embedding model in LangChain, which you can access with:
|
||||
```python
|
||||
from langchain.embeddings import ClarifaiEmbeddings
|
||||
embeddings = ClarifaiEmbeddings(pat=CLARIFAI_PAT, user_id=USER_ID, app_id=APP_ID, model_id=MODEL_ID)
|
||||
```
|
||||
For more details, the docs on the Clarifai Embeddings wrapper provide a [detailed walthrough](/docs/modules/data_connection/text_embedding/integrations/clarifai.html).
|
||||
|
||||
## Vectorstore
|
||||
|
||||
Clarifai's vector DB was launched in 2016 and has been optimized to support live search queries. With workflows in the Clarifai platform, you data is automatically indexed by am embedding model and optionally other models as well to index that information in the DB for search. You can query the DB not only via the vectors but also filter by metadata matches, other AI predicted concepts, and even do geo-coordinate search. Simply create an application, select the appropriate base workflow for your type of data, and upload it (through the API as [documented here](https://docs.clarifai.com/api-guide/data/create-get-update-delete) or the UIs at clarifai.com).
|
||||
|
||||
You an also add data directly from LangChain as well, and the auto-indexing will take place for you. You'll notice this is a little different than other vectorstores where you need to provde an embedding model in their constructor and have LangChain coordinate getting the embeddings from text and writing those to the index. Not only is it more convenient, but it's much more scalable to use Clarifai's distributed cloud to do all the index in the background.
|
||||
|
||||
```python
|
||||
from langchain.vectorstores import Clarifai
|
||||
clarifai_vector_db = Clarifai.from_texts(user_id=USER_ID, app_id=APP_ID, texts=texts, pat=CLARIFAI_PAT, number_of_docs=NUMBER_OF_DOCS, metadatas = metadatas)
|
||||
```
|
||||
For more details, the docs on the Clarifai vector store provide a [detailed walthrough](/docs/modules/data_connection/text_embedding/integrations/clarifai.html).
|
||||
@@ -1,108 +0,0 @@
|
||||
# CnosDB
|
||||
> [CnosDB](https://github.com/cnosdb/cnosdb) is an open source distributed time series database with high performance, high compression rate and high ease of use.
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
```python
|
||||
pip install cnos-connector
|
||||
```
|
||||
|
||||
## Connecting to CnosDB
|
||||
You can connect to CnosDB using the SQLDatabase.from_cnosdb() method.
|
||||
### Syntax
|
||||
```python
|
||||
def SQLDatabase.from_cnosdb(url: str = "127.0.0.1:8902",
|
||||
user: str = "root",
|
||||
password: str = "",
|
||||
tenant: str = "cnosdb",
|
||||
database: str = "public")
|
||||
```
|
||||
Args:
|
||||
1. url (str): The HTTP connection host name and port number of the CnosDB
|
||||
service, excluding "http://" or "https://", with a default value
|
||||
of "127.0.0.1:8902".
|
||||
2. user (str): The username used to connect to the CnosDB service, with a
|
||||
default value of "root".
|
||||
3. password (str): The password of the user connecting to the CnosDB service,
|
||||
with a default value of "".
|
||||
4. tenant (str): The name of the tenant used to connect to the CnosDB service,
|
||||
with a default value of "cnosdb".
|
||||
5. database (str): The name of the database in the CnosDB tenant.
|
||||
## Examples
|
||||
```python
|
||||
# Connecting to CnosDB with SQLDatabase Wrapper
|
||||
from cnosdb_connector import make_cnosdb_langchain_uri
|
||||
from langchain import SQLDatabase
|
||||
|
||||
db = SQLDatabase.from_cnosdb()
|
||||
```
|
||||
```python
|
||||
# Creating a OpenAI Chat LLM Wrapper
|
||||
from langchain.chat_models import ChatOpenAI
|
||||
|
||||
llm = ChatOpenAI(temperature=0, model_name="gpt-3.5-turbo")
|
||||
```
|
||||
|
||||
### SQL Chain
|
||||
This example demonstrates the use of the SQL Chain for answering a question over a CnosDB.
|
||||
```python
|
||||
from langchain import SQLDatabaseChain
|
||||
|
||||
db_chain = SQLDatabaseChain.from_llm(llm, db, verbose=True)
|
||||
|
||||
db_chain.run(
|
||||
"What is the average fa of test table that time between November 3,2022 and November 4, 2022?"
|
||||
)
|
||||
```
|
||||
```shell
|
||||
> Entering new chain...
|
||||
What is the average fa of test table that time between November 3, 2022 and November 4, 2022?
|
||||
SQLQuery:SELECT AVG(fa) FROM test WHERE time >= '2022-11-03' AND time < '2022-11-04'
|
||||
SQLResult: [(2.0,)]
|
||||
Answer:The average fa of the test table between November 3, 2022, and November 4, 2022, is 2.0.
|
||||
> Finished chain.
|
||||
```
|
||||
### SQL Database Agent
|
||||
This example demonstrates the use of the SQL Database Agent for answering questions over a CnosDB.
|
||||
```python
|
||||
from langchain.agents import create_sql_agent
|
||||
from langchain.agents.agent_toolkits import SQLDatabaseToolkit
|
||||
|
||||
toolkit = SQLDatabaseToolkit(db=db, llm=llm)
|
||||
agent = create_sql_agent(llm=llm, toolkit=toolkit, verbose=True)
|
||||
```
|
||||
```python
|
||||
agent.run(
|
||||
"What is the average fa of test table that time between November 3, 2022 and November 4, 2022?"
|
||||
)
|
||||
```
|
||||
```shell
|
||||
> Entering new chain...
|
||||
Action: sql_db_list_tables
|
||||
Action Input: ""
|
||||
Observation: test
|
||||
Thought:The relevant table is "test". I should query the schema of this table to see the column names.
|
||||
Action: sql_db_schema
|
||||
Action Input: "test"
|
||||
Observation:
|
||||
CREATE TABLE test (
|
||||
time TIMESTAMP,
|
||||
fa BIGINT
|
||||
)
|
||||
|
||||
/*
|
||||
3 rows from test table:
|
||||
fa time
|
||||
1 2022-11-03T06:20:11
|
||||
2 2022-11-03T06:20:11.000000001
|
||||
3 2022-11-03T06:20:11.000000002
|
||||
*/
|
||||
Thought:The relevant column is "fa" in the "test" table. I can now construct the query to calculate the average "fa" between the specified time range.
|
||||
Action: sql_db_query
|
||||
Action Input: "SELECT AVG(fa) FROM test WHERE time >= '2022-11-03' AND time < '2022-11-04'"
|
||||
Observation: [(2.0,)]
|
||||
Thought:The average "fa" of the "test" table between November 3, 2022 and November 4, 2022 is 2.0.
|
||||
Final Answer: 2.0
|
||||
|
||||
> Finished chain.
|
||||
```
|
||||
@@ -1,51 +0,0 @@
|
||||
# DataForSEO
|
||||
|
||||
This page provides instructions on how to use the DataForSEO search APIs within LangChain.
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
- Get a DataForSEO API Access login and password, and set them as environment variables (`DATAFORSEO_LOGIN` and `DATAFORSEO_PASSWORD` respectively). You can find it in your dashboard.
|
||||
|
||||
## Wrappers
|
||||
|
||||
### Utility
|
||||
|
||||
The DataForSEO utility wraps the API. To import this utility, use:
|
||||
|
||||
```python
|
||||
from langchain.utilities import DataForSeoAPIWrapper
|
||||
```
|
||||
|
||||
For a detailed walkthrough of this wrapper, see [this notebook](/docs/modules/agents/tools/integrations/dataforseo.ipynb).
|
||||
|
||||
### Tool
|
||||
|
||||
You can also load this wrapper as a Tool to use with an Agent:
|
||||
|
||||
```python
|
||||
from langchain.agents import load_tools
|
||||
tools = load_tools(["dataforseo-api-search"])
|
||||
```
|
||||
|
||||
## Example usage
|
||||
|
||||
```python
|
||||
dataforseo = DataForSeoAPIWrapper(api_login="your_login", api_password="your_password")
|
||||
result = dataforseo.run("Bill Gates")
|
||||
print(result)
|
||||
```
|
||||
|
||||
## Environment Variables
|
||||
|
||||
You can store your DataForSEO API Access login and password as environment variables. The wrapper will automatically check for these environment variables if no values are provided:
|
||||
|
||||
```python
|
||||
import os
|
||||
|
||||
os.environ["DATAFORSEO_LOGIN"] = "your_login"
|
||||
os.environ["DATAFORSEO_PASSWORD"] = "your_password"
|
||||
|
||||
dataforseo = DataForSeoAPIWrapper()
|
||||
result = dataforseo.run("weather in Los Angeles")
|
||||
print(result)
|
||||
```
|
||||
@@ -16,59 +16,3 @@ There exists a Jina Embeddings wrapper, which you can access with
|
||||
from langchain.embeddings import JinaEmbeddings
|
||||
```
|
||||
For a more detailed walkthrough of this, see [this notebook](/docs/modules/data_connection/text_embedding/integrations/jina.html)
|
||||
|
||||
## Deployment
|
||||
|
||||
[Langchain-serve](https://github.com/jina-ai/langchain-serve), powered by Jina, helps take LangChain apps to production with easy to use REST/WebSocket APIs and Slack bots.
|
||||
|
||||
### Usage
|
||||
|
||||
Install the package from PyPI.
|
||||
|
||||
```bash
|
||||
pip install langchain-serve
|
||||
```
|
||||
|
||||
Wrap your LangChain app with the `@serving` decorator.
|
||||
|
||||
```python
|
||||
# app.py
|
||||
from lcserve import serving
|
||||
|
||||
@serving
|
||||
def ask(input: str) -> str:
|
||||
from langchain import LLMChain, OpenAI
|
||||
from langchain.agents import AgentExecutor, ZeroShotAgent
|
||||
|
||||
tools = [...] # list of tools
|
||||
prompt = ZeroShotAgent.create_prompt(
|
||||
tools, input_variables=["input", "agent_scratchpad"],
|
||||
)
|
||||
llm_chain = LLMChain(llm=OpenAI(temperature=0), prompt=prompt)
|
||||
agent = ZeroShotAgent(
|
||||
llm_chain=llm_chain, allowed_tools=[tool.name for tool in tools]
|
||||
)
|
||||
agent_executor = AgentExecutor.from_agent_and_tools(
|
||||
agent=agent,
|
||||
tools=tools,
|
||||
verbose=True,
|
||||
)
|
||||
return agent_executor.run(input)
|
||||
```
|
||||
|
||||
Deploy on Jina AI Cloud with `lc-serve deploy jcloud app`. Once deployed, we can send a POST request to the API endpoint to get a response.
|
||||
|
||||
```bash
|
||||
curl -X 'POST' 'https://<your-app>.wolf.jina.ai/ask' \
|
||||
-d '{
|
||||
"input": "Your Quesion here?",
|
||||
"envs": {
|
||||
"OPENAI_API_KEY": "sk-***"
|
||||
}
|
||||
}'
|
||||
```
|
||||
|
||||
You can also self-host the app on your infrastructure with Docker-compose or Kubernetes. See [here](https://github.com/jina-ai/langchain-serve#-self-host-llm-apps-with-docker-compose-or-kubernetes) for more details.
|
||||
|
||||
|
||||
Langchain-serve also allows to deploy the apps with WebSocket APIs and Slack Bots both on [Jina AI Cloud](https://cloud.jina.ai/) or self-hosted infrastructure.
|
||||
|
||||
@@ -1,31 +0,0 @@
|
||||
# Marqo
|
||||
|
||||
This page covers how to use the Marqo ecosystem within LangChain.
|
||||
|
||||
### **What is Marqo?**
|
||||
|
||||
Marqo is a tensor search engine that uses embeddings stored in in-memory HNSW indexes to achieve cutting edge search speeds. Marqo can scale to hundred-million document indexes with horizontal index sharding and allows for async and non-blocking data upload and search. Marqo uses the latest machine learning models from PyTorch, Huggingface, OpenAI and more. You can start with a pre-configured model or bring your own. The built in ONNX support and conversion allows for faster inference and higher throughput on both CPU and GPU.
|
||||
|
||||
Because Marqo include its own inference your documents can have a mix of text and images, you can bring Marqo indexes with data from your other systems into the langchain ecosystem without having to worry about your embeddings being compatible.
|
||||
|
||||
Deployment of Marqo is flexible, you can get started yourself with our docker image or [contact us about our managed cloud offering!](https://www.marqo.ai/pricing)
|
||||
|
||||
To run Marqo locally with our docker image, [see our getting started.](https://docs.marqo.ai/latest/)
|
||||
|
||||
## Installation and Setup
|
||||
- Install the Python SDK with `pip install marqo`
|
||||
|
||||
## Wrappers
|
||||
|
||||
### VectorStore
|
||||
|
||||
There exists a wrapper around Marqo indexes, allowing you to use them within the vectorstore framework. Marqo lets you select from a range of models for generating embeddings and exposes some preprocessing configurations.
|
||||
|
||||
The Marqo vectorstore can also work with existing multimodel indexes where your documents have a mix of images and text, for more information refer to [our documentation](https://docs.marqo.ai/latest/#multi-modal-and-cross-modal-search). Note that instaniating the Marqo vectorstore with an existing multimodal index will disable the ability to add any new documents to it via the langchain vectorstore `add_texts` method.
|
||||
|
||||
To import this vectorstore:
|
||||
```python
|
||||
from langchain.vectorstores import Marqo
|
||||
```
|
||||
|
||||
For a more detailed walkthrough of the Marqo wrapper and some of its unique features, see [this notebook](../modules/data_connection/vectorstores/integrations/marqo.ipynb)
|
||||
@@ -1,56 +0,0 @@
|
||||
# TruLens
|
||||
|
||||
This page covers how to use [TruLens](https://trulens.org) to evaluate and track LLM apps built on langchain.
|
||||
|
||||
## What is TruLens?
|
||||
|
||||
TruLens is an [opensource](https://github.com/truera/trulens) package that provides instrumentation and evaluation tools for large language model (LLM) based applications.
|
||||
|
||||
## Quick start
|
||||
|
||||
Once you've created your LLM chain, you can use TruLens for evaluation and tracking. TruLens has a number of [out-of-the-box Feedback Functions](https://www.trulens.org/trulens_eval/feedback_functions/), and is also an extensible framework for LLM evaluation.
|
||||
|
||||
```python
|
||||
# create a feedback function
|
||||
|
||||
from trulens_eval.feedback import Feedback, Huggingface, OpenAI
|
||||
# Initialize HuggingFace-based feedback function collection class:
|
||||
hugs = Huggingface()
|
||||
openai = OpenAI()
|
||||
|
||||
# Define a language match feedback function using HuggingFace.
|
||||
lang_match = Feedback(hugs.language_match).on_input_output()
|
||||
# By default this will check language match on the main app input and main app
|
||||
# output.
|
||||
|
||||
# Question/answer relevance between overall question and answer.
|
||||
qa_relevance = Feedback(openai.relevance).on_input_output()
|
||||
# By default this will evaluate feedback on main app input and main app output.
|
||||
|
||||
# Toxicity of input
|
||||
toxicity = Feedback(openai.toxicity).on_input()
|
||||
|
||||
```
|
||||
|
||||
After you've set up Feedback Function(s) for evaluating your LLM, you can wrap your application with TruChain to get detailed tracing, logging and evaluation of your LLM app.
|
||||
|
||||
```python
|
||||
# wrap your chain with TruChain
|
||||
truchain = TruChain(
|
||||
chain,
|
||||
app_id='Chain1_ChatApplication',
|
||||
feedbacks=[lang_match, qa_relevance, toxicity]
|
||||
)
|
||||
# Note: any `feedbacks` specified here will be evaluated and logged whenever the chain is used.
|
||||
truchain("que hora es?")
|
||||
```
|
||||
|
||||
Now you can explore your LLM-based application!
|
||||
|
||||
Doing so will help you understand how your LLM application is performing at a glance. As you iterate new versions of your LLM application, you can compare their performance across all of the different quality metrics you've set up. You'll also be able to view evaluations at a record level, and explore the chain metadata for each record.
|
||||
|
||||
```python
|
||||
tru.run_dashboard() # open a Streamlit app to explore
|
||||
```
|
||||
|
||||
For more information on TruLens, visit [trulens.org](https://www.trulens.org/)
|
||||
@@ -24,7 +24,6 @@ Understanding these components is crucial when assessing serving systems. LangCh
|
||||
- [BentoML](https://github.com/bentoml/BentoML)
|
||||
- [OpenLLM](/docs/ecosystem/integrations/openllm.html)
|
||||
- [Modal](/docs/ecosystem/integrations/modal.html)
|
||||
- [Jina](/docs/ecosystem/integrations/jina.html#deployment)
|
||||
|
||||
These links will provide further information on each ecosystem, assisting you in finding the best fit for your LLM deployment needs.
|
||||
|
||||
|
||||
@@ -51,10 +51,6 @@ A minimal example of how to deploy LangChain to [Fly.io](https://fly.io/) using
|
||||
|
||||
A minimal example on how to deploy LangChain to DigitalOcean App Platform.
|
||||
|
||||
## [CI/CD Google Cloud Build + Dockerfile + Serverless Google Cloud Run](https://github.com/g-emarco/github-assistant)
|
||||
|
||||
Boilerplate LangChain project on how to deploy to Google Cloud Run using Docker with Cloud Build CI/CD pipeline
|
||||
|
||||
## [Google Cloud Run](https://github.com/homanp/gcp-langchain)
|
||||
|
||||
A minimal example on how to deploy LangChain to Google Cloud Run.
|
||||
@@ -65,7 +61,7 @@ This repository contains LangChain adapters for Steamship, enabling LangChain de
|
||||
|
||||
## [Langchain-serve](https://github.com/jina-ai/langchain-serve)
|
||||
|
||||
This repository allows users to deploy any LangChain app as REST/WebSocket APIs or, as Slack Bots with ease. Benefit from the scalability and serverless architecture of Jina AI Cloud, or deploy on-premise with Kubernetes.
|
||||
This repository allows users to serve local chains and agents as RESTful, gRPC, or WebSocket APIs, thanks to [Jina](https://docs.jina.ai/). Deploy your chains & agents with ease and enjoy independent scaling, serverless and autoscaling APIs, as well as a Streamlit playground on Jina AI Cloud.
|
||||
|
||||
## [BentoML](https://github.com/ssheng/BentoChain)
|
||||
|
||||
|
||||
@@ -9,7 +9,7 @@
|
||||
"\n",
|
||||
"Here we go over how to benchmark performance on a question answering task over a Paul Graham essay.\n",
|
||||
"\n",
|
||||
"It is highly reccomended that you do any evaluation/benchmarking with tracing enabled. See [here](https://python.langchain.com/docs/modules/callbacks/how_to/tracing) for an explanation of what tracing is and how to set it up."
|
||||
"It is highly reccomended that you do any evaluation/benchmarking with tracing enabled. See [here](https://langchain.readthedocs.io/en/latest/tracing.html) for an explanation of what tracing is and how to set it up."
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -7,7 +7,7 @@
|
||||
"source": [
|
||||
"# SQL Database Agent\n",
|
||||
"\n",
|
||||
"This notebook showcases an agent designed to interact with a sql databases. The agent builds off of [SQLDatabaseChain](https://python.langchain.com/docs/modules/chains/popular/sqlite) and is designed to answer more general questions about a database, as well as recover from errors.\n",
|
||||
"This notebook showcases an agent designed to interact with a sql databases. The agent builds off of [SQLDatabaseChain](https://langchain.readthedocs.io/en/latest/modules/chains/examples/sqlite.html) and is designed to answer more general questions about a database, as well as recover from errors.\n",
|
||||
"\n",
|
||||
"Note that, as this agent is in active development, all answers might not be correct. Additionally, it is not guaranteed that the agent won't perform DML statements on your database given certain questions. Be careful running it on sensitive data!\n",
|
||||
"\n",
|
||||
|
||||
@@ -26,8 +26,8 @@
|
||||
"source": [
|
||||
"import os\n",
|
||||
"\n",
|
||||
"os.environ[\"BING_SUBSCRIPTION_KEY\"] = \"<key>\"\n",
|
||||
"os.environ[\"BING_SEARCH_URL\"] = \"https://api.bing.microsoft.com/v7.0/search\""
|
||||
"os.environ[\"BING_SUBSCRIPTION_KEY\"] = \"\"\n",
|
||||
"os.environ[\"BING_SEARCH_URL\"] = \"\""
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -1,226 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# DataForSeo API Wrapper\n",
|
||||
"This notebook demonstrates how to use the DataForSeo API wrapper to obtain search engine results. The DataForSeo API allows users to retrieve SERP from most popular search engines like Google, Bing, Yahoo. It also allows to get SERPs from different search engine types like Maps, News, Events, etc.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.utilities import DataForSeoAPIWrapper"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Setting up the API wrapper with your credentials\n",
|
||||
"You can obtain your API credentials by registering on the DataForSeo website."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"os.environ[\"DATAFORSEO_LOGIN\"] = \"your_api_access_username\"\n",
|
||||
"os.environ[\"DATAFORSEO_PASSWORD\"] = \"your_api_access_password\"\n",
|
||||
"\n",
|
||||
"wrapper = DataForSeoAPIWrapper()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The run method will return the first result snippet from one of the following elements: answer_box, knowledge_graph, featured_snippet, shopping, organic."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"wrapper.run(\"Weather in Los Angeles\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## The Difference Between `run` and `results`\n",
|
||||
"`run` and `results` are two methods provided by the `DataForSeoAPIWrapper` class.\n",
|
||||
"\n",
|
||||
"The `run` method executes the search and returns the first result snippet from the answer box, knowledge graph, featured snippet, shopping, or organic results. These elements are sorted by priority from highest to lowest.\n",
|
||||
"\n",
|
||||
"The `results` method returns a JSON response configured according to the parameters set in the wrapper. This allows for more flexibility in terms of what data you want to return from the API."
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Getting Results as JSON\n",
|
||||
"You can customize the result types and fields you want to return in the JSON response. You can also set a maximum count for the number of top results to return."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"json_wrapper = DataForSeoAPIWrapper(\n",
|
||||
" json_result_types=[\"organic\", \"knowledge_graph\", \"answer_box\"],\n",
|
||||
" json_result_fields=[\"type\", \"title\", \"description\", \"text\"],\n",
|
||||
" top_count=3)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"json_wrapper.results(\"Bill Gates\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Customizing Location and Language\n",
|
||||
"You can specify the location and language of your search results by passing additional parameters to the API wrapper."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"customized_wrapper = DataForSeoAPIWrapper(\n",
|
||||
" top_count=10,\n",
|
||||
" json_result_types=[\"organic\", \"local_pack\"],\n",
|
||||
" json_result_fields=[\"title\", \"description\", \"type\"],\n",
|
||||
" params={\"location_name\": \"Germany\", \"language_code\": \"en\"})\n",
|
||||
"customized_wrapper.results(\"coffee near me\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Customizing the Search Engine\n",
|
||||
"You can also specify the search engine you want to use."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"customized_wrapper = DataForSeoAPIWrapper(\n",
|
||||
" top_count=10,\n",
|
||||
" json_result_types=[\"organic\", \"local_pack\"],\n",
|
||||
" json_result_fields=[\"title\", \"description\", \"type\"],\n",
|
||||
" params={\"location_name\": \"Germany\", \"language_code\": \"en\", \"se_name\": \"bing\"})\n",
|
||||
"customized_wrapper.results(\"coffee near me\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Customizing the Search Type\n",
|
||||
"The API wrapper also allows you to specify the type of search you want to perform. For example, you can perform a maps search."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"maps_search = DataForSeoAPIWrapper(\n",
|
||||
" top_count=10,\n",
|
||||
" json_result_fields=[\"title\", \"value\", \"address\", \"rating\", \"type\"],\n",
|
||||
" params={\"location_coordinate\": \"52.512,13.36,12z\", \"language_code\": \"en\", \"se_type\": \"maps\"})\n",
|
||||
"maps_search.results(\"coffee near me\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Integration with Langchain Agents\n",
|
||||
"You can use the `Tool` class from the `langchain.agents` module to integrate the `DataForSeoAPIWrapper` with a langchain agent. The `Tool` class encapsulates a function that the agent can call."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.agents import Tool\n",
|
||||
"search = DataForSeoAPIWrapper(\n",
|
||||
" top_count=3,\n",
|
||||
" json_result_types=[\"organic\"],\n",
|
||||
" json_result_fields=[\"title\", \"description\", \"type\"])\n",
|
||||
"tool = Tool(\n",
|
||||
" name=\"google-search-answer\",\n",
|
||||
" description=\"My new answer tool\",\n",
|
||||
" func=search.run,\n",
|
||||
")\n",
|
||||
"json_tool = Tool(\n",
|
||||
" name=\"google-search-json\",\n",
|
||||
" description=\"My new json tool\",\n",
|
||||
" func=search.results,\n",
|
||||
")"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.11"
|
||||
},
|
||||
"orig_nbformat": 4
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
||||
File diff suppressed because one or more lines are too long
@@ -56,8 +56,7 @@
|
||||
"source": [
|
||||
"import os\n",
|
||||
"\n",
|
||||
"os.environ[\"SERPER_API_KEY\"] = \"\"",
|
||||
"os.environ[\"OPENAI_API_KEY\"] = \"\""
|
||||
"os.environ[\"SERPER_API_KEY\"] = \"\""
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -78,7 +77,7 @@
|
||||
"from langchain.chat_models import ChatOpenAI\n",
|
||||
"from langchain.llms import OpenAI\n",
|
||||
"from langchain.schema import Document\n",
|
||||
"from typing import Any, List"
|
||||
"from typing import Any"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -97,8 +96,8 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"class SerperSearchRetriever(BaseRetriever):\n",
|
||||
"\n",
|
||||
" search: GoogleSerperAPIWrapper = None\n",
|
||||
" def __init__(self, search):\n",
|
||||
" self.search = search\n",
|
||||
"\n",
|
||||
" def _get_relevant_documents(self, query: str, *, run_manager: CallbackManagerForRetrieverRun, **kwargs: Any) -> List[Document]:\n",
|
||||
" return [Document(page_content=self.search.run(query))]\n",
|
||||
@@ -112,7 +111,7 @@
|
||||
" raise NotImplementedError()\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"retriever = SerperSearchRetriever(search=GoogleSerperAPIWrapper())"
|
||||
"retriever = SerperSearchRetriever(GoogleSerperAPIWrapper())"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -1,302 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "d2777010",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# HugeGraph QA Chain\n",
|
||||
"\n",
|
||||
"This notebook shows how to use LLMs to provide a natural language interface to [HugeGraph](https://hugegraph.apache.org/cn/) database."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "f26dcbe4",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"You will need to have a running HugeGraph instance.\n",
|
||||
"You can run a local docker container by running the executing the following script:\n",
|
||||
"\n",
|
||||
"```\n",
|
||||
"docker run \\\n",
|
||||
" --name=graph \\\n",
|
||||
" -itd \\\n",
|
||||
" -p 8080:8080 \\\n",
|
||||
" hugegraph/hugegraph\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"If we want to connect HugeGraph in the application, we need to install python sdk:\n",
|
||||
"\n",
|
||||
"```\n",
|
||||
"pip3 install hugegraph-python\n",
|
||||
"```"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "d64a29f1",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"If you are using the docker container, you need to wait a couple of second for the database to start, and then we need create schema and write graph data for the database."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 13,
|
||||
"id": "e53ab93e",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from hugegraph.connection import PyHugeGraph\n",
|
||||
"\n",
|
||||
"client = PyHugeGraph(\"localhost\", \"8080\", user=\"admin\", pwd=\"admin\", graph=\"hugegraph\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "b7c3a50e",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"First, we create the schema for a simple movie database:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "ef5372a8",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'create EdgeLabel success, Detail: \"b\\'{\"id\":1,\"name\":\"ActedIn\",\"source_label\":\"Person\",\"target_label\":\"Movie\",\"frequency\":\"SINGLE\",\"sort_keys\":[],\"nullable_keys\":[],\"index_labels\":[],\"properties\":[],\"status\":\"CREATED\",\"ttl\":0,\"enable_label_index\":true,\"user_data\":{\"~create_time\":\"2023-07-04 10:48:47.908\"}}\\'\"'"
|
||||
]
|
||||
},
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"\"\"\"schema\"\"\"\n",
|
||||
"schema = client.schema()\n",
|
||||
"schema.propertyKey(\"name\").asText().ifNotExist().create()\n",
|
||||
"schema.propertyKey(\"birthDate\").asText().ifNotExist().create()\n",
|
||||
"schema.vertexLabel(\"Person\").properties(\"name\", \"birthDate\").usePrimaryKeyId().primaryKeys(\"name\").ifNotExist().create()\n",
|
||||
"schema.vertexLabel(\"Movie\").properties(\"name\").usePrimaryKeyId().primaryKeys(\"name\").ifNotExist().create()\n",
|
||||
"schema.edgeLabel(\"ActedIn\").sourceLabel(\"Person\").targetLabel(\"Movie\").ifNotExist().create()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "016f7989",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Then we can insert some data."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 26,
|
||||
"id": "b7f4c370",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"1:Robert De Niro--ActedIn-->2:The Godfather Part II"
|
||||
]
|
||||
},
|
||||
"execution_count": 26,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"\"\"\"graph\"\"\"\n",
|
||||
"g = client.graph()\n",
|
||||
"g.addVertex(\"Person\", {\"name\": \"Al Pacino\", \"birthDate\": \"1940-04-25\"})\n",
|
||||
"g.addVertex(\"Person\", {\"name\": \"Robert De Niro\", \"birthDate\": \"1943-08-17\"})\n",
|
||||
"g.addVertex(\"Movie\", {\"name\": \"The Godfather\"})\n",
|
||||
"g.addVertex(\"Movie\", {\"name\": \"The Godfather Part II\"})\n",
|
||||
"g.addVertex(\"Movie\", {\"name\": \"The Godfather Coda The Death of Michael Corleone\"})\n",
|
||||
"\n",
|
||||
"g.addEdge(\"ActedIn\", \"1:Al Pacino\", \"2:The Godfather\", {})\n",
|
||||
"g.addEdge(\"ActedIn\", \"1:Al Pacino\", \"2:The Godfather Part II\", {})\n",
|
||||
"g.addEdge(\"ActedIn\", \"1:Al Pacino\", \"2:The Godfather Coda The Death of Michael Corleone\", {})\n",
|
||||
"g.addEdge(\"ActedIn\", \"1:Robert De Niro\", \"2:The Godfather Part II\", {})"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "5b8f7788",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Creating `HugeGraphQAChain`\n",
|
||||
"\n",
|
||||
"We can now create the `HugeGraph` and `HugeGraphQAChain`. To create the `HugeGraph` we simply need to pass the database object to the `HugeGraph` constructor."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 27,
|
||||
"id": "f1f68fcf",
|
||||
"metadata": {
|
||||
"is_executing": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.chat_models import ChatOpenAI\n",
|
||||
"from langchain.chains import HugeGraphQAChain\n",
|
||||
"from langchain.graphs import HugeGraph"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 28,
|
||||
"id": "b86ebfa7",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"graph = HugeGraph(\n",
|
||||
" username=\"admin\",\n",
|
||||
" password=\"admin\",\n",
|
||||
" address=\"localhost\",\n",
|
||||
" port=8080,\n",
|
||||
" graph=\"hugegraph\"\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "e262540b",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Refresh graph schema information\n",
|
||||
"\n",
|
||||
"If the schema of database changes, you can refresh the schema information needed to generate Gremlin statements."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 29,
|
||||
"id": "134dd8d6",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# graph.refresh_schema()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 30,
|
||||
"id": "e78b8e72",
|
||||
"metadata": {
|
||||
"ExecuteTime": {}
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Node properties: [name: Person, primary_keys: ['name'], properties: ['name', 'birthDate'], name: Movie, primary_keys: ['name'], properties: ['name']]\n",
|
||||
"Edge properties: [name: ActedIn, properties: []]\n",
|
||||
"Relationships: ['Person--ActedIn-->Movie']\n",
|
||||
"\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"print(graph.get_schema)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "5c27e813",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Querying the graph\n",
|
||||
"\n",
|
||||
"We can now use the graph Gremlin QA chain to ask question of the graph"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 31,
|
||||
"id": "3b23dead",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"chain = HugeGraphQAChain.from_llm(\n",
|
||||
" ChatOpenAI(temperature=0), graph=graph, verbose=True\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 32,
|
||||
"id": "76aecc93",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new chain...\u001b[0m\n",
|
||||
"Generated gremlin:\n",
|
||||
"\u001b[32;1m\u001b[1;3mg.V().has('Movie', 'name', 'The Godfather').in('ActedIn').valueMap(true)\u001b[0m\n",
|
||||
"Full Context:\n",
|
||||
"\u001b[32;1m\u001b[1;3m[{'id': '1:Al Pacino', 'label': 'Person', 'name': ['Al Pacino'], 'birthDate': ['1940-04-25']}]\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'Al Pacino played in The Godfather.'"
|
||||
]
|
||||
},
|
||||
"execution_count": 32,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"chain.run(\"Who played in The Godfather?\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "869f0258",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "venv",
|
||||
"language": "python",
|
||||
"name": "venv"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.3"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -1,300 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "c94240f5",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# GraphSparqlQAChain\n",
|
||||
"\n",
|
||||
"Graph databases are an excellent choice for applications based on network-like models. To standardize the syntax and semantics of such graphs, the W3C recommends Semantic Web Technologies, cp. [Semantic Web](https://www.w3.org/standards/semanticweb/). [SPARQL](https://www.w3.org/TR/sparql11-query/) serves as a query language analogously to SQL or Cypher for these graphs. This notebook demonstrates the application of LLMs as a natural language interface to a graph database by generating SPARQL.\\\n",
|
||||
"Disclaimer: To date, SPARQL query generation via LLMs is still a bit unstable. Be especially careful with UPDATE queries, which alter the graph."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "dbc0ee68",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"There are several sources you can run queries against, including files on the web, files you have available locally, SPARQL endpoints, e.g., [Wikidata](https://www.wikidata.org/wiki/Wikidata:Main_Page), and [triple stores](https://www.w3.org/wiki/LargeTripleStores)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "62812aad",
|
||||
"metadata": {
|
||||
"pycharm": {
|
||||
"is_executing": true
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.chat_models import ChatOpenAI\n",
|
||||
"from langchain.chains import GraphSparqlQAChain\n",
|
||||
"from langchain.graphs import RdfGraph"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"id": "0928915d",
|
||||
"metadata": {
|
||||
"pycharm": {
|
||||
"is_executing": true
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"graph = RdfGraph(\n",
|
||||
" source_file=\"http://www.w3.org/People/Berners-Lee/card\",\n",
|
||||
" standard=\"rdf\",\n",
|
||||
" local_copy=\"test.ttl\",\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"Note that providing a `local_file` is necessary for storing changes locally if the source is read-only."
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "58c1a8ea",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Refresh graph schema information\n",
|
||||
"If the schema of the database changes, you can refresh the schema information needed to generate SPARQL queries."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"id": "4e3de44f",
|
||||
"metadata": {
|
||||
"pycharm": {
|
||||
"is_executing": true
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"graph.load_schema()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"id": "1fe76ccd",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"In the following, each IRI is followed by the local name and optionally its description in parentheses. \n",
|
||||
"The RDF graph supports the following node types:\n",
|
||||
"<http://xmlns.com/foaf/0.1/PersonalProfileDocument> (PersonalProfileDocument, None), <http://www.w3.org/ns/auth/cert#RSAPublicKey> (RSAPublicKey, None), <http://www.w3.org/2000/10/swap/pim/contact#Male> (Male, None), <http://xmlns.com/foaf/0.1/Person> (Person, None), <http://www.w3.org/2006/vcard/ns#Work> (Work, None)\n",
|
||||
"The RDF graph supports the following relationships:\n",
|
||||
"<http://www.w3.org/2000/01/rdf-schema#seeAlso> (seeAlso, None), <http://purl.org/dc/elements/1.1/title> (title, None), <http://xmlns.com/foaf/0.1/mbox_sha1sum> (mbox_sha1sum, None), <http://xmlns.com/foaf/0.1/maker> (maker, None), <http://www.w3.org/ns/solid/terms#oidcIssuer> (oidcIssuer, None), <http://www.w3.org/2000/10/swap/pim/contact#publicHomePage> (publicHomePage, None), <http://xmlns.com/foaf/0.1/openid> (openid, None), <http://www.w3.org/ns/pim/space#storage> (storage, None), <http://xmlns.com/foaf/0.1/name> (name, None), <http://www.w3.org/2000/10/swap/pim/contact#country> (country, None), <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> (type, None), <http://www.w3.org/ns/solid/terms#profileHighlightColor> (profileHighlightColor, None), <http://www.w3.org/ns/pim/space#preferencesFile> (preferencesFile, None), <http://www.w3.org/2000/01/rdf-schema#label> (label, None), <http://www.w3.org/ns/auth/cert#modulus> (modulus, None), <http://www.w3.org/2000/10/swap/pim/contact#participant> (participant, None), <http://www.w3.org/2000/10/swap/pim/contact#street2> (street2, None), <http://www.w3.org/2006/vcard/ns#locality> (locality, None), <http://xmlns.com/foaf/0.1/nick> (nick, None), <http://xmlns.com/foaf/0.1/homepage> (homepage, None), <http://creativecommons.org/ns#license> (license, None), <http://xmlns.com/foaf/0.1/givenname> (givenname, None), <http://www.w3.org/2006/vcard/ns#street-address> (street-address, None), <http://www.w3.org/2006/vcard/ns#postal-code> (postal-code, None), <http://www.w3.org/2000/10/swap/pim/contact#street> (street, None), <http://www.w3.org/2003/01/geo/wgs84_pos#lat> (lat, None), <http://xmlns.com/foaf/0.1/primaryTopic> (primaryTopic, None), <http://www.w3.org/2006/vcard/ns#fn> (fn, None), <http://www.w3.org/2003/01/geo/wgs84_pos#location> (location, None), <http://usefulinc.com/ns/doap#developer> (developer, None), <http://www.w3.org/2000/10/swap/pim/contact#city> (city, None), <http://www.w3.org/2006/vcard/ns#region> (region, None), <http://xmlns.com/foaf/0.1/member> (member, None), <http://www.w3.org/2003/01/geo/wgs84_pos#long> (long, None), <http://www.w3.org/2000/10/swap/pim/contact#address> (address, None), <http://xmlns.com/foaf/0.1/family_name> (family_name, None), <http://xmlns.com/foaf/0.1/account> (account, None), <http://xmlns.com/foaf/0.1/workplaceHomepage> (workplaceHomepage, None), <http://purl.org/dc/terms/title> (title, None), <http://www.w3.org/ns/solid/terms#publicTypeIndex> (publicTypeIndex, None), <http://www.w3.org/2000/10/swap/pim/contact#office> (office, None), <http://www.w3.org/2000/10/swap/pim/contact#homePage> (homePage, None), <http://xmlns.com/foaf/0.1/mbox> (mbox, None), <http://www.w3.org/2000/10/swap/pim/contact#preferredURI> (preferredURI, None), <http://www.w3.org/ns/solid/terms#profileBackgroundColor> (profileBackgroundColor, None), <http://schema.org/owns> (owns, None), <http://xmlns.com/foaf/0.1/based_near> (based_near, None), <http://www.w3.org/2006/vcard/ns#hasAddress> (hasAddress, None), <http://xmlns.com/foaf/0.1/img> (img, None), <http://www.w3.org/2000/10/swap/pim/contact#assistant> (assistant, None), <http://xmlns.com/foaf/0.1/title> (title, None), <http://www.w3.org/ns/auth/cert#key> (key, None), <http://www.w3.org/ns/ldp#inbox> (inbox, None), <http://www.w3.org/ns/solid/terms#editableProfile> (editableProfile, None), <http://www.w3.org/2000/10/swap/pim/contact#postalCode> (postalCode, None), <http://xmlns.com/foaf/0.1/weblog> (weblog, None), <http://www.w3.org/ns/auth/cert#exponent> (exponent, None), <http://rdfs.org/sioc/ns#avatar> (avatar, None)\n",
|
||||
"\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"graph.get_schema"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "68a3c677",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Querying the graph\n",
|
||||
"\n",
|
||||
"Now, you can use the graph SPARQL QA chain to ask questions about the graph."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"id": "7476ce98",
|
||||
"metadata": {
|
||||
"pycharm": {
|
||||
"is_executing": true
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"chain = GraphSparqlQAChain.from_llm(\n",
|
||||
" ChatOpenAI(temperature=0), graph=graph, verbose=True\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 12,
|
||||
"id": "ef8ee27b",
|
||||
"metadata": {
|
||||
"pycharm": {
|
||||
"is_executing": true
|
||||
}
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001B[1m> Entering new GraphSparqlQAChain chain...\u001B[0m\n",
|
||||
"Identified intent:\n",
|
||||
"\u001B[32;1m\u001B[1;3mSELECT\u001B[0m\n",
|
||||
"Generated SPARQL:\n",
|
||||
"\u001B[32;1m\u001B[1;3mPREFIX foaf: <http://xmlns.com/foaf/0.1/>\n",
|
||||
"SELECT ?homepage\n",
|
||||
"WHERE {\n",
|
||||
" ?person foaf:name \"Tim Berners-Lee\" .\n",
|
||||
" ?person foaf:workplaceHomepage ?homepage .\n",
|
||||
"}\u001B[0m\n",
|
||||
"Full Context:\n",
|
||||
"\u001B[32;1m\u001B[1;3m[]\u001B[0m\n",
|
||||
"\n",
|
||||
"\u001B[1m> Finished chain.\u001B[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"\"Tim Berners-Lee's work homepage is http://www.w3.org/People/Berners-Lee/.\""
|
||||
]
|
||||
},
|
||||
"execution_count": 12,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"chain.run(\"What is Tim Berners-Lee's work homepage?\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "af4b3294",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Updating the graph\n",
|
||||
"\n",
|
||||
"Analogously, you can update the graph, i.e., insert triples, using natural language."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 14,
|
||||
"id": "fdf38841",
|
||||
"metadata": {
|
||||
"pycharm": {
|
||||
"is_executing": true
|
||||
}
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001B[1m> Entering new GraphSparqlQAChain chain...\u001B[0m\n",
|
||||
"Identified intent:\n",
|
||||
"\u001B[32;1m\u001B[1;3mUPDATE\u001B[0m\n",
|
||||
"Generated SPARQL:\n",
|
||||
"\u001B[32;1m\u001B[1;3mPREFIX foaf: <http://xmlns.com/foaf/0.1/>\n",
|
||||
"INSERT {\n",
|
||||
" ?person foaf:workplaceHomepage <http://www.w3.org/foo/bar/> .\n",
|
||||
"}\n",
|
||||
"WHERE {\n",
|
||||
" ?person foaf:name \"Timothy Berners-Lee\" .\n",
|
||||
"}\u001B[0m\n",
|
||||
"\n",
|
||||
"\u001B[1m> Finished chain.\u001B[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'Successfully inserted triples into the graph.'"
|
||||
]
|
||||
},
|
||||
"execution_count": 14,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"chain.run(\"Save that the person with the name 'Timothy Berners-Lee' has a work homepage at 'http://www.w3.org/foo/bar/'\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "5e0f7fc1",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Let's verify the results:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 15,
|
||||
"id": "f874171b",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[(rdflib.term.URIRef('https://www.w3.org/'),),\n",
|
||||
" (rdflib.term.URIRef('http://www.w3.org/foo/bar/'),)]"
|
||||
]
|
||||
},
|
||||
"execution_count": 15,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"query = (\n",
|
||||
" \"\"\"PREFIX foaf: <http://xmlns.com/foaf/0.1/>\\n\"\"\"\n",
|
||||
" \"\"\"SELECT ?hp\\n\"\"\"\n",
|
||||
" \"\"\"WHERE {\\n\"\"\"\n",
|
||||
" \"\"\" ?person foaf:name \"Timothy Berners-Lee\" . \\n\"\"\"\n",
|
||||
" \"\"\" ?person foaf:workplaceHomepage ?hp .\\n\"\"\"\n",
|
||||
" \"\"\"}\"\"\"\n",
|
||||
")\n",
|
||||
"graph.query(query)"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "lc",
|
||||
"language": "python",
|
||||
"name": "lc"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.4"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -1,578 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "54ccb772",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Using OpenAI functions\n",
|
||||
"This walkthrough demonstrates how to incorporate OpenAI function-calling API's in a chain. We'll go over: \n",
|
||||
"1. How to use functions to get structured outputs from ChatOpenAI\n",
|
||||
"2. How to create a generic chain that uses (multiple) functions\n",
|
||||
"3. How to create a chain that actually executes the chosen function"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "767ac575",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from typing import Optional\n",
|
||||
"\n",
|
||||
"from langchain.chains.openai_functions import (\n",
|
||||
" create_openai_fn_chain, create_structured_output_chain\n",
|
||||
")\n",
|
||||
"from langchain.prompts import ChatPromptTemplate"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "976b6496",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Getting structured outputs\n",
|
||||
"We can take advantage of OpenAI functions to try and force the model to return a particular kind of structured output. We'll use the `create_structured_output_chain` to create our chain, which takes the desired structured output either as a Pydantic object or as JsonSchema.\n",
|
||||
"\n",
|
||||
"See here for relevant [reference docs](https://api.python.langchain.com/en/latest/chains/langchain.chains.openai_functions.base.create_structured_output_chain.html)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "e052faae",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Using Pydantic objects\n",
|
||||
"When passing in Pydantic objects to structure our text, we need to make sure to have a docstring description for the class. It also helps to have descriptions for each of the object attributes."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "b459a33e",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new chain...\u001b[0m\n",
|
||||
"Prompt after formatting:\n",
|
||||
"\u001b[32;1m\u001b[1;3mHuman: Sally is 13\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"{'name': 'Sally', 'age': 13}"
|
||||
]
|
||||
},
|
||||
"execution_count": 2,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from pydantic import BaseModel, Field\n",
|
||||
"\n",
|
||||
"class Person(BaseModel):\n",
|
||||
" \"\"\"Identifying information about a person.\"\"\"\n",
|
||||
" name: str = Field(..., description=\"The person's name\")\n",
|
||||
" age: int = Field(..., description=\"The person's age\")\n",
|
||||
" fav_food: Optional[str] = Field(None, description=\"The person's favorite food\")\n",
|
||||
" \n",
|
||||
"chain = create_structured_output_chain(Person, verbose=True)\n",
|
||||
"chain.run(\"Sally is 13\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "e3539936",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"To extract arbitrarily many structured outputs of a given format, we can just create a wrapper Pydantic object that takes a sequence of the original object."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "4d8ea815",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new chain...\u001b[0m\n",
|
||||
"Prompt after formatting:\n",
|
||||
"\u001b[32;1m\u001b[1;3mHuman: Sally is 13, Joey just turned 12 and loves spinach. Caroline is 10 years older than Sally, so she's 23.\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"{'people': [{'name': 'Sally', 'age': 13, 'fav_food': ''},\n",
|
||||
" {'name': 'Joey', 'age': 12, 'fav_food': 'spinach'},\n",
|
||||
" {'name': 'Caroline', 'age': 23, 'fav_food': ''}]}"
|
||||
]
|
||||
},
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from typing import Sequence\n",
|
||||
"\n",
|
||||
"class People(BaseModel):\n",
|
||||
" \"\"\"Identifying information about all people in a text.\"\"\"\n",
|
||||
" people: Sequence[Person] = Field(..., description=\"The people in the text\")\n",
|
||||
" \n",
|
||||
"chain = create_structured_output_chain(People, verbose=True)\n",
|
||||
"chain.run(\"Sally is 13, Joey just turned 12 and loves spinach. Caroline is 10 years older than Sally, so she's 23.\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "ea66e10e",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Using JsonSchema\n",
|
||||
"\n",
|
||||
"We can also pass in JsonSchema instead of Pydantic objects to specify the desired structure. When we do this, our chain will output json corresponding to the properties described in the JsonSchema, instead of a Pydantic object."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "3484415e",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"json_schema = {\n",
|
||||
" \"title\": \"Person\",\n",
|
||||
" \"description\": \"Identifying information about a person.\",\n",
|
||||
" \"type\": \"object\",\n",
|
||||
" \"properties\": {\n",
|
||||
" \"name\": {\n",
|
||||
" \"title\": \"Name\",\n",
|
||||
" \"description\": \"The person's name\",\n",
|
||||
" \"type\": \"string\"\n",
|
||||
" },\n",
|
||||
" \"age\": {\n",
|
||||
" \"title\": \"Age\",\n",
|
||||
" \"description\": \"The person's age\",\n",
|
||||
" \"type\": \"integer\"\n",
|
||||
" },\n",
|
||||
" \"fav_food\": {\n",
|
||||
" \"title\": \"Fav Food\",\n",
|
||||
" \"description\": \"The person's favorite food\",\n",
|
||||
" \"type\": \"string\"\n",
|
||||
" }\n",
|
||||
" },\n",
|
||||
" \"required\": [\n",
|
||||
" \"name\",\n",
|
||||
" \"age\"\n",
|
||||
" ]\n",
|
||||
"}\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "be9b76b3",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new chain...\u001b[0m\n",
|
||||
"Prompt after formatting:\n",
|
||||
"\u001b[32;1m\u001b[1;3mHuman: Sally is 13\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"{'name': 'Sally', 'age': 13}"
|
||||
]
|
||||
},
|
||||
"execution_count": 5,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"chain = create_structured_output_chain(json_schema, verbose=True)\n",
|
||||
"chain.run(\"Sally is 13\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "12394696",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Creating a generic OpenAI functions chain\n",
|
||||
"To create a generic OpenAI functions chain, we can use the `create_openai_fn_chain` method. This is the same as `create_structured_output_chain` except that instead of taking a single output schema, it takes a sequence of function definitions.\n",
|
||||
"\n",
|
||||
"Functions can be passed in as:\n",
|
||||
"- dicts conforming to OpenAI functions spec,\n",
|
||||
"- Pydantic objects, in which case they should have docstring descriptions of the function they represent and descriptions for each of the parameters,\n",
|
||||
"- Python functions, in which case they should have docstring descriptions of the function and args, along with type hints.\n",
|
||||
"\n",
|
||||
"See here for relevant [reference docs](https://api.python.langchain.com/en/latest/chains/langchain.chains.openai_functions.base.create_openai_fn_chain.html)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "ff19be25",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Using Pydantic objects"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"id": "a4658ad8",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new chain...\u001b[0m\n",
|
||||
"Prompt after formatting:\n",
|
||||
"\u001b[32;1m\u001b[1;3mHuman: Harry was a chubby brown beagle who loved chicken\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"RecordDog(name='Harry', color='brown', fav_food='chicken')"
|
||||
]
|
||||
},
|
||||
"execution_count": 6,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"class RecordPerson(BaseModel):\n",
|
||||
" \"\"\"Record some identifying information about a pe.\"\"\"\n",
|
||||
" name: str = Field(..., description=\"The person's name\")\n",
|
||||
" age: int = Field(..., description=\"The person's age\")\n",
|
||||
" fav_food: Optional[str] = Field(None, description=\"The person's favorite food\")\n",
|
||||
"\n",
|
||||
"class RecordDog(BaseModel):\n",
|
||||
" \"\"\"Record some identifying information about a dog.\"\"\"\n",
|
||||
" name: str = Field(..., description=\"The dog's name\")\n",
|
||||
" color: str = Field(..., description=\"The dog's color\")\n",
|
||||
" fav_food: Optional[str] = Field(None, description=\"The dog's favorite food\")\n",
|
||||
"\n",
|
||||
"chain = create_openai_fn_chain([RecordPerson, RecordDog], verbose=True)\n",
|
||||
"chain.run(\"Harry was a chubby brown beagle who loved chicken\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "df6d9147",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Using Python functions\n",
|
||||
"We can pass in functions as Pydantic objects, directly as OpenAI function dicts, or Python functions. To pass Python function in directly, we'll want to make sure our parameters have type hints, we have a docstring, and we use [Google Python style docstrings](https://google.github.io/styleguide/pyguide.html#doc-function-args) to describe the parameters.\n",
|
||||
"\n",
|
||||
"**NOTE**: To use Python functions, make sure the function arguments are of primitive types (str, float, int, bool) or that they are Pydantic objects."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 41,
|
||||
"id": "95ac5825",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new chain...\u001b[0m\n",
|
||||
"Prompt after formatting:\n",
|
||||
"\u001b[32;1m\u001b[1;3mHuman: The most important thing to remember about Tommy, my 12 year old, is that he'll do anything for apple pie.\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"{'name': 'Tommy', 'age': 12, 'fav_food': {'food': 'apple pie'}}"
|
||||
]
|
||||
},
|
||||
"execution_count": 41,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"class OptionalFavFood(BaseModel):\n",
|
||||
" \"\"\"Either a food or null.\"\"\"\n",
|
||||
" food: Optional[str] = Field(None, description=\"Either the name of a food or null. Should be null if the food isn't known.\")\n",
|
||||
"\n",
|
||||
"def record_person(name: str, age: int, fav_food: OptionalFavFood) -> str:\n",
|
||||
" \"\"\"Record some basic identifying information about a person.\n",
|
||||
" \n",
|
||||
" Args:\n",
|
||||
" name: The person's name.\n",
|
||||
" age: The person's age in years.\n",
|
||||
" fav_food: An OptionalFavFood object that either contains the person's favorite food or a null value. Food should be null if it's not known.\n",
|
||||
" \"\"\"\n",
|
||||
" return f\"Recording person {name} of age {age} with favorite food {fav_food.food}!\"\n",
|
||||
"\n",
|
||||
" \n",
|
||||
"chain = create_openai_fn_chain([record_person], verbose=True)\n",
|
||||
"chain.run(\"The most important thing to remember about Tommy, my 12 year old, is that he'll do anything for apple pie.\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "403ea5dd",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"If we pass in multiple Python functions or OpenAI functions, then the returned output will be of the form\n",
|
||||
"```python\n",
|
||||
"{\"name\": \"<<function_name>>\", \"arguments\": {<<function_arguments>>}}\n",
|
||||
"```"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 42,
|
||||
"id": "8b0d11de",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new chain...\u001b[0m\n",
|
||||
"Prompt after formatting:\n",
|
||||
"\u001b[32;1m\u001b[1;3mHuman: I can't find my dog Henry anywhere, he's a small brown beagle. Could you send a message about him?\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"{'name': 'report_dog',\n",
|
||||
" 'arguments': {'name': 'Henry', 'color': 'brown', 'fav_food': {'food': None}}}"
|
||||
]
|
||||
},
|
||||
"execution_count": 42,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"def record_dog(name: str, color: str, fav_food: OptionalFavFood) -> str:\n",
|
||||
" \"\"\"Record some basic identifying information about a dog.\n",
|
||||
" \n",
|
||||
" Args:\n",
|
||||
" name: The dog's name.\n",
|
||||
" color: The dog's color.\n",
|
||||
" fav_food: An OptionalFavFood object that either contains the dog's favorite food or a null value. Food should be null if it's not known.\n",
|
||||
" \"\"\"\n",
|
||||
" return f\"Recording dog {name} of color {color} with favorite food {fav_food}!\"\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"chain = create_openai_fn_chain([record_person, report_dog], verbose=True)\n",
|
||||
"chain.run(\"I can't find my dog Henry anywhere, he's a small brown beagle. Could you send a message about him?\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "4535ce33",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Creating a Chain that runs the chosen function\n",
|
||||
"We can go one step further and create a chain that actually executes the function chosen by the model."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 43,
|
||||
"id": "43b0dfe0",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import json\n",
|
||||
"import inspect\n",
|
||||
"from typing import Any, Callable, Dict, List, Optional\n",
|
||||
"\n",
|
||||
"from langchain.callbacks.manager import CallbackManagerForChainRun\n",
|
||||
"from langchain.chains.base import Chain\n",
|
||||
"from langchain.input import get_colored_text\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"class FunctionExecutorChain(Chain):\n",
|
||||
" functions: Dict[str, Callable]\n",
|
||||
" output_key: str = \"output\"\n",
|
||||
" input_key: str = \"function\"\n",
|
||||
"\n",
|
||||
" @property\n",
|
||||
" def input_keys(self) -> List[str]:\n",
|
||||
" return [self.input_key]\n",
|
||||
"\n",
|
||||
" @property\n",
|
||||
" def output_keys(self) -> List[str]:\n",
|
||||
" return [self.output_key]\n",
|
||||
"\n",
|
||||
" def _call(\n",
|
||||
" self,\n",
|
||||
" inputs: Dict[str, Any],\n",
|
||||
" run_manager: Optional[CallbackManagerForChainRun] = None,\n",
|
||||
" ) -> Dict[str, Any]:\n",
|
||||
" \"\"\"Run the logic of this chain and return the output.\"\"\"\n",
|
||||
" _run_manager = run_manager or CallbackManagerForChainRun.get_noop_manager()\n",
|
||||
" name = inputs[\"function\"].pop(\"name\")\n",
|
||||
" args = inputs[\"function\"].pop(\"arguments\")\n",
|
||||
" _pretty_name = get_colored_text(name, \"green\")\n",
|
||||
" _pretty_args = get_colored_text(json.dumps(args, indent=2), \"green\")\n",
|
||||
" _text = f\"Calling function {_pretty_name} with arguments:\\n\" + _pretty_args\n",
|
||||
" _run_manager.on_text(_text)\n",
|
||||
" _args = {}\n",
|
||||
" function = self.functions[name]\n",
|
||||
" for arg_name, arg_type in inspect.getfullargspec(function).annotations.items():\n",
|
||||
" if isinstance(arg_type, type) and issubclass(arg_type, BaseModel):\n",
|
||||
" args[arg_name] = arg_type.parse_obj(args[arg_name])\n",
|
||||
" output = function(**args)\n",
|
||||
" return {self.output_key: output}"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 44,
|
||||
"id": "b8391857",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new chain...\u001b[0m\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new chain...\u001b[0m\n",
|
||||
"Calling function \u001b[32;1m\u001b[1;3mrecord_person\u001b[0m with arguments:\n",
|
||||
"\u001b[32;1m\u001b[1;3m{\n",
|
||||
" \"name\": \"Tommy\",\n",
|
||||
" \"age\": 12,\n",
|
||||
" \"fav_food\": {\n",
|
||||
" \"food\": \"apple pie\"\n",
|
||||
" }\n",
|
||||
"}\u001b[0m\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'Recording person Tommy of age 12 with favorite food apple pie!'"
|
||||
]
|
||||
},
|
||||
"execution_count": 44,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain.chains import SequentialChain\n",
|
||||
"from langchain.chains.openai_functions.base import convert_to_openai_function\n",
|
||||
"\n",
|
||||
"functions = [record_person, record_dog]\n",
|
||||
"openai_functions = [convert_to_openai_function(f) for f in functions]\n",
|
||||
"fn_map = {\n",
|
||||
" openai_fn[\"name\"]: fn for openai_fn, fn in zip(openai_functions, functions)\n",
|
||||
"}\n",
|
||||
"llm_chain = create_openai_fn_chain(functions)\n",
|
||||
"exec_chain = FunctionExecutorChain(functions=fn_map, verbose=True)\n",
|
||||
"chain = SequentialChain(\n",
|
||||
" chains=[llm_chain, exec_chain],\n",
|
||||
" input_variables=llm_chain.input_keys,\n",
|
||||
" output_variables=[\"output\"],\n",
|
||||
" verbose=True\n",
|
||||
")\n",
|
||||
"chain.run(\"The most important thing to remember about Tommy, my 12 year old, is that he'll do anything for apple pie.\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "5f93686b",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Other Chains using OpenAI functions\n",
|
||||
"\n",
|
||||
"There are a number of more specific chains that use OpenAI functions.\n",
|
||||
"- [Extraction](/docs/modules/chains/additional/extraction): very similar to structured output chain, intended for information/entity extraction specifically.\n",
|
||||
"- [Tagging](/docs/modules/chains/additional/tagging): tag inputs.\n",
|
||||
"- [OpenAPI](/docs/modules/chains/additional/openapi_openai): take an OpenAPI spec and create + execute valid requests against the API, using OpenAI functions under the hood.\n",
|
||||
"- [QA with citations](/docs/modules/chains/additional/qa_citations): use OpenAI functions ability to extract citations from text."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "93425c66",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "venv",
|
||||
"language": "python",
|
||||
"name": "venv"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.3"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -1,118 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Cube Semantic Layer"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"This notebook demonstrates the process of retrieving Cube's data model metadata in a format suitable for passing to LLMs as embeddings, thereby enhancing contextual information."
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### About Cube"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"[Cube](https://cube.dev/) is the Semantic Layer for building data apps. It helps data engineers and application developers access data from modern data stores, organize it into consistent definitions, and deliver it to every application."
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Cube’s data model provides structure and definitions that are used as a context for LLM to understand data and generate correct queries. LLM doesn’t need to navigate complex joins and metrics calculations because Cube abstracts those and provides a simple interface that operates on the business-level terminology, instead of SQL table and column names. This simplification helps LLM to be less error-prone and avoid hallucinations."
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Example"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"`Cube Semantic Loader` requires 2 arguments:\n",
|
||||
"| Input Parameter | Description |\n",
|
||||
"| --- | --- |\n",
|
||||
"| `cube_api_url` | The URL of your Cube's deployment REST API. Please refer to the [Cube documentation](https://cube.dev/docs/http-api/rest#configuration-base-path) for more information on configuring the base path. |\n",
|
||||
"| `cube_api_token` | The authentication token generated based on your Cube's API secret. Please refer to the [Cube documentation](https://cube.dev/docs/security#generating-json-web-tokens-jwt) for instructions on generating JSON Web Tokens (JWT). |\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import jwt\n",
|
||||
"from langchain.document_loaders import CubeSemanticLoader\n",
|
||||
"\n",
|
||||
"api_url = \"https://api-example.gcp-us-central1.cubecloudapp.dev/cubejs-api/v1/meta\"\n",
|
||||
"cubejs_api_secret = \"api-secret-here\"\n",
|
||||
"security_context = {}\n",
|
||||
"# Read more about security context here: https://cube.dev/docs/security\n",
|
||||
"api_token = jwt.encode(security_context, cubejs_api_secret, algorithm=\"HS256\")\n",
|
||||
"\n",
|
||||
"loader = CubeSemanticLoader(api_url, api_token)\n",
|
||||
"\n",
|
||||
"documents = loader.load()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Returns:\n",
|
||||
"\n",
|
||||
"A list of documents with the following attributes:\n",
|
||||
"\n",
|
||||
"- `page_content`\n",
|
||||
"- `metadata`\n",
|
||||
" - `table_name`\n",
|
||||
" - `column_name`\n",
|
||||
" - `column_data_type`\n",
|
||||
" - `column_title`\n",
|
||||
" - `column_description`"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"> page_content='table name: orders_view, column name: orders_view.total_amount, column data type: number, column title: Orders View Total Amount, column description: None' metadata={'table_name': 'orders_view', 'column_name': 'orders_view.total_amount', 'column_data_type': 'number', 'column_title': 'Orders View Total Amount', 'column_description': 'None'}"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"language_info": {
|
||||
"name": "python"
|
||||
},
|
||||
"orig_nbformat": 4
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
||||
@@ -14,24 +14,31 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "994d6c74",
|
||||
"execution_count": 1,
|
||||
"id": "c2f3f5f2",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Build a sample vectorDB\n",
|
||||
"from langchain.vectorstores import Chroma\n",
|
||||
"from langchain.document_loaders import WebBaseLoader\n",
|
||||
"from langchain.document_loaders import PyPDFLoader\n",
|
||||
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
|
||||
"from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
|
||||
"\n",
|
||||
"# Load blog post\n",
|
||||
"loader = WebBaseLoader(\"https://lilianweng.github.io/posts/2023-06-23-agent/\")\n",
|
||||
"data = loader.load()\n",
|
||||
"# Load PDF\n",
|
||||
"path=\"path-to-files\"\n",
|
||||
"loaders = [\n",
|
||||
" PyPDFLoader(path+\"docs/cs229_lectures/MachineLearning-Lecture01.pdf\"),\n",
|
||||
" PyPDFLoader(path+\"docs/cs229_lectures/MachineLearning-Lecture02.pdf\"),\n",
|
||||
" PyPDFLoader(path+\"docs/cs229_lectures/MachineLearning-Lecture03.pdf\")\n",
|
||||
"]\n",
|
||||
"docs = []\n",
|
||||
"for loader in loaders:\n",
|
||||
" docs.extend(loader.load())\n",
|
||||
" \n",
|
||||
"# Split\n",
|
||||
"text_splitter = RecursiveCharacterTextSplitter(chunk_size = 500, chunk_overlap = 0)\n",
|
||||
"splits = text_splitter.split_documents(data)\n",
|
||||
"text_splitter = RecursiveCharacterTextSplitter(chunk_size = 1500,chunk_overlap = 150)\n",
|
||||
"splits = text_splitter.split_documents(docs)\n",
|
||||
"\n",
|
||||
"# VectorDB\n",
|
||||
"embedding = OpenAIEmbeddings()\n",
|
||||
@@ -57,7 +64,8 @@
|
||||
"source": [
|
||||
"from langchain.chat_models import ChatOpenAI\n",
|
||||
"from langchain.retrievers.multi_query import MultiQueryRetriever\n",
|
||||
"question=\"What are the approaches to Task Decomposition?\"\n",
|
||||
"question=\"What does the course say about regression?\"\n",
|
||||
"num_queries=3\n",
|
||||
"llm = ChatOpenAI(temperature=0)\n",
|
||||
"retriever_from_llm = MultiQueryRetriever.from_llm(retriever=vectordb.as_retriever(),llm=llm)"
|
||||
]
|
||||
@@ -65,19 +73,6 @@
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "9e6d3b69",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Set logging for the queries\n",
|
||||
"import logging\n",
|
||||
"logging.basicConfig()\n",
|
||||
"logging.getLogger('langchain.retrievers.multi_query').setLevel(logging.INFO)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "e5203612",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
@@ -85,22 +80,22 @@
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"INFO:langchain.retrievers.multi_query:Generated queries: ['1. How can Task Decomposition be approached?', '2. What are the different methods for Task Decomposition?', '3. What are the various approaches to decomposing tasks?']\n"
|
||||
"INFO:root:Generated queries: [\"1. What is the course's perspective on regression?\", '2. How does the course discuss regression?', '3. What information does the course provide about regression?']\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"5"
|
||||
"6"
|
||||
]
|
||||
},
|
||||
"execution_count": 5,
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"unique_docs = retriever_from_llm.get_relevant_documents(query=question)\n",
|
||||
"unique_docs = retriever_from_llm.get_relevant_documents(question=\"What does the course say about regression?\")\n",
|
||||
"len(unique_docs)"
|
||||
]
|
||||
},
|
||||
@@ -116,7 +111,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"execution_count": 5,
|
||||
"id": "d9afb0ca",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -156,12 +151,12 @@
|
||||
"llm_chain = LLMChain(llm=llm,prompt=QUERY_PROMPT,output_parser=output_parser)\n",
|
||||
" \n",
|
||||
"# Other inputs\n",
|
||||
"question=\"What are the approaches to Task Decomposition?\""
|
||||
"question=\"What does the course say about regression?\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"execution_count": 6,
|
||||
"id": "6660d7ee",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
@@ -169,16 +164,16 @@
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"INFO:langchain.retrievers.multi_query:Generated queries: [\"1. What is the course's perspective on regression?\", '2. Can you provide information on regression as discussed in the course?', '3. How does the course cover the topic of regression?', \"4. What are the course's teachings on regression?\", '5. In relation to the course, what is mentioned about regression?']\n"
|
||||
"INFO:root:Generated queries: [\"1. What is the course's perspective on regression?\", '2. Can you provide information on regression as discussed in the course?', '3. How does the course cover the topic of regression?', \"4. What are the course's teachings on regression?\", '5. In relation to the course, what is mentioned about regression?']\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"11"
|
||||
"8"
|
||||
]
|
||||
},
|
||||
"execution_count": 7,
|
||||
"execution_count": 6,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
@@ -190,7 +185,7 @@
|
||||
" parser_key=\"lines\") # \"lines\" is the key (attribute name) of the parsed output\n",
|
||||
"\n",
|
||||
"# Results\n",
|
||||
"unique_docs = retriever.get_relevant_documents(query=\"What does the course say about regression?\")\n",
|
||||
"unique_docs = retriever.get_relevant_documents(question=\"What does the course say about regression?\")\n",
|
||||
"len(unique_docs)"
|
||||
]
|
||||
}
|
||||
|
||||
@@ -5,9 +5,9 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Amazon Kendra\n",
|
||||
"# AWS Kendra\n",
|
||||
"\n",
|
||||
"> Amazon Kendra is an intelligent search service provided by Amazon Web Services (AWS). It utilizes advanced natural language processing (NLP) and machine learning algorithms to enable powerful search capabilities across various data sources within an organization. Kendra is designed to help users find the information they need quickly and accurately, improving productivity and decision-making.\n",
|
||||
"> AWS Kendra is an intelligent search service provided by Amazon Web Services (AWS). It utilizes advanced natural language processing (NLP) and machine learning algorithms to enable powerful search capabilities across various data sources within an organization. Kendra is designed to help users find the information they need quickly and accurately, improving productivity and decision-making.\n",
|
||||
"\n",
|
||||
"> With Kendra, users can search across a wide range of content types, including documents, FAQs, knowledge bases, manuals, and websites. It supports multiple languages and can understand complex queries, synonyms, and contextual meanings to provide highly relevant search results."
|
||||
]
|
||||
@@ -17,7 +17,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Using the Amazon Kendra Index Retriever"
|
||||
"## Using the AWS Kendra Index Retriever"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -1,208 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "9597802c",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Clarifai\n",
|
||||
"\n",
|
||||
">[Clarifai](https://www.clarifai.com/) is an AI Platform that provides the full AI lifecycle ranging from data exploration, data labeling, model training, evaluation, and inference.\n",
|
||||
"\n",
|
||||
"This example goes over how to use LangChain to interact with `Clarifai` [models](https://clarifai.com/explore/models). Text embedding models in particular can be found [here](https://clarifai.com/explore/models?page=1&perPage=24&filterData=%5B%7B%22field%22%3A%22model_type_id%22%2C%22value%22%3A%5B%22text-embedder%22%5D%7D%5D).\n",
|
||||
"\n",
|
||||
"To use Clarifai, you must have an account and a Personal Access Token (PAT) key. \n",
|
||||
"[Check here](https://clarifai.com/settings/security) to get or create a PAT."
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "2a773d8d",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Dependencies"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "91ea14ce-831d-409a-a88f-30353acdabd1",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Install required dependencies\n",
|
||||
"!pip install clarifai"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "426f1156",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Imports\n",
|
||||
"Here we will be setting the personal access token. You can find your PAT under [settings/security](https://clarifai.com/settings/security) in your Clarifai account."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "3f5dc9d7-65e3-4b5b-9086-3327d016cfe0",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdin",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
" ········\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# Please login and get your API key from https://clarifai.com/settings/security \n",
|
||||
"from getpass import getpass\n",
|
||||
"\n",
|
||||
"CLARIFAI_PAT = getpass()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "6fb585dd",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Import the required modules\n",
|
||||
"from langchain.embeddings import ClarifaiEmbeddings\n",
|
||||
"from langchain import PromptTemplate, LLMChain"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "16521ed2",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Input\n",
|
||||
"Create a prompt template to be used with the LLM Chain:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "035dea0f",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"template = \"\"\"Question: {question}\n",
|
||||
"\n",
|
||||
"Answer: Let's think step by step.\"\"\"\n",
|
||||
"\n",
|
||||
"prompt = PromptTemplate(template=template, input_variables=[\"question\"])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "c8905eac",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Setup\n",
|
||||
"Set the user id and app id to the application in which the model resides. You can find a list of public models on https://clarifai.com/explore/models\n",
|
||||
"\n",
|
||||
"You will have to also initialize the model id and if needed, the model version id. Some models have many versions, you can choose the one appropriate for your task."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "1fe9bf15",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"USER_ID = 'openai'\n",
|
||||
"APP_ID = 'embed'\n",
|
||||
"MODEL_ID = 'text-embedding-ada'\n",
|
||||
"\n",
|
||||
"# You can provide a specific model version as the model_version_id arg.\n",
|
||||
"# MODEL_VERSION_ID = \"MODEL_VERSION_ID\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"id": "3f3458d9",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Initialize a Clarifai embedding model\n",
|
||||
"embeddings = ClarifaiEmbeddings(pat=CLARIFAI_PAT, user_id=USER_ID, app_id=APP_ID, model_id=MODEL_ID)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"id": "a641dbd9",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"text = \"This is a test document.\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"id": "32b4d5f4-2b8e-4681-856f-19a3dd141ae4",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"query_result = embeddings.embed_query(text)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"id": "47076457-1880-48ac-970f-872ead6f0d94",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"doc_result = embeddings.embed_documents([text])"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.16"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -9,7 +9,7 @@
|
||||
"# Elasticsearch\n",
|
||||
"Walkthrough of how to generate embeddings using a hosted embedding model in Elasticsearch\n",
|
||||
"\n",
|
||||
"The easiest way to instantiate the `ElasticsearchEmbeddings` class it either\n",
|
||||
"The easiest way to instantiate the `ElasticsearchEmebddings` class it either\n",
|
||||
"- using the `from_credentials` constructor if you are using Elastic Cloud\n",
|
||||
"- or using the `from_es_connection` constructor with any Elasticsearch cluster"
|
||||
],
|
||||
|
||||
@@ -10,32 +10,14 @@
|
||||
"\n",
|
||||
">`OpenSearch` helps you develop high quality, maintenance-free, and high performance intelligent search services to provide your users with high search efficiency and accuracy.\n",
|
||||
"\n",
|
||||
">`OpenSearch` provides the vector search feature. In specific scenarios, especially test question search and image search scenarios, you can use the vector search feature together with the multimodal search feature to improve the accuracy of search results.\n",
|
||||
">`OpenSearch` provides the vector search feature. In specific scenarios, especially test question search and image search scenarios, you can use the vector search feature together with the multimodal search feature to improve the accuracy of search results. This topic describes the syntax and usage notes of vector indexes.\n",
|
||||
"\n",
|
||||
"This notebook shows how to use functionality related to the `Alibaba Cloud OpenSearch Vector Search Edition`.\n",
|
||||
"To run, you should have an [OpenSearch Vector Search Edition](https://opensearch.console.aliyun.com) instance up and running:\n",
|
||||
"\n",
|
||||
"Read the [help document](https://www.alibabacloud.com/help/en/opensearch/latest/vector-search) to quickly familiarize and configure OpenSearch Vector Search Edition instance."
|
||||
"Read the [help document](https://www.alibabacloud.com/help/en/opensearch/latest/vector-search) to quickly familiarize and configure OpenSearch Vector Search Edition instance.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"After the instance is up and running, follow these steps to split documents, get embeddings, connect to the alibaba cloud opensearch instance, index documents, and perform vector retrieval."
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"We need to install the following Python packages first."
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
@@ -47,29 +29,10 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We want to use `OpenAIEmbeddings` so we have to get the OpenAI API Key."
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"import getpass\n",
|
||||
"\n",
|
||||
"os.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"OpenAI API Key:\")\n"
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"pycharm": {
|
||||
"name": "#%%\n"
|
||||
}
|
||||
}
|
||||
"After completing the configuration, follow these steps to connect to the instance, index documents, and perform vector retrieval."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
@@ -97,7 +60,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Split documents and get embeddings."
|
||||
"Split documents and get embeddings by call OpenAI API"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -160,7 +123,7 @@
|
||||
" \"id\": \"id\", # The id field name mapping of index document.\n",
|
||||
" \"document\": \"document\", # The text field name mapping of index document.\n",
|
||||
" \"embedding\": \"embedding\", # The embedding field name mapping of index document.\n",
|
||||
" \"name_of_the_metadata_specified_during_search\": \"opensearch_metadata_field_name,=\", # The metadata field name mapping of index document, could specify multiple, The value field contains mapping name and operator, the operator would be used when executing metadata filter query.\n",
|
||||
" \"metadata_x\": \"metadata_x,=\", # The metadata field name mapping of index document, could specify multiple, The value field contains mapping name and operator, the operator would be used when executing metadata filter query.\n",
|
||||
" },\n",
|
||||
")\n",
|
||||
"\n",
|
||||
@@ -176,10 +139,7 @@
|
||||
"# \"id\": \"id\",\n",
|
||||
"# \"document\": \"document\",\n",
|
||||
"# \"embedding\": \"embedding\",\n",
|
||||
"# \"metadata_a\": \"metadata_a,=\" #The value field contains mapping name and operator, the operator would be used when executing metadata filter query\n",
|
||||
"# \"metadata_b\": \"metadata_b,>\"\n",
|
||||
"# \"metadata_c\": \"metadata_c,<\"\n",
|
||||
"# \"metadata_else\": \"metadata_else,=\"\n",
|
||||
"# \"metadata\": \"metadata,=\" #The value field contains mapping name and operator, the operator would be used when executing metadata filter query\n",
|
||||
"# })"
|
||||
]
|
||||
},
|
||||
@@ -291,7 +251,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Query and retrieve data with metadata.\n"
|
||||
"Query and retrieve data with metadata\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -347,4 +307,4 @@
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 4
|
||||
}
|
||||
}
|
||||
|
||||
@@ -8,12 +8,12 @@
|
||||
"source": [
|
||||
"# Clarifai\n",
|
||||
"\n",
|
||||
">[Clarifai](https://www.clarifai.com/) is an AI Platform that provides the full AI lifecycle ranging from data exploration, data labeling, model training, evaluation, and inference. A Clarifai application can be used as a vector database after uploading inputs. \n",
|
||||
">[Clarifai](https://www.clarifai.com/) is a AI Platform that provides the full AI lifecycle ranging from data exploration, data labeling, model building and inference. A Clarifai application can be used as a vector database after uploading inputs. \n",
|
||||
"\n",
|
||||
"This notebook shows how to use functionality related to the `Clarifai` vector database.\n",
|
||||
"\n",
|
||||
"To use Clarifai, you must have an account and a Personal Access Token (PAT) key. \n",
|
||||
"[Check here](https://clarifai.com/settings/security) to get or create a PAT."
|
||||
"To use Clarifai, you must have an account and a Personal Access Token key. \n",
|
||||
"Here are the [installation instructions](https://clarifai.com/settings/security )."
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -58,7 +58,7 @@
|
||||
"# Please login and get your API key from https://clarifai.com/settings/security \n",
|
||||
"from getpass import getpass\n",
|
||||
"\n",
|
||||
"CLARIFAI_PAT = getpass()"
|
||||
"CLARIFAI_PAT_KEY = getpass()"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -92,7 +92,7 @@
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Setup\n",
|
||||
"Setup the user id and app id where the text data will be uploaded. Note: when creating that application please select an appropriate base workflow for indexing your text documents such as the Language-Understanding workflow.\n",
|
||||
"Setup the user id and app id where the text data will be uploaded. \n",
|
||||
"\n",
|
||||
"You will have to first create an account on [Clarifai](https://clarifai.com/login) and then create an application."
|
||||
]
|
||||
@@ -139,7 +139,7 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"clarifai_vector_db = Clarifai.from_texts(user_id=USER_ID, app_id=APP_ID, texts=texts, pat=CLARIFAI_PAT, number_of_docs=NUMBER_OF_DOCS, metadatas = metadatas)"
|
||||
"clarifai_vector_db = Clarifai.from_texts(user_id=USER_ID, app_id=APP_ID, texts=texts, pat=CLARIFAI_PAT_KEY, number_of_docs=NUMBER_OF_DOCS, metadatas = metadatas)"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -1,575 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "683953b3",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Marqo\n",
|
||||
"\n",
|
||||
"This notebook shows how to use functionality related to the Marqo vectorstore.\n",
|
||||
"\n",
|
||||
">[Marqo](https://www.marqo.ai/) is an open-source vector search engine. Marqo allows you to store and query multimodal data such as text and images. Marqo creates the vectors for you using a huge selection of opensource models, you can also provide your own finetuned models and Marqo will handle the loading and inference for you.\n",
|
||||
"\n",
|
||||
"To run this notebook with our docker image please run the following commands first to get Marqo:\n",
|
||||
"\n",
|
||||
"```\n",
|
||||
"docker pull marqoai/marqo:latest\n",
|
||||
"docker rm -f marqo\n",
|
||||
"docker run --name marqo -it --privileged -p 8882:8882 --add-host host.docker.internal:host-gateway marqoai/marqo:latest\n",
|
||||
"```"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "aac9563e",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"!pip install marqo"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "5d1489ec",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.text_splitter import CharacterTextSplitter\n",
|
||||
"from langchain.vectorstores import Marqo\n",
|
||||
"from langchain.document_loaders import TextLoader"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "a3c3999a",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.document_loaders import TextLoader\n",
|
||||
"loader = TextLoader('../../../state_of_the_union.txt')\n",
|
||||
"documents = loader.load()\n",
|
||||
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
|
||||
"docs = text_splitter.split_documents(documents)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "6e104aee",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Index langchain-demo exists.\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"import marqo \n",
|
||||
"\n",
|
||||
"# initialize marqo\n",
|
||||
"marqo_url = \"http://localhost:8882\" # if using marqo cloud replace with your endpoint (console.marqo.ai)\n",
|
||||
"marqo_api_key = \"\" # if using marqo cloud replace with your api key (console.marqo.ai)\n",
|
||||
"\n",
|
||||
"client = marqo.Client(url=marqo_url, api_key=marqo_api_key)\n",
|
||||
"\n",
|
||||
"index_name = \"langchain-demo\"\n",
|
||||
"\n",
|
||||
"docsearch = Marqo.from_documents(docs, index_name=index_name)\n",
|
||||
"\n",
|
||||
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
|
||||
"result_docs = docsearch.similarity_search(query)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "9c608226",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \n",
|
||||
"\n",
|
||||
"Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n",
|
||||
"\n",
|
||||
"One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n",
|
||||
"\n",
|
||||
"And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"print(result_docs[0].page_content)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "98704b27",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \n",
|
||||
"\n",
|
||||
"Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n",
|
||||
"\n",
|
||||
"One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n",
|
||||
"\n",
|
||||
"And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.\n",
|
||||
"0.68647254\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"result_docs = docsearch.similarity_search_with_score(query)\n",
|
||||
"print(result_docs[0][0].page_content, result_docs[0][1], sep=\"\\n\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "eb3395b6",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Additional features\n",
|
||||
"\n",
|
||||
"One of the powerful features of Marqo as a vectorstore is that you can use indexes created externally. For example:\n",
|
||||
"\n",
|
||||
"+ If you had a database of image and text pairs from another application, you can simply just use it in langchain with the Marqo vectorstore. Note that bringing your own multimodal indexes will disable the `add_texts` method.\n",
|
||||
"\n",
|
||||
"+ If you had a database of text documents, you can bring it into the langchain framework and add more texts through `add_texts`.\n",
|
||||
"\n",
|
||||
"The documents that are returned are customised by passing your own function to the `page_content_builder` callback in the search methods."
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "35b99fef",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### Multimodal Example"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"id": "a359ed74",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"{'errors': False,\n",
|
||||
" 'processingTimeMs': 2090.2822139996715,\n",
|
||||
" 'index_name': 'langchain-multimodal-demo',\n",
|
||||
" 'items': [{'_id': 'aa92fc1c-1fb2-4d86-b027-feb507c419f7',\n",
|
||||
" 'result': 'created',\n",
|
||||
" 'status': 201},\n",
|
||||
" {'_id': '5142c258-ef9f-4bf2-a1a6-2307280173a0',\n",
|
||||
" 'result': 'created',\n",
|
||||
" 'status': 201}]}"
|
||||
]
|
||||
},
|
||||
"execution_count": 6,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"\n",
|
||||
"# use a new index\n",
|
||||
"index_name = \"langchain-multimodal-demo\"\n",
|
||||
"\n",
|
||||
"# incase the demo is re-run\n",
|
||||
"try:\n",
|
||||
" client.delete_index(index_name)\n",
|
||||
"except Exception:\n",
|
||||
" print(f\"Creating {index_name}\")\n",
|
||||
" \n",
|
||||
"# This index could have been created by another system\n",
|
||||
"settings = {\"treat_urls_and_pointers_as_images\": True, \"model\": \"ViT-L/14\"}\n",
|
||||
"client.create_index(index_name, **settings)\n",
|
||||
"client.index(index_name).add_documents(\n",
|
||||
" [ \n",
|
||||
" # image of a bus\n",
|
||||
" {\n",
|
||||
" \"caption\": \"Bus\",\n",
|
||||
" \"image\": \"https://raw.githubusercontent.com/marqo-ai/marqo/mainline/examples/ImageSearchGuide/data/image4.jpg\"\n",
|
||||
" },\n",
|
||||
" # image of a plane\n",
|
||||
" { \n",
|
||||
" \"caption\": \"Plane\", \n",
|
||||
" \"image\": \"https://raw.githubusercontent.com/marqo-ai/marqo/mainline/examples/ImageSearchGuide/data/image2.jpg\"\n",
|
||||
" }\n",
|
||||
" ],\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"id": "368d1fab",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def get_content(res):\n",
|
||||
" \"\"\"Helper to format Marqo's documents into text to be used as page_content\"\"\"\n",
|
||||
" return f\"{res['caption']}: {res['image']}\"\n",
|
||||
"\n",
|
||||
"docsearch = Marqo(client, index_name, page_content_builder=get_content)\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"query = \"vehicles that fly\"\n",
|
||||
"doc_results = docsearch.similarity_search(query)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"id": "eef4edf9",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Plane: https://raw.githubusercontent.com/marqo-ai/marqo/mainline/examples/ImageSearchGuide/data/image2.jpg\n",
|
||||
"Bus: https://raw.githubusercontent.com/marqo-ai/marqo/mainline/examples/ImageSearchGuide/data/image4.jpg\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"for doc in doc_results:\n",
|
||||
" print(doc.page_content)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "c255f603",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### Text only example"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"id": "9e9a2b20",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"{'errors': False,\n",
|
||||
" 'processingTimeMs': 139.2144540004665,\n",
|
||||
" 'index_name': 'langchain-byo-index-demo',\n",
|
||||
" 'items': [{'_id': '27c05a1c-b8a9-49a5-ae73-fbf1eb51dc3f',\n",
|
||||
" 'result': 'created',\n",
|
||||
" 'status': 201},\n",
|
||||
" {'_id': '6889afe0-e600-43c1-aa3b-1d91bf6db274',\n",
|
||||
" 'result': 'created',\n",
|
||||
" 'status': 201}]}"
|
||||
]
|
||||
},
|
||||
"execution_count": 9,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"\n",
|
||||
"# use a new index\n",
|
||||
"index_name = \"langchain-byo-index-demo\"\n",
|
||||
"\n",
|
||||
"# incase the demo is re-run\n",
|
||||
"try:\n",
|
||||
" client.delete_index(index_name)\n",
|
||||
"except Exception:\n",
|
||||
" print(f\"Creating {index_name}\")\n",
|
||||
"\n",
|
||||
"# This index could have been created by another system\n",
|
||||
"client.create_index(index_name)\n",
|
||||
"client.index(index_name).add_documents(\n",
|
||||
" [ \n",
|
||||
" {\n",
|
||||
" \"Title\": \"Smartphone\",\n",
|
||||
" \"Description\": \"A smartphone is a portable computer device that combines mobile telephone \"\n",
|
||||
" \"functions and computing functions into one unit.\",\n",
|
||||
" },\n",
|
||||
" { \n",
|
||||
" \"Title\": \"Telephone\",\n",
|
||||
" \"Description\": \"A telephone is a telecommunications device that permits two or more users to\"\n",
|
||||
" \"conduct a conversation when they are too far apart to be easily heard directly.\",\n",
|
||||
" }\n",
|
||||
" ],\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"id": "b2943ea9",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"['9986cc72-adcd-4080-9d74-265c173a9ec3']"
|
||||
]
|
||||
},
|
||||
"execution_count": 10,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# Note text indexes retain the ability to use add_texts despite different field names in documents\n",
|
||||
"# this is because the page_content_builder callback lets you handle these document fields as required\n",
|
||||
"\n",
|
||||
"def get_content(res):\n",
|
||||
" \"\"\"Helper to format Marqo's documents into text to be used as page_content\"\"\"\n",
|
||||
" if 'text' in res:\n",
|
||||
" return res['text']\n",
|
||||
" return res['Description']\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"docsearch = Marqo(client, index_name, page_content_builder=get_content)\n",
|
||||
"\n",
|
||||
"docsearch.add_texts([\"This is a document that is about elephants\"])\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"id": "851450e9",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"A smartphone is a portable computer device that combines mobile telephone functions and computing functions into one unit.\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"query = \"modern communications devices\"\n",
|
||||
"doc_results = docsearch.similarity_search(query)\n",
|
||||
"\n",
|
||||
"print(doc_results[0].page_content)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 12,
|
||||
"id": "9a438fec",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"This is a document that is about elephants\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"query = \"elephants\"\n",
|
||||
"doc_results = docsearch.similarity_search(query, page_content_builder=get_content)\n",
|
||||
"\n",
|
||||
"print(doc_results[0].page_content)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "0d04c9d4",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Weighted Queries\n",
|
||||
"\n",
|
||||
"We also expose marqos weighted queries which are a powerful way to compose complex semantic searches."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 13,
|
||||
"id": "d42ba0d6",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"A smartphone is a portable computer device that combines mobile telephone functions and computing functions into one unit.\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"query = {\"communications devices\" : 1.0}\n",
|
||||
"doc_results = docsearch.similarity_search(query)\n",
|
||||
"print(doc_results[0].page_content)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 14,
|
||||
"id": "b5918a16",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"A telephone is a telecommunications device that permits two or more users toconduct a conversation when they are too far apart to be easily heard directly.\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"query = {\"communications devices\" : 1.0, \"technology post 2000\": -1.0}\n",
|
||||
"doc_results = docsearch.similarity_search(query)\n",
|
||||
"print(doc_results[0].page_content)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "2d026aa0",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Question Answering with Sources\n",
|
||||
"\n",
|
||||
"This section shows how to use Marqo as part of a `RetrievalQAWithSourcesChain`. Marqo will perform the searches for information in the sources."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 15,
|
||||
"id": "e4ca223c",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"OpenAI API Key:········\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain.chains import RetrievalQAWithSourcesChain\n",
|
||||
"from langchain import OpenAI\n",
|
||||
"\n",
|
||||
"import os\n",
|
||||
"import getpass\n",
|
||||
"\n",
|
||||
"os.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"OpenAI API Key:\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 16,
|
||||
"id": "5c6e45f9",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"with open(\"../../../state_of_the_union.txt\") as f:\n",
|
||||
" state_of_the_union = f.read()\n",
|
||||
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
|
||||
"texts = text_splitter.split_text(state_of_the_union)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 17,
|
||||
"id": "70a7f320",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Index langchain-qa-with-retrieval exists.\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"index_name = \"langchain-qa-with-retrieval\"\n",
|
||||
"docsearch = Marqo.from_documents(docs, index_name=index_name)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 18,
|
||||
"id": "b3b008a4",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"chain = RetrievalQAWithSourcesChain.from_chain_type(\n",
|
||||
" OpenAI(temperature=0), chain_type=\"stuff\", retriever=docsearch.as_retriever()\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 19,
|
||||
"id": "e1457716",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"{'answer': ' The president honored Justice Breyer, thanking him for his service and noting that he is a retiring Justice of the United States Supreme Court.\\n',\n",
|
||||
" 'sources': '../../../state_of_the_union.txt'}"
|
||||
]
|
||||
},
|
||||
"execution_count": 19,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"chain(\n",
|
||||
" {\"question\": \"What did the president say about Justice Breyer\"},\n",
|
||||
" return_only_outputs=True,\n",
|
||||
")"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.16"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -1,7 +1,6 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "683953b3",
|
||||
"metadata": {},
|
||||
@@ -45,7 +44,6 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "457ace44-1d95-4001-9dd5-78811ab208ad",
|
||||
"metadata": {},
|
||||
@@ -65,7 +63,6 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "1f3ecc42",
|
||||
"metadata": {},
|
||||
@@ -133,7 +130,7 @@
|
||||
"# initialize MongoDB python client\n",
|
||||
"client = MongoClient(MONGODB_ATLAS_CLUSTER_URI)\n",
|
||||
"\n",
|
||||
"db_name = \"langchain_db\"\n",
|
||||
"db_name = \"lanchain_db\"\n",
|
||||
"collection_name = \"langchain_col\"\n",
|
||||
"collection = client[db_name][collection_name]\n",
|
||||
"index_name = \"langchain_demo\"\n",
|
||||
@@ -159,7 +156,6 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "851a2ec9-9390-49a4-8412-3e132c9f789d",
|
||||
"metadata": {},
|
||||
@@ -187,14 +183,14 @@
|
||||
"db_name = \"langchain_db\"\n",
|
||||
"collection_name = \"langchain_col\"\n",
|
||||
"collection = client[db_name][collection_name]\n",
|
||||
"index_name = \"langchain_demo\"\n",
|
||||
"index_name = \"langchain_index\"\n",
|
||||
"\n",
|
||||
"# initialize vector store\n",
|
||||
"vectorStore = MongoDBAtlasVectorSearch(\n",
|
||||
" collection, OpenAIEmbeddings(), index_name=index_name\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"# perform a similarity search between a query and the ingested documents\n",
|
||||
"# perform a similarity search between the embedding of the query and the embeddings of the documents\n",
|
||||
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
|
||||
"docs = vectorStore.similarity_search(query)\n",
|
||||
"\n",
|
||||
|
||||
@@ -1,338 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "1292f057",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# pg_hnsw\n",
|
||||
"\n",
|
||||
"> [pg_embedding](https://github.com/knizhnik/hnsw) is an open-source vector similarity search for `Postgres` that uses Hierarchical Navigable Small Worlds for approximate nearest neighbor search.\n",
|
||||
"\n",
|
||||
"It supports:\n",
|
||||
"- exact and approximate nearest neighbor search using HNSW\n",
|
||||
"- L2 distance\n",
|
||||
"\n",
|
||||
"This notebook shows how to use the Postgres vector database (`PGEmbedding`).\n",
|
||||
"\n",
|
||||
"> The PGEmbedding integration creates the pg_embedding extension for you, but you run the following Postgres query to add it:\n",
|
||||
"```sql\n",
|
||||
"CREATE EXTENSION embedding;\n",
|
||||
"```"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "a6214221",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Pip install necessary package\n",
|
||||
"!pip install openai\n",
|
||||
"!pip install psycopg2-binary\n",
|
||||
"!pip install tiktoken"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "b2e49694",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Add the OpenAI API Key to the environment variables to use `OpenAIEmbeddings`."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "1dcc8d99",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"OpenAI API Key:········\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"import getpass\n",
|
||||
"\n",
|
||||
"os.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"OpenAI API Key:\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "9719ea68",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"## Loading Environment Variables\n",
|
||||
"from typing import List, Tuple"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "dfd1f38d",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
|
||||
"from langchain.text_splitter import CharacterTextSplitter\n",
|
||||
"from langchain.vectorstores import PGEmbedding\n",
|
||||
"from langchain.document_loaders import TextLoader\n",
|
||||
"from langchain.docstore.document import Document"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "8fab8cc2",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Database Url:········\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"os.environ[\"DATABASE_URL\"] = getpass.getpass(\"Database Url:\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"id": "bef17115",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"loader = TextLoader(\"state_of_the_union.txt\")\n",
|
||||
"documents = loader.load()\n",
|
||||
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
|
||||
"docs = text_splitter.split_documents(documents)\n",
|
||||
"\n",
|
||||
"embeddings = OpenAIEmbeddings()\n",
|
||||
"connection_string = os.environ.get(\"DATABASE_URL\")\n",
|
||||
"collection_name = \"state_of_the_union\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 13,
|
||||
"id": "743abfaa",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"db = PGEmbedding.from_documents(\n",
|
||||
" embedding=embeddings,\n",
|
||||
" documents=docs,\n",
|
||||
" collection_name=collection_name,\n",
|
||||
" connection_string=connection_string,\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
|
||||
"docs_with_score: List[Tuple[Document, float]] = db.similarity_search_with_score(query)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "41ce4c4e",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"for doc, score in docs_with_score:\n",
|
||||
" print(\"-\" * 80)\n",
|
||||
" print(\"Score: \", score)\n",
|
||||
" print(doc.page_content)\n",
|
||||
" print(\"-\" * 80)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "7ef7b052",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Working with vectorstore in Postgres"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "939151f7",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Uploading a vectorstore in PG "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 32,
|
||||
"id": "595ac511",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"db = PGEmbedding.from_documents(\n",
|
||||
" embedding=embeddings,\n",
|
||||
" documents=docs,\n",
|
||||
" collection_name=collection_name,\n",
|
||||
" connection_string=connection_string,\n",
|
||||
" pre_delete_collection=False,\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "f9510e6b",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Create HNSW Index\n",
|
||||
"By default, the extension performs a sequential scan search, with 100% recall. You might consider creating an HNSW index for approximate nearest neighbor (ANN) search to speed up `similarity_search_with_score` execution time. To create the HNSW index on your vector column, use a `create_hnsw_index` function:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "2d1981fa",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"PGEmbedding.create_hnsw_index(\n",
|
||||
" max_elements=10000, dims=1536, m=8, ef_construction=16, ef_search=16\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "7adacf29",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The function above is equivalent to running the below SQL query:\n",
|
||||
"```sql\n",
|
||||
"CREATE INDEX ON vectors USING hnsw(vec) WITH (maxelements=10000, dims=1536, m=3, efconstruction=16, efsearch=16);\n",
|
||||
"```\n",
|
||||
"The HNSW index options used in the statement above include:\n",
|
||||
"\n",
|
||||
"- maxelements: Defines the maximum number of elements indexed. This is a required parameter. The example shown above has a value of 3. A real-world example would have a much large value, such as 1000000. An \"element\" refers to a data point (a vector) in the dataset, which is represented as a node in the HNSW graph. Typically, you would set this option to a value able to accommodate the number of rows in your in your dataset.\n",
|
||||
"- dims: Defines the number of dimensions in your vector data. This is a required parameter. A small value is used in the example above. If you are storing data generated using OpenAI's text-embedding-ada-002 model, which supports 1536 dimensions, you would define a value of 1536, for example.\n",
|
||||
"- m: Defines the maximum number of bi-directional links (also referred to as \"edges\") created for each node during graph construction.\n",
|
||||
"The following additional index options are supported:\n",
|
||||
"\n",
|
||||
"- efConstruction: Defines the number of nearest neighbors considered during index construction. The default value is 32.\n",
|
||||
"- efsearch: Defines the number of nearest neighbors considered during index search. The default value is 32.\n",
|
||||
"For information about how you can configure these options to influence the HNSW algorithm, refer to [Tuning the HNSW algorithm](https://neon-next-git-dprice-hnsw-extension-neondatabase.vercel.app/docs/extensions/hnsw#tuning-the-hnsw-algorithm)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "528893fb",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Retrieving a vectorstore in PG"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 15,
|
||||
"id": "b6162b1c",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"store = PGEmbedding(\n",
|
||||
" connection_string=connection_string,\n",
|
||||
" embedding_function=embeddings,\n",
|
||||
" collection_name=collection_name,\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"retriever = store.as_retriever()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 16,
|
||||
"id": "1a5fedb1",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"VectorStoreRetriever(vectorstore=<langchain.vectorstores.pghnsw.HNSWVectoreStore object at 0x121d3c8b0>, search_type='similarity', search_kwargs={})"
|
||||
]
|
||||
},
|
||||
"execution_count": 16,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"retriever"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 17,
|
||||
"id": "0cefc938",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"db1 = PGEmbedding.from_existing_index(\n",
|
||||
" embedding=embeddings,\n",
|
||||
" collection_name=collection_name,\n",
|
||||
" pre_delete_collection=False,\n",
|
||||
" connection_string=connection_string,\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
|
||||
"docs_with_score: List[Tuple[Document, float]] = db1.similarity_search_with_score(query)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "85cde495",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"for doc, score in docs_with_score:\n",
|
||||
" print(\"-\" * 80)\n",
|
||||
" print(\"Score: \", score)\n",
|
||||
" print(doc.page_content)\n",
|
||||
" print(\"-\" * 80)"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.6"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -1,7 +1,6 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "683953b3",
|
||||
"metadata": {},
|
||||
@@ -52,7 +51,6 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "320af802-9271-46ee-948f-d2453933d44b",
|
||||
"metadata": {},
|
||||
@@ -138,30 +136,6 @@
|
||||
"print(docs[0].page_content)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "86a4b96b",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Adding More Text to an Existing Index\n",
|
||||
"\n",
|
||||
"More text can embedded and upserted to an existing Pinecone index using the `add_texts` function\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "38a7a60e",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"index = pinecone.Index(\"langchain-demo\")\n",
|
||||
"vectorstore = Pinecone(index, embeddings.embed_query, \"text\")\n",
|
||||
"\n",
|
||||
"vectorstore.add_texts(\"More text!\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
|
||||
@@ -43,7 +43,7 @@
|
||||
"\n",
|
||||
" CREATE FUNCTION match_documents(query_embedding vector(1536), match_count int)\n",
|
||||
" RETURNS TABLE(\n",
|
||||
" id uuid,\n",
|
||||
" id bigint,\n",
|
||||
" content text,\n",
|
||||
" metadata jsonb,\n",
|
||||
" -- we return matched vectors to enable maximal marginal relevance searches\n",
|
||||
|
||||
@@ -23,7 +23,7 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"!pip install \"cassio>=0.0.7\""
|
||||
"!pip install \"cassio>=0.0.6\""
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -1,212 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Human input Chat Model\n",
|
||||
"\n",
|
||||
"Along with HumanInputLLM, LangChain also provides a pseudo Chat Model class that can be used for testing, debugging, or educational purposes. This allows you to mock out calls to the Chat Model and simulate how a human would respond if they received the messages.\n",
|
||||
"\n",
|
||||
"In this notebook, we go over how to use this.\n",
|
||||
"\n",
|
||||
"We start this with using the HumanInputChatModel in an agent."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.chat_models.human import HumanInputChatModel"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Since we will use the `WikipediaQueryRun` tool in this notebook, you might need to install the `wikipedia` package if you haven't done so already."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"/Users/mskim58/dev/research/chatbot/github/langchain/.venv/bin/python: No module named pip\n",
|
||||
"Note: you may need to restart the kernel to use updated packages.\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"%pip install wikipedia"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.agents import load_tools\n",
|
||||
"from langchain.agents import initialize_agent\n",
|
||||
"from langchain.agents import AgentType"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"tools = load_tools([\"wikipedia\"])\n",
|
||||
"llm = HumanInputChatModel()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"agent = initialize_agent(\n",
|
||||
" tools, llm, agent=AgentType.CHAT_ZERO_SHOT_REACT_DESCRIPTION, verbose=True\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new chain...\u001b[0m\n",
|
||||
"\n",
|
||||
" ======= start of message ======= \n",
|
||||
"\n",
|
||||
"\n",
|
||||
"type: system\n",
|
||||
"data:\n",
|
||||
" content: \"Answer the following questions as best you can. You have access to the following tools:\\n\\nWikipedia: A wrapper around Wikipedia. Useful for when you need to answer general questions about people, places, companies, facts, historical events, or other subjects. Input should be a search query.\\n\\nThe way you use the tools is by specifying a json blob.\\nSpecifically, this json should have a `action` key (with the name of the tool to use) and a `action_input` key (with the input to the tool going here).\\n\\nThe only values that should be in the \\\"action\\\" field are: Wikipedia\\n\\nThe $JSON_BLOB should only contain a SINGLE action, do NOT return a list of multiple actions. Here is an example of a valid $JSON_BLOB:\\n\\n```\\n{\\n \\\"action\\\": $TOOL_NAME,\\n \\\"action_input\\\": $INPUT\\n}\\n```\\n\\nALWAYS use the following format:\\n\\nQuestion: the input question you must answer\\nThought: you should always think about what to do\\nAction:\\n```\\n$JSON_BLOB\\n```\\nObservation: the result of the action\\n... (this Thought/Action/Observation can repeat N times)\\nThought: I now know the final answer\\nFinal Answer: the final answer to the original input question\\n\\nBegin! Reminder to always use the exact characters `Final Answer` when responding.\"\n",
|
||||
" additional_kwargs: {}\n",
|
||||
"\n",
|
||||
"======= end of message ======= \n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\n",
|
||||
" ======= start of message ======= \n",
|
||||
"\n",
|
||||
"\n",
|
||||
"type: human\n",
|
||||
"data:\n",
|
||||
" content: 'What is Bocchi the Rock?\n",
|
||||
"\n",
|
||||
"\n",
|
||||
" '\n",
|
||||
" additional_kwargs: {}\n",
|
||||
" example: false\n",
|
||||
"\n",
|
||||
"======= end of message ======= \n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[32;1m\u001b[1;3mAction:\n",
|
||||
"```\n",
|
||||
"{\n",
|
||||
" \"action\": \"Wikipedia\",\n",
|
||||
" \"action_input\": \"What is Bocchi the Rock?\"\n",
|
||||
"}\n",
|
||||
"```\u001b[0m\n",
|
||||
"Observation: \u001b[36;1m\u001b[1;3mPage: Bocchi the Rock!\n",
|
||||
"Summary: Bocchi the Rock! (ぼっち・ざ・ろっく!, Botchi Za Rokku!) is a Japanese four-panel manga series written and illustrated by Aki Hamaji. It has been serialized in Houbunsha's seinen manga magazine Manga Time Kirara Max since December 2017. Its chapters have been collected in five tankōbon volumes as of November 2022.\n",
|
||||
"An anime television series adaptation produced by CloverWorks aired from October to December 2022. The series has been praised for its writing, comedy, characters, and depiction of social anxiety, with the anime's visual creativity receiving acclaim.\n",
|
||||
"\n",
|
||||
"Page: Hitori Bocchi no Marumaru Seikatsu\n",
|
||||
"Summary: Hitori Bocchi no Marumaru Seikatsu (Japanese: ひとりぼっちの○○生活, lit. \"Bocchi Hitori's ____ Life\" or \"The ____ Life of Being Alone\") is a Japanese yonkoma manga series written and illustrated by Katsuwo. It was serialized in ASCII Media Works' Comic Dengeki Daioh \"g\" magazine from September 2013 to April 2021. Eight tankōbon volumes have been released. An anime television series adaptation by C2C aired from April to June 2019.\n",
|
||||
"\n",
|
||||
"Page: Kessoku Band (album)\n",
|
||||
"Summary: Kessoku Band (Japanese: 結束バンド, Hepburn: Kessoku Bando) is the debut studio album by Kessoku Band, a fictional musical group from the anime television series Bocchi the Rock!, released digitally on December 25, 2022, and physically on CD on December 28 by Aniplex. Featuring vocals from voice actresses Yoshino Aoyama, Sayumi Suzushiro, Saku Mizuno, and Ikumi Hasegawa, the album consists of 14 tracks previously heard in the anime, including a cover of Asian Kung-Fu Generation's \"Rockn' Roll, Morning Light Falls on You\", as well as newly recorded songs; nine singles preceded the album's physical release. Commercially, Kessoku Band peaked at number one on the Billboard Japan Hot Albums Chart and Oricon Albums Chart, and was certified gold by the Recording Industry Association of Japan.\n",
|
||||
"\n",
|
||||
"\u001b[0m\n",
|
||||
"Thought:\n",
|
||||
" ======= start of message ======= \n",
|
||||
"\n",
|
||||
"\n",
|
||||
"type: system\n",
|
||||
"data:\n",
|
||||
" content: \"Answer the following questions as best you can. You have access to the following tools:\\n\\nWikipedia: A wrapper around Wikipedia. Useful for when you need to answer general questions about people, places, companies, facts, historical events, or other subjects. Input should be a search query.\\n\\nThe way you use the tools is by specifying a json blob.\\nSpecifically, this json should have a `action` key (with the name of the tool to use) and a `action_input` key (with the input to the tool going here).\\n\\nThe only values that should be in the \\\"action\\\" field are: Wikipedia\\n\\nThe $JSON_BLOB should only contain a SINGLE action, do NOT return a list of multiple actions. Here is an example of a valid $JSON_BLOB:\\n\\n```\\n{\\n \\\"action\\\": $TOOL_NAME,\\n \\\"action_input\\\": $INPUT\\n}\\n```\\n\\nALWAYS use the following format:\\n\\nQuestion: the input question you must answer\\nThought: you should always think about what to do\\nAction:\\n```\\n$JSON_BLOB\\n```\\nObservation: the result of the action\\n... (this Thought/Action/Observation can repeat N times)\\nThought: I now know the final answer\\nFinal Answer: the final answer to the original input question\\n\\nBegin! Reminder to always use the exact characters `Final Answer` when responding.\"\n",
|
||||
" additional_kwargs: {}\n",
|
||||
"\n",
|
||||
"======= end of message ======= \n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\n",
|
||||
" ======= start of message ======= \n",
|
||||
"\n",
|
||||
"\n",
|
||||
"type: human\n",
|
||||
"data:\n",
|
||||
" content: \"What is Bocchi the Rock?\\n\\nThis was your previous work (but I haven't seen any of it! I only see what you return as final answer):\\nAction:\\n```\\n{\\n \\\"action\\\": \\\"Wikipedia\\\",\\n \\\"action_input\\\": \\\"What is Bocchi the Rock?\\\"\\n}\\n```\\nObservation: Page: Bocchi the Rock!\\nSummary: Bocchi the Rock! (ぼっち・ざ・ろっく!, Botchi Za Rokku!) is a Japanese four-panel manga series written and illustrated by Aki Hamaji. It has been serialized in Houbunsha's seinen manga magazine Manga Time Kirara Max since December 2017. Its chapters have been collected in five tankōbon volumes as of November 2022.\\nAn anime television series adaptation produced by CloverWorks aired from October to December 2022. The series has been praised for its writing, comedy, characters, and depiction of social anxiety, with the anime's visual creativity receiving acclaim.\\n\\nPage: Hitori Bocchi no Marumaru Seikatsu\\nSummary: Hitori Bocchi no Marumaru Seikatsu (Japanese: ひとりぼっちの○○生活, lit. \\\"Bocchi Hitori's ____ Life\\\" or \\\"The ____ Life of Being Alone\\\") is a Japanese yonkoma manga series written and illustrated by Katsuwo. It was serialized in ASCII Media Works' Comic Dengeki Daioh \\\"g\\\" magazine from September 2013 to April 2021. Eight tankōbon volumes have been released. An anime television series adaptation by C2C aired from April to June 2019.\\n\\nPage: Kessoku Band (album)\\nSummary: Kessoku Band (Japanese: 結束バンド, Hepburn: Kessoku Bando) is the debut studio album by Kessoku Band, a fictional musical group from the anime television series Bocchi the Rock!, released digitally on December 25, 2022, and physically on CD on December 28 by Aniplex. Featuring vocals from voice actresses Yoshino Aoyama, Sayumi Suzushiro, Saku Mizuno, and Ikumi Hasegawa, the album consists of 14 tracks previously heard in the anime, including a cover of Asian Kung-Fu Generation's \\\"Rockn' Roll, Morning Light Falls on You\\\", as well as newly recorded songs; nine singles preceded the album's physical release. Commercially, Kessoku Band peaked at number one on the Billboard Japan Hot Albums Chart and Oricon Albums Chart, and was certified gold by the Recording Industry Association of Japan.\\n\\n\\nThought:\"\n",
|
||||
" additional_kwargs: {}\n",
|
||||
" example: false\n",
|
||||
"\n",
|
||||
"======= end of message ======= \n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[32;1m\u001b[1;3mThis finally works.\n",
|
||||
"Final Answer: Bocchi the Rock! is a four-panel manga series and anime television series. The series has been praised for its writing, comedy, characters, and depiction of social anxiety, with the anime's visual creativity receiving acclaim.\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"{'input': 'What is Bocchi the Rock?',\n",
|
||||
" 'output': \"Bocchi the Rock! is a four-panel manga series and anime television series. The series has been praised for its writing, comedy, characters, and depiction of social anxiety, with the anime's visual creativity receiving acclaim.\"}"
|
||||
]
|
||||
},
|
||||
"execution_count": 6,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"agent(\"What is Bocchi the Rock?\")"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.9"
|
||||
},
|
||||
"orig_nbformat": 4
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
||||
@@ -52,7 +52,7 @@
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"AIMessage(content=\" J'aime la programmation.\", additional_kwargs={}, example=False)"
|
||||
"AIMessage(content=\" J'aime programmer. \", additional_kwargs={})"
|
||||
]
|
||||
},
|
||||
"execution_count": 3,
|
||||
@@ -101,7 +101,7 @@
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"LLMResult(generations=[[ChatGeneration(text=\" J'aime programmer.\", generation_info=None, message=AIMessage(content=\" J'aime programmer.\", additional_kwargs={}, example=False))]], llm_output={}, run=[RunInfo(run_id=UUID('8cc8fb68-1c35-439c-96a0-695036a93652'))])"
|
||||
"LLMResult(generations=[[ChatGeneration(text=\" J'aime la programmation.\", generation_info=None, message=AIMessage(content=\" J'aime la programmation.\", additional_kwargs={}))]], llm_output={})"
|
||||
]
|
||||
},
|
||||
"execution_count": 5,
|
||||
@@ -125,13 +125,13 @@
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
" J'aime la programmation."
|
||||
" J'adore programmer."
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"AIMessage(content=\" J'aime la programmation.\", additional_kwargs={}, example=False)"
|
||||
"AIMessage(content=\" J'adore programmer.\", additional_kwargs={})"
|
||||
]
|
||||
},
|
||||
"execution_count": 6,
|
||||
@@ -151,7 +151,7 @@
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "c253883f",
|
||||
"id": "df45f59f",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
@@ -173,7 +173,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.1"
|
||||
"version": "3.11.3"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
||||
@@ -8,12 +8,9 @@
|
||||
"source": [
|
||||
"# Clarifai\n",
|
||||
"\n",
|
||||
">[Clarifai](https://www.clarifai.com/) is an AI Platform that provides the full AI lifecycle ranging from data exploration, data labeling, model training, evaluation, and inference.\n",
|
||||
">[Clarifai](https://www.clarifai.com/) is a AI Platform that provides the full AI lifecycle ranging from data exploration, data labeling, model building and inference.\n",
|
||||
"\n",
|
||||
"This example goes over how to use LangChain to interact with `Clarifai` [models](https://clarifai.com/explore/models). \n",
|
||||
"\n",
|
||||
"To use Clarifai, you must have an account and a Personal Access Token (PAT) key. \n",
|
||||
"[Check here](https://clarifai.com/settings/security) to get or create a PAT."
|
||||
"This example goes over how to use LangChain to interact with `Clarifai` [models](https://clarifai.com/explore/models)."
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -45,7 +42,7 @@
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Imports\n",
|
||||
"Here we will be setting the personal access token. You can find your PAT under [settings/security](https://clarifai.com/settings/security) in your Clarifai account."
|
||||
"Here we will be setting the personal access token. You can find your PAT under settings/security on the platform."
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -55,25 +52,17 @@
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdin",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
" ········\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Please login and get your API key from https://clarifai.com/settings/security \n",
|
||||
"from getpass import getpass\n",
|
||||
"\n",
|
||||
"CLARIFAI_PAT = getpass()"
|
||||
"CLARIFAI_PAT_KEY = getpass()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"execution_count": 2,
|
||||
"id": "6fb585dd",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
@@ -92,12 +81,12 @@
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Input\n",
|
||||
"Create a prompt template to be used with the LLM Chain:"
|
||||
"Create a prompt template to be used with the LLM Chain"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"execution_count": 3,
|
||||
"id": "035dea0f",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
@@ -132,10 +121,10 @@
|
||||
"source": [
|
||||
"USER_ID = 'openai'\n",
|
||||
"APP_ID = 'chat-completion'\n",
|
||||
"MODEL_ID = 'GPT-3_5-turbo'\n",
|
||||
"MODEL_ID = 'chatgpt-3_5-turbo'\n",
|
||||
"\n",
|
||||
"# You can provide a specific model version as the model_version_id arg.\n",
|
||||
"# MODEL_VERSION_ID = \"MODEL_VERSION_ID\""
|
||||
"# You can provide a specific model version\n",
|
||||
"# model_version_id = \"MODEL_VERSION_ID\""
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -148,7 +137,7 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Initialize a Clarifai LLM\n",
|
||||
"clarifai_llm = Clarifai(pat=CLARIFAI_PAT, user_id=USER_ID, app_id=APP_ID, model_id=MODEL_ID)"
|
||||
"clarifai_llm = Clarifai(clarifai_pat_key=CLARIFAI_PAT_KEY, user_id=USER_ID, app_id=APP_ID, model_id=MODEL_ID)"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -182,7 +171,7 @@
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'Justin Bieber was born on March 1, 1994. So, we need to figure out the Super Bowl winner for the 1994 season. The NFL season spans two calendar years, so the Super Bowl for the 1994 season would have taken place in early 1995. \\n\\nThe Super Bowl in question is Super Bowl XXIX, which was played on January 29, 1995. The game was won by the San Francisco 49ers, who defeated the San Diego Chargers by a score of 49-26. Therefore, the San Francisco 49ers won the Super Bowl in the year Justin Bieber was born.'"
|
||||
"'Justin Bieber was born on March 1, 1994. So, we need to look at the Super Bowl that was played in the year 1994. \\n\\nThe Super Bowl in 1994 was Super Bowl XXVIII (28). It was played on January 30, 1994, between the Dallas Cowboys and the Buffalo Bills. \\n\\nThe Dallas Cowboys won the Super Bowl in 1994, defeating the Buffalo Bills by a score of 30-13. \\n\\nTherefore, the Dallas Cowboys are the NFL team that won the Super Bowl in the year Justin Bieber was born.'"
|
||||
]
|
||||
},
|
||||
"execution_count": 7,
|
||||
|
||||
@@ -32,7 +32,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"execution_count": 1,
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
@@ -45,7 +45,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"execution_count": 2,
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
@@ -64,20 +64,13 @@
|
||||
"source": [
|
||||
"### Specify Model\n",
|
||||
"\n",
|
||||
"To run locally, download a compatible ggml-formatted model. \n",
|
||||
" \n",
|
||||
"**Download option 1**: The [gpt4all page](https://gpt4all.io/index.html) has a useful `Model Explorer` section:\n",
|
||||
"To run locally, download a compatible ggml-formatted model. For more info, visit https://github.com/nomic-ai/gpt4all\n",
|
||||
"\n",
|
||||
"* Select a model of interest\n",
|
||||
"* Download using the UI and move the `.bin` to the `local_path` (noted below)\n",
|
||||
"For full installation instructions go [here](https://gpt4all.io/index.html).\n",
|
||||
"\n",
|
||||
"For more info, visit https://github.com/nomic-ai/gpt4all.\n",
|
||||
"The GPT4All Chat installer needs to decompress a 3GB LLM model during the installation process!\n",
|
||||
"\n",
|
||||
"--- \n",
|
||||
"\n",
|
||||
"**Download option 2**: Uncomment the below block to download a model. \n",
|
||||
"\n",
|
||||
"* You may want to update `url` to a new version, whih can be browsed using the [gpt4all page](https://gpt4all.io/index.html)."
|
||||
"Note that new models are uploaded regularly - check the link above for the most recent `.bin` URL"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -88,8 +81,22 @@
|
||||
"source": [
|
||||
"local_path = (\n",
|
||||
" \"./models/ggml-gpt4all-l13b-snoozy.bin\" # replace with your desired local file path\n",
|
||||
")\n",
|
||||
"\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Uncomment the below block to download a model. You may want to update `url` to a new version."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# import requests\n",
|
||||
"\n",
|
||||
"# from pathlib import Path\n",
|
||||
@@ -119,10 +126,8 @@
|
||||
"source": [
|
||||
"# Callbacks support token-wise streaming\n",
|
||||
"callbacks = [StreamingStdOutCallbackHandler()]\n",
|
||||
"\n",
|
||||
"# Verbose is required to pass to the callback manager\n",
|
||||
"llm = GPT4All(model=local_path, callbacks=callbacks, verbose=True)\n",
|
||||
"\n",
|
||||
"# If you want to use a custom model add the backend parameter\n",
|
||||
"# Check https://docs.gpt4all.io/gpt4all_python.html for supported backends\n",
|
||||
"llm = GPT4All(model=local_path, backend=\"gptj\", callbacks=callbacks, verbose=True)"
|
||||
@@ -165,7 +170,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.16"
|
||||
"version": "3.11.2"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
||||
@@ -1,26 +1,20 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "959300d4",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Hugging Face Hub\n",
|
||||
"\n",
|
||||
">The [Hugging Face Hub](https://huggingface.co/docs/hub/index) is a platform with over 120k models, 20k datasets, and 50k demo apps (Spaces), all open source and publicly available, in an online platform where people can easily collaborate and build ML together.\n",
|
||||
"The [Hugging Face Hub](https://huggingface.co/docs/hub/index) is a platform with over 120k models, 20k datasets, and 50k demo apps (Spaces), all open source and publicly available, in an online platform where people can easily collaborate and build ML together.\n",
|
||||
"\n",
|
||||
"This example showcases how to connect to the `Hugging Face Hub` and use different models."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "1ddafc6d-7d7c-48fa-838f-0e7f50895ce3",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Installation and Setup"
|
||||
"This example showcases how to connect to the Hugging Face Hub."
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "4c1b8450-5eaf-4d34-8341-2d785448a1ff",
|
||||
"metadata": {
|
||||
@@ -32,30 +26,22 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"execution_count": null,
|
||||
"id": "d772b637-de00-4663-bd77-9bc96d798db2",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"!pip install huggingface_hub"
|
||||
"!pip install huggingface_hub > /dev/null"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"execution_count": null,
|
||||
"id": "d597a792-354c-4ca5-b483-5965eec5d63d",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdin",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
" ········\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# get a token: https://huggingface.co/docs/api-inference/quicktour#get-your-api-token\n",
|
||||
"\n",
|
||||
@@ -66,7 +52,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"execution_count": null,
|
||||
"id": "b8c5b88c-e4b8-4d0d-9a35-6e8f106452c2",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -77,101 +63,108 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "84dd44c1-c428-41f3-a911-520281386c94",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Prepare Examples"
|
||||
"**Select a Model**"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "3fe7d1d1-241d-426a-acff-e208f1088871",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain import HuggingFaceHub"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "6620f39b-3d32-4840-8931-ff7d2c3e47e8",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain import PromptTemplate, LLMChain"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "44adc1a0-9c0a-4f1e-af5a-fe04222e78d7",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"question = \"Who won the FIFA World Cup in the year 1994? \"\n",
|
||||
"\n",
|
||||
"template = \"\"\"Question: {question}\n",
|
||||
"\n",
|
||||
"Answer: Let's think step by step.\"\"\"\n",
|
||||
"\n",
|
||||
"prompt = PromptTemplate(template=template, input_variables=[\"question\"])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "ddaa06cf-95ec-48ce-b0ab-d892a7909693",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Examples\n",
|
||||
"\n",
|
||||
"Below are some examples of models you can access through the `Hugging Face Hub` integration."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "4c16fded-70d1-42af-8bfa-6ddda9f0bc63",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Flan, by Google"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "39c7eeac-01c4-486b-9480-e828a9e73e78",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"repo_id = \"google/flan-t5-xxl\" # See https://huggingface.co/models?pipeline_tag=text-generation&sort=downloads for some other options"
|
||||
"from langchain import HuggingFaceHub\n",
|
||||
"\n",
|
||||
"repo_id = \"google/flan-t5-xl\" # See https://huggingface.co/models?pipeline_tag=text-generation&sort=downloads for some other options\n",
|
||||
"\n",
|
||||
"llm = HuggingFaceHub(repo_id=repo_id, model_kwargs={\"temperature\": 0, \"max_length\": 64})"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"execution_count": null,
|
||||
"id": "3acf0069",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"The FIFA World Cup was held in the year 1994. West Germany won the FIFA World Cup in 1994\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"llm = HuggingFaceHub(repo_id=repo_id, model_kwargs={\"temperature\": 0.5, \"max_length\": 64})\n",
|
||||
"from langchain import PromptTemplate, LLMChain\n",
|
||||
"\n",
|
||||
"template = \"\"\"Question: {question}\n",
|
||||
"\n",
|
||||
"Answer: Let's think step by step.\"\"\"\n",
|
||||
"prompt = PromptTemplate(template=template, input_variables=[\"question\"])\n",
|
||||
"llm_chain = LLMChain(prompt=prompt, llm=llm)\n",
|
||||
"\n",
|
||||
"question = \"Who won the FIFA World Cup in the year 1994? \"\n",
|
||||
"\n",
|
||||
"print(llm_chain.run(question))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "ddaa06cf-95ec-48ce-b0ab-d892a7909693",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Examples\n",
|
||||
"\n",
|
||||
"Below are some examples of models you can access through the Hugging Face Hub integration."
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "4fa9337e-ccb5-4c52-9b7c-1653148bc256",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### StableLM, by Stability AI\n",
|
||||
"\n",
|
||||
"See [Stability AI's](https://huggingface.co/stabilityai) organization page for a list of available models."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "36a1ce01-bd46-451f-8ee6-61c8f4bd665a",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"repo_id = \"stabilityai/stablelm-tuned-alpha-3b\"\n",
|
||||
"# Others include stabilityai/stablelm-base-alpha-3b\n",
|
||||
"# as well as 7B parameter versions"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "b5654cea-60b0-4f40-ab34-06ba1eca810d",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"llm = HuggingFaceHub(repo_id=repo_id, model_kwargs={\"temperature\": 0, \"max_length\": 64})"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "2f19d0dc-c987-433f-a8d6-b1214e8ee067",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Reuse the prompt and question from above.\n",
|
||||
"llm_chain = LLMChain(prompt=prompt, llm=llm)\n",
|
||||
"print(llm_chain.run(question))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "1a5c97af-89bc-4e59-95c1-223742a9160b",
|
||||
"metadata": {},
|
||||
@@ -183,40 +176,34 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"execution_count": null,
|
||||
"id": "521fcd2b-8e38-4920-b407-5c7d330411c9",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"repo_id = \"databricks/dolly-v2-3b\""
|
||||
"from langchain import HuggingFaceHub\n",
|
||||
"\n",
|
||||
"repo_id = \"databricks/dolly-v2-3b\"\n",
|
||||
"\n",
|
||||
"llm = HuggingFaceHub(repo_id=repo_id, model_kwargs={\"temperature\": 0, \"max_length\": 64})"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"execution_count": null,
|
||||
"id": "9907ec3a-fe0c-4543-81c4-d42f9453f16c",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
" First of all, the world cup was won by the Germany. Then the Argentina won the world cup in 2022. So, the Argentina won the world cup in 1994.\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"Question: Who\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"llm = HuggingFaceHub(repo_id=repo_id, model_kwargs={\"temperature\": 0.5, \"max_length\": 64})\n",
|
||||
"# Reuse the prompt and question from above.\n",
|
||||
"llm_chain = LLMChain(prompt=prompt, llm=llm)\n",
|
||||
"print(llm_chain.run(question))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "03f6ae52-b5f9-4de6-832c-551cb3fa11ae",
|
||||
"metadata": {},
|
||||
@@ -228,14 +215,17 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 13,
|
||||
"execution_count": null,
|
||||
"id": "257a091d-750b-4910-ac08-fe1c7b3fd98b",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"repo_id = \"Writer/camel-5b-hf\" # See https://huggingface.co/Writer for other options"
|
||||
"from langchain import HuggingFaceHub\n",
|
||||
"\n",
|
||||
"repo_id = \"Writer/camel-5b-hf\" # See https://huggingface.co/Writer for other options\n",
|
||||
"llm = HuggingFaceHub(repo_id=repo_id, model_kwargs={\"temperature\": 0, \"max_length\": 64})"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -245,74 +235,27 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"llm = HuggingFaceHub(repo_id=repo_id, model_kwargs={\"temperature\": 0.5, \"max_length\": 64})\n",
|
||||
"# Reuse the prompt and question from above.\n",
|
||||
"llm_chain = LLMChain(prompt=prompt, llm=llm)\n",
|
||||
"print(llm_chain.run(question))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "2bf838eb-1083-402f-b099-b07c452418c8",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### XGen, by Salesforce\n",
|
||||
"\n",
|
||||
"See [more information](https://github.com/salesforce/xgen)."
|
||||
"**And many more!**"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"execution_count": null,
|
||||
"id": "18c78880-65d7-41d0-9722-18090efb60e9",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"repo_id = \"Salesforce/xgen-7b-8k-base\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "1b1150b4-ec30-4674-849e-6a41b085aa2b",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"llm = HuggingFaceHub(repo_id=repo_id, model_kwargs={\"temperature\": 0.5, \"max_length\": 64})\n",
|
||||
"llm_chain = LLMChain(prompt=prompt, llm=llm)\n",
|
||||
"print(llm_chain.run(question))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "0aca9f9e-f333-449c-97b2-10d1dbf17e75",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Falcon, by Technology Innovation Institute (TII)\n",
|
||||
"\n",
|
||||
"See [more information](https://huggingface.co/tiiuae/falcon-40b)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"id": "496b35ac-5ee2-4b68-a6ce-232608f56c03",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"repo_id = \"tiiuae/falcon-40b\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "ff2541ad-e394-4179-93c2-7ae9c4ca2a25",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"llm = HuggingFaceHub(repo_id=repo_id, model_kwargs={\"temperature\": 0.5, \"max_length\": 64})\n",
|
||||
"llm_chain = LLMChain(prompt=prompt, llm=llm)\n",
|
||||
"print(llm_chain.run(question))"
|
||||
]
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
@@ -331,7 +274,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.6"
|
||||
"version": "3.11.2"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
||||
@@ -253,7 +253,7 @@ html_text = """
|
||||
|
||||
```python
|
||||
html_splitter = RecursiveCharacterTextSplitter.from_language(
|
||||
language=Language.HTML, chunk_size=60, chunk_overlap=0
|
||||
language=Language.MARKDOWN, chunk_size=60, chunk_overlap=0
|
||||
)
|
||||
html_docs = html_splitter.create_documents([html_text])
|
||||
html_docs
|
||||
@@ -262,18 +262,19 @@ html_docs
|
||||
<CodeOutputBlock lang="python">
|
||||
|
||||
```
|
||||
[Document(page_content='<!DOCTYPE html>\n<html>', metadata={}),
|
||||
Document(page_content='<head>\n <title>🦜️🔗 LangChain</title>', metadata={}),
|
||||
Document(page_content='<style>\n body {\n font-family: Aria', metadata={}),
|
||||
Document(page_content='l, sans-serif;\n }\n h1 {', metadata={}),
|
||||
Document(page_content='color: darkblue;\n }\n </style>\n </head', metadata={}),
|
||||
Document(page_content='>', metadata={}),
|
||||
Document(page_content='<body>', metadata={}),
|
||||
Document(page_content='<div>\n <h1>🦜️🔗 LangChain</h1>', metadata={}),
|
||||
Document(page_content='<p>⚡ Building applications with LLMs through composability ⚡', metadata={}),
|
||||
Document(page_content='</p>\n </div>', metadata={}),
|
||||
Document(page_content='<div>\n As an open source project in a rapidly dev', metadata={}),
|
||||
Document(page_content='eloping field, we are extremely open to contributions.', metadata={}),
|
||||
[Document(page_content='<!DOCTYPE html>\n<html>\n <head>', metadata={}),
|
||||
Document(page_content='<title>🦜️🔗 LangChain</title>\n <style>', metadata={}),
|
||||
Document(page_content='body {', metadata={}),
|
||||
Document(page_content='font-family: Arial, sans-serif;', metadata={}),
|
||||
Document(page_content='}\n h1 {', metadata={}),
|
||||
Document(page_content='color: darkblue;\n }', metadata={}),
|
||||
Document(page_content='</style>\n </head>\n <body>\n <div>', metadata={}),
|
||||
Document(page_content='<h1>🦜️🔗 LangChain</h1>', metadata={}),
|
||||
Document(page_content='<p>⚡ Building applications with LLMs through', metadata={}),
|
||||
Document(page_content='composability ⚡</p>', metadata={}),
|
||||
Document(page_content='</div>\n <div>', metadata={}),
|
||||
Document(page_content='As an open source project in a rapidly', metadata={}),
|
||||
Document(page_content='developing field, we are extremely open to contributions.', metadata={}),
|
||||
Document(page_content='</div>\n </body>\n</html>', metadata={})]
|
||||
```
|
||||
|
||||
@@ -309,4 +310,4 @@ sol_docs
|
||||
]
|
||||
```
|
||||
|
||||
</CodeOutputBlock>
|
||||
</CodeOutputBlock>
|
||||
@@ -12,7 +12,6 @@ from langchain.agents.agent_toolkits.jira.toolkit import JiraToolkit
|
||||
from langchain.agents.agent_toolkits.json.base import create_json_agent
|
||||
from langchain.agents.agent_toolkits.json.toolkit import JsonToolkit
|
||||
from langchain.agents.agent_toolkits.nla.toolkit import NLAToolkit
|
||||
from langchain.agents.agent_toolkits.office365.toolkit import O365Toolkit
|
||||
from langchain.agents.agent_toolkits.openapi.base import create_openapi_agent
|
||||
from langchain.agents.agent_toolkits.openapi.toolkit import OpenAPIToolkit
|
||||
from langchain.agents.agent_toolkits.pandas.base import create_pandas_dataframe_agent
|
||||
@@ -65,5 +64,4 @@ __all__ = [
|
||||
"FileManagementToolkit",
|
||||
"PlayWrightBrowserToolkit",
|
||||
"AzureCognitiveServicesToolkit",
|
||||
"O365Toolkit",
|
||||
]
|
||||
|
||||
@@ -1,5 +1,5 @@
|
||||
"""Toolkits for agents."""
|
||||
from abc import ABC, abstractmethod
|
||||
from abc import abstractmethod
|
||||
from typing import List
|
||||
|
||||
from pydantic import BaseModel
|
||||
@@ -7,8 +7,8 @@ from pydantic import BaseModel
|
||||
from langchain.tools import BaseTool
|
||||
|
||||
|
||||
class BaseToolkit(BaseModel, ABC):
|
||||
"""Class representing a collection of related tools."""
|
||||
class BaseToolkit(BaseModel):
|
||||
"""Class responsible for defining a collection of related tools."""
|
||||
|
||||
@abstractmethod
|
||||
def get_tools(self) -> List[BaseTool]:
|
||||
|
||||
@@ -1,4 +1,4 @@
|
||||
"""Agent for working with csv files."""
|
||||
"""Agent for working with csvs."""
|
||||
from typing import Any, List, Optional, Union
|
||||
|
||||
from langchain.agents.agent import AgentExecutor
|
||||
|
||||
@@ -1 +1 @@
|
||||
"""Office365 toolkit."""
|
||||
"""Gmail toolkit."""
|
||||
|
||||
@@ -30,9 +30,9 @@ class O365Toolkit(BaseToolkit):
|
||||
def get_tools(self) -> List[BaseTool]:
|
||||
"""Get the tools in the toolkit."""
|
||||
return [
|
||||
O365SearchEvents(),
|
||||
O365CreateDraftMessage(),
|
||||
O365SearchEmails(),
|
||||
O365SendEvent(),
|
||||
O365SendMessage(),
|
||||
O365SearchEvents(account=self.account),
|
||||
O365CreateDraftMessage(account=self.account),
|
||||
O365SearchEmails(account=self.account),
|
||||
O365SendEvent(account=self.account),
|
||||
O365SendMessage(account=self.account),
|
||||
]
|
||||
|
||||
@@ -39,7 +39,7 @@ class RequestsToolkit(BaseToolkit):
|
||||
|
||||
|
||||
class OpenAPIToolkit(BaseToolkit):
|
||||
"""Toolkit for interacting with an OpenAPI API."""
|
||||
"""Toolkit for interacting with a OpenAPI api."""
|
||||
|
||||
json_agent: AgentExecutor
|
||||
requests_wrapper: TextRequestsWrapper
|
||||
|
||||
@@ -30,7 +30,6 @@ def _get_multi_prompt(
|
||||
suffix: Optional[str] = None,
|
||||
input_variables: Optional[List[str]] = None,
|
||||
include_df_in_prompt: Optional[bool] = True,
|
||||
number_of_head_rows: int = 5,
|
||||
) -> Tuple[BasePromptTemplate, List[PythonAstREPLTool]]:
|
||||
num_dfs = len(dfs)
|
||||
if suffix is not None:
|
||||
@@ -61,7 +60,7 @@ def _get_multi_prompt(
|
||||
|
||||
partial_prompt = prompt.partial()
|
||||
if "dfs_head" in input_variables:
|
||||
dfs_head = "\n\n".join([d.head(number_of_head_rows).to_markdown() for d in dfs])
|
||||
dfs_head = "\n\n".join([d.head().to_markdown() for d in dfs])
|
||||
partial_prompt = partial_prompt.partial(num_dfs=str(num_dfs), dfs_head=dfs_head)
|
||||
if "num_dfs" in input_variables:
|
||||
partial_prompt = partial_prompt.partial(num_dfs=str(num_dfs))
|
||||
@@ -74,7 +73,6 @@ def _get_single_prompt(
|
||||
suffix: Optional[str] = None,
|
||||
input_variables: Optional[List[str]] = None,
|
||||
include_df_in_prompt: Optional[bool] = True,
|
||||
number_of_head_rows: int = 5,
|
||||
) -> Tuple[BasePromptTemplate, List[PythonAstREPLTool]]:
|
||||
if suffix is not None:
|
||||
suffix_to_use = suffix
|
||||
@@ -102,9 +100,7 @@ def _get_single_prompt(
|
||||
|
||||
partial_prompt = prompt.partial()
|
||||
if "df_head" in input_variables:
|
||||
partial_prompt = partial_prompt.partial(
|
||||
df_head=str(df.head(number_of_head_rows).to_markdown())
|
||||
)
|
||||
partial_prompt = partial_prompt.partial(df_head=str(df.head().to_markdown()))
|
||||
return partial_prompt, tools
|
||||
|
||||
|
||||
@@ -114,7 +110,6 @@ def _get_prompt_and_tools(
|
||||
suffix: Optional[str] = None,
|
||||
input_variables: Optional[List[str]] = None,
|
||||
include_df_in_prompt: Optional[bool] = True,
|
||||
number_of_head_rows: int = 5,
|
||||
) -> Tuple[BasePromptTemplate, List[PythonAstREPLTool]]:
|
||||
try:
|
||||
import pandas as pd
|
||||
@@ -136,7 +131,6 @@ def _get_prompt_and_tools(
|
||||
suffix=suffix,
|
||||
input_variables=input_variables,
|
||||
include_df_in_prompt=include_df_in_prompt,
|
||||
number_of_head_rows=number_of_head_rows,
|
||||
)
|
||||
else:
|
||||
if not isinstance(df, pd.DataFrame):
|
||||
@@ -147,7 +141,6 @@ def _get_prompt_and_tools(
|
||||
suffix=suffix,
|
||||
input_variables=input_variables,
|
||||
include_df_in_prompt=include_df_in_prompt,
|
||||
number_of_head_rows=number_of_head_rows,
|
||||
)
|
||||
|
||||
|
||||
@@ -156,18 +149,13 @@ def _get_functions_single_prompt(
|
||||
prefix: Optional[str] = None,
|
||||
suffix: Optional[str] = None,
|
||||
include_df_in_prompt: Optional[bool] = True,
|
||||
number_of_head_rows: int = 5,
|
||||
) -> Tuple[BasePromptTemplate, List[PythonAstREPLTool]]:
|
||||
if suffix is not None:
|
||||
suffix_to_use = suffix
|
||||
if include_df_in_prompt:
|
||||
suffix_to_use = suffix_to_use.format(
|
||||
df_head=str(df.head(number_of_head_rows).to_markdown())
|
||||
)
|
||||
suffix_to_use = suffix_to_use.format(df_head=str(df.head().to_markdown()))
|
||||
elif include_df_in_prompt:
|
||||
suffix_to_use = FUNCTIONS_WITH_DF.format(
|
||||
df_head=str(df.head(number_of_head_rows).to_markdown())
|
||||
)
|
||||
suffix_to_use = FUNCTIONS_WITH_DF.format(df_head=str(df.head().to_markdown()))
|
||||
else:
|
||||
suffix_to_use = ""
|
||||
|
||||
@@ -185,19 +173,16 @@ def _get_functions_multi_prompt(
|
||||
prefix: Optional[str] = None,
|
||||
suffix: Optional[str] = None,
|
||||
include_df_in_prompt: Optional[bool] = True,
|
||||
number_of_head_rows: int = 5,
|
||||
) -> Tuple[BasePromptTemplate, List[PythonAstREPLTool]]:
|
||||
if suffix is not None:
|
||||
suffix_to_use = suffix
|
||||
if include_df_in_prompt:
|
||||
dfs_head = "\n\n".join(
|
||||
[d.head(number_of_head_rows).to_markdown() for d in dfs]
|
||||
)
|
||||
dfs_head = "\n\n".join([d.head().to_markdown() for d in dfs])
|
||||
suffix_to_use = suffix_to_use.format(
|
||||
dfs_head=dfs_head,
|
||||
)
|
||||
elif include_df_in_prompt:
|
||||
dfs_head = "\n\n".join([d.head(number_of_head_rows).to_markdown() for d in dfs])
|
||||
dfs_head = "\n\n".join([d.head().to_markdown() for d in dfs])
|
||||
suffix_to_use = FUNCTIONS_WITH_MULTI_DF.format(
|
||||
dfs_head=dfs_head,
|
||||
)
|
||||
@@ -223,7 +208,6 @@ def _get_functions_prompt_and_tools(
|
||||
suffix: Optional[str] = None,
|
||||
input_variables: Optional[List[str]] = None,
|
||||
include_df_in_prompt: Optional[bool] = True,
|
||||
number_of_head_rows: int = 5,
|
||||
) -> Tuple[BasePromptTemplate, List[PythonAstREPLTool]]:
|
||||
try:
|
||||
import pandas as pd
|
||||
@@ -246,7 +230,6 @@ def _get_functions_prompt_and_tools(
|
||||
prefix=prefix,
|
||||
suffix=suffix,
|
||||
include_df_in_prompt=include_df_in_prompt,
|
||||
number_of_head_rows=number_of_head_rows,
|
||||
)
|
||||
else:
|
||||
if not isinstance(df, pd.DataFrame):
|
||||
@@ -256,7 +239,6 @@ def _get_functions_prompt_and_tools(
|
||||
prefix=prefix,
|
||||
suffix=suffix,
|
||||
include_df_in_prompt=include_df_in_prompt,
|
||||
number_of_head_rows=number_of_head_rows,
|
||||
)
|
||||
|
||||
|
||||
@@ -275,7 +257,6 @@ def create_pandas_dataframe_agent(
|
||||
early_stopping_method: str = "force",
|
||||
agent_executor_kwargs: Optional[Dict[str, Any]] = None,
|
||||
include_df_in_prompt: Optional[bool] = True,
|
||||
number_of_head_rows: int = 5,
|
||||
**kwargs: Dict[str, Any],
|
||||
) -> AgentExecutor:
|
||||
"""Construct a pandas agent from an LLM and dataframe."""
|
||||
@@ -287,7 +268,6 @@ def create_pandas_dataframe_agent(
|
||||
suffix=suffix,
|
||||
input_variables=input_variables,
|
||||
include_df_in_prompt=include_df_in_prompt,
|
||||
number_of_head_rows=number_of_head_rows,
|
||||
)
|
||||
llm_chain = LLMChain(
|
||||
llm=llm,
|
||||
@@ -308,7 +288,6 @@ def create_pandas_dataframe_agent(
|
||||
suffix=suffix,
|
||||
input_variables=input_variables,
|
||||
include_df_in_prompt=include_df_in_prompt,
|
||||
number_of_head_rows=number_of_head_rows,
|
||||
)
|
||||
agent = OpenAIFunctionsAgent(
|
||||
llm=llm,
|
||||
|
||||
@@ -17,7 +17,7 @@ from langchain.utilities.powerbi import PowerBIDataset
|
||||
|
||||
def create_pbi_agent(
|
||||
llm: BaseLanguageModel,
|
||||
toolkit: Optional[PowerBIToolkit] = None,
|
||||
toolkit: Optional[PowerBIToolkit],
|
||||
powerbi: Optional[PowerBIDataset] = None,
|
||||
callback_manager: Optional[BaseCallbackManager] = None,
|
||||
prefix: str = POWERBI_PREFIX,
|
||||
@@ -36,13 +36,13 @@ def create_pbi_agent(
|
||||
raise ValueError("Must provide either a toolkit or powerbi dataset")
|
||||
toolkit = PowerBIToolkit(powerbi=powerbi, llm=llm, examples=examples)
|
||||
tools = toolkit.get_tools()
|
||||
tables = powerbi.table_names if powerbi else toolkit.powerbi.table_names
|
||||
|
||||
agent = ZeroShotAgent(
|
||||
llm_chain=LLMChain(
|
||||
llm=llm,
|
||||
prompt=ZeroShotAgent.create_prompt(
|
||||
tools,
|
||||
prefix=prefix.format(top_k=top_k).format(tables=tables),
|
||||
prefix=prefix.format(top_k=top_k),
|
||||
suffix=suffix,
|
||||
format_instructions=format_instructions,
|
||||
input_variables=input_variables,
|
||||
|
||||
@@ -18,7 +18,7 @@ from langchain.utilities.powerbi import PowerBIDataset
|
||||
|
||||
def create_pbi_chat_agent(
|
||||
llm: BaseChatModel,
|
||||
toolkit: Optional[PowerBIToolkit] = None,
|
||||
toolkit: Optional[PowerBIToolkit],
|
||||
powerbi: Optional[PowerBIDataset] = None,
|
||||
callback_manager: Optional[BaseCallbackManager] = None,
|
||||
output_parser: Optional[AgentOutputParser] = None,
|
||||
@@ -32,20 +32,19 @@ def create_pbi_chat_agent(
|
||||
agent_executor_kwargs: Optional[Dict[str, Any]] = None,
|
||||
**kwargs: Dict[str, Any],
|
||||
) -> AgentExecutor:
|
||||
"""Construct a Power BI agent from a Chat LLM and tools.
|
||||
"""Construct a pbi agent from an Chat LLM and tools.
|
||||
|
||||
If you supply only a toolkit and no Power BI dataset, the same LLM is used for both.
|
||||
If you supply only a toolkit and no powerbi dataset, the same LLM is used for both.
|
||||
"""
|
||||
if toolkit is None:
|
||||
if powerbi is None:
|
||||
raise ValueError("Must provide either a toolkit or powerbi dataset")
|
||||
toolkit = PowerBIToolkit(powerbi=powerbi, llm=llm, examples=examples)
|
||||
tools = toolkit.get_tools()
|
||||
tables = powerbi.table_names if powerbi else toolkit.powerbi.table_names
|
||||
agent = ConversationalChatAgent.from_llm_and_tools(
|
||||
llm=llm,
|
||||
tools=tools,
|
||||
system_message=prefix.format(top_k=top_k).format(tables=tables),
|
||||
system_message=prefix.format(top_k=top_k),
|
||||
human_message=suffix,
|
||||
input_variables=input_variables,
|
||||
callback_manager=callback_manager,
|
||||
|
||||
@@ -4,7 +4,7 @@
|
||||
|
||||
POWERBI_PREFIX = """You are an agent designed to help users interact with a PowerBI Dataset.
|
||||
|
||||
Agent has access to a tool that can write a query based on the question and then run those against PowerBI, Microsofts business intelligence tool. The questions from the users should be interpreted as related to the dataset that is available and not general questions about the world. If the question does not seem related to the dataset, return "This does not appear to be part of this dataset." as the answer.
|
||||
Agent has access to a tool that can write a query based on the question and then run those against PowerBI, Microsofts business intelligence tool. The questions from the users should be interpreted as related to the dataset that is available and not general questions about the world. If the question does not seem related to the dataset, just return "This does not appear to be part of this dataset." as the answer.
|
||||
|
||||
Given an input question, ask to run the questions against the dataset, then look at the results and return the answer, the answer should be a complete sentence that answers the question, if multiple rows are asked find a way to write that in a easily readable format for a human, also make sure to represent numbers in readable ways, like 1M instead of 1000000. Unless the user specifies a specific number of examples they wish to obtain, always limit your query to at most {top_k} results.
|
||||
"""
|
||||
@@ -17,9 +17,9 @@ Thought: I can first ask which tables I have, then how each table is defined and
|
||||
|
||||
POWERBI_CHAT_PREFIX = """Assistant is a large language model built to help users interact with a PowerBI Dataset.
|
||||
|
||||
Assistant should try to create a correct and complete answer to the question from the user. If the user asks a question not related to the dataset it should return "This does not appear to be part of this dataset." as the answer. The user might make a mistake with the spelling of certain values, if you think that is the case, ask the user to confirm the spelling of the value and then run the query again. Unless the user specifies a specific number of examples they wish to obtain, and the results are too large, limit your query to at most {top_k} results, but make it clear when answering which field was used for the filtering. The user has access to these tables: {{tables}}.
|
||||
Assistant has access to a tool that can write a query based on the question and then run those against PowerBI, Microsofts business intelligence tool. The questions from the users should be interpreted as related to the dataset that is available and not general questions about the world. If the question does not seem related to the dataset, just return "This does not appear to be part of this dataset." as the answer.
|
||||
|
||||
The answer should be a complete sentence that answers the question, if multiple rows are asked find a way to write that in a easily readable format for a human, also make sure to represent numbers in readable ways, like 1M instead of 1000000.
|
||||
Given an input question, ask to run the questions against the dataset, then look at the results and return the answer, the answer should be a complete sentence that answers the question, if multiple rows are asked find a way to write that in a easily readable format for a human, also make sure to represent numbers in readable ways, like 1M instead of 1000000. Unless the user specifies a specific number of examples they wish to obtain, always limit your query to at most {top_k} results.
|
||||
"""
|
||||
|
||||
POWERBI_CHAT_SUFFIX = """TOOLS
|
||||
|
||||
@@ -1,5 +1,5 @@
|
||||
"""Toolkit for interacting with a Power BI dataset."""
|
||||
from typing import List, Optional, Union
|
||||
from typing import List, Optional
|
||||
|
||||
from pydantic import Field
|
||||
|
||||
@@ -7,19 +7,9 @@ from langchain.agents.agent_toolkits.base import BaseToolkit
|
||||
from langchain.base_language import BaseLanguageModel
|
||||
from langchain.callbacks.base import BaseCallbackManager
|
||||
from langchain.chains.llm import LLMChain
|
||||
from langchain.chat_models.base import BaseChatModel
|
||||
from langchain.prompts import PromptTemplate
|
||||
from langchain.prompts.chat import (
|
||||
ChatPromptTemplate,
|
||||
HumanMessagePromptTemplate,
|
||||
SystemMessagePromptTemplate,
|
||||
)
|
||||
from langchain.tools import BaseTool
|
||||
from langchain.tools.powerbi.prompt import (
|
||||
QUESTION_TO_QUERY_BASE,
|
||||
SINGLE_QUESTION_TO_QUERY,
|
||||
USER_INPUT,
|
||||
)
|
||||
from langchain.tools.powerbi.prompt import QUESTION_TO_QUERY
|
||||
from langchain.tools.powerbi.tool import (
|
||||
InfoPowerBITool,
|
||||
ListPowerBITool,
|
||||
@@ -32,12 +22,10 @@ class PowerBIToolkit(BaseToolkit):
|
||||
"""Toolkit for interacting with PowerBI dataset."""
|
||||
|
||||
powerbi: PowerBIDataset = Field(exclude=True)
|
||||
llm: Union[BaseLanguageModel, BaseChatModel] = Field(exclude=True)
|
||||
llm: BaseLanguageModel = Field(exclude=True)
|
||||
examples: Optional[str] = None
|
||||
max_iterations: int = 5
|
||||
callback_manager: Optional[BaseCallbackManager] = None
|
||||
output_token_limit: Optional[int] = None
|
||||
tiktoken_model_name: Optional[str] = None
|
||||
|
||||
class Config:
|
||||
"""Configuration for this pydantic object."""
|
||||
@@ -46,47 +34,30 @@ class PowerBIToolkit(BaseToolkit):
|
||||
|
||||
def get_tools(self) -> List[BaseTool]:
|
||||
"""Get the tools in the toolkit."""
|
||||
if self.callback_manager:
|
||||
chain = LLMChain(
|
||||
llm=self.llm,
|
||||
callback_manager=self.callback_manager,
|
||||
prompt=PromptTemplate(
|
||||
template=QUESTION_TO_QUERY,
|
||||
input_variables=["tool_input", "tables", "schemas", "examples"],
|
||||
),
|
||||
)
|
||||
else:
|
||||
chain = LLMChain(
|
||||
llm=self.llm,
|
||||
prompt=PromptTemplate(
|
||||
template=QUESTION_TO_QUERY,
|
||||
input_variables=["tool_input", "tables", "schemas", "examples"],
|
||||
),
|
||||
)
|
||||
return [
|
||||
QueryPowerBITool(
|
||||
llm_chain=self._get_chain(),
|
||||
llm_chain=chain,
|
||||
powerbi=self.powerbi,
|
||||
examples=self.examples,
|
||||
max_iterations=self.max_iterations,
|
||||
output_token_limit=self.output_token_limit,
|
||||
tiktoken_model_name=self.tiktoken_model_name,
|
||||
),
|
||||
InfoPowerBITool(powerbi=self.powerbi),
|
||||
ListPowerBITool(powerbi=self.powerbi),
|
||||
]
|
||||
|
||||
def _get_chain(self) -> LLMChain:
|
||||
"""Construct the chain based on the callback manager and model type."""
|
||||
if isinstance(self.llm, BaseLanguageModel):
|
||||
return LLMChain(
|
||||
llm=self.llm,
|
||||
callback_manager=self.callback_manager
|
||||
if self.callback_manager
|
||||
else None,
|
||||
prompt=PromptTemplate(
|
||||
template=SINGLE_QUESTION_TO_QUERY,
|
||||
input_variables=["tool_input", "tables", "schemas", "examples"],
|
||||
),
|
||||
)
|
||||
|
||||
system_prompt = SystemMessagePromptTemplate(
|
||||
prompt=PromptTemplate(
|
||||
template=QUESTION_TO_QUERY_BASE,
|
||||
input_variables=["tables", "schemas", "examples"],
|
||||
)
|
||||
)
|
||||
human_prompt = HumanMessagePromptTemplate(
|
||||
prompt=PromptTemplate(
|
||||
template=USER_INPUT,
|
||||
input_variables=["tool_input"],
|
||||
)
|
||||
)
|
||||
return LLMChain(
|
||||
llm=self.llm,
|
||||
callback_manager=self.callback_manager if self.callback_manager else None,
|
||||
prompt=ChatPromptTemplate.from_messages([system_prompt, human_prompt]),
|
||||
)
|
||||
|
||||
@@ -30,8 +30,9 @@ class ChatOutputParser(AgentOutputParser):
|
||||
except Exception:
|
||||
if not includes_answer:
|
||||
raise OutputParserException(f"Could not parse LLM output: {text}")
|
||||
output = text.split(FINAL_ANSWER_ACTION)[-1].strip()
|
||||
return AgentFinish({"output": output}, text)
|
||||
return AgentFinish(
|
||||
{"output": text.split(FINAL_ANSWER_ACTION)[-1].strip()}, text
|
||||
)
|
||||
|
||||
@property
|
||||
def _type(self) -> str:
|
||||
|
||||
@@ -38,8 +38,6 @@ from langchain.tools.sleep.tool import SleepTool
|
||||
from langchain.tools.wikipedia.tool import WikipediaQueryRun
|
||||
from langchain.tools.wolfram_alpha.tool import WolframAlphaQueryRun
|
||||
from langchain.tools.openweathermap.tool import OpenWeatherMapQueryRun
|
||||
from langchain.tools.dataforseo_api_search import DataForSeoAPISearchRun
|
||||
from langchain.tools.dataforseo_api_search import DataForSeoAPISearchResults
|
||||
from langchain.utilities import ArxivAPIWrapper
|
||||
from langchain.utilities import PubMedAPIWrapper
|
||||
from langchain.utilities.bing_search import BingSearchAPIWrapper
|
||||
@@ -55,7 +53,6 @@ from langchain.utilities.twilio import TwilioAPIWrapper
|
||||
from langchain.utilities.wikipedia import WikipediaAPIWrapper
|
||||
from langchain.utilities.wolfram_alpha import WolframAlphaAPIWrapper
|
||||
from langchain.utilities.openweathermap import OpenWeatherMapAPIWrapper
|
||||
from langchain.utilities.dataforseo_api_search import DataForSeoAPIWrapper
|
||||
|
||||
|
||||
def _get_python_repl() -> BaseTool:
|
||||
@@ -281,14 +278,6 @@ def _get_openweathermap(**kwargs: Any) -> BaseTool:
|
||||
return OpenWeatherMapQueryRun(api_wrapper=OpenWeatherMapAPIWrapper(**kwargs))
|
||||
|
||||
|
||||
def _get_dataforseo_api_search(**kwargs: Any) -> BaseTool:
|
||||
return DataForSeoAPISearchRun(api_wrapper=DataForSeoAPIWrapper(**kwargs))
|
||||
|
||||
|
||||
def _get_dataforseo_api_search_json(**kwargs: Any) -> BaseTool:
|
||||
return DataForSeoAPISearchResults(api_wrapper=DataForSeoAPIWrapper(**kwargs))
|
||||
|
||||
|
||||
_EXTRA_LLM_TOOLS: Dict[
|
||||
str,
|
||||
Tuple[Callable[[Arg(BaseLanguageModel, "llm"), KwArg(Any)], BaseTool], List[str]],
|
||||
@@ -337,14 +326,6 @@ _EXTRA_OPTIONAL_TOOLS: Dict[str, Tuple[Callable[[KwArg(Any)], BaseTool], List[st
|
||||
"sceneXplain": (_get_scenexplain, []),
|
||||
"graphql": (_get_graphql_tool, ["graphql_endpoint"]),
|
||||
"openweathermap-api": (_get_openweathermap, ["openweathermap_api_key"]),
|
||||
"dataforseo-api-search": (
|
||||
_get_dataforseo_api_search,
|
||||
["api_login", "api_password", "aiosession"],
|
||||
),
|
||||
"dataforseo-api-search-json": (
|
||||
_get_dataforseo_api_search_json,
|
||||
["api_login", "api_password", "aiosession"],
|
||||
),
|
||||
}
|
||||
|
||||
|
||||
|
||||
@@ -108,6 +108,7 @@ def _parse_ai_message(message: BaseMessage) -> Union[AgentAction, AgentFinish]:
|
||||
function_call = message.additional_kwargs.get("function_call", {})
|
||||
|
||||
if function_call:
|
||||
function_call = message.additional_kwargs["function_call"]
|
||||
function_name = function_call["name"]
|
||||
try:
|
||||
_tool_input = json.loads(function_call["arguments"])
|
||||
|
||||
@@ -107,6 +107,7 @@ def _parse_ai_message(message: BaseMessage) -> Union[List[AgentAction], AgentFin
|
||||
function_call = message.additional_kwargs.get("function_call", {})
|
||||
|
||||
if function_call:
|
||||
function_call = message.additional_kwargs["function_call"]
|
||||
try:
|
||||
tools = json.loads(function_call["arguments"])["actions"]
|
||||
except JSONDecodeError:
|
||||
|
||||
@@ -1,4 +1,4 @@
|
||||
"""Chain that does self-ask with search."""
|
||||
"""Chain that does self ask with search."""
|
||||
from typing import Any, Sequence, Union
|
||||
|
||||
from pydantic import Field
|
||||
@@ -59,7 +59,7 @@ class SelfAskWithSearchAgent(Agent):
|
||||
|
||||
|
||||
class SelfAskWithSearchChain(AgentExecutor):
|
||||
"""Chain that does self-ask with search.
|
||||
"""Chain that does self ask with search.
|
||||
|
||||
Example:
|
||||
.. code-block:: python
|
||||
|
||||
@@ -1,3 +1,4 @@
|
||||
"""Base class for all language models."""
|
||||
from __future__ import annotations
|
||||
|
||||
from abc import ABC, abstractmethod
|
||||
@@ -29,8 +30,6 @@ def _get_token_ids_default_method(text: str) -> List[int]:
|
||||
|
||||
|
||||
class BaseLanguageModel(Serializable, ABC):
|
||||
"""Base class for all language models."""
|
||||
|
||||
@abstractmethod
|
||||
def generate_prompt(
|
||||
self,
|
||||
|
||||
@@ -147,7 +147,6 @@ class CallbackManagerMixin:
|
||||
run_id: UUID,
|
||||
parent_run_id: Optional[UUID] = None,
|
||||
tags: Optional[List[str]] = None,
|
||||
metadata: Optional[Dict[str, Any]] = None,
|
||||
**kwargs: Any,
|
||||
) -> Any:
|
||||
"""Run when LLM starts running."""
|
||||
@@ -160,7 +159,6 @@ class CallbackManagerMixin:
|
||||
run_id: UUID,
|
||||
parent_run_id: Optional[UUID] = None,
|
||||
tags: Optional[List[str]] = None,
|
||||
metadata: Optional[Dict[str, Any]] = None,
|
||||
**kwargs: Any,
|
||||
) -> Any:
|
||||
"""Run when a chat model starts running."""
|
||||
@@ -170,13 +168,10 @@ class CallbackManagerMixin:
|
||||
|
||||
def on_retriever_start(
|
||||
self,
|
||||
serialized: Dict[str, Any],
|
||||
query: str,
|
||||
*,
|
||||
run_id: UUID,
|
||||
parent_run_id: Optional[UUID] = None,
|
||||
tags: Optional[List[str]] = None,
|
||||
metadata: Optional[Dict[str, Any]] = None,
|
||||
**kwargs: Any,
|
||||
) -> Any:
|
||||
"""Run when Retriever starts running."""
|
||||
@@ -189,7 +184,6 @@ class CallbackManagerMixin:
|
||||
run_id: UUID,
|
||||
parent_run_id: Optional[UUID] = None,
|
||||
tags: Optional[List[str]] = None,
|
||||
metadata: Optional[Dict[str, Any]] = None,
|
||||
**kwargs: Any,
|
||||
) -> Any:
|
||||
"""Run when chain starts running."""
|
||||
@@ -202,7 +196,6 @@ class CallbackManagerMixin:
|
||||
run_id: UUID,
|
||||
parent_run_id: Optional[UUID] = None,
|
||||
tags: Optional[List[str]] = None,
|
||||
metadata: Optional[Dict[str, Any]] = None,
|
||||
**kwargs: Any,
|
||||
) -> Any:
|
||||
"""Run when tool starts running."""
|
||||
@@ -273,7 +266,6 @@ class AsyncCallbackHandler(BaseCallbackHandler):
|
||||
run_id: UUID,
|
||||
parent_run_id: Optional[UUID] = None,
|
||||
tags: Optional[List[str]] = None,
|
||||
metadata: Optional[Dict[str, Any]] = None,
|
||||
**kwargs: Any,
|
||||
) -> None:
|
||||
"""Run when LLM starts running."""
|
||||
@@ -286,7 +278,6 @@ class AsyncCallbackHandler(BaseCallbackHandler):
|
||||
run_id: UUID,
|
||||
parent_run_id: Optional[UUID] = None,
|
||||
tags: Optional[List[str]] = None,
|
||||
metadata: Optional[Dict[str, Any]] = None,
|
||||
**kwargs: Any,
|
||||
) -> Any:
|
||||
"""Run when a chat model starts running."""
|
||||
@@ -335,7 +326,6 @@ class AsyncCallbackHandler(BaseCallbackHandler):
|
||||
run_id: UUID,
|
||||
parent_run_id: Optional[UUID] = None,
|
||||
tags: Optional[List[str]] = None,
|
||||
metadata: Optional[Dict[str, Any]] = None,
|
||||
**kwargs: Any,
|
||||
) -> None:
|
||||
"""Run when chain starts running."""
|
||||
@@ -370,7 +360,6 @@ class AsyncCallbackHandler(BaseCallbackHandler):
|
||||
run_id: UUID,
|
||||
parent_run_id: Optional[UUID] = None,
|
||||
tags: Optional[List[str]] = None,
|
||||
metadata: Optional[Dict[str, Any]] = None,
|
||||
**kwargs: Any,
|
||||
) -> None:
|
||||
"""Run when tool starts running."""
|
||||
@@ -432,13 +421,11 @@ class AsyncCallbackHandler(BaseCallbackHandler):
|
||||
|
||||
async def on_retriever_start(
|
||||
self,
|
||||
serialized: Dict[str, Any],
|
||||
query: str,
|
||||
*,
|
||||
run_id: UUID,
|
||||
parent_run_id: Optional[UUID] = None,
|
||||
tags: Optional[List[str]] = None,
|
||||
metadata: Optional[Dict[str, Any]] = None,
|
||||
**kwargs: Any,
|
||||
) -> None:
|
||||
"""Run on retriever start."""
|
||||
@@ -477,8 +464,6 @@ class BaseCallbackManager(CallbackManagerMixin):
|
||||
*,
|
||||
tags: Optional[List[str]] = None,
|
||||
inheritable_tags: Optional[List[str]] = None,
|
||||
metadata: Optional[Dict[str, Any]] = None,
|
||||
inheritable_metadata: Optional[Dict[str, Any]] = None,
|
||||
) -> None:
|
||||
"""Initialize callback manager."""
|
||||
self.handlers: List[BaseCallbackHandler] = handlers
|
||||
@@ -488,8 +473,6 @@ class BaseCallbackManager(CallbackManagerMixin):
|
||||
self.parent_run_id: Optional[UUID] = parent_run_id
|
||||
self.tags = tags or []
|
||||
self.inheritable_tags = inheritable_tags or []
|
||||
self.metadata = metadata or {}
|
||||
self.inheritable_metadata = inheritable_metadata or {}
|
||||
|
||||
@property
|
||||
def is_async(self) -> bool:
|
||||
@@ -532,13 +515,3 @@ class BaseCallbackManager(CallbackManagerMixin):
|
||||
for tag in tags:
|
||||
self.tags.remove(tag)
|
||||
self.inheritable_tags.remove(tag)
|
||||
|
||||
def add_metadata(self, metadata: Dict[str, Any], inherit: bool = True) -> None:
|
||||
self.metadata.update(metadata)
|
||||
if inherit:
|
||||
self.inheritable_metadata.update(metadata)
|
||||
|
||||
def remove_metadata(self, keys: List[str]) -> None:
|
||||
for key in keys:
|
||||
self.metadata.pop(key)
|
||||
self.inheritable_metadata.pop(key)
|
||||
|
||||
@@ -18,7 +18,6 @@ LANGCHAIN_MODEL_NAME = "langchain-model"
|
||||
|
||||
|
||||
def import_comet_ml() -> Any:
|
||||
"""Import comet_ml and raise an error if it is not installed."""
|
||||
try:
|
||||
import comet_ml # noqa: F401
|
||||
except ImportError:
|
||||
|
||||
@@ -39,7 +39,7 @@ class FileCallbackHandler(BaseCallbackHandler):
|
||||
self, action: AgentAction, color: Optional[str] = None, **kwargs: Any
|
||||
) -> Any:
|
||||
"""Run on agent action."""
|
||||
print_text(action.log, color=color or self.color, file=self.file)
|
||||
print_text(action.log, color=color if color else self.color, file=self.file)
|
||||
|
||||
def on_tool_end(
|
||||
self,
|
||||
@@ -52,18 +52,24 @@ class FileCallbackHandler(BaseCallbackHandler):
|
||||
"""If not the final action, print out observation."""
|
||||
if observation_prefix is not None:
|
||||
print_text(f"\n{observation_prefix}", file=self.file)
|
||||
print_text(output, color=color or self.color, file=self.file)
|
||||
print_text(output, color=color if color else self.color, file=self.file)
|
||||
if llm_prefix is not None:
|
||||
print_text(f"\n{llm_prefix}", file=self.file)
|
||||
|
||||
def on_text(
|
||||
self, text: str, color: Optional[str] = None, end: str = "", **kwargs: Any
|
||||
self,
|
||||
text: str,
|
||||
color: Optional[str] = None,
|
||||
end: str = "",
|
||||
**kwargs: Any,
|
||||
) -> None:
|
||||
"""Run when agent ends."""
|
||||
print_text(text, color=color or self.color, end=end, file=self.file)
|
||||
print_text(text, color=color if color else self.color, end=end, file=self.file)
|
||||
|
||||
def on_agent_finish(
|
||||
self, finish: AgentFinish, color: Optional[str] = None, **kwargs: Any
|
||||
) -> None:
|
||||
"""Run on agent end."""
|
||||
print_text(finish.log, color=color or self.color, end="\n", file=self.file)
|
||||
print_text(
|
||||
finish.log, color=color if self.color else color, end="\n", file=self.file
|
||||
)
|
||||
|
||||
@@ -23,7 +23,6 @@ logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
def import_flytekit() -> Tuple[flytekit, renderer]:
|
||||
"""Import flytekit and flytekitplugins-deck-standard."""
|
||||
try:
|
||||
import flytekit # noqa: F401
|
||||
from flytekitplugins.deck import renderer # noqa: F401
|
||||
@@ -40,7 +39,6 @@ def import_flytekit() -> Tuple[flytekit, renderer]:
|
||||
def analyze_text(
|
||||
text: str,
|
||||
nlp: Any = None,
|
||||
textstat: Any = None,
|
||||
) -> dict:
|
||||
"""Analyze text using textstat and spacy.
|
||||
|
||||
@@ -53,26 +51,26 @@ def analyze_text(
|
||||
files serialized to HTML string.
|
||||
"""
|
||||
resp: Dict[str, Any] = {}
|
||||
if textstat is not None:
|
||||
text_complexity_metrics = {
|
||||
"flesch_reading_ease": textstat.flesch_reading_ease(text),
|
||||
"flesch_kincaid_grade": textstat.flesch_kincaid_grade(text),
|
||||
"smog_index": textstat.smog_index(text),
|
||||
"coleman_liau_index": textstat.coleman_liau_index(text),
|
||||
"automated_readability_index": textstat.automated_readability_index(text),
|
||||
"dale_chall_readability_score": textstat.dale_chall_readability_score(text),
|
||||
"difficult_words": textstat.difficult_words(text),
|
||||
"linsear_write_formula": textstat.linsear_write_formula(text),
|
||||
"gunning_fog": textstat.gunning_fog(text),
|
||||
"fernandez_huerta": textstat.fernandez_huerta(text),
|
||||
"szigriszt_pazos": textstat.szigriszt_pazos(text),
|
||||
"gutierrez_polini": textstat.gutierrez_polini(text),
|
||||
"crawford": textstat.crawford(text),
|
||||
"gulpease_index": textstat.gulpease_index(text),
|
||||
"osman": textstat.osman(text),
|
||||
}
|
||||
resp.update({"text_complexity_metrics": text_complexity_metrics})
|
||||
resp.update(text_complexity_metrics)
|
||||
textstat = import_textstat()
|
||||
text_complexity_metrics = {
|
||||
"flesch_reading_ease": textstat.flesch_reading_ease(text),
|
||||
"flesch_kincaid_grade": textstat.flesch_kincaid_grade(text),
|
||||
"smog_index": textstat.smog_index(text),
|
||||
"coleman_liau_index": textstat.coleman_liau_index(text),
|
||||
"automated_readability_index": textstat.automated_readability_index(text),
|
||||
"dale_chall_readability_score": textstat.dale_chall_readability_score(text),
|
||||
"difficult_words": textstat.difficult_words(text),
|
||||
"linsear_write_formula": textstat.linsear_write_formula(text),
|
||||
"gunning_fog": textstat.gunning_fog(text),
|
||||
"fernandez_huerta": textstat.fernandez_huerta(text),
|
||||
"szigriszt_pazos": textstat.szigriszt_pazos(text),
|
||||
"gutierrez_polini": textstat.gutierrez_polini(text),
|
||||
"crawford": textstat.crawford(text),
|
||||
"gulpease_index": textstat.gulpease_index(text),
|
||||
"osman": textstat.osman(text),
|
||||
}
|
||||
resp.update({"text_complexity_metrics": text_complexity_metrics})
|
||||
resp.update(text_complexity_metrics)
|
||||
|
||||
if nlp is not None:
|
||||
spacy = import_spacy()
|
||||
@@ -80,13 +78,16 @@ def analyze_text(
|
||||
dep_out = spacy.displacy.render( # type: ignore
|
||||
doc, style="dep", jupyter=False, page=True
|
||||
)
|
||||
|
||||
ent_out = spacy.displacy.render( # type: ignore
|
||||
doc, style="ent", jupyter=False, page=True
|
||||
)
|
||||
|
||||
text_visualizations = {
|
||||
"dependency_tree": dep_out,
|
||||
"entities": ent_out,
|
||||
}
|
||||
|
||||
resp.update(text_visualizations)
|
||||
|
||||
return resp
|
||||
@@ -97,19 +98,10 @@ class FlyteCallbackHandler(BaseMetadataCallbackHandler, BaseCallbackHandler):
|
||||
|
||||
def __init__(self) -> None:
|
||||
"""Initialize callback handler."""
|
||||
import_textstat() # Raise error since it is required
|
||||
flytekit, renderer = import_flytekit()
|
||||
self.pandas = import_pandas()
|
||||
|
||||
self.textstat = None
|
||||
try:
|
||||
self.textstat = import_textstat()
|
||||
except ImportError:
|
||||
logger.warning(
|
||||
"Textstat library is not installed. \
|
||||
It may result in the inability to log \
|
||||
certain metrics that can be captured with Textstat."
|
||||
)
|
||||
|
||||
spacy = None
|
||||
try:
|
||||
spacy = import_spacy()
|
||||
@@ -131,7 +123,7 @@ class FlyteCallbackHandler(BaseMetadataCallbackHandler, BaseCallbackHandler):
|
||||
"FlyteCallbackHandler uses spacy's en_core_web_sm model"
|
||||
" for certain metrics. To download,"
|
||||
" run the following command in your terminal:"
|
||||
" `python -m spacy download en_core_web_sm`"
|
||||
" `python -m spacy download en_core_web_sm` command."
|
||||
)
|
||||
|
||||
self.table_renderer = renderer.TableRenderer
|
||||
@@ -188,10 +180,11 @@ class FlyteCallbackHandler(BaseMetadataCallbackHandler, BaseCallbackHandler):
|
||||
for generation in generations:
|
||||
generation_resp = deepcopy(resp)
|
||||
generation_resp.update(flatten_dict(generation.dict()))
|
||||
if self.nlp or self.textstat:
|
||||
if self.nlp:
|
||||
generation_resp.update(
|
||||
analyze_text(
|
||||
generation.text, nlp=self.nlp, textstat=self.textstat
|
||||
generation.text,
|
||||
nlp=self.nlp,
|
||||
)
|
||||
)
|
||||
|
||||
|
||||
@@ -6,7 +6,6 @@ from langchain.schema import AgentAction, AgentFinish, LLMResult
|
||||
|
||||
|
||||
def import_infino() -> Any:
|
||||
"""Import the infino client."""
|
||||
try:
|
||||
from infinopy import InfinoClient
|
||||
except ImportError:
|
||||
|
||||
@@ -144,7 +144,6 @@ def tracing_v2_enabled(
|
||||
project_name: Optional[str] = None,
|
||||
*,
|
||||
example_id: Optional[Union[str, UUID]] = None,
|
||||
tags: Optional[List[str]] = None,
|
||||
) -> Generator[None, None, None]:
|
||||
"""Instruct LangChain to log all runs in context to LangSmith.
|
||||
|
||||
@@ -153,8 +152,6 @@ def tracing_v2_enabled(
|
||||
Defaults to "default".
|
||||
example_id (str or UUID, optional): The ID of the example.
|
||||
Defaults to None.
|
||||
tags (List[str], optional): The tags to add to the run.
|
||||
Defaults to None.
|
||||
|
||||
Returns:
|
||||
None
|
||||
@@ -173,7 +170,6 @@ def tracing_v2_enabled(
|
||||
cb = LangChainTracer(
|
||||
example_id=example_id,
|
||||
project_name=project_name,
|
||||
tags=tags,
|
||||
)
|
||||
tracing_v2_callback_var.set(cb)
|
||||
yield
|
||||
@@ -387,8 +383,6 @@ class BaseRunManager(RunManagerMixin):
|
||||
parent_run_id: Optional[UUID] = None,
|
||||
tags: Optional[List[str]] = None,
|
||||
inheritable_tags: Optional[List[str]] = None,
|
||||
metadata: Optional[Dict[str, Any]] = None,
|
||||
inheritable_metadata: Optional[Dict[str, Any]] = None,
|
||||
) -> None:
|
||||
"""Initialize the run manager.
|
||||
|
||||
@@ -401,8 +395,6 @@ class BaseRunManager(RunManagerMixin):
|
||||
Defaults to None.
|
||||
tags (Optional[List[str]]): The list of tags.
|
||||
inheritable_tags (Optional[List[str]]): The list of inheritable tags.
|
||||
metadata (Optional[Dict[str, Any]]): The metadata.
|
||||
inheritable_metadata (Optional[Dict[str, Any]]): The inheritable metadata.
|
||||
"""
|
||||
self.run_id = run_id
|
||||
self.handlers = handlers
|
||||
@@ -410,8 +402,6 @@ class BaseRunManager(RunManagerMixin):
|
||||
self.parent_run_id = parent_run_id
|
||||
self.tags = tags or []
|
||||
self.inheritable_tags = inheritable_tags or []
|
||||
self.metadata = metadata or {}
|
||||
self.inheritable_metadata = inheritable_metadata or {}
|
||||
|
||||
@classmethod
|
||||
def get_noop_manager(cls: Type[BRM]) -> BRM:
|
||||
@@ -426,8 +416,6 @@ class BaseRunManager(RunManagerMixin):
|
||||
inheritable_handlers=[],
|
||||
tags=[],
|
||||
inheritable_tags=[],
|
||||
metadata={},
|
||||
inheritable_metadata={},
|
||||
)
|
||||
|
||||
|
||||
@@ -459,28 +447,6 @@ class RunManager(BaseRunManager):
|
||||
)
|
||||
|
||||
|
||||
class ParentRunManager(RunManager):
|
||||
"""Sync Parent Run Manager."""
|
||||
|
||||
def get_child(self, tag: Optional[str] = None) -> CallbackManager:
|
||||
"""Get a child callback manager.
|
||||
|
||||
Args:
|
||||
tag (str, optional): The tag for the child callback manager.
|
||||
Defaults to None.
|
||||
|
||||
Returns:
|
||||
CallbackManager: The child callback manager.
|
||||
"""
|
||||
manager = CallbackManager(handlers=[], parent_run_id=self.run_id)
|
||||
manager.set_handlers(self.inheritable_handlers)
|
||||
manager.add_tags(self.inheritable_tags)
|
||||
manager.add_metadata(self.inheritable_metadata)
|
||||
if tag is not None:
|
||||
manager.add_tags([tag], False)
|
||||
return manager
|
||||
|
||||
|
||||
class AsyncRunManager(BaseRunManager):
|
||||
"""Async Run Manager."""
|
||||
|
||||
@@ -509,28 +475,6 @@ class AsyncRunManager(BaseRunManager):
|
||||
)
|
||||
|
||||
|
||||
class AsyncParentRunManager(AsyncRunManager):
|
||||
"""Async Parent Run Manager."""
|
||||
|
||||
def get_child(self, tag: Optional[str] = None) -> AsyncCallbackManager:
|
||||
"""Get a child callback manager.
|
||||
|
||||
Args:
|
||||
tag (str, optional): The tag for the child callback manager.
|
||||
Defaults to None.
|
||||
|
||||
Returns:
|
||||
AsyncCallbackManager: The child callback manager.
|
||||
"""
|
||||
manager = AsyncCallbackManager(handlers=[], parent_run_id=self.run_id)
|
||||
manager.set_handlers(self.inheritable_handlers)
|
||||
manager.add_tags(self.inheritable_tags)
|
||||
manager.add_metadata(self.inheritable_metadata)
|
||||
if tag is not None:
|
||||
manager.add_tags([tag], False)
|
||||
return manager
|
||||
|
||||
|
||||
class CallbackManagerForLLMRun(RunManager, LLMManagerMixin):
|
||||
"""Callback manager for LLM run."""
|
||||
|
||||
@@ -657,9 +601,26 @@ class AsyncCallbackManagerForLLMRun(AsyncRunManager, LLMManagerMixin):
|
||||
)
|
||||
|
||||
|
||||
class CallbackManagerForChainRun(ParentRunManager, ChainManagerMixin):
|
||||
class CallbackManagerForChainRun(RunManager, ChainManagerMixin):
|
||||
"""Callback manager for chain run."""
|
||||
|
||||
def get_child(self, tag: Optional[str] = None) -> CallbackManager:
|
||||
"""Get a child callback manager.
|
||||
|
||||
Args:
|
||||
tag (str, optional): The tag for the child callback manager.
|
||||
Defaults to None.
|
||||
|
||||
Returns:
|
||||
CallbackManager: The child callback manager.
|
||||
"""
|
||||
manager = CallbackManager(handlers=[], parent_run_id=self.run_id)
|
||||
manager.set_handlers(self.inheritable_handlers)
|
||||
manager.add_tags(self.inheritable_tags)
|
||||
if tag is not None:
|
||||
manager.add_tags([tag], False)
|
||||
return manager
|
||||
|
||||
def on_chain_end(self, outputs: Dict[str, Any], **kwargs: Any) -> None:
|
||||
"""Run when chain ends running.
|
||||
|
||||
@@ -739,9 +700,26 @@ class CallbackManagerForChainRun(ParentRunManager, ChainManagerMixin):
|
||||
)
|
||||
|
||||
|
||||
class AsyncCallbackManagerForChainRun(AsyncParentRunManager, ChainManagerMixin):
|
||||
class AsyncCallbackManagerForChainRun(AsyncRunManager, ChainManagerMixin):
|
||||
"""Async callback manager for chain run."""
|
||||
|
||||
def get_child(self, tag: Optional[str] = None) -> AsyncCallbackManager:
|
||||
"""Get a child callback manager.
|
||||
|
||||
Args:
|
||||
tag (str, optional): The tag for the child callback manager.
|
||||
Defaults to None.
|
||||
|
||||
Returns:
|
||||
AsyncCallbackManager: The child callback manager.
|
||||
"""
|
||||
manager = AsyncCallbackManager(handlers=[], parent_run_id=self.run_id)
|
||||
manager.set_handlers(self.inheritable_handlers)
|
||||
manager.add_tags(self.inheritable_tags)
|
||||
if tag is not None:
|
||||
manager.add_tags([tag], False)
|
||||
return manager
|
||||
|
||||
async def on_chain_end(self, outputs: Dict[str, Any], **kwargs: Any) -> None:
|
||||
"""Run when chain ends running.
|
||||
|
||||
@@ -821,9 +799,26 @@ class AsyncCallbackManagerForChainRun(AsyncParentRunManager, ChainManagerMixin):
|
||||
)
|
||||
|
||||
|
||||
class CallbackManagerForToolRun(ParentRunManager, ToolManagerMixin):
|
||||
class CallbackManagerForToolRun(RunManager, ToolManagerMixin):
|
||||
"""Callback manager for tool run."""
|
||||
|
||||
def get_child(self, tag: Optional[str] = None) -> CallbackManager:
|
||||
"""Get a child callback manager.
|
||||
|
||||
Args:
|
||||
tag (str, optional): The tag for the child callback manager.
|
||||
Defaults to None.
|
||||
|
||||
Returns:
|
||||
CallbackManager: The child callback manager.
|
||||
"""
|
||||
manager = CallbackManager(handlers=[], parent_run_id=self.run_id)
|
||||
manager.set_handlers(self.inheritable_handlers)
|
||||
manager.add_tags(self.inheritable_tags)
|
||||
if tag is not None:
|
||||
manager.add_tags([tag], False)
|
||||
return manager
|
||||
|
||||
def on_tool_end(
|
||||
self,
|
||||
output: str,
|
||||
@@ -867,9 +862,26 @@ class CallbackManagerForToolRun(ParentRunManager, ToolManagerMixin):
|
||||
)
|
||||
|
||||
|
||||
class AsyncCallbackManagerForToolRun(AsyncParentRunManager, ToolManagerMixin):
|
||||
class AsyncCallbackManagerForToolRun(AsyncRunManager, ToolManagerMixin):
|
||||
"""Async callback manager for tool run."""
|
||||
|
||||
def get_child(self, tag: Optional[str] = None) -> AsyncCallbackManager:
|
||||
"""Get a child callback manager.
|
||||
|
||||
Args:
|
||||
tag (str, optional): The tag to add to the child
|
||||
callback manager. Defaults to None.
|
||||
|
||||
Returns:
|
||||
AsyncCallbackManager: The child callback manager.
|
||||
"""
|
||||
manager = AsyncCallbackManager(handlers=[], parent_run_id=self.run_id)
|
||||
manager.set_handlers(self.inheritable_handlers)
|
||||
manager.add_tags(self.inheritable_tags)
|
||||
if tag is not None:
|
||||
manager.add_tags([tag], False)
|
||||
return manager
|
||||
|
||||
async def on_tool_end(self, output: str, **kwargs: Any) -> None:
|
||||
"""Run when tool ends running.
|
||||
|
||||
@@ -909,9 +921,18 @@ class AsyncCallbackManagerForToolRun(AsyncParentRunManager, ToolManagerMixin):
|
||||
)
|
||||
|
||||
|
||||
class CallbackManagerForRetrieverRun(ParentRunManager, RetrieverManagerMixin):
|
||||
class CallbackManagerForRetrieverRun(RunManager, RetrieverManagerMixin):
|
||||
"""Callback manager for retriever run."""
|
||||
|
||||
def get_child(self, tag: Optional[str] = None) -> CallbackManager:
|
||||
"""Get a child callback manager."""
|
||||
manager = CallbackManager([], parent_run_id=self.run_id)
|
||||
manager.set_handlers(self.inheritable_handlers)
|
||||
manager.add_tags(self.inheritable_tags)
|
||||
if tag is not None:
|
||||
manager.add_tags([tag], False)
|
||||
return manager
|
||||
|
||||
def on_retriever_end(
|
||||
self,
|
||||
documents: Sequence[Document],
|
||||
@@ -948,11 +969,20 @@ class CallbackManagerForRetrieverRun(ParentRunManager, RetrieverManagerMixin):
|
||||
|
||||
|
||||
class AsyncCallbackManagerForRetrieverRun(
|
||||
AsyncParentRunManager,
|
||||
AsyncRunManager,
|
||||
RetrieverManagerMixin,
|
||||
):
|
||||
"""Async callback manager for retriever run."""
|
||||
|
||||
def get_child(self, tag: Optional[str] = None) -> AsyncCallbackManager:
|
||||
"""Get a child callback manager."""
|
||||
manager = AsyncCallbackManager([], parent_run_id=self.run_id)
|
||||
manager.set_handlers(self.inheritable_handlers)
|
||||
manager.add_tags(self.inheritable_tags)
|
||||
if tag is not None:
|
||||
manager.add_tags([tag], False)
|
||||
return manager
|
||||
|
||||
async def on_retriever_end(
|
||||
self, documents: Sequence[Document], **kwargs: Any
|
||||
) -> None:
|
||||
@@ -1018,7 +1048,6 @@ class CallbackManager(BaseCallbackManager):
|
||||
run_id=run_id_,
|
||||
parent_run_id=self.parent_run_id,
|
||||
tags=self.tags,
|
||||
metadata=self.metadata,
|
||||
**kwargs,
|
||||
)
|
||||
|
||||
@@ -1030,8 +1059,6 @@ class CallbackManager(BaseCallbackManager):
|
||||
parent_run_id=self.parent_run_id,
|
||||
tags=self.tags,
|
||||
inheritable_tags=self.inheritable_tags,
|
||||
metadata=self.metadata,
|
||||
inheritable_metadata=self.inheritable_metadata,
|
||||
)
|
||||
)
|
||||
|
||||
@@ -1067,7 +1094,6 @@ class CallbackManager(BaseCallbackManager):
|
||||
run_id=run_id_,
|
||||
parent_run_id=self.parent_run_id,
|
||||
tags=self.tags,
|
||||
metadata=self.metadata,
|
||||
**kwargs,
|
||||
)
|
||||
|
||||
@@ -1079,8 +1105,6 @@ class CallbackManager(BaseCallbackManager):
|
||||
parent_run_id=self.parent_run_id,
|
||||
tags=self.tags,
|
||||
inheritable_tags=self.inheritable_tags,
|
||||
metadata=self.metadata,
|
||||
inheritable_metadata=self.inheritable_metadata,
|
||||
)
|
||||
)
|
||||
|
||||
@@ -1115,7 +1139,6 @@ class CallbackManager(BaseCallbackManager):
|
||||
run_id=run_id,
|
||||
parent_run_id=self.parent_run_id,
|
||||
tags=self.tags,
|
||||
metadata=self.metadata,
|
||||
**kwargs,
|
||||
)
|
||||
|
||||
@@ -1126,8 +1149,6 @@ class CallbackManager(BaseCallbackManager):
|
||||
parent_run_id=self.parent_run_id,
|
||||
tags=self.tags,
|
||||
inheritable_tags=self.inheritable_tags,
|
||||
metadata=self.metadata,
|
||||
inheritable_metadata=self.inheritable_metadata,
|
||||
)
|
||||
|
||||
def on_tool_start(
|
||||
@@ -1161,7 +1182,6 @@ class CallbackManager(BaseCallbackManager):
|
||||
run_id=run_id,
|
||||
parent_run_id=self.parent_run_id,
|
||||
tags=self.tags,
|
||||
metadata=self.metadata,
|
||||
**kwargs,
|
||||
)
|
||||
|
||||
@@ -1172,13 +1192,10 @@ class CallbackManager(BaseCallbackManager):
|
||||
parent_run_id=self.parent_run_id,
|
||||
tags=self.tags,
|
||||
inheritable_tags=self.inheritable_tags,
|
||||
metadata=self.metadata,
|
||||
inheritable_metadata=self.inheritable_metadata,
|
||||
)
|
||||
|
||||
def on_retriever_start(
|
||||
self,
|
||||
serialized: Dict[str, Any],
|
||||
query: str,
|
||||
run_id: Optional[UUID] = None,
|
||||
parent_run_id: Optional[UUID] = None,
|
||||
@@ -1192,12 +1209,10 @@ class CallbackManager(BaseCallbackManager):
|
||||
self.handlers,
|
||||
"on_retriever_start",
|
||||
"ignore_retriever",
|
||||
serialized,
|
||||
query,
|
||||
run_id=run_id,
|
||||
parent_run_id=self.parent_run_id,
|
||||
tags=self.tags,
|
||||
metadata=self.metadata,
|
||||
**kwargs,
|
||||
)
|
||||
|
||||
@@ -1208,8 +1223,6 @@ class CallbackManager(BaseCallbackManager):
|
||||
parent_run_id=self.parent_run_id,
|
||||
tags=self.tags,
|
||||
inheritable_tags=self.inheritable_tags,
|
||||
metadata=self.metadata,
|
||||
inheritable_metadata=self.inheritable_metadata,
|
||||
)
|
||||
|
||||
@classmethod
|
||||
@@ -1220,8 +1233,6 @@ class CallbackManager(BaseCallbackManager):
|
||||
verbose: bool = False,
|
||||
inheritable_tags: Optional[List[str]] = None,
|
||||
local_tags: Optional[List[str]] = None,
|
||||
inheritable_metadata: Optional[Dict[str, Any]] = None,
|
||||
local_metadata: Optional[Dict[str, Any]] = None,
|
||||
) -> CallbackManager:
|
||||
"""Configure the callback manager.
|
||||
|
||||
@@ -1235,10 +1246,6 @@ class CallbackManager(BaseCallbackManager):
|
||||
Defaults to None.
|
||||
local_tags (Optional[List[str]], optional): The local tags.
|
||||
Defaults to None.
|
||||
inheritable_metadata (Optional[Dict[str, Any]], optional): The inheritable
|
||||
metadata. Defaults to None.
|
||||
local_metadata (Optional[Dict[str, Any]], optional): The local metadata.
|
||||
Defaults to None.
|
||||
|
||||
Returns:
|
||||
CallbackManager: The configured callback manager.
|
||||
@@ -1250,8 +1257,6 @@ class CallbackManager(BaseCallbackManager):
|
||||
verbose,
|
||||
inheritable_tags,
|
||||
local_tags,
|
||||
inheritable_metadata,
|
||||
local_metadata,
|
||||
)
|
||||
|
||||
|
||||
@@ -1298,7 +1303,6 @@ class AsyncCallbackManager(BaseCallbackManager):
|
||||
run_id=run_id_,
|
||||
parent_run_id=self.parent_run_id,
|
||||
tags=self.tags,
|
||||
metadata=self.metadata,
|
||||
**kwargs,
|
||||
)
|
||||
)
|
||||
@@ -1311,8 +1315,6 @@ class AsyncCallbackManager(BaseCallbackManager):
|
||||
parent_run_id=self.parent_run_id,
|
||||
tags=self.tags,
|
||||
inheritable_tags=self.inheritable_tags,
|
||||
metadata=self.metadata,
|
||||
inheritable_metadata=self.inheritable_metadata,
|
||||
)
|
||||
)
|
||||
|
||||
@@ -1354,7 +1356,6 @@ class AsyncCallbackManager(BaseCallbackManager):
|
||||
run_id=run_id_,
|
||||
parent_run_id=self.parent_run_id,
|
||||
tags=self.tags,
|
||||
metadata=self.metadata,
|
||||
**kwargs,
|
||||
)
|
||||
)
|
||||
@@ -1367,8 +1368,6 @@ class AsyncCallbackManager(BaseCallbackManager):
|
||||
parent_run_id=self.parent_run_id,
|
||||
tags=self.tags,
|
||||
inheritable_tags=self.inheritable_tags,
|
||||
metadata=self.metadata,
|
||||
inheritable_metadata=self.inheritable_metadata,
|
||||
)
|
||||
)
|
||||
|
||||
@@ -1405,7 +1404,6 @@ class AsyncCallbackManager(BaseCallbackManager):
|
||||
run_id=run_id,
|
||||
parent_run_id=self.parent_run_id,
|
||||
tags=self.tags,
|
||||
metadata=self.metadata,
|
||||
**kwargs,
|
||||
)
|
||||
|
||||
@@ -1416,8 +1414,6 @@ class AsyncCallbackManager(BaseCallbackManager):
|
||||
parent_run_id=self.parent_run_id,
|
||||
tags=self.tags,
|
||||
inheritable_tags=self.inheritable_tags,
|
||||
metadata=self.metadata,
|
||||
inheritable_metadata=self.inheritable_metadata,
|
||||
)
|
||||
|
||||
async def on_tool_start(
|
||||
@@ -1453,7 +1449,6 @@ class AsyncCallbackManager(BaseCallbackManager):
|
||||
run_id=run_id,
|
||||
parent_run_id=self.parent_run_id,
|
||||
tags=self.tags,
|
||||
metadata=self.metadata,
|
||||
**kwargs,
|
||||
)
|
||||
|
||||
@@ -1464,13 +1459,10 @@ class AsyncCallbackManager(BaseCallbackManager):
|
||||
parent_run_id=self.parent_run_id,
|
||||
tags=self.tags,
|
||||
inheritable_tags=self.inheritable_tags,
|
||||
metadata=self.metadata,
|
||||
inheritable_metadata=self.inheritable_metadata,
|
||||
)
|
||||
|
||||
async def on_retriever_start(
|
||||
self,
|
||||
serialized: Dict[str, Any],
|
||||
query: str,
|
||||
run_id: Optional[UUID] = None,
|
||||
parent_run_id: Optional[UUID] = None,
|
||||
@@ -1484,12 +1476,10 @@ class AsyncCallbackManager(BaseCallbackManager):
|
||||
self.handlers,
|
||||
"on_retriever_start",
|
||||
"ignore_retriever",
|
||||
serialized,
|
||||
query,
|
||||
run_id=run_id,
|
||||
parent_run_id=self.parent_run_id,
|
||||
tags=self.tags,
|
||||
metadata=self.metadata,
|
||||
**kwargs,
|
||||
)
|
||||
|
||||
@@ -1500,8 +1490,6 @@ class AsyncCallbackManager(BaseCallbackManager):
|
||||
parent_run_id=self.parent_run_id,
|
||||
tags=self.tags,
|
||||
inheritable_tags=self.inheritable_tags,
|
||||
metadata=self.metadata,
|
||||
inheritable_metadata=self.inheritable_metadata,
|
||||
)
|
||||
|
||||
@classmethod
|
||||
@@ -1512,8 +1500,6 @@ class AsyncCallbackManager(BaseCallbackManager):
|
||||
verbose: bool = False,
|
||||
inheritable_tags: Optional[List[str]] = None,
|
||||
local_tags: Optional[List[str]] = None,
|
||||
inheritable_metadata: Optional[Dict[str, Any]] = None,
|
||||
local_metadata: Optional[Dict[str, Any]] = None,
|
||||
) -> AsyncCallbackManager:
|
||||
"""Configure the async callback manager.
|
||||
|
||||
@@ -1527,10 +1513,6 @@ class AsyncCallbackManager(BaseCallbackManager):
|
||||
Defaults to None.
|
||||
local_tags (Optional[List[str]], optional): The local tags.
|
||||
Defaults to None.
|
||||
inheritable_metadata (Optional[Dict[str, Any]], optional): The inheritable
|
||||
metadata. Defaults to None.
|
||||
local_metadata (Optional[Dict[str, Any]], optional): The local metadata.
|
||||
Defaults to None.
|
||||
|
||||
Returns:
|
||||
AsyncCallbackManager: The configured async callback manager.
|
||||
@@ -1542,8 +1524,6 @@ class AsyncCallbackManager(BaseCallbackManager):
|
||||
verbose,
|
||||
inheritable_tags,
|
||||
local_tags,
|
||||
inheritable_metadata,
|
||||
local_metadata,
|
||||
)
|
||||
|
||||
|
||||
@@ -1574,8 +1554,6 @@ def _configure(
|
||||
verbose: bool = False,
|
||||
inheritable_tags: Optional[List[str]] = None,
|
||||
local_tags: Optional[List[str]] = None,
|
||||
inheritable_metadata: Optional[Dict[str, Any]] = None,
|
||||
local_metadata: Optional[Dict[str, Any]] = None,
|
||||
) -> T:
|
||||
"""Configure the callback manager.
|
||||
|
||||
@@ -1589,10 +1567,6 @@ def _configure(
|
||||
inheritable_tags (Optional[List[str]], optional): The inheritable tags.
|
||||
Defaults to None.
|
||||
local_tags (Optional[List[str]], optional): The local tags. Defaults to None.
|
||||
inheritable_metadata (Optional[Dict[str, Any]], optional): The inheritable
|
||||
metadata. Defaults to None.
|
||||
local_metadata (Optional[Dict[str, Any]], optional): The local metadata.
|
||||
Defaults to None.
|
||||
|
||||
Returns:
|
||||
T: The configured callback manager.
|
||||
@@ -1612,8 +1586,6 @@ def _configure(
|
||||
parent_run_id=inheritable_callbacks.parent_run_id,
|
||||
tags=inheritable_callbacks.tags,
|
||||
inheritable_tags=inheritable_callbacks.inheritable_tags,
|
||||
metadata=inheritable_callbacks.metadata,
|
||||
inheritable_metadata=inheritable_callbacks.inheritable_metadata,
|
||||
)
|
||||
local_handlers_ = (
|
||||
local_callbacks
|
||||
@@ -1625,9 +1597,6 @@ def _configure(
|
||||
if inheritable_tags or local_tags:
|
||||
callback_manager.add_tags(inheritable_tags or [])
|
||||
callback_manager.add_tags(local_tags or [], False)
|
||||
if inheritable_metadata or local_metadata:
|
||||
callback_manager.add_metadata(inheritable_metadata or {})
|
||||
callback_manager.add_metadata(local_metadata or {}, False)
|
||||
|
||||
tracer = tracing_callback_var.get()
|
||||
wandb_tracer = wandb_tracing_callback_var.get()
|
||||
|
||||
@@ -63,7 +63,7 @@ class StdOutCallbackHandler(BaseCallbackHandler):
|
||||
self, action: AgentAction, color: Optional[str] = None, **kwargs: Any
|
||||
) -> Any:
|
||||
"""Run on agent action."""
|
||||
print_text(action.log, color=color or self.color)
|
||||
print_text(action.log, color=color if color else self.color)
|
||||
|
||||
def on_tool_end(
|
||||
self,
|
||||
@@ -76,7 +76,7 @@ class StdOutCallbackHandler(BaseCallbackHandler):
|
||||
"""If not the final action, print out observation."""
|
||||
if observation_prefix is not None:
|
||||
print_text(f"\n{observation_prefix}")
|
||||
print_text(output, color=color or self.color)
|
||||
print_text(output, color=color if color else self.color)
|
||||
if llm_prefix is not None:
|
||||
print_text(f"\n{llm_prefix}")
|
||||
|
||||
@@ -94,10 +94,10 @@ class StdOutCallbackHandler(BaseCallbackHandler):
|
||||
**kwargs: Any,
|
||||
) -> None:
|
||||
"""Run when agent ends."""
|
||||
print_text(text, color=color or self.color, end=end)
|
||||
print_text(text, color=color if color else self.color, end=end)
|
||||
|
||||
def on_agent_finish(
|
||||
self, finish: AgentFinish, color: Optional[str] = None, **kwargs: Any
|
||||
) -> None:
|
||||
"""Run on agent end."""
|
||||
print_text(finish.log, color=color or self.color, end="\n")
|
||||
print_text(finish.log, color=color if self.color else color, end="\n")
|
||||
|
||||
@@ -31,8 +31,7 @@ class AsyncIteratorCallbackHandler(AsyncCallbackHandler):
|
||||
self.done.clear()
|
||||
|
||||
async def on_llm_new_token(self, token: str, **kwargs: Any) -> None:
|
||||
if token is not None and token != "":
|
||||
self.queue.put_nowait(token)
|
||||
self.queue.put_nowait(token)
|
||||
|
||||
async def on_llm_end(self, response: LLMResult, **kwargs: Any) -> None:
|
||||
self.done.set()
|
||||
|
||||
@@ -37,7 +37,7 @@ class FinalStreamingStdOutCallbackHandler(StreamingStdOutCallbackHandler):
|
||||
"""Instantiate FinalStreamingStdOutCallbackHandler.
|
||||
|
||||
Args:
|
||||
answer_prefix_tokens: Token sequence that prefixes the answer.
|
||||
answer_prefix_tokens: Token sequence that prefixes the anwer.
|
||||
Default is ["Final", "Answer", ":"]
|
||||
strip_tokens: Ignore white spaces and new lines when comparing
|
||||
answer_prefix_tokens to last tokens? (to determine if answer has been
|
||||
|
||||
@@ -9,15 +9,11 @@ if TYPE_CHECKING:
|
||||
|
||||
|
||||
class ChildType(Enum):
|
||||
"""The enumerator of the child type."""
|
||||
|
||||
MARKDOWN = "MARKDOWN"
|
||||
EXCEPTION = "EXCEPTION"
|
||||
|
||||
|
||||
class ChildRecord(NamedTuple):
|
||||
"""The child record as a NamedTuple."""
|
||||
|
||||
type: ChildType
|
||||
kwargs: Dict[str, Any]
|
||||
dg: DeltaGenerator
|
||||
|
||||
@@ -27,8 +27,6 @@ EXCEPTION_EMOJI = "⚠️"
|
||||
|
||||
|
||||
class LLMThoughtState(Enum):
|
||||
"""Enumerator of the LLMThought state."""
|
||||
|
||||
# The LLM is thinking about what to do next. We don't know which tool we'll run.
|
||||
THINKING = "THINKING"
|
||||
# The LLM has decided to run a tool. We don't have results from the tool yet.
|
||||
@@ -38,8 +36,6 @@ class LLMThoughtState(Enum):
|
||||
|
||||
|
||||
class ToolRecord(NamedTuple):
|
||||
"""The tool record as a NamedTuple."""
|
||||
|
||||
name: str
|
||||
input_str: str
|
||||
|
||||
@@ -104,8 +100,6 @@ class LLMThoughtLabeler:
|
||||
|
||||
|
||||
class LLMThought:
|
||||
"""A thought in the LLM's thought stream."""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
parent_container: DeltaGenerator,
|
||||
@@ -113,14 +107,6 @@ class LLMThought:
|
||||
expanded: bool,
|
||||
collapse_on_complete: bool,
|
||||
):
|
||||
"""Initialize the LLMThought.
|
||||
|
||||
Args:
|
||||
parent_container: The container we're writing into.
|
||||
labeler: The labeler to use for this thought.
|
||||
expanded: Whether the thought should be expanded by default.
|
||||
collapse_on_complete: Whether the thought should be collapsed.
|
||||
"""
|
||||
self._container = MutableExpander(
|
||||
parent_container=parent_container,
|
||||
label=labeler.get_initial_label(),
|
||||
@@ -227,8 +213,6 @@ class LLMThought:
|
||||
|
||||
|
||||
class StreamlitCallbackHandler(BaseCallbackHandler):
|
||||
"""A callback handler that writes to a Streamlit app."""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
parent_container: DeltaGenerator,
|
||||
|
||||
@@ -4,14 +4,12 @@ from __future__ import annotations
|
||||
import logging
|
||||
from abc import ABC, abstractmethod
|
||||
from datetime import datetime
|
||||
from typing import Any, Dict, List, Optional, Sequence, Union, cast
|
||||
from typing import Any, Dict, List, Optional, Sequence, Union
|
||||
from uuid import UUID
|
||||
|
||||
from langchain.callbacks.base import BaseCallbackHandler
|
||||
from langchain.callbacks.tracers.schemas import Run, RunTypeEnum
|
||||
from langchain.load.dump import dumpd
|
||||
from langchain.schema.document import Document
|
||||
from langchain.schema.output import ChatGeneration, LLMResult
|
||||
from langchain.schema import Document, LLMResult
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
@@ -89,15 +87,12 @@ class BaseTracer(BaseCallbackHandler, ABC):
|
||||
run_id: UUID,
|
||||
tags: Optional[List[str]] = None,
|
||||
parent_run_id: Optional[UUID] = None,
|
||||
metadata: Optional[Dict[str, Any]] = None,
|
||||
**kwargs: Any,
|
||||
) -> None:
|
||||
"""Start a trace for an LLM run."""
|
||||
parent_run_id_ = str(parent_run_id) if parent_run_id else None
|
||||
execution_order = self._get_execution_order(parent_run_id_)
|
||||
start_time = datetime.utcnow()
|
||||
if metadata:
|
||||
kwargs.update({"metadata": metadata})
|
||||
llm_run = Run(
|
||||
id=run_id,
|
||||
parent_run_id=parent_run_id,
|
||||
@@ -148,13 +143,6 @@ class BaseTracer(BaseCallbackHandler, ABC):
|
||||
if llm_run is None or llm_run.run_type != RunTypeEnum.llm:
|
||||
raise TracerException("No LLM Run found to be traced")
|
||||
llm_run.outputs = response.dict()
|
||||
for i, generations in enumerate(response.generations):
|
||||
for j, generation in enumerate(generations):
|
||||
output_generation = llm_run.outputs["generations"][i][j]
|
||||
if "message" in output_generation:
|
||||
output_generation["message"] = dumpd(
|
||||
cast(ChatGeneration, generation).message
|
||||
)
|
||||
llm_run.end_time = datetime.utcnow()
|
||||
llm_run.events.append({"name": "end", "time": llm_run.end_time})
|
||||
self._end_trace(llm_run)
|
||||
@@ -189,15 +177,12 @@ class BaseTracer(BaseCallbackHandler, ABC):
|
||||
run_id: UUID,
|
||||
tags: Optional[List[str]] = None,
|
||||
parent_run_id: Optional[UUID] = None,
|
||||
metadata: Optional[Dict[str, Any]] = None,
|
||||
**kwargs: Any,
|
||||
) -> None:
|
||||
"""Start a trace for a chain run."""
|
||||
parent_run_id_ = str(parent_run_id) if parent_run_id else None
|
||||
execution_order = self._get_execution_order(parent_run_id_)
|
||||
start_time = datetime.utcnow()
|
||||
if metadata:
|
||||
kwargs.update({"metadata": metadata})
|
||||
chain_run = Run(
|
||||
id=run_id,
|
||||
parent_run_id=parent_run_id,
|
||||
@@ -259,15 +244,12 @@ class BaseTracer(BaseCallbackHandler, ABC):
|
||||
run_id: UUID,
|
||||
tags: Optional[List[str]] = None,
|
||||
parent_run_id: Optional[UUID] = None,
|
||||
metadata: Optional[Dict[str, Any]] = None,
|
||||
**kwargs: Any,
|
||||
) -> None:
|
||||
"""Start a trace for a tool run."""
|
||||
parent_run_id_ = str(parent_run_id) if parent_run_id else None
|
||||
execution_order = self._get_execution_order(parent_run_id_)
|
||||
start_time = datetime.utcnow()
|
||||
if metadata:
|
||||
kwargs.update({"metadata": metadata})
|
||||
tool_run = Run(
|
||||
id=run_id,
|
||||
parent_run_id=parent_run_id,
|
||||
@@ -321,25 +303,20 @@ class BaseTracer(BaseCallbackHandler, ABC):
|
||||
|
||||
def on_retriever_start(
|
||||
self,
|
||||
serialized: Dict[str, Any],
|
||||
query: str,
|
||||
*,
|
||||
run_id: UUID,
|
||||
parent_run_id: Optional[UUID] = None,
|
||||
metadata: Optional[Dict[str, Any]] = None,
|
||||
**kwargs: Any,
|
||||
) -> None:
|
||||
"""Run when Retriever starts running."""
|
||||
parent_run_id_ = str(parent_run_id) if parent_run_id else None
|
||||
execution_order = self._get_execution_order(parent_run_id_)
|
||||
start_time = datetime.utcnow()
|
||||
if metadata:
|
||||
kwargs.update({"metadata": metadata})
|
||||
retrieval_run = Run(
|
||||
id=run_id,
|
||||
name="Retriever",
|
||||
parent_run_id=parent_run_id,
|
||||
serialized=serialized,
|
||||
inputs={"query": query},
|
||||
extra=kwargs,
|
||||
events=[{"name": "start", "time": start_time}],
|
||||
|
||||
@@ -6,7 +6,6 @@ from uuid import UUID
|
||||
|
||||
from langchainplus_sdk import LangChainPlusClient, RunEvaluator
|
||||
|
||||
from langchain.callbacks.manager import tracing_v2_enabled
|
||||
from langchain.callbacks.tracers.base import BaseTracer
|
||||
from langchain.callbacks.tracers.schemas import Run
|
||||
|
||||
@@ -28,8 +27,6 @@ class EvaluatorCallbackHandler(BaseTracer):
|
||||
If not specified, a new instance will be created.
|
||||
example_id : Union[UUID, str], optional
|
||||
The example ID to be associated with the runs.
|
||||
project_name : str, optional
|
||||
The LangSmith project name to be organize eval chain runs under.
|
||||
|
||||
Attributes
|
||||
----------
|
||||
@@ -43,11 +40,6 @@ class EvaluatorCallbackHandler(BaseTracer):
|
||||
The thread pool executor used for running the evaluators.
|
||||
futures : Set[Future]
|
||||
The set of futures representing the running evaluators.
|
||||
skip_unfinished : bool
|
||||
Whether to skip runs that are not finished or raised
|
||||
an error.
|
||||
project_name : Optional[str]
|
||||
The LangSmith project name to be organize eval chain runs under.
|
||||
"""
|
||||
|
||||
name = "evaluator_callback_handler"
|
||||
@@ -58,8 +50,6 @@ class EvaluatorCallbackHandler(BaseTracer):
|
||||
max_workers: Optional[int] = None,
|
||||
client: Optional[LangChainPlusClient] = None,
|
||||
example_id: Optional[Union[UUID, str]] = None,
|
||||
skip_unfinished: bool = True,
|
||||
project_name: Optional[str] = None,
|
||||
**kwargs: Any,
|
||||
) -> None:
|
||||
super().__init__(**kwargs)
|
||||
@@ -72,25 +62,10 @@ class EvaluatorCallbackHandler(BaseTracer):
|
||||
max_workers=max(max_workers or len(evaluators), 1)
|
||||
)
|
||||
self.futures: Set[Future] = set()
|
||||
self.skip_unfinished = skip_unfinished
|
||||
self.project_name = project_name
|
||||
|
||||
def _evaluate_in_project(self, run: Run, evaluator: RunEvaluator) -> None:
|
||||
"""Evaluate the run in the project.
|
||||
|
||||
Parameters
|
||||
----------
|
||||
run : Run
|
||||
The run to be evaluated.
|
||||
evaluator : RunEvaluator
|
||||
The evaluator to use for evaluating the run.
|
||||
|
||||
"""
|
||||
def _evaluate_run(self, run: Run, evaluator: RunEvaluator) -> None:
|
||||
try:
|
||||
if self.project_name is None:
|
||||
self.client.evaluate_run(run, evaluator)
|
||||
with tracing_v2_enabled(project_name=self.project_name, tags=["eval"]):
|
||||
self.client.evaluate_run(run, evaluator)
|
||||
self.client.evaluate_run(run, evaluator)
|
||||
except Exception as e:
|
||||
logger.error(
|
||||
f"Error evaluating run {run.id} with "
|
||||
@@ -108,15 +83,10 @@ class EvaluatorCallbackHandler(BaseTracer):
|
||||
The run to be evaluated.
|
||||
|
||||
"""
|
||||
if self.skip_unfinished and not run.outputs:
|
||||
logger.debug(f"Skipping unfinished run {run.id}")
|
||||
return
|
||||
run_ = run.copy()
|
||||
run_.reference_example_id = self.example_id
|
||||
for evaluator in self.evaluators:
|
||||
self.futures.add(
|
||||
self.executor.submit(self._evaluate_in_project, run_, evaluator)
|
||||
)
|
||||
self.futures.add(self.executor.submit(self._evaluate_run, run_, evaluator))
|
||||
|
||||
def wait_for_futures(self) -> None:
|
||||
"""Wait for all futures to complete."""
|
||||
|
||||
@@ -11,10 +11,13 @@ from uuid import UUID
|
||||
from langchainplus_sdk import LangChainPlusClient
|
||||
|
||||
from langchain.callbacks.tracers.base import BaseTracer
|
||||
from langchain.callbacks.tracers.schemas import Run, RunTypeEnum, TracerSession
|
||||
from langchain.callbacks.tracers.schemas import (
|
||||
Run,
|
||||
RunTypeEnum,
|
||||
TracerSession,
|
||||
)
|
||||
from langchain.env import get_runtime_environment
|
||||
from langchain.load.dump import dumpd
|
||||
from langchain.schema.messages import BaseMessage
|
||||
from langchain.schema.messages import BaseMessage, messages_to_dict
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
_LOGGED = set()
|
||||
@@ -31,7 +34,6 @@ def log_error_once(method: str, exception: Exception) -> None:
|
||||
|
||||
|
||||
def wait_for_all_tracers() -> None:
|
||||
"""Wait for all tracers to finish."""
|
||||
global _TRACERS
|
||||
for tracer in _TRACERS:
|
||||
tracer.wait_for_futures()
|
||||
@@ -45,7 +47,6 @@ class LangChainTracer(BaseTracer):
|
||||
example_id: Optional[Union[UUID, str]] = None,
|
||||
project_name: Optional[str] = None,
|
||||
client: Optional[LangChainPlusClient] = None,
|
||||
tags: Optional[List[str]] = None,
|
||||
**kwargs: Any,
|
||||
) -> None:
|
||||
"""Initialize the LangChain tracer."""
|
||||
@@ -61,7 +62,6 @@ class LangChainTracer(BaseTracer):
|
||||
self.executor = ThreadPoolExecutor(max_workers=1)
|
||||
self.client = client or LangChainPlusClient()
|
||||
self._futures: Set[Future] = set()
|
||||
self.tags = tags or []
|
||||
global _TRACERS
|
||||
_TRACERS.append(self)
|
||||
|
||||
@@ -73,20 +73,17 @@ class LangChainTracer(BaseTracer):
|
||||
run_id: UUID,
|
||||
tags: Optional[List[str]] = None,
|
||||
parent_run_id: Optional[UUID] = None,
|
||||
metadata: Optional[Dict[str, Any]] = None,
|
||||
**kwargs: Any,
|
||||
) -> None:
|
||||
"""Start a trace for an LLM run."""
|
||||
parent_run_id_ = str(parent_run_id) if parent_run_id else None
|
||||
execution_order = self._get_execution_order(parent_run_id_)
|
||||
start_time = datetime.utcnow()
|
||||
if metadata:
|
||||
kwargs.update({"metadata": metadata})
|
||||
chat_model_run = Run(
|
||||
id=run_id,
|
||||
parent_run_id=parent_run_id,
|
||||
serialized=serialized,
|
||||
inputs={"messages": [[dumpd(msg) for msg in batch] for batch in messages]},
|
||||
inputs={"messages": [messages_to_dict(batch) for batch in messages]},
|
||||
extra=kwargs,
|
||||
events=[{"name": "start", "time": start_time}],
|
||||
start_time=start_time,
|
||||
@@ -101,18 +98,11 @@ class LangChainTracer(BaseTracer):
|
||||
def _persist_run(self, run: Run) -> None:
|
||||
"""The Langchain Tracer uses Post/Patch rather than persist."""
|
||||
|
||||
def _get_tags(self, run: Run) -> List[str]:
|
||||
"""Get combined tags for a run."""
|
||||
tags = set(run.tags or [])
|
||||
tags.update(self.tags or [])
|
||||
return list(tags)
|
||||
|
||||
def _persist_run_single(self, run: Run) -> None:
|
||||
"""Persist a run."""
|
||||
if run.parent_run_id is None:
|
||||
run.reference_example_id = self.example_id
|
||||
run_dict = run.dict(exclude={"child_runs"})
|
||||
run_dict["tags"] = self._get_tags(run)
|
||||
extra = run_dict.get("extra", {})
|
||||
extra["runtime"] = get_runtime_environment()
|
||||
run_dict["extra"] = extra
|
||||
@@ -126,9 +116,7 @@ class LangChainTracer(BaseTracer):
|
||||
def _update_run_single(self, run: Run) -> None:
|
||||
"""Update a run."""
|
||||
try:
|
||||
run_dict = run.dict()
|
||||
run_dict["tags"] = self._get_tags(run)
|
||||
self.client.update_run(run.id, **run_dict)
|
||||
self.client.update_run(run.id, **run.dict())
|
||||
except Exception as e:
|
||||
# Errors are swallowed by the thread executor so we need to log them here
|
||||
log_error_once("patch", e)
|
||||
|
||||
@@ -48,48 +48,7 @@ def import_langkit(
|
||||
|
||||
|
||||
class WhyLabsCallbackHandler(BaseCallbackHandler):
|
||||
"""
|
||||
Callback Handler for logging to WhyLabs. This callback handler utilizes
|
||||
`langkit` to extract features from the prompts & responses when interacting with
|
||||
an LLM. These features can be used to guardrail, evaluate, and observe interactions
|
||||
over time to detect issues relating to hallucinations, prompt engineering,
|
||||
or output validation. LangKit is an LLM monitoring toolkit developed by WhyLabs.
|
||||
|
||||
Here are some examples of what can be monitored with LangKit:
|
||||
* Text Quality
|
||||
- readability score
|
||||
- complexity and grade scores
|
||||
* Text Relevance
|
||||
- Similarity scores between prompt/responses
|
||||
- Similarity scores against user-defined themes
|
||||
- Topic classification
|
||||
* Security and Privacy
|
||||
- patterns - count of strings matching a user-defined regex pattern group
|
||||
- jailbreaks - similarity scores with respect to known jailbreak attempts
|
||||
- prompt injection - similarity scores with respect to known prompt attacks
|
||||
- refusals - similarity scores with respect to known LLM refusal responses
|
||||
* Sentiment and Toxicity
|
||||
- sentiment analysis
|
||||
- toxicity analysis
|
||||
|
||||
For more information, see https://docs.whylabs.ai/docs/language-model-monitoring
|
||||
or check out the LangKit repo here: https://github.com/whylabs/langkit
|
||||
|
||||
---
|
||||
Args:
|
||||
api_key (Optional[str]): WhyLabs API key. Optional because the preferred
|
||||
way to specify the API key is with environment variable
|
||||
WHYLABS_API_KEY.
|
||||
org_id (Optional[str]): WhyLabs organization id to write profiles to.
|
||||
Optional because the preferred way to specify the organization id is
|
||||
with environment variable WHYLABS_DEFAULT_ORG_ID.
|
||||
dataset_id (Optional[str]): WhyLabs dataset id to write profiles to.
|
||||
Optional because the preferred way to specify the dataset id is
|
||||
with environment variable WHYLABS_DEFAULT_DATASET_ID.
|
||||
sentiment (bool): Whether to enable sentiment analysis. Defaults to False.
|
||||
toxicity (bool): Whether to enable toxicity analysis. Defaults to False.
|
||||
themes (bool): Whether to enable theme analysis. Defaults to False.
|
||||
"""
|
||||
"""WhyLabs CallbackHandler."""
|
||||
|
||||
def __init__(self, logger: Logger):
|
||||
"""Initiate the rolling logger"""
|
||||
|
||||
@@ -4,7 +4,6 @@ from langchain.chains.api.openapi.chain import OpenAPIEndpointChain
|
||||
from langchain.chains.combine_documents.base import AnalyzeDocumentChain
|
||||
from langchain.chains.combine_documents.map_reduce import MapReduceDocumentsChain
|
||||
from langchain.chains.combine_documents.map_rerank import MapRerankDocumentsChain
|
||||
from langchain.chains.combine_documents.reduce import ReduceDocumentsChain
|
||||
from langchain.chains.combine_documents.refine import RefineDocumentsChain
|
||||
from langchain.chains.combine_documents.stuff import StuffDocumentsChain
|
||||
from langchain.chains.constitutional_ai.base import ConstitutionalChain
|
||||
@@ -16,10 +15,8 @@ from langchain.chains.conversational_retrieval.base import (
|
||||
from langchain.chains.flare.base import FlareChain
|
||||
from langchain.chains.graph_qa.base import GraphQAChain
|
||||
from langchain.chains.graph_qa.cypher import GraphCypherQAChain
|
||||
from langchain.chains.graph_qa.hugegraph import HugeGraphQAChain
|
||||
from langchain.chains.graph_qa.kuzu import KuzuQAChain
|
||||
from langchain.chains.graph_qa.nebulagraph import NebulaGraphQAChain
|
||||
from langchain.chains.graph_qa.sparql import GraphSparqlQAChain
|
||||
from langchain.chains.hyde.base import HypotheticalDocumentEmbedder
|
||||
from langchain.chains.llm import LLMChain
|
||||
from langchain.chains.llm_bash.base import LLMBashChain
|
||||
@@ -70,10 +67,8 @@ __all__ = [
|
||||
"FlareChain",
|
||||
"GraphCypherQAChain",
|
||||
"GraphQAChain",
|
||||
"GraphSparqlQAChain",
|
||||
"HypotheticalDocumentEmbedder",
|
||||
"KuzuQAChain",
|
||||
"HugeGraphQAChain",
|
||||
"LLMBashChain",
|
||||
"LLMChain",
|
||||
"LLMCheckerChain",
|
||||
@@ -114,5 +109,4 @@ __all__ = [
|
||||
"MapRerankDocumentsChain",
|
||||
"MapReduceDocumentsChain",
|
||||
"RefineDocumentsChain",
|
||||
"ReduceDocumentsChain",
|
||||
]
|
||||
|
||||
@@ -1,7 +1,6 @@
|
||||
"""Base interface that all chains should implement."""
|
||||
import inspect
|
||||
import json
|
||||
import logging
|
||||
import warnings
|
||||
from abc import ABC, abstractmethod
|
||||
from pathlib import Path
|
||||
@@ -23,35 +22,13 @@ from langchain.load.dump import dumpd
|
||||
from langchain.load.serializable import Serializable
|
||||
from langchain.schema import RUN_KEY, BaseMemory, RunInfo
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
def _get_verbosity() -> bool:
|
||||
return langchain.verbose
|
||||
|
||||
|
||||
class Chain(Serializable, ABC):
|
||||
"""Abstract base class for creating structured sequences of calls to components.
|
||||
|
||||
Chains should be used to encode a sequence of calls to components like
|
||||
models, document retrievers, other chains, etc., and provide a simple interface
|
||||
to this sequence.
|
||||
|
||||
The Chain interface makes it easy to create apps that are:
|
||||
- Stateful: add Memory to any Chain to give it state,
|
||||
- Observable: pass Callbacks to a Chain to execute additional functionality,
|
||||
like logging, outside the main sequence of component calls,
|
||||
- Composable: the Chain API is flexible enough that it is easy to combine
|
||||
Chains with other components, including other Chains.
|
||||
|
||||
The main methods exposed by chains are:
|
||||
- `__call__`: Chains are callable. The `__call__` method is the primary way to
|
||||
execute a Chain. This takes inputs as a dictionary and returns a
|
||||
dictionary output.
|
||||
- `run`: A convenience method that takes inputs as args/kwargs and returns the
|
||||
output as a string. This method can only be used for a subset of chains and
|
||||
cannot return as rich of an output as `__call__`.
|
||||
"""
|
||||
"""Base interface that all chains should implement."""
|
||||
|
||||
memory: Optional[BaseMemory] = None
|
||||
"""Optional memory object. Defaults to None.
|
||||
@@ -77,12 +54,6 @@ class Chain(Serializable, ABC):
|
||||
and passed as arguments to the handlers defined in `callbacks`.
|
||||
You can use these to eg identify a specific instance of a chain with its use case.
|
||||
"""
|
||||
metadata: Optional[Dict[str, Any]] = None
|
||||
"""Optional metadata associated with the chain. Defaults to None
|
||||
This metadata will be associated with each call to this chain,
|
||||
and passed as arguments to the handlers defined in `callbacks`.
|
||||
You can use these to eg identify a specific instance of a chain with its use case.
|
||||
"""
|
||||
|
||||
class Config:
|
||||
"""Configuration for this pydantic object."""
|
||||
@@ -94,7 +65,7 @@ class Chain(Serializable, ABC):
|
||||
raise NotImplementedError("Saving not supported for this chain type.")
|
||||
|
||||
@root_validator()
|
||||
def raise_callback_manager_deprecation(cls, values: Dict) -> Dict:
|
||||
def raise_deprecation(cls, values: Dict) -> Dict:
|
||||
"""Raise deprecation warning if callback_manager is used."""
|
||||
if values.get("callback_manager") is not None:
|
||||
warnings.warn(
|
||||
@@ -106,9 +77,9 @@ class Chain(Serializable, ABC):
|
||||
|
||||
@validator("verbose", pre=True, always=True)
|
||||
def set_verbose(cls, verbose: Optional[bool]) -> bool:
|
||||
"""Set the chain verbosity.
|
||||
"""If verbose is None, set it.
|
||||
|
||||
Defaults to the global setting if not specified by the user.
|
||||
This allows users to pass in None as verbose to access the global setting.
|
||||
"""
|
||||
if verbose is None:
|
||||
return _get_verbosity()
|
||||
@@ -118,12 +89,12 @@ class Chain(Serializable, ABC):
|
||||
@property
|
||||
@abstractmethod
|
||||
def input_keys(self) -> List[str]:
|
||||
"""Return the keys expected to be in the chain input."""
|
||||
"""Input keys this chain expects."""
|
||||
|
||||
@property
|
||||
@abstractmethod
|
||||
def output_keys(self) -> List[str]:
|
||||
"""Return the keys expected to be in the chain output."""
|
||||
"""Output keys this chain expects."""
|
||||
|
||||
def _validate_inputs(self, inputs: Dict[str, Any]) -> None:
|
||||
"""Check that all inputs are present."""
|
||||
@@ -142,44 +113,14 @@ class Chain(Serializable, ABC):
|
||||
inputs: Dict[str, Any],
|
||||
run_manager: Optional[CallbackManagerForChainRun] = None,
|
||||
) -> Dict[str, Any]:
|
||||
"""Execute the chain.
|
||||
|
||||
This is a private method that is not user-facing. It is only called within
|
||||
`Chain.__call__`, which is the user-facing wrapper method that handles
|
||||
callbacks configuration and some input/output processing.
|
||||
|
||||
Args:
|
||||
inputs: A dict of named inputs to the chain. Assumed to contain all inputs
|
||||
specified in `Chain.input_keys`, including any inputs added by memory.
|
||||
run_manager: The callbacks manager that contains the callback handlers for
|
||||
this run of the chain.
|
||||
|
||||
Returns:
|
||||
A dict of named outputs. Should contain all outputs specified in
|
||||
`Chain.output_keys`.
|
||||
"""
|
||||
"""Run the logic of this chain and return the output."""
|
||||
|
||||
async def _acall(
|
||||
self,
|
||||
inputs: Dict[str, Any],
|
||||
run_manager: Optional[AsyncCallbackManagerForChainRun] = None,
|
||||
) -> Dict[str, Any]:
|
||||
"""Asynchronously execute the chain.
|
||||
|
||||
This is a private method that is not user-facing. It is only called within
|
||||
`Chain.acall`, which is the user-facing wrapper method that handles
|
||||
callbacks configuration and some input/output processing.
|
||||
|
||||
Args:
|
||||
inputs: A dict of named inputs to the chain. Assumed to contain all inputs
|
||||
specified in `Chain.input_keys`, including any inputs added by memory.
|
||||
run_manager: The callbacks manager that contains the callback handlers for
|
||||
this run of the chain.
|
||||
|
||||
Returns:
|
||||
A dict of named outputs. Should contain all outputs specified in
|
||||
`Chain.output_keys`.
|
||||
"""
|
||||
"""Run the logic of this chain and return the output."""
|
||||
raise NotImplementedError("Async call not supported for this chain type.")
|
||||
|
||||
def __call__(
|
||||
@@ -189,43 +130,25 @@ class Chain(Serializable, ABC):
|
||||
callbacks: Callbacks = None,
|
||||
*,
|
||||
tags: Optional[List[str]] = None,
|
||||
metadata: Optional[Dict[str, Any]] = None,
|
||||
include_run_info: bool = False,
|
||||
) -> Dict[str, Any]:
|
||||
"""Execute the chain.
|
||||
"""Run the logic of this chain and add to output if desired.
|
||||
|
||||
Args:
|
||||
inputs: Dictionary of inputs, or single input if chain expects
|
||||
only one param. Should contain all inputs specified in
|
||||
`Chain.input_keys` except for inputs that will be set by the chain's
|
||||
memory.
|
||||
return_only_outputs: Whether to return only outputs in the
|
||||
only one param.
|
||||
return_only_outputs: boolean for whether to return only outputs in the
|
||||
response. If True, only new keys generated by this chain will be
|
||||
returned. If False, both input keys and new keys generated by this
|
||||
chain will be returned. Defaults to False.
|
||||
callbacks: Callbacks to use for this chain run. These will be called in
|
||||
addition to callbacks passed to the chain during construction, but only
|
||||
these runtime callbacks will propagate to calls to other objects.
|
||||
tags: List of string tags to pass to all callbacks. These will be passed in
|
||||
addition to tags passed to the chain during construction, but only
|
||||
these runtime tags will propagate to calls to other objects.
|
||||
metadata: Optional metadata associated with the chain. Defaults to None
|
||||
callbacks: Callbacks to use for this chain run. If not provided, will
|
||||
use the callbacks provided to the chain.
|
||||
include_run_info: Whether to include run info in the response. Defaults
|
||||
to False.
|
||||
|
||||
Returns:
|
||||
A dict of named outputs. Should contain all outputs specified in
|
||||
`Chain.output_keys`.
|
||||
"""
|
||||
inputs = self.prep_inputs(inputs)
|
||||
callback_manager = CallbackManager.configure(
|
||||
callbacks,
|
||||
self.callbacks,
|
||||
self.verbose,
|
||||
tags,
|
||||
self.tags,
|
||||
metadata,
|
||||
self.metadata,
|
||||
callbacks, self.callbacks, self.verbose, tags, self.tags
|
||||
)
|
||||
new_arg_supported = inspect.signature(self._call).parameters.get("run_manager")
|
||||
run_manager = callback_manager.on_chain_start(
|
||||
@@ -256,43 +179,25 @@ class Chain(Serializable, ABC):
|
||||
callbacks: Callbacks = None,
|
||||
*,
|
||||
tags: Optional[List[str]] = None,
|
||||
metadata: Optional[Dict[str, Any]] = None,
|
||||
include_run_info: bool = False,
|
||||
) -> Dict[str, Any]:
|
||||
"""Asynchronously execute the chain.
|
||||
"""Run the logic of this chain and add to output if desired.
|
||||
|
||||
Args:
|
||||
inputs: Dictionary of inputs, or single input if chain expects
|
||||
only one param. Should contain all inputs specified in
|
||||
`Chain.input_keys` except for inputs that will be set by the chain's
|
||||
memory.
|
||||
return_only_outputs: Whether to return only outputs in the
|
||||
only one param.
|
||||
return_only_outputs: boolean for whether to return only outputs in the
|
||||
response. If True, only new keys generated by this chain will be
|
||||
returned. If False, both input keys and new keys generated by this
|
||||
chain will be returned. Defaults to False.
|
||||
callbacks: Callbacks to use for this chain run. These will be called in
|
||||
addition to callbacks passed to the chain during construction, but only
|
||||
these runtime callbacks will propagate to calls to other objects.
|
||||
tags: List of string tags to pass to all callbacks. These will be passed in
|
||||
addition to tags passed to the chain during construction, but only
|
||||
these runtime tags will propagate to calls to other objects.
|
||||
metadata: Optional metadata associated with the chain. Defaults to None
|
||||
callbacks: Callbacks to use for this chain run. If not provided, will
|
||||
use the callbacks provided to the chain.
|
||||
include_run_info: Whether to include run info in the response. Defaults
|
||||
to False.
|
||||
|
||||
Returns:
|
||||
A dict of named outputs. Should contain all outputs specified in
|
||||
`Chain.output_keys`.
|
||||
"""
|
||||
inputs = self.prep_inputs(inputs)
|
||||
callback_manager = AsyncCallbackManager.configure(
|
||||
callbacks,
|
||||
self.callbacks,
|
||||
self.verbose,
|
||||
tags,
|
||||
self.tags,
|
||||
metadata,
|
||||
self.metadata,
|
||||
callbacks, self.callbacks, self.verbose, tags, self.tags
|
||||
)
|
||||
new_arg_supported = inspect.signature(self._acall).parameters.get("run_manager")
|
||||
run_manager = await callback_manager.on_chain_start(
|
||||
@@ -322,18 +227,7 @@ class Chain(Serializable, ABC):
|
||||
outputs: Dict[str, str],
|
||||
return_only_outputs: bool = False,
|
||||
) -> Dict[str, str]:
|
||||
"""Validate and prepare chain outputs, and save info about this run to memory.
|
||||
|
||||
Args:
|
||||
inputs: Dictionary of chain inputs, including any inputs added by chain
|
||||
memory.
|
||||
outputs: Dictionary of initial chain outputs.
|
||||
return_only_outputs: Whether to only return the chain outputs. If False,
|
||||
inputs are also added to the final outputs.
|
||||
|
||||
Returns:
|
||||
A dict of the final chain outputs.
|
||||
"""
|
||||
"""Validate and prep outputs."""
|
||||
self._validate_outputs(outputs)
|
||||
if self.memory is not None:
|
||||
self.memory.save_context(inputs, outputs)
|
||||
@@ -343,17 +237,7 @@ class Chain(Serializable, ABC):
|
||||
return {**inputs, **outputs}
|
||||
|
||||
def prep_inputs(self, inputs: Union[Dict[str, Any], Any]) -> Dict[str, str]:
|
||||
"""Validate and prepare chain inputs, including adding inputs from memory.
|
||||
|
||||
Args:
|
||||
inputs: Dictionary of raw inputs, or single input if chain expects
|
||||
only one param. Should contain all inputs specified in
|
||||
`Chain.input_keys` except for inputs that will be set by the chain's
|
||||
memory.
|
||||
|
||||
Returns:
|
||||
A dictionary of all inputs, including those added by the chain's memory.
|
||||
"""
|
||||
"""Validate and prep inputs."""
|
||||
if not isinstance(inputs, dict):
|
||||
_input_keys = set(self.input_keys)
|
||||
if self.memory is not None:
|
||||
@@ -374,6 +258,12 @@ class Chain(Serializable, ABC):
|
||||
self._validate_inputs(inputs)
|
||||
return inputs
|
||||
|
||||
def apply(
|
||||
self, input_list: List[Dict[str, Any]], callbacks: Callbacks = None
|
||||
) -> List[Dict[str, str]]:
|
||||
"""Call the chain on all inputs in the list."""
|
||||
return [self(inputs, callbacks=callbacks) for inputs in input_list]
|
||||
|
||||
@property
|
||||
def _run_output_key(self) -> str:
|
||||
if len(self.output_keys) != 1:
|
||||
@@ -388,143 +278,56 @@ class Chain(Serializable, ABC):
|
||||
*args: Any,
|
||||
callbacks: Callbacks = None,
|
||||
tags: Optional[List[str]] = None,
|
||||
metadata: Optional[Dict[str, Any]] = None,
|
||||
**kwargs: Any,
|
||||
) -> str:
|
||||
"""Convenience method for executing chain when there's a single string output.
|
||||
|
||||
The main difference between this method and `Chain.__call__` is that this method
|
||||
can only be used for chains that return a single string output. If a Chain
|
||||
has more outputs, a non-string output, or you want to return the inputs/run
|
||||
info along with the outputs, use `Chain.__call__`.
|
||||
|
||||
The other difference is that this method expects inputs to be passed directly in
|
||||
as positional arguments or keyword arguments, whereas `Chain.__call__` expects
|
||||
a single input dictionary with all the inputs.
|
||||
|
||||
Args:
|
||||
*args: If the chain expects a single input, it can be passed in as the
|
||||
sole positional argument.
|
||||
callbacks: Callbacks to use for this chain run. These will be called in
|
||||
addition to callbacks passed to the chain during construction, but only
|
||||
these runtime callbacks will propagate to calls to other objects.
|
||||
tags: List of string tags to pass to all callbacks. These will be passed in
|
||||
addition to tags passed to the chain during construction, but only
|
||||
these runtime tags will propagate to calls to other objects.
|
||||
**kwargs: If the chain expects multiple inputs, they can be passed in
|
||||
directly as keyword arguments.
|
||||
|
||||
Returns:
|
||||
The chain output as a string.
|
||||
|
||||
Example:
|
||||
.. code-block:: python
|
||||
|
||||
# Suppose we have a single-input chain that takes a 'question' string:
|
||||
chain.run("What's the temperature in Boise, Idaho?")
|
||||
# -> "The temperature in Boise is..."
|
||||
|
||||
# Suppose we have a multi-input chain that takes a 'question' string
|
||||
# and 'context' string:
|
||||
question = "What's the temperature in Boise, Idaho?"
|
||||
context = "Weather report for Boise, Idaho on 07/03/23..."
|
||||
chain.run(question=question, context=context)
|
||||
# -> "The temperature in Boise is..."
|
||||
"""
|
||||
"""Run the chain as text in, text out or multiple variables, text out."""
|
||||
# Run at start to make sure this is possible/defined
|
||||
_output_key = self._run_output_key
|
||||
|
||||
if args and not kwargs:
|
||||
if len(args) != 1:
|
||||
raise ValueError("`run` supports only one positional argument.")
|
||||
return self(args[0], callbacks=callbacks, tags=tags, metadata=metadata)[
|
||||
_output_key
|
||||
]
|
||||
return self(args[0], callbacks=callbacks, tags=tags)[_output_key]
|
||||
|
||||
if kwargs and not args:
|
||||
return self(kwargs, callbacks=callbacks, tags=tags, metadata=metadata)[
|
||||
_output_key
|
||||
]
|
||||
return self(kwargs, callbacks=callbacks, tags=tags)[_output_key]
|
||||
|
||||
if not kwargs and not args:
|
||||
raise ValueError(
|
||||
"`run` supported with either positional arguments or keyword arguments,"
|
||||
" but none were provided."
|
||||
)
|
||||
else:
|
||||
raise ValueError(
|
||||
f"`run` supported with either positional arguments or keyword arguments"
|
||||
f" but not both. Got args: {args} and kwargs: {kwargs}."
|
||||
)
|
||||
|
||||
raise ValueError(
|
||||
f"`run` supported with either positional arguments or keyword arguments"
|
||||
f" but not both. Got args: {args} and kwargs: {kwargs}."
|
||||
)
|
||||
|
||||
async def arun(
|
||||
self,
|
||||
*args: Any,
|
||||
callbacks: Callbacks = None,
|
||||
tags: Optional[List[str]] = None,
|
||||
metadata: Optional[Dict[str, Any]] = None,
|
||||
**kwargs: Any,
|
||||
) -> str:
|
||||
"""Convenience method for executing chain when there's a single string output.
|
||||
|
||||
The main difference between this method and `Chain.__call__` is that this method
|
||||
can only be used for chains that return a single string output. If a Chain
|
||||
has more outputs, a non-string output, or you want to return the inputs/run
|
||||
info along with the outputs, use `Chain.__call__`.
|
||||
|
||||
The other difference is that this method expects inputs to be passed directly in
|
||||
as positional arguments or keyword arguments, whereas `Chain.__call__` expects
|
||||
a single input dictionary with all the inputs.
|
||||
|
||||
Args:
|
||||
*args: If the chain expects a single input, it can be passed in as the
|
||||
sole positional argument.
|
||||
callbacks: Callbacks to use for this chain run. These will be called in
|
||||
addition to callbacks passed to the chain during construction, but only
|
||||
these runtime callbacks will propagate to calls to other objects.
|
||||
tags: List of string tags to pass to all callbacks. These will be passed in
|
||||
addition to tags passed to the chain during construction, but only
|
||||
these runtime tags will propagate to calls to other objects.
|
||||
**kwargs: If the chain expects multiple inputs, they can be passed in
|
||||
directly as keyword arguments.
|
||||
|
||||
Returns:
|
||||
The chain output as a string.
|
||||
|
||||
Example:
|
||||
.. code-block:: python
|
||||
|
||||
# Suppose we have a single-input chain that takes a 'question' string:
|
||||
await chain.arun("What's the temperature in Boise, Idaho?")
|
||||
# -> "The temperature in Boise is..."
|
||||
|
||||
# Suppose we have a multi-input chain that takes a 'question' string
|
||||
# and 'context' string:
|
||||
question = "What's the temperature in Boise, Idaho?"
|
||||
context = "Weather report for Boise, Idaho on 07/03/23..."
|
||||
await chain.arun(question=question, context=context)
|
||||
# -> "The temperature in Boise is..."
|
||||
"""
|
||||
"""Run the chain as text in, text out or multiple variables, text out."""
|
||||
if len(self.output_keys) != 1:
|
||||
raise ValueError(
|
||||
f"`run` not supported when there is not exactly "
|
||||
f"one output key. Got {self.output_keys}."
|
||||
)
|
||||
elif args and not kwargs:
|
||||
|
||||
if args and not kwargs:
|
||||
if len(args) != 1:
|
||||
raise ValueError("`run` supports only one positional argument.")
|
||||
return (
|
||||
await self.acall(
|
||||
args[0], callbacks=callbacks, tags=tags, metadata=metadata
|
||||
)
|
||||
)[self.output_keys[0]]
|
||||
return (await self.acall(args[0], callbacks=callbacks, tags=tags))[
|
||||
self.output_keys[0]
|
||||
]
|
||||
|
||||
if kwargs and not args:
|
||||
return (
|
||||
await self.acall(
|
||||
kwargs, callbacks=callbacks, tags=tags, metadata=metadata
|
||||
)
|
||||
)[self.output_keys[0]]
|
||||
return (await self.acall(kwargs, callbacks=callbacks, tags=tags))[
|
||||
self.output_keys[0]
|
||||
]
|
||||
|
||||
raise ValueError(
|
||||
f"`run` supported with either positional arguments or keyword arguments"
|
||||
@@ -532,43 +335,23 @@ class Chain(Serializable, ABC):
|
||||
)
|
||||
|
||||
def dict(self, **kwargs: Any) -> Dict:
|
||||
"""Return dictionary representation of chain.
|
||||
|
||||
Expects `Chain._chain_type` property to be implemented and for memory to be
|
||||
null.
|
||||
|
||||
Args:
|
||||
**kwargs: Keyword arguments passed to default `pydantic.BaseModel.dict`
|
||||
method.
|
||||
|
||||
Returns:
|
||||
A dictionary representation of the chain.
|
||||
|
||||
Example:
|
||||
..code-block:: python
|
||||
|
||||
chain.dict(exclude_unset=True)
|
||||
# -> {"_type": "foo", "verbose": False, ...}
|
||||
"""
|
||||
"""Return dictionary representation of chain."""
|
||||
if self.memory is not None:
|
||||
raise ValueError("Saving of memory is not yet supported.")
|
||||
_dict = super().dict(**kwargs)
|
||||
_dict = super().dict()
|
||||
_dict["_type"] = self._chain_type
|
||||
return _dict
|
||||
|
||||
def save(self, file_path: Union[Path, str]) -> None:
|
||||
"""Save the chain.
|
||||
|
||||
Expects `Chain._chain_type` property to be implemented and for memory to be
|
||||
null.
|
||||
|
||||
Args:
|
||||
file_path: Path to file to save the chain to.
|
||||
|
||||
Example:
|
||||
.. code-block:: python
|
||||
.. code-block:: python
|
||||
|
||||
chain.save(file_path="path/chain.yaml")
|
||||
chain.save(file_path="path/chain.yaml")
|
||||
"""
|
||||
# Convert file to Path object.
|
||||
if isinstance(file_path, str):
|
||||
@@ -590,9 +373,3 @@ class Chain(Serializable, ABC):
|
||||
yaml.dump(chain_dict, f, default_flow_style=False)
|
||||
else:
|
||||
raise ValueError(f"{save_path} must be json or yaml")
|
||||
|
||||
def apply(
|
||||
self, input_list: List[Dict[str, Any]], callbacks: Callbacks = None
|
||||
) -> List[Dict[str, str]]:
|
||||
"""Call the chain on all inputs in the list."""
|
||||
return [self(inputs, callbacks=callbacks) for inputs in input_list]
|
||||
|
||||
@@ -11,20 +11,30 @@ from langchain.callbacks.manager import (
|
||||
)
|
||||
from langchain.chains.base import Chain
|
||||
from langchain.docstore.document import Document
|
||||
from langchain.schema import BasePromptTemplate
|
||||
from langchain.text_splitter import RecursiveCharacterTextSplitter, TextSplitter
|
||||
|
||||
|
||||
class BaseCombineDocumentsChain(Chain, ABC):
|
||||
"""Base interface for chains combining documents.
|
||||
def format_document(doc: Document, prompt: BasePromptTemplate) -> str:
|
||||
"""Format a document into a string based on a prompt template."""
|
||||
base_info = {"page_content": doc.page_content}
|
||||
base_info.update(doc.metadata)
|
||||
missing_metadata = set(prompt.input_variables).difference(base_info)
|
||||
if len(missing_metadata) > 0:
|
||||
required_metadata = [
|
||||
iv for iv in prompt.input_variables if iv != "page_content"
|
||||
]
|
||||
raise ValueError(
|
||||
f"Document prompt requires documents to have metadata variables: "
|
||||
f"{required_metadata}. Received document with missing metadata: "
|
||||
f"{list(missing_metadata)}."
|
||||
)
|
||||
document_info = {k: base_info[k] for k in prompt.input_variables}
|
||||
return prompt.format(**document_info)
|
||||
|
||||
Subclasses of this chain deal with combining documents in a variety of
|
||||
ways. This base class exists to add some uniformity in the interface these types
|
||||
of chains should expose. Namely, they expect an input key related to the documents
|
||||
to use (default `input_documents`), and then also expose a method to calculate
|
||||
the length of a prompt from documents (useful for outside callers to use to
|
||||
determine whether it's safe to pass a list of documents into this chain or whether
|
||||
that will longer than the context length).
|
||||
"""
|
||||
|
||||
class BaseCombineDocumentsChain(Chain, ABC):
|
||||
"""Base interface for chains combining documents."""
|
||||
|
||||
input_key: str = "input_documents" #: :meta private:
|
||||
output_key: str = "output_text" #: :meta private:
|
||||
@@ -48,57 +58,25 @@ class BaseCombineDocumentsChain(Chain, ABC):
|
||||
def prompt_length(self, docs: List[Document], **kwargs: Any) -> Optional[int]:
|
||||
"""Return the prompt length given the documents passed in.
|
||||
|
||||
This can be used by a caller to determine whether passing in a list
|
||||
of documents would exceed a certain prompt length. This useful when
|
||||
trying to ensure that the size of a prompt remains below a certain
|
||||
context limit.
|
||||
|
||||
Args:
|
||||
docs: List[Document], a list of documents to use to calculate the
|
||||
total prompt length.
|
||||
|
||||
Returns:
|
||||
Returns None if the method does not depend on the prompt length,
|
||||
otherwise the length of the prompt in tokens.
|
||||
Returns None if the method does not depend on the prompt length.
|
||||
"""
|
||||
return None
|
||||
|
||||
@abstractmethod
|
||||
def combine_docs(self, docs: List[Document], **kwargs: Any) -> Tuple[str, dict]:
|
||||
"""Combine documents into a single string.
|
||||
|
||||
Args:
|
||||
docs: List[Document], the documents to combine
|
||||
**kwargs: Other parameters to use in combining documents, often
|
||||
other inputs to the prompt.
|
||||
|
||||
Returns:
|
||||
The first element returned is the single string output. The second
|
||||
element returned is a dictionary of other keys to return.
|
||||
"""
|
||||
"""Combine documents into a single string."""
|
||||
|
||||
@abstractmethod
|
||||
async def acombine_docs(
|
||||
self, docs: List[Document], **kwargs: Any
|
||||
) -> Tuple[str, dict]:
|
||||
"""Combine documents into a single string.
|
||||
|
||||
Args:
|
||||
docs: List[Document], the documents to combine
|
||||
**kwargs: Other parameters to use in combining documents, often
|
||||
other inputs to the prompt.
|
||||
|
||||
Returns:
|
||||
The first element returned is the single string output. The second
|
||||
element returned is a dictionary of other keys to return.
|
||||
"""
|
||||
"""Combine documents into a single string asynchronously."""
|
||||
|
||||
def _call(
|
||||
self,
|
||||
inputs: Dict[str, List[Document]],
|
||||
run_manager: Optional[CallbackManagerForChainRun] = None,
|
||||
) -> Dict[str, str]:
|
||||
"""Prepare inputs, call combine docs, prepare outputs."""
|
||||
_run_manager = run_manager or CallbackManagerForChainRun.get_noop_manager()
|
||||
docs = inputs[self.input_key]
|
||||
# Other keys are assumed to be needed for LLM prediction
|
||||
@@ -114,7 +92,6 @@ class BaseCombineDocumentsChain(Chain, ABC):
|
||||
inputs: Dict[str, List[Document]],
|
||||
run_manager: Optional[AsyncCallbackManagerForChainRun] = None,
|
||||
) -> Dict[str, str]:
|
||||
"""Prepare inputs, call combine docs, prepare outputs."""
|
||||
_run_manager = run_manager or AsyncCallbackManagerForChainRun.get_noop_manager()
|
||||
docs = inputs[self.input_key]
|
||||
# Other keys are assumed to be needed for LLM prediction
|
||||
@@ -127,12 +104,7 @@ class BaseCombineDocumentsChain(Chain, ABC):
|
||||
|
||||
|
||||
class AnalyzeDocumentChain(Chain):
|
||||
"""Chain that splits documents, then analyzes it in pieces.
|
||||
|
||||
This chain is parameterized by a TextSplitter and a CombineDocumentsChain.
|
||||
This chain takes a single document as input, and then splits it up into chunks
|
||||
and then passes those chucks to the CombineDocumentsChain.
|
||||
"""
|
||||
"""Chain that splits documents, then analyzes it in pieces."""
|
||||
|
||||
input_key: str = "input_document" #: :meta private:
|
||||
text_splitter: TextSplitter = Field(default_factory=RecursiveCharacterTextSplitter)
|
||||
@@ -159,7 +131,6 @@ class AnalyzeDocumentChain(Chain):
|
||||
inputs: Dict[str, str],
|
||||
run_manager: Optional[CallbackManagerForChainRun] = None,
|
||||
) -> Dict[str, str]:
|
||||
"""Split document into chunks and pass to CombineDocumentsChain."""
|
||||
_run_manager = run_manager or CallbackManagerForChainRun.get_noop_manager()
|
||||
document = inputs[self.input_key]
|
||||
docs = self.text_splitter.create_documents([document])
|
||||
|
||||
@@ -2,97 +2,74 @@
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from typing import Any, Dict, List, Optional, Tuple
|
||||
from typing import Any, Callable, Dict, List, Optional, Protocol, Tuple
|
||||
|
||||
from pydantic import Extra, root_validator
|
||||
|
||||
from langchain.callbacks.manager import Callbacks
|
||||
from langchain.chains.combine_documents.base import BaseCombineDocumentsChain
|
||||
from langchain.chains.combine_documents.reduce import ReduceDocumentsChain
|
||||
from langchain.chains.llm import LLMChain
|
||||
from langchain.docstore.document import Document
|
||||
|
||||
|
||||
class CombineDocsProtocol(Protocol):
|
||||
"""Interface for the combine_docs method."""
|
||||
|
||||
def __call__(self, docs: List[Document], **kwargs: Any) -> str:
|
||||
"""Interface for the combine_docs method."""
|
||||
|
||||
|
||||
def _split_list_of_docs(
|
||||
docs: List[Document], length_func: Callable, token_max: int, **kwargs: Any
|
||||
) -> List[List[Document]]:
|
||||
new_result_doc_list = []
|
||||
_sub_result_docs = []
|
||||
for doc in docs:
|
||||
_sub_result_docs.append(doc)
|
||||
_num_tokens = length_func(_sub_result_docs, **kwargs)
|
||||
if _num_tokens > token_max:
|
||||
if len(_sub_result_docs) == 1:
|
||||
raise ValueError(
|
||||
"A single document was longer than the context length,"
|
||||
" we cannot handle this."
|
||||
)
|
||||
if len(_sub_result_docs) == 2:
|
||||
raise ValueError(
|
||||
"A single document was so long it could not be combined "
|
||||
"with another document, we cannot handle this."
|
||||
)
|
||||
new_result_doc_list.append(_sub_result_docs[:-1])
|
||||
_sub_result_docs = _sub_result_docs[-1:]
|
||||
new_result_doc_list.append(_sub_result_docs)
|
||||
return new_result_doc_list
|
||||
|
||||
|
||||
def _collapse_docs(
|
||||
docs: List[Document],
|
||||
combine_document_func: CombineDocsProtocol,
|
||||
**kwargs: Any,
|
||||
) -> Document:
|
||||
result = combine_document_func(docs, **kwargs)
|
||||
combined_metadata = {k: str(v) for k, v in docs[0].metadata.items()}
|
||||
for doc in docs[1:]:
|
||||
for k, v in doc.metadata.items():
|
||||
if k in combined_metadata:
|
||||
combined_metadata[k] += f", {v}"
|
||||
else:
|
||||
combined_metadata[k] = str(v)
|
||||
return Document(page_content=result, metadata=combined_metadata)
|
||||
|
||||
|
||||
class MapReduceDocumentsChain(BaseCombineDocumentsChain):
|
||||
"""Combining documents by mapping a chain over them, then combining results.
|
||||
|
||||
We first call `llm_chain` on each document individually, passing in the
|
||||
`page_content` and any other kwargs. This is the `map` step.
|
||||
|
||||
We then process the results of that `map` step in a `reduce` step. This should
|
||||
likely be a ReduceDocumentsChain.
|
||||
|
||||
Example:
|
||||
.. code-block:: python
|
||||
|
||||
from langchain.chains import (
|
||||
StuffDocumentsChain,
|
||||
LLMChain,
|
||||
ReduceDocumentsChain,
|
||||
MapReduceDocumentsChain,
|
||||
)
|
||||
from langchain.prompts import PromptTemplate
|
||||
from langchain.llms import OpenAI
|
||||
|
||||
# This controls how each document will be formatted. Specifically,
|
||||
# it will be passed to `format_document` - see that function for more
|
||||
# details.
|
||||
document_prompt = PromptTemplate(
|
||||
input_variables=["page_content"],
|
||||
template="{page_content}"
|
||||
)
|
||||
document_variable_name = "context"
|
||||
llm = OpenAI()
|
||||
# The prompt here should take as an input variable the
|
||||
# `document_variable_name`
|
||||
prompt = PromptTemplate.from_template(
|
||||
"Summarize this content: {context}"
|
||||
)
|
||||
llm_chain = LLMChain(llm=llm, prompt=prompt)
|
||||
# We now define how to combine these summaries
|
||||
reduce_prompt = PromptTemplate.from_template(
|
||||
"Combine these summaries: {context}"
|
||||
)
|
||||
reduce_llm_chain = LLMChain(llm=llm, prompt=reduce_prompt)
|
||||
combine_documents_chain = StuffDocumentsChain(
|
||||
llm_chain=reduce_llm_chain,
|
||||
document_prompt=document_prompt,
|
||||
document_variable_name=document_variable_name
|
||||
)
|
||||
reduce_documents_chain = ReduceDocumentsChain(
|
||||
combine_documents_chain=combine_documents_chain,
|
||||
)
|
||||
chain = MapReduceDocumentsChain(
|
||||
llm_chain=llm_chain,
|
||||
reduce_documents_chain=reduce_documents_chain,
|
||||
)
|
||||
# If we wanted to, we could also pass in collapse_documents_chain
|
||||
# which is specifically aimed at collapsing documents BEFORE
|
||||
# the final call.
|
||||
prompt = PromptTemplate.from_template(
|
||||
"Collapse this content: {context}"
|
||||
)
|
||||
llm_chain = LLMChain(llm=llm, prompt=prompt)
|
||||
collapse_documents_chain = StuffDocumentsChain(
|
||||
llm_chain=llm_chain,
|
||||
document_prompt=document_prompt,
|
||||
document_variable_name=document_variable_name
|
||||
)
|
||||
reduce_documents_chain = ReduceDocumentsChain(
|
||||
combine_documents_chain=combine_documents_chain,
|
||||
collapse_documents_chain=collapse_documents_chain,
|
||||
)
|
||||
chain = MapReduceDocumentsChain(
|
||||
llm_chain=llm_chain,
|
||||
reduce_documents_chain=reduce_documents_chain,
|
||||
)
|
||||
"""
|
||||
"""Combining documents by mapping a chain over them, then combining results."""
|
||||
|
||||
llm_chain: LLMChain
|
||||
"""Chain to apply to each document individually."""
|
||||
reduce_documents_chain: BaseCombineDocumentsChain
|
||||
"""Chain to use to reduce the results of applying `llm_chain` to each doc.
|
||||
This typically either a ReduceDocumentChain or StuffDocumentChain."""
|
||||
combine_document_chain: BaseCombineDocumentsChain
|
||||
"""Chain to use to combine results of applying llm_chain to documents."""
|
||||
collapse_document_chain: Optional[BaseCombineDocumentsChain] = None
|
||||
"""Chain to use to collapse intermediary results if needed.
|
||||
If None, will use the combine_document_chain."""
|
||||
document_variable_name: str
|
||||
"""The variable name in the llm_chain to put the documents in.
|
||||
If only one variable in the llm_chain, this need not be provided."""
|
||||
@@ -116,29 +93,6 @@ class MapReduceDocumentsChain(BaseCombineDocumentsChain):
|
||||
extra = Extra.forbid
|
||||
arbitrary_types_allowed = True
|
||||
|
||||
@root_validator(pre=True)
|
||||
def get_reduce_chain(cls, values: Dict) -> Dict:
|
||||
"""For backwards compatibility."""
|
||||
if "combine_document_chain" in values:
|
||||
if "reduce_documents_chain" in values:
|
||||
raise ValueError(
|
||||
"Both `reduce_documents_chain` and `combine_document_chain` "
|
||||
"cannot be provided at the same time. `combine_document_chain` "
|
||||
"is deprecated, please only provide `reduce_documents_chain`"
|
||||
)
|
||||
combine_chain = values["combine_document_chain"]
|
||||
collapse_chain = values.get("collapse_document_chain")
|
||||
reduce_chain = ReduceDocumentsChain(
|
||||
combine_documents_chain=combine_chain,
|
||||
collapse_documents_chain=collapse_chain,
|
||||
)
|
||||
values["reduce_documents_chain"] = reduce_chain
|
||||
del values["combine_document_chain"]
|
||||
if "collapse_document_chain" in values:
|
||||
del values["collapse_document_chain"]
|
||||
|
||||
return values
|
||||
|
||||
@root_validator(pre=True)
|
||||
def get_return_intermediate_steps(cls, values: Dict) -> Dict:
|
||||
"""For backwards compatibility."""
|
||||
@@ -169,36 +123,16 @@ class MapReduceDocumentsChain(BaseCombineDocumentsChain):
|
||||
return values
|
||||
|
||||
@property
|
||||
def collapse_document_chain(self) -> BaseCombineDocumentsChain:
|
||||
"""Kept for backward compatibility."""
|
||||
if isinstance(self.reduce_documents_chain, ReduceDocumentsChain):
|
||||
if self.reduce_documents_chain.collapse_documents_chain:
|
||||
return self.reduce_documents_chain.collapse_documents_chain
|
||||
else:
|
||||
return self.reduce_documents_chain.combine_documents_chain
|
||||
def _collapse_chain(self) -> BaseCombineDocumentsChain:
|
||||
if self.collapse_document_chain is not None:
|
||||
return self.collapse_document_chain
|
||||
else:
|
||||
raise ValueError(
|
||||
f"`reduce_documents_chain` is of type "
|
||||
f"{type(self.reduce_documents_chain)} so it does not have "
|
||||
f"this attribute."
|
||||
)
|
||||
|
||||
@property
|
||||
def combine_document_chain(self) -> BaseCombineDocumentsChain:
|
||||
"""Kept for backward compatibility."""
|
||||
if isinstance(self.reduce_documents_chain, ReduceDocumentsChain):
|
||||
return self.reduce_documents_chain.combine_documents_chain
|
||||
else:
|
||||
raise ValueError(
|
||||
f"`reduce_documents_chain` is of type "
|
||||
f"{type(self.reduce_documents_chain)} so it does not have "
|
||||
f"this attribute."
|
||||
)
|
||||
return self.combine_document_chain
|
||||
|
||||
def combine_docs(
|
||||
self,
|
||||
docs: List[Document],
|
||||
token_max: Optional[int] = None,
|
||||
token_max: int = 3000,
|
||||
callbacks: Callbacks = None,
|
||||
**kwargs: Any,
|
||||
) -> Tuple[str, dict]:
|
||||
@@ -207,55 +141,100 @@ class MapReduceDocumentsChain(BaseCombineDocumentsChain):
|
||||
Combine by mapping first chain over all documents, then reducing the results.
|
||||
This reducing can be done recursively if needed (if there are many documents).
|
||||
"""
|
||||
map_results = self.llm_chain.apply(
|
||||
results = self.llm_chain.apply(
|
||||
# FYI - this is parallelized and so it is fast.
|
||||
[{self.document_variable_name: d.page_content, **kwargs} for d in docs],
|
||||
callbacks=callbacks,
|
||||
)
|
||||
question_result_key = self.llm_chain.output_key
|
||||
result_docs = [
|
||||
Document(page_content=r[question_result_key], metadata=docs[i].metadata)
|
||||
# This uses metadata from the docs, and the textual results from `results`
|
||||
for i, r in enumerate(map_results)
|
||||
]
|
||||
result, extra_return_dict = self.reduce_documents_chain.combine_docs(
|
||||
result_docs, token_max=token_max, callbacks=callbacks, **kwargs
|
||||
return self._process_results(
|
||||
results, docs, token_max, callbacks=callbacks, **kwargs
|
||||
)
|
||||
if self.return_intermediate_steps:
|
||||
intermediate_steps = [r[question_result_key] for r in map_results]
|
||||
extra_return_dict["intermediate_steps"] = intermediate_steps
|
||||
return result, extra_return_dict
|
||||
|
||||
async def acombine_docs(
|
||||
self,
|
||||
docs: List[Document],
|
||||
token_max: Optional[int] = None,
|
||||
callbacks: Callbacks = None,
|
||||
**kwargs: Any,
|
||||
self, docs: List[Document], callbacks: Callbacks = None, **kwargs: Any
|
||||
) -> Tuple[str, dict]:
|
||||
"""Combine documents in a map reduce manner.
|
||||
|
||||
Combine by mapping first chain over all documents, then reducing the results.
|
||||
This reducing can be done recursively if needed (if there are many documents).
|
||||
"""
|
||||
map_results = await self.llm_chain.aapply(
|
||||
results = await self.llm_chain.aapply(
|
||||
# FYI - this is parallelized and so it is fast.
|
||||
[{**{self.document_variable_name: d.page_content}, **kwargs} for d in docs],
|
||||
callbacks=callbacks,
|
||||
)
|
||||
return await self._aprocess_results(
|
||||
results, docs, callbacks=callbacks, **kwargs
|
||||
)
|
||||
|
||||
def _process_results_common(
|
||||
self,
|
||||
results: List[Dict],
|
||||
docs: List[Document],
|
||||
token_max: int = 3000,
|
||||
callbacks: Callbacks = None,
|
||||
**kwargs: Any,
|
||||
) -> Tuple[List[Document], dict]:
|
||||
question_result_key = self.llm_chain.output_key
|
||||
result_docs = [
|
||||
Document(page_content=r[question_result_key], metadata=docs[i].metadata)
|
||||
# This uses metadata from the docs, and the textual results from `results`
|
||||
for i, r in enumerate(map_results)
|
||||
for i, r in enumerate(results)
|
||||
]
|
||||
result, extra_return_dict = await self.reduce_documents_chain.acombine_docs(
|
||||
result_docs, token_max=token_max, callbacks=callbacks, **kwargs
|
||||
)
|
||||
length_func = self.combine_document_chain.prompt_length
|
||||
num_tokens = length_func(result_docs, **kwargs)
|
||||
|
||||
def _collapse_docs_func(docs: List[Document], **kwargs: Any) -> str:
|
||||
return self._collapse_chain.run(
|
||||
input_documents=docs, callbacks=callbacks, **kwargs
|
||||
)
|
||||
|
||||
while num_tokens is not None and num_tokens > token_max:
|
||||
new_result_doc_list = _split_list_of_docs(
|
||||
result_docs, length_func, token_max, **kwargs
|
||||
)
|
||||
result_docs = []
|
||||
for docs in new_result_doc_list:
|
||||
new_doc = _collapse_docs(docs, _collapse_docs_func, **kwargs)
|
||||
result_docs.append(new_doc)
|
||||
num_tokens = length_func(result_docs, **kwargs)
|
||||
if self.return_intermediate_steps:
|
||||
intermediate_steps = [r[question_result_key] for r in map_results]
|
||||
extra_return_dict["intermediate_steps"] = intermediate_steps
|
||||
return result, extra_return_dict
|
||||
_results = [r[self.llm_chain.output_key] for r in results]
|
||||
extra_return_dict = {"intermediate_steps": _results}
|
||||
else:
|
||||
extra_return_dict = {}
|
||||
return result_docs, extra_return_dict
|
||||
|
||||
def _process_results(
|
||||
self,
|
||||
results: List[Dict],
|
||||
docs: List[Document],
|
||||
token_max: int = 3000,
|
||||
callbacks: Callbacks = None,
|
||||
**kwargs: Any,
|
||||
) -> Tuple[str, dict]:
|
||||
result_docs, extra_return_dict = self._process_results_common(
|
||||
results, docs, token_max, callbacks=callbacks, **kwargs
|
||||
)
|
||||
output = self.combine_document_chain.run(
|
||||
input_documents=result_docs, callbacks=callbacks, **kwargs
|
||||
)
|
||||
return output, extra_return_dict
|
||||
|
||||
async def _aprocess_results(
|
||||
self,
|
||||
results: List[Dict],
|
||||
docs: List[Document],
|
||||
callbacks: Callbacks = None,
|
||||
**kwargs: Any,
|
||||
) -> Tuple[str, dict]:
|
||||
result_docs, extra_return_dict = self._process_results_common(
|
||||
results, docs, callbacks=callbacks, **kwargs
|
||||
)
|
||||
output = await self.combine_document_chain.arun(
|
||||
input_documents=result_docs, callbacks=callbacks, **kwargs
|
||||
)
|
||||
return output, extra_return_dict
|
||||
|
||||
@property
|
||||
def _chain_type(self) -> str:
|
||||
|
||||
@@ -14,48 +14,7 @@ from langchain.output_parsers.regex import RegexParser
|
||||
|
||||
|
||||
class MapRerankDocumentsChain(BaseCombineDocumentsChain):
|
||||
"""Combining documents by mapping a chain over them, then reranking results.
|
||||
|
||||
This algorithm calls an LLMChain on each input document. The LLMChain is expected
|
||||
to have an OutputParser that parses the result into both an answer (`answer_key`)
|
||||
and a score (`rank_key`). The answer with the highest score is then returned.
|
||||
|
||||
Example:
|
||||
.. code-block:: python
|
||||
|
||||
from langchain.chains import StuffDocumentsChain, LLMChain
|
||||
from langchain.prompts import PromptTemplate
|
||||
from langchain.llms import OpenAI
|
||||
from langchain.output_parsers.regex import RegexParser
|
||||
|
||||
document_variable_name = "context"
|
||||
llm = OpenAI()
|
||||
# The prompt here should take as an input variable the
|
||||
# `document_variable_name`
|
||||
# The actual prompt will need to be a lot more complex, this is just
|
||||
# an example.
|
||||
prompt_template = (
|
||||
"Use the following context to tell me the chemical formula "
|
||||
"for water. Output both your answer and a score of how confident "
|
||||
"you are. Context: {content}"
|
||||
)
|
||||
output_parser = RegexParser(
|
||||
regex=r"(.*?)\nScore: (.*)",
|
||||
output_keys=["answer", "score"],
|
||||
)
|
||||
prompt = PromptTemplate(
|
||||
template=prompt_template,
|
||||
input_variables=["context"],
|
||||
output_parser=output_parser,
|
||||
)
|
||||
llm_chain = LLMChain(llm=llm, prompt=prompt)
|
||||
chain = MapRerankDocumentsChain(
|
||||
llm_chain=llm_chain,
|
||||
document_variable_name=document_variable_name,
|
||||
rank_key="score",
|
||||
answer_key="answer",
|
||||
)
|
||||
"""
|
||||
"""Combining documents by mapping a chain over them, then reranking results."""
|
||||
|
||||
llm_chain: LLMChain
|
||||
"""Chain to apply to each document individually."""
|
||||
@@ -67,10 +26,7 @@ class MapRerankDocumentsChain(BaseCombineDocumentsChain):
|
||||
answer_key: str
|
||||
"""Key in output of llm_chain to return as answer."""
|
||||
metadata_keys: Optional[List[str]] = None
|
||||
"""Additional metadata from the chosen document to return."""
|
||||
return_intermediate_steps: bool = False
|
||||
"""Return intermediate steps.
|
||||
Intermediate steps include the results of calling llm_chain on each document."""
|
||||
|
||||
class Config:
|
||||
"""Configuration for this pydantic object."""
|
||||
@@ -140,16 +96,6 @@ class MapRerankDocumentsChain(BaseCombineDocumentsChain):
|
||||
"""Combine documents in a map rerank manner.
|
||||
|
||||
Combine by mapping first chain over all documents, then reranking the results.
|
||||
|
||||
Args:
|
||||
docs: List of documents to combine
|
||||
callbacks: Callbacks to be passed through
|
||||
**kwargs: additional parameters to be passed to LLM calls (like other
|
||||
input variables besides the documents)
|
||||
|
||||
Returns:
|
||||
The first element returned is the single string output. The second
|
||||
element returned is a dictionary of other keys to return.
|
||||
"""
|
||||
results = self.llm_chain.apply_and_parse(
|
||||
# FYI - this is parallelized and so it is fast.
|
||||
@@ -164,16 +110,6 @@ class MapRerankDocumentsChain(BaseCombineDocumentsChain):
|
||||
"""Combine documents in a map rerank manner.
|
||||
|
||||
Combine by mapping first chain over all documents, then reranking the results.
|
||||
|
||||
Args:
|
||||
docs: List of documents to combine
|
||||
callbacks: Callbacks to be passed through
|
||||
**kwargs: additional parameters to be passed to LLM calls (like other
|
||||
input variables besides the documents)
|
||||
|
||||
Returns:
|
||||
The first element returned is the single string output. The second
|
||||
element returned is a dictionary of other keys to return.
|
||||
"""
|
||||
results = await self.llm_chain.aapply_and_parse(
|
||||
# FYI - this is parallelized and so it is fast.
|
||||
|
||||
@@ -1,289 +0,0 @@
|
||||
"""Combine many documents together by recursively reducing them."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from typing import Any, Callable, List, Optional, Protocol, Tuple
|
||||
|
||||
from pydantic import Extra
|
||||
|
||||
from langchain.callbacks.manager import Callbacks
|
||||
from langchain.chains.combine_documents.base import BaseCombineDocumentsChain
|
||||
from langchain.docstore.document import Document
|
||||
|
||||
|
||||
class CombineDocsProtocol(Protocol):
|
||||
"""Interface for the combine_docs method."""
|
||||
|
||||
def __call__(self, docs: List[Document], **kwargs: Any) -> str:
|
||||
"""Interface for the combine_docs method."""
|
||||
|
||||
|
||||
class AsyncCombineDocsProtocol(Protocol):
|
||||
"""Interface for the combine_docs method."""
|
||||
|
||||
async def __call__(self, docs: List[Document], **kwargs: Any) -> str:
|
||||
"""Async nterface for the combine_docs method."""
|
||||
|
||||
|
||||
def _split_list_of_docs(
|
||||
docs: List[Document], length_func: Callable, token_max: int, **kwargs: Any
|
||||
) -> List[List[Document]]:
|
||||
new_result_doc_list = []
|
||||
_sub_result_docs = []
|
||||
for doc in docs:
|
||||
_sub_result_docs.append(doc)
|
||||
_num_tokens = length_func(_sub_result_docs, **kwargs)
|
||||
if _num_tokens > token_max:
|
||||
if len(_sub_result_docs) == 1:
|
||||
raise ValueError(
|
||||
"A single document was longer than the context length,"
|
||||
" we cannot handle this."
|
||||
)
|
||||
new_result_doc_list.append(_sub_result_docs[:-1])
|
||||
_sub_result_docs = _sub_result_docs[-1:]
|
||||
new_result_doc_list.append(_sub_result_docs)
|
||||
return new_result_doc_list
|
||||
|
||||
|
||||
def _collapse_docs(
|
||||
docs: List[Document],
|
||||
combine_document_func: CombineDocsProtocol,
|
||||
**kwargs: Any,
|
||||
) -> Document:
|
||||
result = combine_document_func(docs, **kwargs)
|
||||
combined_metadata = {k: str(v) for k, v in docs[0].metadata.items()}
|
||||
for doc in docs[1:]:
|
||||
for k, v in doc.metadata.items():
|
||||
if k in combined_metadata:
|
||||
combined_metadata[k] += f", {v}"
|
||||
else:
|
||||
combined_metadata[k] = str(v)
|
||||
return Document(page_content=result, metadata=combined_metadata)
|
||||
|
||||
|
||||
async def _acollapse_docs(
|
||||
docs: List[Document],
|
||||
combine_document_func: AsyncCombineDocsProtocol,
|
||||
**kwargs: Any,
|
||||
) -> Document:
|
||||
result = await combine_document_func(docs, **kwargs)
|
||||
combined_metadata = {k: str(v) for k, v in docs[0].metadata.items()}
|
||||
for doc in docs[1:]:
|
||||
for k, v in doc.metadata.items():
|
||||
if k in combined_metadata:
|
||||
combined_metadata[k] += f", {v}"
|
||||
else:
|
||||
combined_metadata[k] = str(v)
|
||||
return Document(page_content=result, metadata=combined_metadata)
|
||||
|
||||
|
||||
class ReduceDocumentsChain(BaseCombineDocumentsChain):
|
||||
"""Combining documents by recursively reducing them.
|
||||
|
||||
This involves
|
||||
|
||||
- combine_documents_chain
|
||||
|
||||
- collapse_documents_chain
|
||||
|
||||
`combine_documents_chain` is ALWAYS provided. This is final chain that is called.
|
||||
We pass all previous results to this chain, and the output of this chain is
|
||||
returned as a final result.
|
||||
|
||||
`collapse_documents_chain` is used if the documents passed in are too many to all
|
||||
be passed to `combine_documents_chain` in one go. In this case,
|
||||
`collapse_documents_chain` is called recursively on as big of groups of documents
|
||||
as are allowed.
|
||||
|
||||
Example:
|
||||
.. code-block:: python
|
||||
|
||||
from langchain.chains import (
|
||||
StuffDocumentsChain, LLMChain, ReduceDocumentsChain
|
||||
)
|
||||
from langchain.prompts import PromptTemplate
|
||||
from langchain.llms import OpenAI
|
||||
|
||||
# This controls how each document will be formatted. Specifically,
|
||||
# it will be passed to `format_document` - see that function for more
|
||||
# details.
|
||||
document_prompt = PromptTemplate(
|
||||
input_variables=["page_content"],
|
||||
template="{page_content}"
|
||||
)
|
||||
document_variable_name = "context"
|
||||
llm = OpenAI()
|
||||
# The prompt here should take as an input variable the
|
||||
# `document_variable_name`
|
||||
prompt = PromptTemplate.from_template(
|
||||
"Summarize this content: {context}"
|
||||
)
|
||||
llm_chain = LLMChain(llm=llm, prompt=prompt)
|
||||
combine_documents_chain = StuffDocumentsChain(
|
||||
llm_chain=llm_chain,
|
||||
document_prompt=document_prompt,
|
||||
document_variable_name=document_variable_name
|
||||
)
|
||||
chain = ReduceDocumentsChain(
|
||||
combine_documents_chain=combine_documents_chain,
|
||||
)
|
||||
# If we wanted to, we could also pass in collapse_documents_chain
|
||||
# which is specifically aimed at collapsing documents BEFORE
|
||||
# the final call.
|
||||
prompt = PromptTemplate.from_template(
|
||||
"Collapse this content: {context}"
|
||||
)
|
||||
llm_chain = LLMChain(llm=llm, prompt=prompt)
|
||||
collapse_documents_chain = StuffDocumentsChain(
|
||||
llm_chain=llm_chain,
|
||||
document_prompt=document_prompt,
|
||||
document_variable_name=document_variable_name
|
||||
)
|
||||
chain = ReduceDocumentsChain(
|
||||
combine_documents_chain=combine_documents_chain,
|
||||
collapse_documents_chain=collapse_documents_chain,
|
||||
)
|
||||
"""
|
||||
|
||||
combine_documents_chain: BaseCombineDocumentsChain
|
||||
"""Final chain to call to combine documents.
|
||||
This is typically a StuffDocumentsChain."""
|
||||
collapse_documents_chain: Optional[BaseCombineDocumentsChain] = None
|
||||
"""Chain to use to collapse documents if needed until they can all fit.
|
||||
If None, will use the combine_documents_chain.
|
||||
This is typically a StuffDocumentsChain."""
|
||||
token_max: int = 3000
|
||||
"""The maximum number of tokens to group documents into. For example, if
|
||||
set to 3000 then documents will be grouped into chunks of no greater than
|
||||
3000 tokens before trying to combine them into a smaller chunk."""
|
||||
|
||||
class Config:
|
||||
"""Configuration for this pydantic object."""
|
||||
|
||||
extra = Extra.forbid
|
||||
arbitrary_types_allowed = True
|
||||
|
||||
@property
|
||||
def _collapse_chain(self) -> BaseCombineDocumentsChain:
|
||||
if self.collapse_documents_chain is not None:
|
||||
return self.collapse_documents_chain
|
||||
else:
|
||||
return self.combine_documents_chain
|
||||
|
||||
def combine_docs(
|
||||
self,
|
||||
docs: List[Document],
|
||||
token_max: Optional[int] = None,
|
||||
callbacks: Callbacks = None,
|
||||
**kwargs: Any,
|
||||
) -> Tuple[str, dict]:
|
||||
"""Combine multiple documents recursively.
|
||||
|
||||
Args:
|
||||
docs: List of documents to combine, assumed that each one is less than
|
||||
`token_max`.
|
||||
token_max: Recursively creates groups of documents less than this number
|
||||
of tokens.
|
||||
callbacks: Callbacks to be passed through
|
||||
**kwargs: additional parameters to be passed to LLM calls (like other
|
||||
input variables besides the documents)
|
||||
|
||||
Returns:
|
||||
The first element returned is the single string output. The second
|
||||
element returned is a dictionary of other keys to return.
|
||||
"""
|
||||
result_docs, extra_return_dict = self._collapse(
|
||||
docs, token_max=token_max, callbacks=callbacks, **kwargs
|
||||
)
|
||||
return self.combine_documents_chain.combine_docs(
|
||||
docs=result_docs, callbacks=callbacks, **kwargs
|
||||
)
|
||||
|
||||
async def acombine_docs(
|
||||
self,
|
||||
docs: List[Document],
|
||||
token_max: Optional[int] = None,
|
||||
callbacks: Callbacks = None,
|
||||
**kwargs: Any,
|
||||
) -> Tuple[str, dict]:
|
||||
"""Combine multiple documents recursively.
|
||||
|
||||
Args:
|
||||
docs: List of documents to combine, assumed that each one is less than
|
||||
`token_max`.
|
||||
token_max: Recursively creates groups of documents less than this number
|
||||
of tokens.
|
||||
callbacks: Callbacks to be passed through
|
||||
**kwargs: additional parameters to be passed to LLM calls (like other
|
||||
input variables besides the documents)
|
||||
|
||||
Returns:
|
||||
The first element returned is the single string output. The second
|
||||
element returned is a dictionary of other keys to return.
|
||||
"""
|
||||
result_docs, extra_return_dict = await self._acollapse(
|
||||
docs, token_max=token_max, callbacks=callbacks, **kwargs
|
||||
)
|
||||
return await self.combine_documents_chain.acombine_docs(
|
||||
docs=result_docs, callbacks=callbacks, **kwargs
|
||||
)
|
||||
|
||||
def _collapse(
|
||||
self,
|
||||
docs: List[Document],
|
||||
token_max: Optional[int] = None,
|
||||
callbacks: Callbacks = None,
|
||||
**kwargs: Any,
|
||||
) -> Tuple[List[Document], dict]:
|
||||
result_docs = docs
|
||||
length_func = self.combine_documents_chain.prompt_length
|
||||
num_tokens = length_func(result_docs, **kwargs)
|
||||
|
||||
def _collapse_docs_func(docs: List[Document], **kwargs: Any) -> str:
|
||||
return self._collapse_chain.run(
|
||||
input_documents=docs, callbacks=callbacks, **kwargs
|
||||
)
|
||||
|
||||
_token_max = token_max or self.token_max
|
||||
while num_tokens is not None and num_tokens > _token_max:
|
||||
new_result_doc_list = _split_list_of_docs(
|
||||
result_docs, length_func, _token_max, **kwargs
|
||||
)
|
||||
result_docs = []
|
||||
for docs in new_result_doc_list:
|
||||
new_doc = _collapse_docs(docs, _collapse_docs_func, **kwargs)
|
||||
result_docs.append(new_doc)
|
||||
num_tokens = length_func(result_docs, **kwargs)
|
||||
return result_docs, {}
|
||||
|
||||
async def _acollapse(
|
||||
self,
|
||||
docs: List[Document],
|
||||
token_max: Optional[int] = None,
|
||||
callbacks: Callbacks = None,
|
||||
**kwargs: Any,
|
||||
) -> Tuple[List[Document], dict]:
|
||||
result_docs = docs
|
||||
length_func = self.combine_documents_chain.prompt_length
|
||||
num_tokens = length_func(result_docs, **kwargs)
|
||||
|
||||
async def _collapse_docs_func(docs: List[Document], **kwargs: Any) -> str:
|
||||
return await self._collapse_chain.arun(
|
||||
input_documents=docs, callbacks=callbacks, **kwargs
|
||||
)
|
||||
|
||||
_token_max = token_max or self.token_max
|
||||
while num_tokens is not None and num_tokens > _token_max:
|
||||
new_result_doc_list = _split_list_of_docs(
|
||||
result_docs, length_func, _token_max, **kwargs
|
||||
)
|
||||
result_docs = []
|
||||
for docs in new_result_doc_list:
|
||||
new_doc = await _acollapse_docs(docs, _collapse_docs_func, **kwargs)
|
||||
result_docs.append(new_doc)
|
||||
num_tokens = length_func(result_docs, **kwargs)
|
||||
return result_docs, {}
|
||||
|
||||
@property
|
||||
def _chain_type(self) -> str:
|
||||
return "reduce_documents_chain"
|
||||
@@ -9,11 +9,12 @@ from pydantic import Extra, Field, root_validator
|
||||
from langchain.callbacks.manager import Callbacks
|
||||
from langchain.chains.combine_documents.base import (
|
||||
BaseCombineDocumentsChain,
|
||||
format_document,
|
||||
)
|
||||
from langchain.chains.llm import LLMChain
|
||||
from langchain.docstore.document import Document
|
||||
from langchain.prompts.prompt import PromptTemplate
|
||||
from langchain.schema import BasePromptTemplate, format_document
|
||||
from langchain.schema import BasePromptTemplate
|
||||
|
||||
|
||||
def _get_default_document_prompt() -> PromptTemplate:
|
||||
@@ -21,55 +22,7 @@ def _get_default_document_prompt() -> PromptTemplate:
|
||||
|
||||
|
||||
class RefineDocumentsChain(BaseCombineDocumentsChain):
|
||||
"""Combine documents by doing a first pass and then refining on more documents.
|
||||
|
||||
This algorithm first calls `initial_llm_chain` on the first document, passing
|
||||
that first document in with the variable name `document_variable_name`, and
|
||||
produces a new variable with the variable name `initial_response_name`.
|
||||
|
||||
Then, it loops over every remaining document. This is called the "refine" step.
|
||||
It calls `refine_llm_chain`,
|
||||
passing in that document with the variable name `document_variable_name`
|
||||
as well as the previous response with the variable name `initial_response_name`.
|
||||
|
||||
Example:
|
||||
.. code-block:: python
|
||||
|
||||
from langchain.chains import RefineDocumentsChain, LLMChain
|
||||
from langchain.prompts import PromptTemplate
|
||||
from langchain.llms import OpenAI
|
||||
|
||||
# This controls how each document will be formatted. Specifically,
|
||||
# it will be passed to `format_document` - see that function for more
|
||||
# details.
|
||||
document_prompt = PromptTemplate(
|
||||
input_variables=["page_content"],
|
||||
template="{page_content}"
|
||||
)
|
||||
document_variable_name = "context"
|
||||
llm = OpenAI()
|
||||
# The prompt here should take as an input variable the
|
||||
# `document_variable_name`
|
||||
prompt = PromptTemplate.from_template(
|
||||
"Summarize this content: {context}"
|
||||
)
|
||||
llm_chain = LLMChain(llm=llm, prompt=prompt)
|
||||
initial_response_name = "prev_response"
|
||||
# The prompt here should take as an input variable the
|
||||
# `document_variable_name` as well as `initial_response_name`
|
||||
prompt_refine = PromptTemplate.from_template(
|
||||
"Here's your first summary: {prev_response}. "
|
||||
"Now add to it based on the following context: {context}"
|
||||
)
|
||||
llm_chain_refine = LLMChain(llm=llm, prompt=prompt_refine)
|
||||
chain = RefineDocumentsChain(
|
||||
initial_llm_chain=initial_llm_chain,
|
||||
refine_llm_chain=refine_llm_chain,
|
||||
document_prompt=document_prompt,
|
||||
document_variable_name=document_variable_name,
|
||||
initial_response_name=initial_response_name,
|
||||
)
|
||||
"""
|
||||
"""Combine documents by doing a first pass and then refining on more documents."""
|
||||
|
||||
initial_llm_chain: LLMChain
|
||||
"""LLM chain to use on initial document."""
|
||||
@@ -83,7 +36,7 @@ class RefineDocumentsChain(BaseCombineDocumentsChain):
|
||||
document_prompt: BasePromptTemplate = Field(
|
||||
default_factory=_get_default_document_prompt
|
||||
)
|
||||
"""Prompt to use to format each document, gets passed to `format_document`."""
|
||||
"""Prompt to use to format each document."""
|
||||
return_intermediate_steps: bool = False
|
||||
"""Return the results of the refine steps in the output."""
|
||||
|
||||
@@ -136,18 +89,7 @@ class RefineDocumentsChain(BaseCombineDocumentsChain):
|
||||
def combine_docs(
|
||||
self, docs: List[Document], callbacks: Callbacks = None, **kwargs: Any
|
||||
) -> Tuple[str, dict]:
|
||||
"""Combine by mapping first chain over all, then stuffing into final chain.
|
||||
|
||||
Args:
|
||||
docs: List of documents to combine
|
||||
callbacks: Callbacks to be passed through
|
||||
**kwargs: additional parameters to be passed to LLM calls (like other
|
||||
input variables besides the documents)
|
||||
|
||||
Returns:
|
||||
The first element returned is the single string output. The second
|
||||
element returned is a dictionary of other keys to return.
|
||||
"""
|
||||
"""Combine by mapping first chain over all, then stuffing into final chain."""
|
||||
inputs = self._construct_initial_inputs(docs, **kwargs)
|
||||
res = self.initial_llm_chain.predict(callbacks=callbacks, **inputs)
|
||||
refine_steps = [res]
|
||||
@@ -161,18 +103,7 @@ class RefineDocumentsChain(BaseCombineDocumentsChain):
|
||||
async def acombine_docs(
|
||||
self, docs: List[Document], callbacks: Callbacks = None, **kwargs: Any
|
||||
) -> Tuple[str, dict]:
|
||||
"""Combine by mapping first chain over all, then stuffing into final chain.
|
||||
|
||||
Args:
|
||||
docs: List of documents to combine
|
||||
callbacks: Callbacks to be passed through
|
||||
**kwargs: additional parameters to be passed to LLM calls (like other
|
||||
input variables besides the documents)
|
||||
|
||||
Returns:
|
||||
The first element returned is the single string output. The second
|
||||
element returned is a dictionary of other keys to return.
|
||||
"""
|
||||
"""Combine by mapping first chain over all, then stuffing into final chain."""
|
||||
inputs = self._construct_initial_inputs(docs, **kwargs)
|
||||
res = await self.initial_llm_chain.apredict(callbacks=callbacks, **inputs)
|
||||
refine_steps = [res]
|
||||
|
||||
@@ -7,11 +7,12 @@ from pydantic import Extra, Field, root_validator
|
||||
from langchain.callbacks.manager import Callbacks
|
||||
from langchain.chains.combine_documents.base import (
|
||||
BaseCombineDocumentsChain,
|
||||
format_document,
|
||||
)
|
||||
from langchain.chains.llm import LLMChain
|
||||
from langchain.docstore.document import Document
|
||||
from langchain.prompts.prompt import PromptTemplate
|
||||
from langchain.schema import BasePromptTemplate, format_document
|
||||
from langchain.schema import BasePromptTemplate
|
||||
|
||||
|
||||
def _get_default_document_prompt() -> PromptTemplate:
|
||||
@@ -19,50 +20,14 @@ def _get_default_document_prompt() -> PromptTemplate:
|
||||
|
||||
|
||||
class StuffDocumentsChain(BaseCombineDocumentsChain):
|
||||
"""Chain that combines documents by stuffing into context.
|
||||
|
||||
This chain takes a list of documents and first combines them into a single string.
|
||||
It does this by formatting each document into a string with the `document_prompt`
|
||||
and then joining them together with `document_separator`. It then adds that new
|
||||
string to the inputs with the variable name set by `document_variable_name`.
|
||||
Those inputs are then passed to the `llm_chain`.
|
||||
|
||||
Example:
|
||||
.. code-block:: python
|
||||
|
||||
from langchain.chains import StuffDocumentsChain, LLMChain
|
||||
from langchain.prompts import PromptTemplate
|
||||
from langchain.llms import OpenAI
|
||||
|
||||
# This controls how each document will be formatted. Specifically,
|
||||
# it will be passed to `format_document` - see that function for more
|
||||
# details.
|
||||
document_prompt = PromptTemplate(
|
||||
input_variables=["page_content"],
|
||||
template="{page_content}"
|
||||
)
|
||||
document_variable_name = "context"
|
||||
llm = OpenAI()
|
||||
# The prompt here should take as an input variable the
|
||||
# `document_variable_name`
|
||||
prompt = PromptTemplate.from_template(
|
||||
"Summarize this content: {context}"
|
||||
)
|
||||
llm_chain = LLMChain(llm=llm, prompt=prompt)
|
||||
chain = StuffDocumentsChain(
|
||||
llm_chain=llm_chain,
|
||||
document_prompt=document_prompt,
|
||||
document_variable_name=document_variable_name
|
||||
)
|
||||
"""
|
||||
"""Chain that combines documents by stuffing into context."""
|
||||
|
||||
llm_chain: LLMChain
|
||||
"""LLM chain which is called with the formatted document string,
|
||||
along with any other inputs."""
|
||||
"""LLM wrapper to use after formatting documents."""
|
||||
document_prompt: BasePromptTemplate = Field(
|
||||
default_factory=_get_default_document_prompt
|
||||
)
|
||||
"""Prompt to use to format each document, gets passed to `format_document`."""
|
||||
"""Prompt to use to format each document."""
|
||||
document_variable_name: str
|
||||
"""The variable name in the llm_chain to put the documents in.
|
||||
If only one variable in the llm_chain, this need not be provided."""
|
||||
@@ -77,12 +42,7 @@ class StuffDocumentsChain(BaseCombineDocumentsChain):
|
||||
|
||||
@root_validator(pre=True)
|
||||
def get_default_document_variable_name(cls, values: Dict) -> Dict:
|
||||
"""Get default document variable name, if not provided.
|
||||
|
||||
If only one variable is present in the llm_chain.prompt,
|
||||
we can infer that the formatted documents should be passed in
|
||||
with this variable name.
|
||||
"""
|
||||
"""Get default document variable name, if not provided."""
|
||||
llm_chain_variables = values["llm_chain"].prompt.input_variables
|
||||
if "document_variable_name" not in values:
|
||||
if len(llm_chain_variables) == 1:
|
||||
@@ -101,20 +61,6 @@ class StuffDocumentsChain(BaseCombineDocumentsChain):
|
||||
return values
|
||||
|
||||
def _get_inputs(self, docs: List[Document], **kwargs: Any) -> dict:
|
||||
"""Construct inputs from kwargs and docs.
|
||||
|
||||
Format and the join all the documents together into one input with name
|
||||
`self.document_variable_name`. The pluck any additional variables
|
||||
from **kwargs.
|
||||
|
||||
Args:
|
||||
docs: List of documents to format and then join into single input
|
||||
**kwargs: additional inputs to chain, will pluck any other required
|
||||
arguments from here.
|
||||
|
||||
Returns:
|
||||
dictionary of inputs to LLMChain
|
||||
"""
|
||||
# Format each document according to the prompt
|
||||
doc_strings = [format_document(doc, self.document_prompt) for doc in docs]
|
||||
# Join the documents together to put them in the prompt.
|
||||
@@ -127,21 +73,7 @@ class StuffDocumentsChain(BaseCombineDocumentsChain):
|
||||
return inputs
|
||||
|
||||
def prompt_length(self, docs: List[Document], **kwargs: Any) -> Optional[int]:
|
||||
"""Return the prompt length given the documents passed in.
|
||||
|
||||
This can be used by a caller to determine whether passing in a list
|
||||
of documents would exceed a certain prompt length. This useful when
|
||||
trying to ensure that the size of a prompt remains below a certain
|
||||
context limit.
|
||||
|
||||
Args:
|
||||
docs: List[Document], a list of documents to use to calculate the
|
||||
total prompt length.
|
||||
|
||||
Returns:
|
||||
Returns None if the method does not depend on the prompt length,
|
||||
otherwise the length of the prompt in tokens.
|
||||
"""
|
||||
"""Get the prompt length by formatting the prompt."""
|
||||
inputs = self._get_inputs(docs, **kwargs)
|
||||
prompt = self.llm_chain.prompt.format(**inputs)
|
||||
return self.llm_chain.llm.get_num_tokens(prompt)
|
||||
@@ -149,17 +81,7 @@ class StuffDocumentsChain(BaseCombineDocumentsChain):
|
||||
def combine_docs(
|
||||
self, docs: List[Document], callbacks: Callbacks = None, **kwargs: Any
|
||||
) -> Tuple[str, dict]:
|
||||
"""Stuff all documents into one prompt and pass to LLM.
|
||||
|
||||
Args:
|
||||
docs: List of documents to join together into one variable
|
||||
callbacks: Optional callbacks to pass along
|
||||
**kwargs: additional parameters to use to get inputs to LLMChain.
|
||||
|
||||
Returns:
|
||||
The first element returned is the single string output. The second
|
||||
element returned is a dictionary of other keys to return.
|
||||
"""
|
||||
"""Stuff all documents into one prompt and pass to LLM."""
|
||||
inputs = self._get_inputs(docs, **kwargs)
|
||||
# Call predict on the LLM.
|
||||
return self.llm_chain.predict(callbacks=callbacks, **inputs), {}
|
||||
@@ -167,17 +89,7 @@ class StuffDocumentsChain(BaseCombineDocumentsChain):
|
||||
async def acombine_docs(
|
||||
self, docs: List[Document], callbacks: Callbacks = None, **kwargs: Any
|
||||
) -> Tuple[str, dict]:
|
||||
"""Stuff all documents into one prompt and pass to LLM.
|
||||
|
||||
Args:
|
||||
docs: List of documents to join together into one variable
|
||||
callbacks: Optional callbacks to pass along
|
||||
**kwargs: additional parameters to use to get inputs to LLMChain.
|
||||
|
||||
Returns:
|
||||
The first element returned is the single string output. The second
|
||||
element returned is a dictionary of other keys to return.
|
||||
"""
|
||||
"""Stuff all documents into one prompt and pass to LLM."""
|
||||
inputs = self._get_inputs(docs, **kwargs)
|
||||
# Call predict on the LLM.
|
||||
return await self.llm_chain.apredict(callbacks=callbacks, **inputs), {}
|
||||
|
||||
@@ -55,26 +55,12 @@ class BaseConversationalRetrievalChain(Chain):
|
||||
"""Chain for chatting with an index."""
|
||||
|
||||
combine_docs_chain: BaseCombineDocumentsChain
|
||||
"""The chain used to combine any retrieved documents."""
|
||||
question_generator: LLMChain
|
||||
"""The chain used to generate a new question for the sake of retrieval.
|
||||
This chain will take in the current question (with variable `question`)
|
||||
and any chat history (with variable `chat_history`) and will produce
|
||||
a new standalone question to be used later on."""
|
||||
output_key: str = "answer"
|
||||
"""The output key to return the final answer of this chain in."""
|
||||
rephrase_question: bool = True
|
||||
"""Whether or not to pass the new generated question to the combine_docs_chain.
|
||||
If True, will pass the new generated question along.
|
||||
If False, will only use the new generated question for retrieval and pass the
|
||||
original question along to the combine_docs_chain."""
|
||||
return_source_documents: bool = False
|
||||
"""Return the retrieved source documents as part of the final result."""
|
||||
return_generated_question: bool = False
|
||||
"""Return the generated question as part of the final result."""
|
||||
get_chat_history: Optional[Callable[[CHAT_TURN_TYPE], str]] = None
|
||||
"""An optional function to get a string of the chat history.
|
||||
If None is provided, will use a default."""
|
||||
"""Return the source documents."""
|
||||
|
||||
class Config:
|
||||
"""Configuration for this pydantic object."""
|
||||
@@ -136,8 +122,7 @@ class BaseConversationalRetrievalChain(Chain):
|
||||
else:
|
||||
docs = self._get_docs(new_question, inputs) # type: ignore[call-arg]
|
||||
new_inputs = inputs.copy()
|
||||
if self.rephrase_question:
|
||||
new_inputs["question"] = new_question
|
||||
new_inputs["question"] = new_question
|
||||
new_inputs["chat_history"] = chat_history_str
|
||||
answer = self.combine_docs_chain.run(
|
||||
input_documents=docs, callbacks=_run_manager.get_child(), **new_inputs
|
||||
@@ -184,8 +169,7 @@ class BaseConversationalRetrievalChain(Chain):
|
||||
docs = await self._aget_docs(new_question, inputs) # type: ignore[call-arg]
|
||||
|
||||
new_inputs = inputs.copy()
|
||||
if self.rephrase_question:
|
||||
new_inputs["question"] = new_question
|
||||
new_inputs["question"] = new_question
|
||||
new_inputs["chat_history"] = chat_history_str
|
||||
answer = await self.combine_docs_chain.arun(
|
||||
input_documents=docs, callbacks=_run_manager.get_child(), **new_inputs
|
||||
@@ -204,60 +188,13 @@ class BaseConversationalRetrievalChain(Chain):
|
||||
|
||||
|
||||
class ConversationalRetrievalChain(BaseConversationalRetrievalChain):
|
||||
"""Chain for having a conversation based on retrieved documents.
|
||||
|
||||
This chain takes in chat history (a list of messages) and new questions,
|
||||
and then returns an answer to that question.
|
||||
The algorithm for this chain consists of three parts:
|
||||
|
||||
1. Use the chat history and the new question to create a "standalone question".
|
||||
This is done so that this question can be passed into the retrieval step to fetch
|
||||
relevant documents. If only the new question was passed in, then relevant context
|
||||
may be lacking. If the whole conversation was passed into retrieval, there may
|
||||
be unnecessary information there that would distract from retrieval.
|
||||
|
||||
2. This new question is passed to the retriever and relevant documents are
|
||||
returned.
|
||||
|
||||
3. The retrieved documents are passed to an LLM along with either the new question
|
||||
(default behavior) or the original question and chat history to generate a final
|
||||
response.
|
||||
|
||||
Example:
|
||||
.. code-block:: python
|
||||
|
||||
from langchain.chains import (
|
||||
StuffDocumentsChain, LLMChain, ConversationalRetrievalChain
|
||||
)
|
||||
from langchain.prompts import PromptTemplate
|
||||
from langchain.llms import OpenAI
|
||||
|
||||
combine_docs_chain = StuffDocumentsChain(...)
|
||||
vectorstore = ...
|
||||
retriever = vectorstore.as_retriever()
|
||||
|
||||
# This controls how the standalone question is generated.
|
||||
# Should take `chat_history` and `question` as input variables.
|
||||
template = (
|
||||
"Combine the chat history and follow up question into "
|
||||
"a standalone question. Chat History: {chat_history}"
|
||||
"Follow up question: {question}"
|
||||
)
|
||||
prompt = PromptTemplate.from_template(template)
|
||||
llm = OpenAI()
|
||||
llm_chain = LLMChain(llm=llm, prompt=prompt)
|
||||
chain = ConversationalRetrievalChain(
|
||||
combine_docs_chain=combine_docs_chain,
|
||||
retriever=retriever,
|
||||
question_generator=question_generator,
|
||||
)
|
||||
"""
|
||||
"""Chain for chatting with an index."""
|
||||
|
||||
retriever: BaseRetriever
|
||||
"""Retriever to use to fetch documents."""
|
||||
"""Index to connect to."""
|
||||
max_tokens_limit: Optional[int] = None
|
||||
"""If set, enforces that the documents returned are less than this limit.
|
||||
This is only enforced if `combine_docs_chain` is of type StuffDocumentsChain."""
|
||||
"""If set, restricts the docs to return from store based on tokens, enforced only
|
||||
for StuffDocumentChain"""
|
||||
|
||||
def _reduce_tokens_below_limit(self, docs: List[Document]) -> List[Document]:
|
||||
num_docs = len(docs)
|
||||
@@ -315,29 +252,7 @@ class ConversationalRetrievalChain(BaseConversationalRetrievalChain):
|
||||
callbacks: Callbacks = None,
|
||||
**kwargs: Any,
|
||||
) -> BaseConversationalRetrievalChain:
|
||||
"""Convenience method to load chain from LLM and retriever.
|
||||
|
||||
This provides some logic to create the `question_generator` chain
|
||||
as well as the combine_docs_chain.
|
||||
|
||||
Args:
|
||||
llm: The default language model to use at every part of this chain
|
||||
(eg in both the question generation and the answering)
|
||||
retriever: The retriever to use to fetch relevant documents from.
|
||||
condense_question_prompt: The prompt to use to condense the chat history
|
||||
and new question into a standalone question.
|
||||
chain_type: The chain type to use to create the combine_docs_chain, will
|
||||
be sent to `load_qa_chain`.
|
||||
verbose: Verbosity flag for logging to stdout.
|
||||
condense_question_llm: The language model to use for condensing the chat
|
||||
history and new question into a standalone question. If none is
|
||||
provided, will default to `llm`.
|
||||
combine_docs_chain_kwargs: Parameters to pass as kwargs to `load_qa_chain`
|
||||
when constructing the combine_docs_chain.
|
||||
callbacks: Callbacks to pass to all subchains.
|
||||
**kwargs: Additional parameters to pass when initializing
|
||||
ConversationalRetrievalChain
|
||||
"""
|
||||
"""Load chain from LLM."""
|
||||
combine_docs_chain_kwargs = combine_docs_chain_kwargs or {}
|
||||
doc_chain = load_qa_chain(
|
||||
llm,
|
||||
|
||||
@@ -8,7 +8,7 @@ from pydantic import Field
|
||||
from langchain.base_language import BaseLanguageModel
|
||||
from langchain.callbacks.manager import CallbackManagerForChainRun
|
||||
from langchain.chains.base import Chain
|
||||
from langchain.chains.graph_qa.prompts import ENTITY_EXTRACTION_PROMPT, GRAPH_QA_PROMPT
|
||||
from langchain.chains.graph_qa.prompts import ENTITY_EXTRACTION_PROMPT, PROMPT
|
||||
from langchain.chains.llm import LLMChain
|
||||
from langchain.graphs.networkx_graph import NetworkxEntityGraph, get_entities
|
||||
from langchain.schema import BasePromptTemplate
|
||||
@@ -44,7 +44,7 @@ class GraphQAChain(Chain):
|
||||
def from_llm(
|
||||
cls,
|
||||
llm: BaseLanguageModel,
|
||||
qa_prompt: BasePromptTemplate = GRAPH_QA_PROMPT,
|
||||
qa_prompt: BasePromptTemplate = PROMPT,
|
||||
entity_prompt: BasePromptTemplate = ENTITY_EXTRACTION_PROMPT,
|
||||
**kwargs: Any,
|
||||
) -> GraphQAChain:
|
||||
|
||||
@@ -1,94 +0,0 @@
|
||||
"""Question answering over a graph."""
|
||||
from __future__ import annotations
|
||||
|
||||
from typing import Any, Dict, List, Optional
|
||||
|
||||
from pydantic import Field
|
||||
|
||||
from langchain.base_language import BaseLanguageModel
|
||||
from langchain.callbacks.manager import CallbackManagerForChainRun
|
||||
from langchain.chains.base import Chain
|
||||
from langchain.chains.graph_qa.prompts import (
|
||||
CYPHER_QA_PROMPT,
|
||||
GREMLIN_GENERATION_PROMPT,
|
||||
)
|
||||
from langchain.chains.llm import LLMChain
|
||||
from langchain.graphs.hugegraph import HugeGraph
|
||||
from langchain.schema import BasePromptTemplate
|
||||
|
||||
|
||||
class HugeGraphQAChain(Chain):
|
||||
"""Chain for question-answering against a graph by generating gremlin statements."""
|
||||
|
||||
graph: HugeGraph = Field(exclude=True)
|
||||
gremlin_generation_chain: LLMChain
|
||||
qa_chain: LLMChain
|
||||
input_key: str = "query" #: :meta private:
|
||||
output_key: str = "result" #: :meta private:
|
||||
|
||||
@property
|
||||
def input_keys(self) -> List[str]:
|
||||
"""Return the input keys.
|
||||
|
||||
:meta private:
|
||||
"""
|
||||
return [self.input_key]
|
||||
|
||||
@property
|
||||
def output_keys(self) -> List[str]:
|
||||
"""Return the output keys.
|
||||
|
||||
:meta private:
|
||||
"""
|
||||
_output_keys = [self.output_key]
|
||||
return _output_keys
|
||||
|
||||
@classmethod
|
||||
def from_llm(
|
||||
cls,
|
||||
llm: BaseLanguageModel,
|
||||
*,
|
||||
qa_prompt: BasePromptTemplate = CYPHER_QA_PROMPT,
|
||||
gremlin_prompt: BasePromptTemplate = GREMLIN_GENERATION_PROMPT,
|
||||
**kwargs: Any,
|
||||
) -> HugeGraphQAChain:
|
||||
"""Initialize from LLM."""
|
||||
qa_chain = LLMChain(llm=llm, prompt=qa_prompt)
|
||||
gremlin_generation_chain = LLMChain(llm=llm, prompt=gremlin_prompt)
|
||||
|
||||
return cls(
|
||||
qa_chain=qa_chain,
|
||||
gremlin_generation_chain=gremlin_generation_chain,
|
||||
**kwargs,
|
||||
)
|
||||
|
||||
def _call(
|
||||
self,
|
||||
inputs: Dict[str, Any],
|
||||
run_manager: Optional[CallbackManagerForChainRun] = None,
|
||||
) -> Dict[str, str]:
|
||||
"""Generate gremlin statement, use it to look up in db and answer question."""
|
||||
_run_manager = run_manager or CallbackManagerForChainRun.get_noop_manager()
|
||||
callbacks = _run_manager.get_child()
|
||||
question = inputs[self.input_key]
|
||||
|
||||
generated_gremlin = self.gremlin_generation_chain.run(
|
||||
{"question": question, "schema": self.graph.get_schema}, callbacks=callbacks
|
||||
)
|
||||
|
||||
_run_manager.on_text("Generated gremlin:", end="\n", verbose=self.verbose)
|
||||
_run_manager.on_text(
|
||||
generated_gremlin, color="green", end="\n", verbose=self.verbose
|
||||
)
|
||||
context = self.graph.query(generated_gremlin)
|
||||
|
||||
_run_manager.on_text("Full Context:", end="\n", verbose=self.verbose)
|
||||
_run_manager.on_text(
|
||||
str(context), color="green", end="\n", verbose=self.verbose
|
||||
)
|
||||
|
||||
result = self.qa_chain(
|
||||
{"question": question, "context": context},
|
||||
callbacks=callbacks,
|
||||
)
|
||||
return {self.output_key: result[self.qa_chain.output_key]}
|
||||
@@ -23,14 +23,14 @@ ENTITY_EXTRACTION_PROMPT = PromptTemplate(
|
||||
input_variables=["input"], template=_DEFAULT_ENTITY_EXTRACTION_TEMPLATE
|
||||
)
|
||||
|
||||
_DEFAULT_GRAPH_QA_TEMPLATE = """Use the following knowledge triplets to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.
|
||||
prompt_template = """Use the following knowledge triplets to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.
|
||||
|
||||
{context}
|
||||
|
||||
Question: {question}
|
||||
Helpful Answer:"""
|
||||
GRAPH_QA_PROMPT = PromptTemplate(
|
||||
template=_DEFAULT_GRAPH_QA_TEMPLATE, input_variables=["context", "question"]
|
||||
PROMPT = PromptTemplate(
|
||||
template=prompt_template, input_variables=["context", "question"]
|
||||
)
|
||||
|
||||
CYPHER_GENERATION_TEMPLATE = """Task:Generate Cypher statement to query a graph database.
|
||||
@@ -90,12 +90,6 @@ KUZU_GENERATION_PROMPT = PromptTemplate(
|
||||
input_variables=["schema", "question"], template=KUZU_GENERATION_TEMPLATE
|
||||
)
|
||||
|
||||
GREMLIN_GENERATION_TEMPLATE = CYPHER_GENERATION_TEMPLATE.replace("Cypher", "Gremlin")
|
||||
|
||||
GREMLIN_GENERATION_PROMPT = PromptTemplate(
|
||||
input_variables=["schema", "question"], template=GREMLIN_GENERATION_TEMPLATE
|
||||
)
|
||||
|
||||
CYPHER_QA_TEMPLATE = """You are an assistant that helps to form nice and human understandable answers.
|
||||
The information part contains the provided information that you must use to construct an answer.
|
||||
The provided information is authorative, you must never doubt it or try to use your internal knowledge to correct it.
|
||||
@@ -109,90 +103,3 @@ Helpful Answer:"""
|
||||
CYPHER_QA_PROMPT = PromptTemplate(
|
||||
input_variables=["context", "question"], template=CYPHER_QA_TEMPLATE
|
||||
)
|
||||
|
||||
SPARQL_INTENT_TEMPLATE = """Task: Identify the intent of a prompt and return the appropriate SPARQL query type.
|
||||
You are an assistant that distinguishes different types of prompts and returns the corresponding SPARQL query types.
|
||||
Consider only the following query types:
|
||||
* SELECT: this query type corresponds to questions
|
||||
* UPDATE: this query type corresponds to all requests for deleting, inserting, or changing triples
|
||||
Note: Be as concise as possible.
|
||||
Do not include any explanations or apologies in your responses.
|
||||
Do not respond to any questions that ask for anything else than for you to identify a SPARQL query type.
|
||||
Do not include any unnecessary whitespaces or any text except the query type, i.e., either return 'SELECT' or 'UPDATE'.
|
||||
|
||||
The prompt is:
|
||||
{prompt}
|
||||
Helpful Answer:"""
|
||||
SPARQL_INTENT_PROMPT = PromptTemplate(
|
||||
input_variables=["prompt"], template=SPARQL_INTENT_TEMPLATE
|
||||
)
|
||||
|
||||
SPARQL_GENERATION_SELECT_TEMPLATE = """Task: Generate a SPARQL SELECT statement for querying a graph database.
|
||||
For instance, to find all email addresses of John Doe, the following query in backticks would be suitable:
|
||||
```
|
||||
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
|
||||
SELECT ?email
|
||||
WHERE {{
|
||||
?person foaf:name "John Doe" .
|
||||
?person foaf:mbox ?email .
|
||||
}}
|
||||
```
|
||||
Instructions:
|
||||
Use only the node types and properties provided in the schema.
|
||||
Do not use any node types and properties that are not explicitly provided.
|
||||
Include all necessary prefixes.
|
||||
Schema:
|
||||
{schema}
|
||||
Note: Be as concise as possible.
|
||||
Do not include any explanations or apologies in your responses.
|
||||
Do not respond to any questions that ask for anything else than for you to construct a SPARQL query.
|
||||
Do not include any text except the SPARQL query generated.
|
||||
|
||||
The question is:
|
||||
{prompt}"""
|
||||
SPARQL_GENERATION_SELECT_PROMPT = PromptTemplate(
|
||||
input_variables=["schema", "prompt"], template=SPARQL_GENERATION_SELECT_TEMPLATE
|
||||
)
|
||||
|
||||
SPARQL_GENERATION_UPDATE_TEMPLATE = """Task: Generate a SPARQL UPDATE statement for updating a graph database.
|
||||
For instance, to add 'jane.doe@foo.bar' as a new email address for Jane Doe, the following query in backticks would be suitable:
|
||||
```
|
||||
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
|
||||
INSERT {{
|
||||
?person foaf:mbox <mailto:jane.doe@foo.bar> .
|
||||
}}
|
||||
WHERE {{
|
||||
?person foaf:name "Jane Doe" .
|
||||
}}
|
||||
```
|
||||
Instructions:
|
||||
Make the query as short as possible and avoid adding unnecessary triples.
|
||||
Use only the node types and properties provided in the schema.
|
||||
Do not use any node types and properties that are not explicitly provided.
|
||||
Include all necessary prefixes.
|
||||
Schema:
|
||||
{schema}
|
||||
Note: Be as concise as possible.
|
||||
Do not include any explanations or apologies in your responses.
|
||||
Do not respond to any questions that ask for anything else than for you to construct a SPARQL query.
|
||||
Return only the generated SPARQL query, nothing else.
|
||||
|
||||
The information to be inserted is:
|
||||
{prompt}"""
|
||||
SPARQL_GENERATION_UPDATE_PROMPT = PromptTemplate(
|
||||
input_variables=["schema", "prompt"], template=SPARQL_GENERATION_UPDATE_TEMPLATE
|
||||
)
|
||||
|
||||
SPARQL_QA_TEMPLATE = """Task: Generate a natural language response from the results of a SPARQL query.
|
||||
You are an assistant that creates well-written and human understandable answers.
|
||||
The information part contains the information provided, which you can use to construct an answer.
|
||||
The information provided is authoritative, you must never doubt it or try to use your internal knowledge to correct it.
|
||||
Make your response sound like the information is coming from an AI assistant, but don't add any information.
|
||||
Information:
|
||||
{context}
|
||||
|
||||
Question: {prompt}
|
||||
Helpful Answer:"""
|
||||
SPARQL_QA_PROMPT = PromptTemplate(
|
||||
input_variables=["context", "prompt"], template=SPARQL_QA_TEMPLATE
|
||||
)
|
||||
|
||||
@@ -1,127 +0,0 @@
|
||||
"""
|
||||
Question answering over an RDF or OWL graph using SPARQL.
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
from typing import Any, Dict, List, Optional
|
||||
|
||||
from pydantic import Field
|
||||
|
||||
from langchain.base_language import BaseLanguageModel
|
||||
from langchain.callbacks.manager import CallbackManagerForChainRun
|
||||
from langchain.chains.base import Chain
|
||||
from langchain.chains.graph_qa.prompts import (
|
||||
SPARQL_GENERATION_SELECT_PROMPT,
|
||||
SPARQL_GENERATION_UPDATE_PROMPT,
|
||||
SPARQL_INTENT_PROMPT,
|
||||
SPARQL_QA_PROMPT,
|
||||
)
|
||||
from langchain.chains.llm import LLMChain
|
||||
from langchain.graphs.rdf_graph import RdfGraph
|
||||
from langchain.prompts.base import BasePromptTemplate
|
||||
|
||||
|
||||
class GraphSparqlQAChain(Chain):
|
||||
"""
|
||||
Chain for question-answering against an RDF or OWL graph by generating
|
||||
SPARQL statements.
|
||||
"""
|
||||
|
||||
graph: RdfGraph = Field(exclude=True)
|
||||
sparql_generation_select_chain: LLMChain
|
||||
sparql_generation_update_chain: LLMChain
|
||||
sparql_intent_chain: LLMChain
|
||||
qa_chain: LLMChain
|
||||
input_key: str = "query" #: :meta private:
|
||||
output_key: str = "result" #: :meta private:
|
||||
|
||||
@property
|
||||
def input_keys(self) -> List[str]:
|
||||
return [self.input_key]
|
||||
|
||||
@property
|
||||
def output_keys(self) -> List[str]:
|
||||
_output_keys = [self.output_key]
|
||||
return _output_keys
|
||||
|
||||
@classmethod
|
||||
def from_llm(
|
||||
cls,
|
||||
llm: BaseLanguageModel,
|
||||
*,
|
||||
qa_prompt: BasePromptTemplate = SPARQL_QA_PROMPT,
|
||||
sparql_select_prompt: BasePromptTemplate = SPARQL_GENERATION_SELECT_PROMPT,
|
||||
sparql_update_prompt: BasePromptTemplate = SPARQL_GENERATION_UPDATE_PROMPT,
|
||||
sparql_intent_prompt: BasePromptTemplate = SPARQL_INTENT_PROMPT,
|
||||
**kwargs: Any,
|
||||
) -> GraphSparqlQAChain:
|
||||
"""Initialize from LLM."""
|
||||
qa_chain = LLMChain(llm=llm, prompt=qa_prompt)
|
||||
sparql_generation_select_chain = LLMChain(llm=llm, prompt=sparql_select_prompt)
|
||||
sparql_generation_update_chain = LLMChain(llm=llm, prompt=sparql_update_prompt)
|
||||
sparql_intent_chain = LLMChain(llm=llm, prompt=sparql_intent_prompt)
|
||||
|
||||
return cls(
|
||||
qa_chain=qa_chain,
|
||||
sparql_generation_select_chain=sparql_generation_select_chain,
|
||||
sparql_generation_update_chain=sparql_generation_update_chain,
|
||||
sparql_intent_chain=sparql_intent_chain,
|
||||
**kwargs,
|
||||
)
|
||||
|
||||
def _call(
|
||||
self,
|
||||
inputs: Dict[str, Any],
|
||||
run_manager: Optional[CallbackManagerForChainRun] = None,
|
||||
) -> Dict[str, str]:
|
||||
"""
|
||||
Generate SPARQL query, use it to retrieve a response from the gdb and answer
|
||||
the question.
|
||||
"""
|
||||
_run_manager = run_manager or CallbackManagerForChainRun.get_noop_manager()
|
||||
callbacks = _run_manager.get_child()
|
||||
prompt = inputs[self.input_key]
|
||||
|
||||
_intent = self.sparql_intent_chain.run({"prompt": prompt}, callbacks=callbacks)
|
||||
intent = _intent.strip()
|
||||
|
||||
if intent == "SELECT":
|
||||
sparql_generation_chain = self.sparql_generation_select_chain
|
||||
elif intent == "UPDATE":
|
||||
sparql_generation_chain = self.sparql_generation_update_chain
|
||||
else:
|
||||
raise ValueError(
|
||||
"I am sorry, but this prompt seems to fit none of the currently "
|
||||
"supported SPARQL query types, i.e., SELECT and UPDATE."
|
||||
)
|
||||
|
||||
_run_manager.on_text("Identified intent:", end="\n", verbose=self.verbose)
|
||||
_run_manager.on_text(intent, color="green", end="\n", verbose=self.verbose)
|
||||
|
||||
generated_sparql = sparql_generation_chain.run(
|
||||
{"prompt": prompt, "schema": self.graph.get_schema}, callbacks=callbacks
|
||||
)
|
||||
|
||||
_run_manager.on_text("Generated SPARQL:", end="\n", verbose=self.verbose)
|
||||
_run_manager.on_text(
|
||||
generated_sparql, color="green", end="\n", verbose=self.verbose
|
||||
)
|
||||
|
||||
if intent == "SELECT":
|
||||
context = self.graph.query(generated_sparql)
|
||||
|
||||
_run_manager.on_text("Full Context:", end="\n", verbose=self.verbose)
|
||||
_run_manager.on_text(
|
||||
str(context), color="green", end="\n", verbose=self.verbose
|
||||
)
|
||||
result = self.qa_chain(
|
||||
{"prompt": prompt, "context": context},
|
||||
callbacks=callbacks,
|
||||
)
|
||||
res = result[self.qa_chain.output_key]
|
||||
elif intent == "UPDATE":
|
||||
self.graph.update(generated_sparql)
|
||||
res = "Successfully inserted triples into the graph."
|
||||
else:
|
||||
raise ValueError("Unsupported SPARQL query type.")
|
||||
return {self.output_key: res}
|
||||
@@ -5,7 +5,6 @@ from typing import Any, Union
|
||||
|
||||
import yaml
|
||||
|
||||
from langchain.chains import ReduceDocumentsChain
|
||||
from langchain.chains.api.base import APIChain
|
||||
from langchain.chains.base import Chain
|
||||
from langchain.chains.combine_documents.map_reduce import MapReduceDocumentsChain
|
||||
@@ -118,9 +117,9 @@ def _load_map_reduce_documents_chain(
|
||||
|
||||
if "combine_document_chain" in config:
|
||||
combine_document_chain_config = config.pop("combine_document_chain")
|
||||
combine_documents_chain = load_chain_from_config(combine_document_chain_config)
|
||||
combine_document_chain = load_chain_from_config(combine_document_chain_config)
|
||||
elif "combine_document_chain_path" in config:
|
||||
combine_documents_chain = load_chain(config.pop("combine_document_chain_path"))
|
||||
combine_document_chain = load_chain(config.pop("combine_document_chain_path"))
|
||||
else:
|
||||
raise ValueError(
|
||||
"One of `combine_document_chain` or "
|
||||
@@ -129,24 +128,17 @@ def _load_map_reduce_documents_chain(
|
||||
if "collapse_document_chain" in config:
|
||||
collapse_document_chain_config = config.pop("collapse_document_chain")
|
||||
if collapse_document_chain_config is None:
|
||||
collapse_documents_chain = None
|
||||
collapse_document_chain = None
|
||||
else:
|
||||
collapse_documents_chain = load_chain_from_config(
|
||||
collapse_document_chain = load_chain_from_config(
|
||||
collapse_document_chain_config
|
||||
)
|
||||
elif "collapse_document_chain_path" in config:
|
||||
collapse_documents_chain = load_chain(
|
||||
config.pop("collapse_document_chain_path")
|
||||
)
|
||||
else:
|
||||
collapse_documents_chain = None
|
||||
reduce_documents_chain = ReduceDocumentsChain(
|
||||
combine_documents_chain=combine_documents_chain,
|
||||
collapse_documents_chain=collapse_documents_chain,
|
||||
)
|
||||
collapse_document_chain = load_chain(config.pop("collapse_document_chain_path"))
|
||||
return MapReduceDocumentsChain(
|
||||
llm_chain=llm_chain,
|
||||
reduce_documents_chain=reduce_documents_chain,
|
||||
combine_document_chain=combine_document_chain,
|
||||
collapse_document_chain=collapse_document_chain,
|
||||
**config,
|
||||
)
|
||||
|
||||
|
||||
@@ -11,7 +11,6 @@ from pydantic import Extra
|
||||
|
||||
from langchain.base_language import BaseLanguageModel
|
||||
from langchain.callbacks.manager import CallbackManagerForChainRun, Callbacks
|
||||
from langchain.chains import ReduceDocumentsChain
|
||||
from langchain.chains.base import Chain
|
||||
from langchain.chains.combine_documents.base import BaseCombineDocumentsChain
|
||||
from langchain.chains.combine_documents.map_reduce import MapReduceDocumentsChain
|
||||
@@ -45,17 +44,14 @@ class MapReduceChain(Chain):
|
||||
) -> MapReduceChain:
|
||||
"""Construct a map-reduce chain that uses the chain for map and reduce."""
|
||||
llm_chain = LLMChain(llm=llm, prompt=prompt, callbacks=callbacks)
|
||||
stuff_chain = StuffDocumentsChain(
|
||||
reduce_chain = StuffDocumentsChain(
|
||||
llm_chain=llm_chain,
|
||||
callbacks=callbacks,
|
||||
**(reduce_chain_kwargs if reduce_chain_kwargs else {}),
|
||||
)
|
||||
reduce_documents_chain = ReduceDocumentsChain(
|
||||
combine_documents_chain=stuff_chain
|
||||
)
|
||||
combine_documents_chain = MapReduceDocumentsChain(
|
||||
llm_chain=llm_chain,
|
||||
reduce_documents_chain=reduce_documents_chain,
|
||||
combine_document_chain=reduce_chain,
|
||||
callbacks=callbacks,
|
||||
**(combine_chain_kwargs if combine_chain_kwargs else {}),
|
||||
)
|
||||
|
||||
@@ -1,7 +1,3 @@
|
||||
from langchain.chains.openai_functions.base import (
|
||||
create_openai_fn_chain,
|
||||
create_structured_output_chain,
|
||||
)
|
||||
from langchain.chains.openai_functions.citation_fuzzy_match import (
|
||||
create_citation_fuzzy_match_chain,
|
||||
)
|
||||
@@ -26,6 +22,4 @@ __all__ = [
|
||||
"create_citation_fuzzy_match_chain",
|
||||
"create_qa_with_structure_chain",
|
||||
"create_qa_with_sources_chain",
|
||||
"create_structured_output_chain",
|
||||
"create_openai_fn_chain",
|
||||
]
|
||||
|
||||
@@ -1,315 +0,0 @@
|
||||
"""Methods for creating chains that use OpenAI function-calling APIs."""
|
||||
import inspect
|
||||
import re
|
||||
from typing import Any, Callable, Dict, List, Optional, Sequence, Tuple, Type, Union
|
||||
|
||||
from pydantic import BaseModel
|
||||
|
||||
from langchain.base_language import BaseLanguageModel
|
||||
from langchain.chains import LLMChain
|
||||
from langchain.chat_models import ChatOpenAI
|
||||
from langchain.output_parsers.openai_functions import (
|
||||
JsonOutputFunctionsParser,
|
||||
PydanticOutputFunctionsParser,
|
||||
)
|
||||
from langchain.prompts import BasePromptTemplate, ChatPromptTemplate
|
||||
from langchain.schema import BaseLLMOutputParser
|
||||
|
||||
PYTHON_TO_JSON_TYPES = {
|
||||
"str": "string",
|
||||
"int": "number",
|
||||
"float": "number",
|
||||
"bool": "boolean",
|
||||
}
|
||||
|
||||
|
||||
def _get_python_function_name(function: Callable) -> str:
|
||||
"""Get the name of a Python function."""
|
||||
source = inspect.getsource(function)
|
||||
return re.search(r"^def (.*)\(", source).groups()[0] # type: ignore
|
||||
|
||||
|
||||
def _parse_python_function_docstring(function: Callable) -> Tuple[str, dict]:
|
||||
"""Parse the function and argument descriptions from the docstring of a function.
|
||||
|
||||
Assumes the function docstring follows Google Python style guide.
|
||||
"""
|
||||
docstring = inspect.getdoc(function)
|
||||
if docstring:
|
||||
docstring_blocks = docstring.split("\n\n")
|
||||
descriptors = []
|
||||
args_block = None
|
||||
past_descriptors = False
|
||||
for block in docstring_blocks:
|
||||
if block.startswith("Args:"):
|
||||
args_block = block
|
||||
break
|
||||
elif block.startswith("Returns:") or block.startswith("Example:"):
|
||||
# Don't break in case Args come after
|
||||
past_descriptors = True
|
||||
elif not past_descriptors:
|
||||
descriptors.append(block)
|
||||
else:
|
||||
continue
|
||||
description = " ".join(descriptors)
|
||||
else:
|
||||
description = ""
|
||||
args_block = None
|
||||
arg_descriptions = {}
|
||||
if args_block:
|
||||
arg = None
|
||||
for line in args_block.split("\n")[1:]:
|
||||
if ":" in line:
|
||||
arg, desc = line.split(":")
|
||||
arg_descriptions[arg.strip()] = desc.strip()
|
||||
elif arg:
|
||||
arg_descriptions[arg.strip()] += " " + line.strip()
|
||||
return description, arg_descriptions
|
||||
|
||||
|
||||
def _get_python_function_arguments(function: Callable, arg_descriptions: dict) -> dict:
|
||||
"""Get JsonSchema describing a Python functions arguments.
|
||||
|
||||
Assumes all function arguments are of primitive types (int, float, str, bool) or
|
||||
are subclasses of pydantic.BaseModel.
|
||||
"""
|
||||
properties = {}
|
||||
annotations = inspect.getfullargspec(function).annotations
|
||||
for arg, arg_type in annotations.items():
|
||||
if arg == "return":
|
||||
continue
|
||||
if isinstance(arg_type, type) and issubclass(arg_type, BaseModel):
|
||||
properties[arg] = arg_type.schema()
|
||||
elif arg_type.__name__ in PYTHON_TO_JSON_TYPES:
|
||||
properties[arg] = {"type": PYTHON_TO_JSON_TYPES[arg_type.__name__]}
|
||||
if arg in arg_descriptions:
|
||||
if arg not in properties:
|
||||
properties[arg] = {}
|
||||
properties[arg]["description"] = arg_descriptions[arg]
|
||||
return properties
|
||||
|
||||
|
||||
def _get_python_function_required_args(function: Callable) -> List[str]:
|
||||
"""Get the required arguments for a Python function."""
|
||||
spec = inspect.getfullargspec(function)
|
||||
required = spec.args[: -len(spec.defaults)] if spec.defaults else spec.args
|
||||
required += [k for k in spec.kwonlyargs if k not in (spec.kwonlydefaults or {})]
|
||||
return required
|
||||
|
||||
|
||||
def convert_python_function_to_openai_function(function: Callable) -> Dict[str, Any]:
|
||||
"""Convert a Python function to an OpenAI function-calling API compatible dict.
|
||||
|
||||
Assumes the Python function has type hints and a docstring with a description. If
|
||||
the docstring has Google Python style argument descriptions, these will be
|
||||
included as well.
|
||||
"""
|
||||
description, arg_descriptions = _parse_python_function_docstring(function)
|
||||
return {
|
||||
"name": _get_python_function_name(function),
|
||||
"description": description,
|
||||
"parameters": {
|
||||
"type": "object",
|
||||
"properties": _get_python_function_arguments(function, arg_descriptions),
|
||||
"required": _get_python_function_required_args(function),
|
||||
},
|
||||
}
|
||||
|
||||
|
||||
def convert_to_openai_function(
|
||||
function: Union[Dict[str, Any], BaseModel, Callable]
|
||||
) -> Dict[str, Any]:
|
||||
"""Convert a raw function/class to an OpenAI function.
|
||||
|
||||
Args:
|
||||
function: Either a dictionary, a pydantic.BaseModel, or a Python function. If
|
||||
a dictionary is passed in, it is assumed to already be a valid OpenAI
|
||||
function.
|
||||
|
||||
Returns:
|
||||
A dict version of the passed in function which is compatible with the
|
||||
OpenAI function-calling API.
|
||||
"""
|
||||
if isinstance(function, dict):
|
||||
return function
|
||||
elif isinstance(function, type) and issubclass(function, BaseModel):
|
||||
schema = function.schema()
|
||||
return {
|
||||
"name": schema["title"],
|
||||
"description": schema["description"],
|
||||
"parameters": schema,
|
||||
}
|
||||
elif callable(function):
|
||||
return convert_python_function_to_openai_function(function)
|
||||
|
||||
else:
|
||||
raise ValueError(
|
||||
f"Unsupported function type {type(function)}. Functions must be passed in"
|
||||
f" as Dict, pydantic.BaseModel, or Callable."
|
||||
)
|
||||
|
||||
|
||||
def _get_openai_output_parser(
|
||||
functions: Sequence[Union[Dict[str, Any], BaseModel, Callable]],
|
||||
function_names: Sequence[str],
|
||||
) -> BaseLLMOutputParser:
|
||||
"""Get the appropriate function output parser given the user functions."""
|
||||
if isinstance(functions[0], type) and issubclass(functions[0], BaseModel):
|
||||
if len(functions) > 1:
|
||||
pydantic_schema: Union[Dict, Type[BaseModel]] = {
|
||||
name: fn for name, fn in zip(function_names, functions)
|
||||
}
|
||||
else:
|
||||
pydantic_schema = functions[0]
|
||||
output_parser: BaseLLMOutputParser = PydanticOutputFunctionsParser(
|
||||
pydantic_schema=pydantic_schema
|
||||
)
|
||||
else:
|
||||
output_parser = JsonOutputFunctionsParser(args_only=len(functions) <= 1)
|
||||
return output_parser
|
||||
|
||||
|
||||
def create_openai_fn_chain(
|
||||
functions: Sequence[Union[Dict[str, Any], BaseModel, Callable]],
|
||||
llm: Optional[BaseLanguageModel] = None,
|
||||
prompt: Optional[BasePromptTemplate] = None,
|
||||
output_parser: Optional[BaseLLMOutputParser] = None,
|
||||
**kwargs: Any,
|
||||
) -> LLMChain:
|
||||
"""Create an LLM chain that uses OpenAI functions.
|
||||
|
||||
Args:
|
||||
functions: A sequence of either dictionaries, pydantic.BaseModels, or
|
||||
Python functions. If dictionaries are passed in, they are assumed to
|
||||
already be a valid OpenAI functions. If only a single
|
||||
function is passed in, then it will be enforced that the model use that
|
||||
function. pydantic.BaseModels and Python functions should have docstrings
|
||||
describing what the function does. For best results, pydantic.BaseModels
|
||||
should have descriptions of the parameters and Python functions should have
|
||||
Google Python style args descriptions in the docstring. Additionally,
|
||||
Python functions should only use primitive types (str, int, float, bool) or
|
||||
pydantic.BaseModels for arguments.
|
||||
llm: Language model to use, assumed to support the OpenAI function-calling API.
|
||||
Defaults to ChatOpenAI using model gpt-3.5-turbo-0613.
|
||||
prompt: BasePromptTemplate to pass to the model. Defaults to a prompt that just
|
||||
passes user input directly to model.
|
||||
output_parser: BaseLLMOutputParser to use for parsing model outputs. By default
|
||||
will be inferred from the function types. If pydantic.BaseModels are passed
|
||||
in, then the OutputParser will try to parse outputs using those. Otherwise
|
||||
model outputs will simply be parsed as JSON. If multiple functions are
|
||||
passed in and they are not pydantic.BaseModels, the chain output will
|
||||
include both the name of the function that was returned and the arguments
|
||||
to pass to the function.
|
||||
|
||||
Returns:
|
||||
An LLMChain that will pass in the given functions to the model when run.
|
||||
|
||||
Example:
|
||||
.. code-block:: python
|
||||
|
||||
from langchain.chains.openai_functions import create_openai_fn_chain
|
||||
|
||||
from pydantic import BaseModel, Field
|
||||
|
||||
|
||||
class RecordPerson(BaseModel):
|
||||
\"\"\"Record some identifying information about a person.\"\"\"
|
||||
|
||||
name: str = Field(..., description="The person's name")
|
||||
age: int = Field(..., description="The person's age")
|
||||
fav_food: Optional[str] = Field(None, description="The person's favorite food")
|
||||
|
||||
|
||||
class RecordDog(BaseModel):
|
||||
\"\"\"Record some identifying information about a dog.\"\"\"
|
||||
|
||||
name: str = Field(..., description="The dog's name")
|
||||
color: str = Field(..., description="The dog's color")
|
||||
fav_food: Optional[str] = Field(None, description="The dog's favorite food")
|
||||
|
||||
|
||||
chain = create_openai_fn_chain([RecordPerson, RecordDog])
|
||||
chain.run("Harry was a chubby brown beagle who loved chicken")
|
||||
# -> RecordDog(name="Harry", color="brown", fav_food="chicken")
|
||||
""" # noqa: E501
|
||||
if not functions:
|
||||
raise ValueError("Need to pass in at least one function. Received zero.")
|
||||
openai_functions = [convert_to_openai_function(f) for f in functions]
|
||||
llm = llm or ChatOpenAI(model="gpt-3.5-turbo-0613", temperature=0)
|
||||
prompt = prompt or ChatPromptTemplate.from_template("{input}")
|
||||
fn_names = [oai_fn["name"] for oai_fn in openai_functions]
|
||||
output_parser = output_parser or _get_openai_output_parser(functions, fn_names)
|
||||
llm_kwargs: Dict[str, Any] = {
|
||||
"functions": openai_functions,
|
||||
}
|
||||
if len(openai_functions) == 1:
|
||||
llm_kwargs["function_call"] = {"name": openai_functions[0]["name"]}
|
||||
llm_chain = LLMChain(
|
||||
llm=llm,
|
||||
prompt=prompt,
|
||||
output_parser=output_parser,
|
||||
llm_kwargs=llm_kwargs,
|
||||
output_key="function",
|
||||
**kwargs,
|
||||
)
|
||||
return llm_chain
|
||||
|
||||
|
||||
def create_structured_output_chain(
|
||||
output_schema: Union[Dict[str, Any], BaseModel],
|
||||
llm: Optional[BaseLanguageModel] = None,
|
||||
prompt: Optional[BasePromptTemplate] = None,
|
||||
output_parser: Optional[BaseLLMOutputParser] = None,
|
||||
**kwargs: Any,
|
||||
) -> LLMChain:
|
||||
"""Create an LLMChain that uses an OpenAI function to get a structured output.
|
||||
|
||||
Args:
|
||||
output_schema: Either a dictionary or pydantic.BaseModel. If a dictionary is
|
||||
passed in, it's assumed to already be a valid JsonSchema.
|
||||
For best results, pydantic.BaseModels should have docstrings describing what
|
||||
the schema represents and descriptions for the parameters.
|
||||
llm: Language model to use, assumed to support the OpenAI function-calling API.
|
||||
Defaults to ChatOpenAI using model gpt-3.5-turbo-0613.
|
||||
prompt: BasePromptTemplate to pass to the model. Defaults to a prompt that just
|
||||
passes user input directly to model.
|
||||
output_parser: BaseLLMOutputParser to use for parsing model outputs. By default
|
||||
will be inferred from the function types. If pydantic.BaseModels are passed
|
||||
in, then the OutputParser will try to parse outputs using those. Otherwise
|
||||
model outputs will simply be parsed as JSON.
|
||||
|
||||
Returns:
|
||||
An LLMChain that will pass the given function to the model.
|
||||
|
||||
Example:
|
||||
.. code-block:: python
|
||||
|
||||
from langchain.chains.openai_functions import create_structured_output_chain
|
||||
|
||||
from pydantic import BaseModel, Field
|
||||
|
||||
class Dog(BaseModel):
|
||||
\"\"\"Identifying information about a dog.\"\"\"
|
||||
|
||||
name: str = Field(..., description="The dog's name")
|
||||
color: str = Field(..., description="The dog's color")
|
||||
fav_food: Optional[str] = Field(None, description="The dog's favorite food")
|
||||
|
||||
chain = create_structured_output_chain([Dog])
|
||||
chain.run("Harry was a chubby brown beagle who loved chicken")
|
||||
# -> Dog(name="Harry", color="brown", fav_food="chicken")
|
||||
""" # noqa: E501
|
||||
function: Dict = {
|
||||
"name": "output_formatter",
|
||||
"description": (
|
||||
"Output formatter. Should always be used to format your response to the"
|
||||
" user."
|
||||
),
|
||||
}
|
||||
parameters = (
|
||||
output_schema if isinstance(output_schema, dict) else output_schema.schema()
|
||||
)
|
||||
function["parameters"] = parameters
|
||||
return create_openai_fn_chain(
|
||||
[function], llm=llm, prompt=prompt, output_parser=output_parser, **kwargs
|
||||
)
|
||||
@@ -14,7 +14,6 @@ from langchain.callbacks.manager import (
|
||||
AsyncCallbackManagerForChainRun,
|
||||
CallbackManagerForChainRun,
|
||||
)
|
||||
from langchain.chains import ReduceDocumentsChain
|
||||
from langchain.chains.base import Chain
|
||||
from langchain.chains.combine_documents.base import BaseCombineDocumentsChain
|
||||
from langchain.chains.combine_documents.map_reduce import MapReduceDocumentsChain
|
||||
@@ -59,16 +58,13 @@ class BaseQAWithSourcesChain(Chain, ABC):
|
||||
document_prompt=document_prompt,
|
||||
document_variable_name="summaries",
|
||||
)
|
||||
reduce_documents_chain = ReduceDocumentsChain(
|
||||
combine_documents_chain=combine_results_chain
|
||||
)
|
||||
combine_documents_chain = MapReduceDocumentsChain(
|
||||
combine_document_chain = MapReduceDocumentsChain(
|
||||
llm_chain=llm_question_chain,
|
||||
reduce_documents_chain=reduce_documents_chain,
|
||||
combine_document_chain=combine_results_chain,
|
||||
document_variable_name="context",
|
||||
)
|
||||
return cls(
|
||||
combine_documents_chain=combine_documents_chain,
|
||||
combine_documents_chain=combine_document_chain,
|
||||
**kwargs,
|
||||
)
|
||||
|
||||
@@ -82,10 +78,10 @@ class BaseQAWithSourcesChain(Chain, ABC):
|
||||
) -> BaseQAWithSourcesChain:
|
||||
"""Load chain from chain type."""
|
||||
_chain_kwargs = chain_type_kwargs or {}
|
||||
combine_documents_chain = load_qa_with_sources_chain(
|
||||
combine_document_chain = load_qa_with_sources_chain(
|
||||
llm, chain_type=chain_type, **_chain_kwargs
|
||||
)
|
||||
return cls(combine_documents_chain=combine_documents_chain, **kwargs)
|
||||
return cls(combine_documents_chain=combine_document_chain, **kwargs)
|
||||
|
||||
class Config:
|
||||
"""Configuration for this pydantic object."""
|
||||
@@ -114,7 +110,7 @@ class BaseQAWithSourcesChain(Chain, ABC):
|
||||
|
||||
@root_validator(pre=True)
|
||||
def validate_naming(cls, values: Dict) -> Dict:
|
||||
"""Fix backwards compatibility in naming."""
|
||||
"""Fix backwards compatability in naming."""
|
||||
if "combine_document_chain" in values:
|
||||
values["combine_documents_chain"] = values.pop("combine_document_chain")
|
||||
return values
|
||||
|
||||
@@ -4,7 +4,6 @@ from __future__ import annotations
|
||||
from typing import Any, Mapping, Optional, Protocol
|
||||
|
||||
from langchain.base_language import BaseLanguageModel
|
||||
from langchain.chains import ReduceDocumentsChain
|
||||
from langchain.chains.combine_documents.base import BaseCombineDocumentsChain
|
||||
from langchain.chains.combine_documents.map_reduce import MapReduceDocumentsChain
|
||||
from langchain.chains.combine_documents.map_rerank import MapRerankDocumentsChain
|
||||
@@ -79,13 +78,12 @@ def _load_map_reduce_chain(
|
||||
reduce_llm: Optional[BaseLanguageModel] = None,
|
||||
collapse_llm: Optional[BaseLanguageModel] = None,
|
||||
verbose: Optional[bool] = None,
|
||||
token_max: int = 3000,
|
||||
**kwargs: Any,
|
||||
) -> MapReduceDocumentsChain:
|
||||
map_chain = LLMChain(llm=llm, prompt=question_prompt, verbose=verbose)
|
||||
_reduce_llm = reduce_llm or llm
|
||||
reduce_chain = LLMChain(llm=_reduce_llm, prompt=combine_prompt, verbose=verbose)
|
||||
combine_documents_chain = StuffDocumentsChain(
|
||||
combine_document_chain = StuffDocumentsChain(
|
||||
llm_chain=reduce_chain,
|
||||
document_variable_name=combine_document_variable_name,
|
||||
document_prompt=document_prompt,
|
||||
@@ -109,16 +107,11 @@ def _load_map_reduce_chain(
|
||||
document_variable_name=combine_document_variable_name,
|
||||
document_prompt=document_prompt,
|
||||
)
|
||||
reduce_documents_chain = ReduceDocumentsChain(
|
||||
combine_documents_chain=combine_documents_chain,
|
||||
collapse_documents_chain=collapse_chain,
|
||||
token_max=token_max,
|
||||
verbose=verbose,
|
||||
)
|
||||
return MapReduceDocumentsChain(
|
||||
llm_chain=map_chain,
|
||||
reduce_documents_chain=reduce_documents_chain,
|
||||
combine_document_chain=combine_document_chain,
|
||||
document_variable_name=map_reduce_document_variable_name,
|
||||
collapse_document_chain=collapse_chain,
|
||||
verbose=verbose,
|
||||
**kwargs,
|
||||
)
|
||||
|
||||
@@ -123,8 +123,8 @@ def load_query_constructor_chain(
|
||||
enable_limit: bool = False,
|
||||
**kwargs: Any,
|
||||
) -> LLMChain:
|
||||
"""Load a query constructor chain.
|
||||
|
||||
"""
|
||||
Load a query constructor chain.
|
||||
Args:
|
||||
llm: BaseLanguageModel to use for the chain.
|
||||
document_contents: The contents of the document to be queried.
|
||||
|
||||
@@ -1,10 +1,14 @@
|
||||
import datetime
|
||||
from typing import Any, Optional, Sequence, Union
|
||||
|
||||
from langchain.utils import check_package_version
|
||||
|
||||
try:
|
||||
check_package_version("lark", gte_version="1.1.5")
|
||||
import lark
|
||||
from packaging import version
|
||||
|
||||
if version.parse(lark.__version__) < version.parse("1.1.5"):
|
||||
raise ValueError(
|
||||
f"Lark should be at least version 1.1.5, got {lark.__version__}"
|
||||
)
|
||||
from lark import Lark, Transformer, v_args
|
||||
except ImportError:
|
||||
|
||||
|
||||
@@ -4,7 +4,6 @@ from typing import Any, Mapping, Optional, Protocol
|
||||
from langchain.base_language import BaseLanguageModel
|
||||
from langchain.callbacks.base import BaseCallbackManager
|
||||
from langchain.callbacks.manager import Callbacks
|
||||
from langchain.chains import ReduceDocumentsChain
|
||||
from langchain.chains.combine_documents.base import BaseCombineDocumentsChain
|
||||
from langchain.chains.combine_documents.map_reduce import MapReduceDocumentsChain
|
||||
from langchain.chains.combine_documents.map_rerank import MapRerankDocumentsChain
|
||||
@@ -99,7 +98,6 @@ def _load_map_reduce_chain(
|
||||
verbose: Optional[bool] = None,
|
||||
callback_manager: Optional[BaseCallbackManager] = None,
|
||||
callbacks: Callbacks = None,
|
||||
token_max: int = 3000,
|
||||
**kwargs: Any,
|
||||
) -> MapReduceDocumentsChain:
|
||||
_question_prompt = (
|
||||
@@ -124,7 +122,7 @@ def _load_map_reduce_chain(
|
||||
callbacks=callbacks,
|
||||
)
|
||||
# TODO: document prompt
|
||||
combine_documents_chain = StuffDocumentsChain(
|
||||
combine_document_chain = StuffDocumentsChain(
|
||||
llm_chain=reduce_chain,
|
||||
document_variable_name=combine_document_variable_name,
|
||||
verbose=verbose,
|
||||
@@ -152,16 +150,11 @@ def _load_map_reduce_chain(
|
||||
verbose=verbose,
|
||||
callback_manager=callback_manager,
|
||||
)
|
||||
reduce_documents_chain = ReduceDocumentsChain(
|
||||
combine_documents_chain=combine_documents_chain,
|
||||
collapse_documents_chain=collapse_chain,
|
||||
token_max=token_max,
|
||||
verbose=verbose,
|
||||
)
|
||||
return MapReduceDocumentsChain(
|
||||
llm_chain=map_chain,
|
||||
combine_document_chain=combine_document_chain,
|
||||
document_variable_name=map_reduce_document_variable_name,
|
||||
reduce_documents_chain=reduce_documents_chain,
|
||||
collapse_document_chain=collapse_chain,
|
||||
verbose=verbose,
|
||||
callback_manager=callback_manager,
|
||||
callbacks=callbacks,
|
||||
|
||||
@@ -2,7 +2,6 @@
|
||||
from typing import Any, Mapping, Optional, Protocol
|
||||
|
||||
from langchain.base_language import BaseLanguageModel
|
||||
from langchain.chains import ReduceDocumentsChain
|
||||
from langchain.chains.combine_documents.base import BaseCombineDocumentsChain
|
||||
from langchain.chains.combine_documents.map_reduce import MapReduceDocumentsChain
|
||||
from langchain.chains.combine_documents.refine import RefineDocumentsChain
|
||||
@@ -48,14 +47,13 @@ def _load_map_reduce_chain(
|
||||
reduce_llm: Optional[BaseLanguageModel] = None,
|
||||
collapse_llm: Optional[BaseLanguageModel] = None,
|
||||
verbose: Optional[bool] = None,
|
||||
token_max: int = 3000,
|
||||
**kwargs: Any,
|
||||
) -> MapReduceDocumentsChain:
|
||||
map_chain = LLMChain(llm=llm, prompt=map_prompt, verbose=verbose)
|
||||
_reduce_llm = reduce_llm or llm
|
||||
reduce_chain = LLMChain(llm=_reduce_llm, prompt=combine_prompt, verbose=verbose)
|
||||
# TODO: document prompt
|
||||
combine_documents_chain = StuffDocumentsChain(
|
||||
combine_document_chain = StuffDocumentsChain(
|
||||
llm_chain=reduce_chain,
|
||||
document_variable_name=combine_document_variable_name,
|
||||
verbose=verbose,
|
||||
@@ -77,16 +75,11 @@ def _load_map_reduce_chain(
|
||||
),
|
||||
document_variable_name=combine_document_variable_name,
|
||||
)
|
||||
reduce_documents_chain = ReduceDocumentsChain(
|
||||
combine_documents_chain=combine_documents_chain,
|
||||
collapse_documents_chain=collapse_chain,
|
||||
token_max=token_max,
|
||||
verbose=verbose,
|
||||
)
|
||||
return MapReduceDocumentsChain(
|
||||
llm_chain=map_chain,
|
||||
reduce_documents_chain=reduce_documents_chain,
|
||||
combine_document_chain=combine_document_chain,
|
||||
document_variable_name=map_reduce_document_variable_name,
|
||||
collapse_document_chain=collapse_chain,
|
||||
verbose=verbose,
|
||||
**kwargs,
|
||||
)
|
||||
|
||||
@@ -2,7 +2,6 @@ from langchain.chat_models.anthropic import ChatAnthropic
|
||||
from langchain.chat_models.azure_openai import AzureChatOpenAI
|
||||
from langchain.chat_models.fake import FakeListChatModel
|
||||
from langchain.chat_models.google_palm import ChatGooglePalm
|
||||
from langchain.chat_models.human import HumanInputChatModel
|
||||
from langchain.chat_models.openai import ChatOpenAI
|
||||
from langchain.chat_models.promptlayer_openai import PromptLayerChatOpenAI
|
||||
from langchain.chat_models.vertexai import ChatVertexAI
|
||||
@@ -15,5 +14,4 @@ __all__ = [
|
||||
"ChatAnthropic",
|
||||
"ChatGooglePalm",
|
||||
"ChatVertexAI",
|
||||
"HumanInputChatModel",
|
||||
]
|
||||
|
||||
@@ -34,10 +34,6 @@ class ChatAnthropic(BaseChatModel, _AnthropicCommon):
|
||||
model = ChatAnthropic(model="<model_name>", anthropic_api_key="my-api-key")
|
||||
"""
|
||||
|
||||
@property
|
||||
def lc_secrets(self) -> Dict[str, str]:
|
||||
return {"anthropic_api_key": "ANTHROPIC_API_KEY"}
|
||||
|
||||
@property
|
||||
def _llm_type(self) -> str:
|
||||
"""Return type of chat model."""
|
||||
@@ -108,17 +104,17 @@ class ChatAnthropic(BaseChatModel, _AnthropicCommon):
|
||||
|
||||
if self.streaming:
|
||||
completion = ""
|
||||
stream_resp = self.client.completions.create(**params, stream=True)
|
||||
stream_resp = self.client.completion_stream(**params)
|
||||
for data in stream_resp:
|
||||
delta = data.completion
|
||||
completion += delta
|
||||
delta = data["completion"][len(completion) :]
|
||||
completion = data["completion"]
|
||||
if run_manager:
|
||||
run_manager.on_llm_new_token(
|
||||
delta,
|
||||
)
|
||||
else:
|
||||
response = self.client.completions.create(**params)
|
||||
completion = response.completion
|
||||
response = self.client.completion(**params)
|
||||
completion = response["completion"]
|
||||
message = AIMessage(content=completion)
|
||||
return ChatResult(generations=[ChatGeneration(message=message)])
|
||||
|
||||
@@ -136,19 +132,17 @@ class ChatAnthropic(BaseChatModel, _AnthropicCommon):
|
||||
|
||||
if self.streaming:
|
||||
completion = ""
|
||||
stream_resp = await self.async_client.completions.create(
|
||||
**params, stream=True
|
||||
)
|
||||
stream_resp = await self.client.acompletion_stream(**params)
|
||||
async for data in stream_resp:
|
||||
delta = data.completion
|
||||
completion += delta
|
||||
delta = data["completion"][len(completion) :]
|
||||
completion = data["completion"]
|
||||
if run_manager:
|
||||
await run_manager.on_llm_new_token(
|
||||
delta,
|
||||
)
|
||||
else:
|
||||
response = await self.async_client.completions.create(**params)
|
||||
completion = response.completion
|
||||
response = await self.client.acompletion(**params)
|
||||
completion = response["completion"]
|
||||
message = AIMessage(content=completion)
|
||||
return ChatResult(generations=[ChatGeneration(message=message)])
|
||||
|
||||
|
||||
@@ -121,13 +121,12 @@ class AzureChatOpenAI(ChatOpenAI):
|
||||
return {**self._default_params}
|
||||
|
||||
@property
|
||||
def _client_params(self) -> Dict[str, Any]:
|
||||
"""Get the config params used for the openai client."""
|
||||
def _invocation_params(self) -> Mapping[str, Any]:
|
||||
openai_creds = {
|
||||
"api_type": self.openai_api_type,
|
||||
"api_version": self.openai_api_version,
|
||||
}
|
||||
return {**super()._client_params, **openai_creds}
|
||||
return {**openai_creds, **super()._invocation_params}
|
||||
|
||||
@property
|
||||
def _llm_type(self) -> str:
|
||||
|
||||
@@ -40,8 +40,6 @@ class BaseChatModel(BaseLanguageModel, ABC):
|
||||
callback_manager: Optional[BaseCallbackManager] = Field(default=None, exclude=True)
|
||||
tags: Optional[List[str]] = Field(default=None, exclude=True)
|
||||
"""Tags to add to the run trace."""
|
||||
metadata: Optional[Dict[str, Any]] = Field(default=None, exclude=True)
|
||||
"""Metadata to add to the run trace."""
|
||||
|
||||
@root_validator()
|
||||
def raise_deprecation(cls, values: Dict) -> Dict:
|
||||
@@ -65,11 +63,10 @@ class BaseChatModel(BaseLanguageModel, ABC):
|
||||
def _get_invocation_params(
|
||||
self,
|
||||
stop: Optional[List[str]] = None,
|
||||
**kwargs: Any,
|
||||
) -> dict:
|
||||
params = self.dict()
|
||||
params["stop"] = stop
|
||||
return {**params, **kwargs}
|
||||
return params
|
||||
|
||||
def _get_llm_string(self, stop: Optional[List[str]] = None, **kwargs: Any) -> str:
|
||||
if self.lc_serializable:
|
||||
@@ -78,7 +75,7 @@ class BaseChatModel(BaseLanguageModel, ABC):
|
||||
llm_string = dumps(self)
|
||||
return llm_string + "---" + param_string
|
||||
else:
|
||||
params = self._get_invocation_params(stop=stop, **kwargs)
|
||||
params = self._get_invocation_params(stop=stop)
|
||||
params = {**params, **kwargs}
|
||||
return str(sorted([(k, v) for k, v in params.items()]))
|
||||
|
||||
@@ -89,11 +86,10 @@ class BaseChatModel(BaseLanguageModel, ABC):
|
||||
callbacks: Callbacks = None,
|
||||
*,
|
||||
tags: Optional[List[str]] = None,
|
||||
metadata: Optional[Dict[str, Any]] = None,
|
||||
**kwargs: Any,
|
||||
) -> LLMResult:
|
||||
"""Top Level call"""
|
||||
params = self._get_invocation_params(stop=stop, **kwargs)
|
||||
params = self._get_invocation_params(stop=stop)
|
||||
options = {"stop": stop}
|
||||
|
||||
callback_manager = CallbackManager.configure(
|
||||
@@ -102,8 +98,6 @@ class BaseChatModel(BaseLanguageModel, ABC):
|
||||
self.verbose,
|
||||
tags,
|
||||
self.tags,
|
||||
metadata,
|
||||
self.metadata,
|
||||
)
|
||||
run_managers = callback_manager.on_chat_model_start(
|
||||
dumpd(self), messages, invocation_params=params, options=options
|
||||
@@ -145,11 +139,10 @@ class BaseChatModel(BaseLanguageModel, ABC):
|
||||
callbacks: Callbacks = None,
|
||||
*,
|
||||
tags: Optional[List[str]] = None,
|
||||
metadata: Optional[Dict[str, Any]] = None,
|
||||
**kwargs: Any,
|
||||
) -> LLMResult:
|
||||
"""Top Level call"""
|
||||
params = self._get_invocation_params(stop=stop, **kwargs)
|
||||
params = self._get_invocation_params(stop=stop)
|
||||
options = {"stop": stop}
|
||||
|
||||
callback_manager = AsyncCallbackManager.configure(
|
||||
@@ -158,8 +151,6 @@ class BaseChatModel(BaseLanguageModel, ABC):
|
||||
self.verbose,
|
||||
tags,
|
||||
self.tags,
|
||||
metadata,
|
||||
self.metadata,
|
||||
)
|
||||
|
||||
run_managers = await callback_manager.on_chat_model_start(
|
||||
|
||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user