mirror of
https://github.com/hwchase17/langchain.git
synced 2026-02-19 05:18:22 +00:00
Compare commits
2 Commits
harrison/m
...
harrison/n
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
10a73de9dc | ||
|
|
62d2edd165 |
2
.github/ISSUE_TEMPLATE/bug-report.yml
vendored
2
.github/ISSUE_TEMPLATE/bug-report.yml
vendored
@@ -46,7 +46,7 @@ body:
|
||||
- @agola11
|
||||
|
||||
Tools / Toolkits
|
||||
- ...
|
||||
- @vowelparrot
|
||||
|
||||
placeholder: "@Username ..."
|
||||
|
||||
|
||||
2
.github/PULL_REQUEST_TEMPLATE.md
vendored
2
.github/PULL_REQUEST_TEMPLATE.md
vendored
@@ -48,7 +48,7 @@ Tag maintainers/contributors who might be interested:
|
||||
- @agola11
|
||||
|
||||
Agents / Tools / Toolkits
|
||||
- @hwchase17
|
||||
- @vowelparrot
|
||||
|
||||
VectorStores / Retrievers / Memory
|
||||
- @dev2049
|
||||
|
||||
@@ -2,7 +2,6 @@
|
||||
|
||||
⚡ Building applications with LLMs through composability ⚡
|
||||
|
||||
[](https://github.com/hwchase17/langchain/releases)
|
||||
[](https://github.com/hwchase17/langchain/actions/workflows/lint.yml)
|
||||
[](https://github.com/hwchase17/langchain/actions/workflows/test.yml)
|
||||
[](https://github.com/hwchase17/langchain/actions/workflows/linkcheck.yml)
|
||||
@@ -13,8 +12,6 @@
|
||||
[](https://vscode.dev/redirect?url=vscode://ms-vscode-remote.remote-containers/cloneInVolume?url=https://github.com/hwchase17/langchain)
|
||||
[](https://codespaces.new/hwchase17/langchain)
|
||||
[](https://star-history.com/#hwchase17/langchain)
|
||||
[](https://libraries.io/github/hwchase17/langchain)
|
||||
[](https://github.com/hwchase17/langchain/issues)
|
||||
|
||||
|
||||
Looking for the JS/TS version? Check out [LangChain.js](https://github.com/hwchase17/langchainjs).
|
||||
|
||||
1
docs/_static/js/mendablesearch.js
vendored
1
docs/_static/js/mendablesearch.js
vendored
@@ -37,7 +37,6 @@ document.addEventListener('DOMContentLoaded', () => {
|
||||
style: { darkMode: false, accentColor: '#010810' },
|
||||
floatingButtonStyle: { color: '#ffffff', backgroundColor: '#010810' },
|
||||
anon_key: '82842b36-3ea6-49b2-9fb8-52cfc4bde6bf', // Mendable Search Public ANON key, ok to be public
|
||||
cmdShortcutKey:'j',
|
||||
messageSettings: {
|
||||
openSourcesInNewTab: false,
|
||||
prettySources: true // Prettify the sources displayed now
|
||||
|
||||
@@ -1,137 +0,0 @@
|
||||
|
||||
===========================
|
||||
Deploying LLMs in Production
|
||||
===========================
|
||||
|
||||
In today's fast-paced technological landscape, the use of Large Language Models (LLMs) is rapidly expanding. As a result, it's crucial for developers to understand how to effectively deploy these models in production environments. LLM interfaces typically fall into two categories:
|
||||
|
||||
- **Case 1: Utilizing External LLM Providers (OpenAI, Anthropic, etc.)**
|
||||
In this scenario, most of the computational burden is handled by the LLM providers, while LangChain simplifies the implementation of business logic around these services. This approach includes features such as prompt templating, chat message generation, caching, vector embedding database creation, preprocessing, etc.
|
||||
|
||||
- **Case 2: Self-hosted Open-Source Models**
|
||||
Alternatively, developers can opt to use smaller, yet comparably capable, self-hosted open-source LLM models. This approach can significantly decrease costs, latency, and privacy concerns associated with transferring data to external LLM providers.
|
||||
|
||||
Regardless of the framework that forms the backbone of your product, deploying LLM applications comes with its own set of challenges. It's vital to understand the trade-offs and key considerations when evaluating serving frameworks.
|
||||
|
||||
Outline
|
||||
=======
|
||||
|
||||
This guide aims to provide a comprehensive overview of the requirements for deploying LLMs in a production setting, focusing on:
|
||||
|
||||
- `Designing a Robust LLM Application Service <#robust>`_
|
||||
- `Maintaining Cost-Efficiency <#cost>`_
|
||||
- `Ensuring Rapid Iteration <#iteration>`_
|
||||
|
||||
Understanding these components is crucial when assessing serving systems. LangChain integrates with several open-source projects designed to tackle these issues, providing a robust framework for productionizing your LLM applications. Some notable frameworks include:
|
||||
|
||||
- `Ray Serve <../integrations/ray_serve.html>`_
|
||||
- `BentoML <https://github.com/ssheng/BentoChain>`_
|
||||
- `Modal <../integrations/modal.html>`_
|
||||
|
||||
These links will provide further information on each ecosystem, assisting you in finding the best fit for your LLM deployment needs.
|
||||
|
||||
Designing a Robust LLM Application Service
|
||||
===========================================
|
||||
.. _robust:
|
||||
|
||||
When deploying an LLM service in production, it's imperative to provide a seamless user experience free from outages. Achieving 24/7 service availability involves creating and maintaining several sub-systems surrounding your application.
|
||||
|
||||
Monitoring
|
||||
----------
|
||||
|
||||
Monitoring forms an integral part of any system running in a production environment. In the context of LLMs, it is essential to monitor both performance and quality metrics.
|
||||
|
||||
**Performance Metrics:** These metrics provide insights into the efficiency and capacity of your model. Here are some key examples:
|
||||
|
||||
- Query per second (QPS): This measures the number of queries your model processes in a second, offering insights into its utilization.
|
||||
- Latency: This metric quantifies the delay from when your client sends a request to when they receive a response.
|
||||
- Tokens Per Second (TPS): This represents the number of tokens your model can generate in a second.
|
||||
|
||||
**Quality Metrics:** These metrics are typically customized according to the business use-case. For instance, how does the output of your system compare to a baseline, such as a previous version? Although these metrics can be calculated offline, you need to log the necessary data to use them later.
|
||||
|
||||
Fault tolerance
|
||||
---------------
|
||||
|
||||
Your application may encounter errors such as exceptions in your model inference or business logic code, causing failures and disrupting traffic. Other potential issues could arise from the machine running your application, such as unexpected hardware breakdowns or loss of spot-instances during high-demand periods. One way to mitigate these risks is by increasing redundancy through replica scaling and implementing recovery mechanisms for failed replicas. However, model replicas aren't the only potential points of failure. It's essential to build resilience against various failures that could occur at any point in your stack.
|
||||
|
||||
|
||||
Zero down time upgrade
|
||||
----------------------
|
||||
|
||||
System upgrades are often necessary but can result in service disruptions if not handled correctly. One way to prevent downtime during upgrades is by implementing a smooth transition process from the old version to the new one. Ideally, the new version of your LLM service is deployed, and traffic gradually shifts from the old to the new version, maintaining a constant QPS throughout the process.
|
||||
|
||||
|
||||
Load balancing
|
||||
--------------
|
||||
|
||||
Load balancing, in simple terms, is a technique to distribute work evenly across multiple computers, servers, or other resources to optimize the utilization of the system, maximize throughput, minimize response time, and avoid overload of any single resource. Think of it as a traffic officer directing cars (requests) to different roads (servers) so that no single road becomes too congested.
|
||||
|
||||
There are several strategies for load balancing. For example, one common method is the *Round Robin* strategy, where each request is sent to the next server in line, cycling back to the first when all servers have received a request. This works well when all servers are equally capable. However, if some servers are more powerful than others, you might use a *Weighted Round Robin* or *Least Connections* strategy, where more requests are sent to the more powerful servers, or to those currently handling the fewest active requests. Let's imagine you're running a LLM chain. If your application becomes popular, you could have hundreds or even thousands of users asking questions at the same time. If one server gets too busy (high load), the load balancer would direct new requests to another server that is less busy. This way, all your users get a timely response and the system remains stable.
|
||||
|
||||
|
||||
|
||||
Maintaining Cost-Efficiency and Scalability
|
||||
============================================
|
||||
.. _cost:
|
||||
|
||||
Deploying LLM services can be costly, especially when you're handling a large volume of user interactions. Charges by LLM providers are usually based on tokens used, making a chat system inference on these models potentially expensive. However, several strategies can help manage these costs without compromising the quality of the service.
|
||||
|
||||
|
||||
Self-hosting models
|
||||
-------------------
|
||||
|
||||
Several smaller and open-source LLMs are emerging to tackle the issue of reliance on LLM providers. Self-hosting allows you to maintain similar quality to LLM provider models while managing costs. The challenge lies in building a reliable, high-performing LLM serving system on your own machines.
|
||||
|
||||
Resource Management and Auto-Scaling
|
||||
------------------------------------
|
||||
|
||||
Computational logic within your application requires precise resource allocation. For instance, if part of your traffic is served by an OpenAI endpoint and another part by a self-hosted model, it's crucial to allocate suitable resources for each. Auto-scaling—adjusting resource allocation based on traffic—can significantly impact the cost of running your application. This strategy requires a balance between cost and responsiveness, ensuring neither resource over-provisioning nor compromised application responsiveness.
|
||||
|
||||
Utilizing Spot Instances
|
||||
------------------------
|
||||
|
||||
On platforms like AWS, spot instances offer substantial cost savings, typically priced at about a third of on-demand instances. The trade-off is a higher crash rate, necessitating a robust fault-tolerance mechanism for effective use.
|
||||
|
||||
Independent Scaling
|
||||
-------------------
|
||||
|
||||
When self-hosting your models, you should consider independent scaling. For example, if you have two translation models, one fine-tuned for French and another for Spanish, incoming requests might necessitate different scaling requirements for each.
|
||||
|
||||
Batching requests
|
||||
-----------------
|
||||
|
||||
In the context of Large Language Models, batching requests can enhance efficiency by better utilizing your GPU resources. GPUs are inherently parallel processors, designed to handle multiple tasks simultaneously. If you send individual requests to the model, the GPU might not be fully utilized as it's only working on a single task at a time. On the other hand, by batching requests together, you're allowing the GPU to work on multiple tasks at once, maximizing its utilization and improving inference speed. This not only leads to cost savings but can also improve the overall latency of your LLM service.
|
||||
|
||||
|
||||
In summary, managing costs while scaling your LLM services requires a strategic approach. Utilizing self-hosting models, managing resources effectively, employing auto-scaling, using spot instances, independently scaling models, and batching requests are key strategies to consider. Open-source libraries such as Ray Serve and BentoML are designed to deal with these complexities.
|
||||
|
||||
|
||||
|
||||
Ensuring Rapid Iteration
|
||||
========================
|
||||
|
||||
.. _iteration:
|
||||
|
||||
The LLM landscape is evolving at an unprecedented pace, with new libraries and model architectures being introduced constantly. Consequently, it's crucial to avoid tying yourself to a solution specific to one particular framework. This is especially relevant in serving, where changes to your infrastructure can be time-consuming, expensive, and risky. Strive for infrastructure that is not locked into any specific machine learning library or framework, but instead offers a general-purpose, scalable serving layer. Here are some aspects where flexibility plays a key role:
|
||||
|
||||
Model composition
|
||||
-----------------
|
||||
|
||||
Deploying systems like LangChain demands the ability to piece together different models and connect them via logic. Take the example of building a natural language input SQL query engine. Querying an LLM and obtaining the SQL command is only part of the system. You need to extract metadata from the connected database, construct a prompt for the LLM, run the SQL query on an engine, collect and feed back the response to the LLM as the query runs, and present the results to the user. This demonstrates the need to seamlessly integrate various complex components built in Python into a dynamic chain of logical blocks that can be served together.
|
||||
|
||||
Cloud providers
|
||||
---------------
|
||||
|
||||
Many hosted solutions are restricted to a single cloud provider, which can limit your options in today's multi-cloud world. Depending on where your other infrastructure components are built, you might prefer to stick with your chosen cloud provider.
|
||||
|
||||
|
||||
Infrastructure as Code (IaC)
|
||||
---------------------------
|
||||
|
||||
Rapid iteration also involves the ability to recreate your infrastructure quickly and reliably. This is where Infrastructure as Code (IaC) tools like Terraform, CloudFormation, or Kubernetes YAML files come into play. They allow you to define your infrastructure in code files, which can be version controlled and quickly deployed, enabling faster and more reliable iterations.
|
||||
|
||||
|
||||
CI/CD
|
||||
-----
|
||||
|
||||
In a fast-paced environment, implementing CI/CD pipelines can significantly speed up the iteration process. They help automate the testing and deployment of your LLM applications, reducing the risk of errors and enabling faster feedback and iteration.
|
||||
@@ -2,230 +2,191 @@
|
||||
|
||||
Dependents stats for `hwchase17/langchain`
|
||||
|
||||
[](https://github.com/hwchase17/langchain/network/dependents)
|
||||
[&message=212&color=informational&logo=slickpic)](https://github.com/hwchase17/langchain/network/dependents)
|
||||
[&message=7272&color=informational&logo=slickpic)](https://github.com/hwchase17/langchain/network/dependents)
|
||||
[&message=19095&color=informational&logo=slickpic)](https://github.com/hwchase17/langchain/network/dependents)
|
||||
[](https://github.com/hwchase17/langchain/network/dependents)
|
||||
[&message=172&color=informational&logo=slickpic)](https://github.com/hwchase17/langchain/network/dependents)
|
||||
[&message=4980&color=informational&logo=slickpic)](https://github.com/hwchase17/langchain/network/dependents)
|
||||
[&message=17239&color=informational&logo=slickpic)](https://github.com/hwchase17/langchain/network/dependents)
|
||||
|
||||
[update: 2023-06-05; only dependent repositories with Stars > 100]
|
||||
[update: 2023-05-17; only dependent repositories with Stars > 100]
|
||||
|
||||
|
||||
| Repository | Stars |
|
||||
| :-------- | -----: |
|
||||
|[openai/openai-cookbook](https://github.com/openai/openai-cookbook) | 38024 |
|
||||
|[LAION-AI/Open-Assistant](https://github.com/LAION-AI/Open-Assistant) | 33609 |
|
||||
|[microsoft/TaskMatrix](https://github.com/microsoft/TaskMatrix) | 33136 |
|
||||
|[hpcaitech/ColossalAI](https://github.com/hpcaitech/ColossalAI) | 30032 |
|
||||
|[imartinez/privateGPT](https://github.com/imartinez/privateGPT) | 28094 |
|
||||
|[reworkd/AgentGPT](https://github.com/reworkd/AgentGPT) | 23430 |
|
||||
|[openai/chatgpt-retrieval-plugin](https://github.com/openai/chatgpt-retrieval-plugin) | 17942 |
|
||||
|[jerryjliu/llama_index](https://github.com/jerryjliu/llama_index) | 16697 |
|
||||
|[mindsdb/mindsdb](https://github.com/mindsdb/mindsdb) | 16410 |
|
||||
|[mlflow/mlflow](https://github.com/mlflow/mlflow) | 14517 |
|
||||
|[GaiZhenbiao/ChuanhuChatGPT](https://github.com/GaiZhenbiao/ChuanhuChatGPT) | 10793 |
|
||||
|[databrickslabs/dolly](https://github.com/databrickslabs/dolly) | 10155 |
|
||||
|[openai/evals](https://github.com/openai/evals) | 10076 |
|
||||
|[AIGC-Audio/AudioGPT](https://github.com/AIGC-Audio/AudioGPT) | 8619 |
|
||||
|[logspace-ai/langflow](https://github.com/logspace-ai/langflow) | 8211 |
|
||||
|[imClumsyPanda/langchain-ChatGLM](https://github.com/imClumsyPanda/langchain-ChatGLM) | 8154 |
|
||||
|[PromtEngineer/localGPT](https://github.com/PromtEngineer/localGPT) | 6853 |
|
||||
|[StanGirard/quivr](https://github.com/StanGirard/quivr) | 6830 |
|
||||
|[PipedreamHQ/pipedream](https://github.com/PipedreamHQ/pipedream) | 6520 |
|
||||
|[go-skynet/LocalAI](https://github.com/go-skynet/LocalAI) | 6018 |
|
||||
|[arc53/DocsGPT](https://github.com/arc53/DocsGPT) | 5643 |
|
||||
|[e2b-dev/e2b](https://github.com/e2b-dev/e2b) | 5075 |
|
||||
|[langgenius/dify](https://github.com/langgenius/dify) | 4281 |
|
||||
|[nsarrazin/serge](https://github.com/nsarrazin/serge) | 4228 |
|
||||
|[zauberzeug/nicegui](https://github.com/zauberzeug/nicegui) | 4084 |
|
||||
|[madawei2699/myGPTReader](https://github.com/madawei2699/myGPTReader) | 4039 |
|
||||
|[wenda-LLM/wenda](https://github.com/wenda-LLM/wenda) | 3871 |
|
||||
|[GreyDGL/PentestGPT](https://github.com/GreyDGL/PentestGPT) | 3837 |
|
||||
|[zilliztech/GPTCache](https://github.com/zilliztech/GPTCache) | 3625 |
|
||||
|[csunny/DB-GPT](https://github.com/csunny/DB-GPT) | 3545 |
|
||||
|[gkamradt/langchain-tutorials](https://github.com/gkamradt/langchain-tutorials) | 3404 |
|
||||
|[mmabrouk/chatgpt-wrapper](https://github.com/mmabrouk/chatgpt-wrapper) | 3303 |
|
||||
|[postgresml/postgresml](https://github.com/postgresml/postgresml) | 3052 |
|
||||
|[marqo-ai/marqo](https://github.com/marqo-ai/marqo) | 3014 |
|
||||
|[MineDojo/Voyager](https://github.com/MineDojo/Voyager) | 2945 |
|
||||
|[PrefectHQ/marvin](https://github.com/PrefectHQ/marvin) | 2761 |
|
||||
|[project-baize/baize-chatbot](https://github.com/project-baize/baize-chatbot) | 2673 |
|
||||
|[hwchase17/chat-langchain](https://github.com/hwchase17/chat-langchain) | 2589 |
|
||||
|[whitead/paper-qa](https://github.com/whitead/paper-qa) | 2572 |
|
||||
|[Azure-Samples/azure-search-openai-demo](https://github.com/Azure-Samples/azure-search-openai-demo) | 2366 |
|
||||
|[GerevAI/gerev](https://github.com/GerevAI/gerev) | 2330 |
|
||||
|[OpenGVLab/InternGPT](https://github.com/OpenGVLab/InternGPT) | 2289 |
|
||||
|[ParisNeo/gpt4all-ui](https://github.com/ParisNeo/gpt4all-ui) | 2159 |
|
||||
|[OpenBMB/BMTools](https://github.com/OpenBMB/BMTools) | 2158 |
|
||||
|[guangzhengli/ChatFiles](https://github.com/guangzhengli/ChatFiles) | 2005 |
|
||||
|[h2oai/h2ogpt](https://github.com/h2oai/h2ogpt) | 1939 |
|
||||
|[Farama-Foundation/PettingZoo](https://github.com/Farama-Foundation/PettingZoo) | 1845 |
|
||||
|[OpenGVLab/Ask-Anything](https://github.com/OpenGVLab/Ask-Anything) | 1749 |
|
||||
|[IntelligenzaArtificiale/Free-Auto-GPT](https://github.com/IntelligenzaArtificiale/Free-Auto-GPT) | 1740 |
|
||||
|[Unstructured-IO/unstructured](https://github.com/Unstructured-IO/unstructured) | 1628 |
|
||||
|[hwchase17/notion-qa](https://github.com/hwchase17/notion-qa) | 1607 |
|
||||
|[NVIDIA/NeMo-Guardrails](https://github.com/NVIDIA/NeMo-Guardrails) | 1544 |
|
||||
|[SamurAIGPT/privateGPT](https://github.com/SamurAIGPT/privateGPT) | 1543 |
|
||||
|[paulpierre/RasaGPT](https://github.com/paulpierre/RasaGPT) | 1526 |
|
||||
|[yanqiangmiffy/Chinese-LangChain](https://github.com/yanqiangmiffy/Chinese-LangChain) | 1485 |
|
||||
|[Kav-K/GPTDiscord](https://github.com/Kav-K/GPTDiscord) | 1402 |
|
||||
|[vocodedev/vocode-python](https://github.com/vocodedev/vocode-python) | 1387 |
|
||||
|[Chainlit/chainlit](https://github.com/Chainlit/chainlit) | 1336 |
|
||||
|[lunasec-io/lunasec](https://github.com/lunasec-io/lunasec) | 1323 |
|
||||
|[psychic-api/psychic](https://github.com/psychic-api/psychic) | 1248 |
|
||||
|[agiresearch/OpenAGI](https://github.com/agiresearch/OpenAGI) | 1208 |
|
||||
|[jina-ai/thinkgpt](https://github.com/jina-ai/thinkgpt) | 1193 |
|
||||
|[thomas-yanxin/LangChain-ChatGLM-Webui](https://github.com/thomas-yanxin/LangChain-ChatGLM-Webui) | 1182 |
|
||||
|[ttengwang/Caption-Anything](https://github.com/ttengwang/Caption-Anything) | 1137 |
|
||||
|[jina-ai/dev-gpt](https://github.com/jina-ai/dev-gpt) | 1135 |
|
||||
|[greshake/llm-security](https://github.com/greshake/llm-security) | 1086 |
|
||||
|[keephq/keep](https://github.com/keephq/keep) | 1063 |
|
||||
|[juncongmoo/chatllama](https://github.com/juncongmoo/chatllama) | 1037 |
|
||||
|[richardyc/Chrome-GPT](https://github.com/richardyc/Chrome-GPT) | 1035 |
|
||||
|[visual-openllm/visual-openllm](https://github.com/visual-openllm/visual-openllm) | 997 |
|
||||
|[mmz-001/knowledge_gpt](https://github.com/mmz-001/knowledge_gpt) | 995 |
|
||||
|[jina-ai/langchain-serve](https://github.com/jina-ai/langchain-serve) | 949 |
|
||||
|[irgolic/AutoPR](https://github.com/irgolic/AutoPR) | 936 |
|
||||
|[microsoft/X-Decoder](https://github.com/microsoft/X-Decoder) | 908 |
|
||||
|[poe-platform/api-bot-tutorial](https://github.com/poe-platform/api-bot-tutorial) | 902 |
|
||||
|[peterw/Chat-with-Github-Repo](https://github.com/peterw/Chat-with-Github-Repo) | 875 |
|
||||
|[cirediatpl/FigmaChain](https://github.com/cirediatpl/FigmaChain) | 822 |
|
||||
|[homanp/superagent](https://github.com/homanp/superagent) | 806 |
|
||||
|[seanpixel/Teenage-AGI](https://github.com/seanpixel/Teenage-AGI) | 800 |
|
||||
|[chatarena/chatarena](https://github.com/chatarena/chatarena) | 796 |
|
||||
|[hashintel/hash](https://github.com/hashintel/hash) | 795 |
|
||||
|[SamurAIGPT/Camel-AutoGPT](https://github.com/SamurAIGPT/Camel-AutoGPT) | 786 |
|
||||
|[rlancemartin/auto-evaluator](https://github.com/rlancemartin/auto-evaluator) | 770 |
|
||||
|[corca-ai/EVAL](https://github.com/corca-ai/EVAL) | 769 |
|
||||
|[101dotxyz/GPTeam](https://github.com/101dotxyz/GPTeam) | 755 |
|
||||
|[noahshinn024/reflexion](https://github.com/noahshinn024/reflexion) | 706 |
|
||||
|[eyurtsev/kor](https://github.com/eyurtsev/kor) | 695 |
|
||||
|[cheshire-cat-ai/core](https://github.com/cheshire-cat-ai/core) | 681 |
|
||||
|[e-johnstonn/BriefGPT](https://github.com/e-johnstonn/BriefGPT) | 656 |
|
||||
|[run-llama/llama-lab](https://github.com/run-llama/llama-lab) | 635 |
|
||||
|[griptape-ai/griptape](https://github.com/griptape-ai/griptape) | 583 |
|
||||
|[namuan/dr-doc-search](https://github.com/namuan/dr-doc-search) | 555 |
|
||||
|[getmetal/motorhead](https://github.com/getmetal/motorhead) | 550 |
|
||||
|[kreneskyp/ix](https://github.com/kreneskyp/ix) | 543 |
|
||||
|[hwchase17/chat-your-data](https://github.com/hwchase17/chat-your-data) | 510 |
|
||||
|[Anil-matcha/ChatPDF](https://github.com/Anil-matcha/ChatPDF) | 501 |
|
||||
|[whyiyhw/chatgpt-wechat](https://github.com/whyiyhw/chatgpt-wechat) | 497 |
|
||||
|[SamurAIGPT/ChatGPT-Developer-Plugins](https://github.com/SamurAIGPT/ChatGPT-Developer-Plugins) | 496 |
|
||||
|[microsoft/PodcastCopilot](https://github.com/microsoft/PodcastCopilot) | 492 |
|
||||
|[debanjum/khoj](https://github.com/debanjum/khoj) | 485 |
|
||||
|[akshata29/chatpdf](https://github.com/akshata29/chatpdf) | 485 |
|
||||
|[langchain-ai/langchain-aiplugin](https://github.com/langchain-ai/langchain-aiplugin) | 462 |
|
||||
|[jina-ai/agentchain](https://github.com/jina-ai/agentchain) | 460 |
|
||||
|[alexanderatallah/window.ai](https://github.com/alexanderatallah/window.ai) | 457 |
|
||||
|[yeagerai/yeagerai-agent](https://github.com/yeagerai/yeagerai-agent) | 451 |
|
||||
|[mckaywrigley/repo-chat](https://github.com/mckaywrigley/repo-chat) | 446 |
|
||||
|[michaelthwan/searchGPT](https://github.com/michaelthwan/searchGPT) | 446 |
|
||||
|[mpaepper/content-chatbot](https://github.com/mpaepper/content-chatbot) | 441 |
|
||||
|[freddyaboulton/gradio-tools](https://github.com/freddyaboulton/gradio-tools) | 439 |
|
||||
|[ruoccofabrizio/azure-open-ai-embeddings-qna](https://github.com/ruoccofabrizio/azure-open-ai-embeddings-qna) | 429 |
|
||||
|[StevenGrove/GPT4Tools](https://github.com/StevenGrove/GPT4Tools) | 422 |
|
||||
|[jonra1993/fastapi-alembic-sqlmodel-async](https://github.com/jonra1993/fastapi-alembic-sqlmodel-async) | 407 |
|
||||
|[msoedov/langcorn](https://github.com/msoedov/langcorn) | 405 |
|
||||
|[amosjyng/langchain-visualizer](https://github.com/amosjyng/langchain-visualizer) | 395 |
|
||||
|[ajndkr/lanarky](https://github.com/ajndkr/lanarky) | 384 |
|
||||
|[mtenenholtz/chat-twitter](https://github.com/mtenenholtz/chat-twitter) | 376 |
|
||||
|[steamship-core/steamship-langchain](https://github.com/steamship-core/steamship-langchain) | 371 |
|
||||
|[langchain-ai/auto-evaluator](https://github.com/langchain-ai/auto-evaluator) | 365 |
|
||||
|[xuwenhao/geektime-ai-course](https://github.com/xuwenhao/geektime-ai-course) | 358 |
|
||||
|[continuum-llms/chatgpt-memory](https://github.com/continuum-llms/chatgpt-memory) | 357 |
|
||||
|[opentensor/bittensor](https://github.com/opentensor/bittensor) | 347 |
|
||||
|[showlab/VLog](https://github.com/showlab/VLog) | 345 |
|
||||
|[daodao97/chatdoc](https://github.com/daodao97/chatdoc) | 345 |
|
||||
|[logan-markewich/llama_index_starter_pack](https://github.com/logan-markewich/llama_index_starter_pack) | 332 |
|
||||
|[poe-platform/poe-protocol](https://github.com/poe-platform/poe-protocol) | 320 |
|
||||
|[explosion/spacy-llm](https://github.com/explosion/spacy-llm) | 312 |
|
||||
|[andylokandy/gpt-4-search](https://github.com/andylokandy/gpt-4-search) | 311 |
|
||||
|[alejandro-ao/langchain-ask-pdf](https://github.com/alejandro-ao/langchain-ask-pdf) | 310 |
|
||||
|[jupyterlab/jupyter-ai](https://github.com/jupyterlab/jupyter-ai) | 294 |
|
||||
|[BlackHC/llm-strategy](https://github.com/BlackHC/llm-strategy) | 283 |
|
||||
|[itamargol/openai](https://github.com/itamargol/openai) | 281 |
|
||||
|[momegas/megabots](https://github.com/momegas/megabots) | 279 |
|
||||
|[personoids/personoids-lite](https://github.com/personoids/personoids-lite) | 277 |
|
||||
|[yvann-hub/Robby-chatbot](https://github.com/yvann-hub/Robby-chatbot) | 267 |
|
||||
|[Anil-matcha/Website-to-Chatbot](https://github.com/Anil-matcha/Website-to-Chatbot) | 266 |
|
||||
|[Cheems-Seminar/grounded-segment-any-parts](https://github.com/Cheems-Seminar/grounded-segment-any-parts) | 260 |
|
||||
|[sullivan-sean/chat-langchainjs](https://github.com/sullivan-sean/chat-langchainjs) | 248 |
|
||||
|[bborn/howdoi.ai](https://github.com/bborn/howdoi.ai) | 245 |
|
||||
|[daveebbelaar/langchain-experiments](https://github.com/daveebbelaar/langchain-experiments) | 240 |
|
||||
|[MagnivOrg/prompt-layer-library](https://github.com/MagnivOrg/prompt-layer-library) | 237 |
|
||||
|[ur-whitelab/exmol](https://github.com/ur-whitelab/exmol) | 234 |
|
||||
|[conceptofmind/toolformer](https://github.com/conceptofmind/toolformer) | 234 |
|
||||
|[recalign/RecAlign](https://github.com/recalign/RecAlign) | 226 |
|
||||
|[OpenBMB/AgentVerse](https://github.com/OpenBMB/AgentVerse) | 220 |
|
||||
|[alvarosevilla95/autolang](https://github.com/alvarosevilla95/autolang) | 219 |
|
||||
|[JohnSnowLabs/nlptest](https://github.com/JohnSnowLabs/nlptest) | 216 |
|
||||
|[kaleido-lab/dolphin](https://github.com/kaleido-lab/dolphin) | 215 |
|
||||
|[truera/trulens](https://github.com/truera/trulens) | 208 |
|
||||
|[NimbleBoxAI/ChainFury](https://github.com/NimbleBoxAI/ChainFury) | 208 |
|
||||
|[airobotlab/KoChatGPT](https://github.com/airobotlab/KoChatGPT) | 207 |
|
||||
|[monarch-initiative/ontogpt](https://github.com/monarch-initiative/ontogpt) | 200 |
|
||||
|[paolorechia/learn-langchain](https://github.com/paolorechia/learn-langchain) | 195 |
|
||||
|[shaman-ai/agent-actors](https://github.com/shaman-ai/agent-actors) | 185 |
|
||||
|[Haste171/langchain-chatbot](https://github.com/Haste171/langchain-chatbot) | 184 |
|
||||
|[plchld/InsightFlow](https://github.com/plchld/InsightFlow) | 182 |
|
||||
|[su77ungr/CASALIOY](https://github.com/su77ungr/CASALIOY) | 180 |
|
||||
|[jbrukh/gpt-jargon](https://github.com/jbrukh/gpt-jargon) | 177 |
|
||||
|[benthecoder/ClassGPT](https://github.com/benthecoder/ClassGPT) | 174 |
|
||||
|[billxbf/ReWOO](https://github.com/billxbf/ReWOO) | 170 |
|
||||
|[filip-michalsky/SalesGPT](https://github.com/filip-michalsky/SalesGPT) | 168 |
|
||||
|[hwchase17/langchain-streamlit-template](https://github.com/hwchase17/langchain-streamlit-template) | 168 |
|
||||
|[radi-cho/datasetGPT](https://github.com/radi-cho/datasetGPT) | 164 |
|
||||
|[hardbyte/qabot](https://github.com/hardbyte/qabot) | 164 |
|
||||
|[gia-guar/JARVIS-ChatGPT](https://github.com/gia-guar/JARVIS-ChatGPT) | 158 |
|
||||
|[plastic-labs/tutor-gpt](https://github.com/plastic-labs/tutor-gpt) | 154 |
|
||||
|[yasyf/compress-gpt](https://github.com/yasyf/compress-gpt) | 154 |
|
||||
|[fengyuli-dev/multimedia-gpt](https://github.com/fengyuli-dev/multimedia-gpt) | 154 |
|
||||
|[ethanyanjiali/minChatGPT](https://github.com/ethanyanjiali/minChatGPT) | 153 |
|
||||
|[hwchase17/chroma-langchain](https://github.com/hwchase17/chroma-langchain) | 153 |
|
||||
|[edreisMD/plugnplai](https://github.com/edreisMD/plugnplai) | 148 |
|
||||
|[chakkaradeep/pyCodeAGI](https://github.com/chakkaradeep/pyCodeAGI) | 145 |
|
||||
|[ccurme/yolopandas](https://github.com/ccurme/yolopandas) | 145 |
|
||||
|[shamspias/customizable-gpt-chatbot](https://github.com/shamspias/customizable-gpt-chatbot) | 144 |
|
||||
|[realminchoi/babyagi-ui](https://github.com/realminchoi/babyagi-ui) | 143 |
|
||||
|[PradipNichite/Youtube-Tutorials](https://github.com/PradipNichite/Youtube-Tutorials) | 140 |
|
||||
|[gustavz/DataChad](https://github.com/gustavz/DataChad) | 140 |
|
||||
|[Klingefjord/chatgpt-telegram](https://github.com/Klingefjord/chatgpt-telegram) | 140 |
|
||||
|[Jaseci-Labs/jaseci](https://github.com/Jaseci-Labs/jaseci) | 139 |
|
||||
|[handrew/browserpilot](https://github.com/handrew/browserpilot) | 137 |
|
||||
|[jmpaz/promptlib](https://github.com/jmpaz/promptlib) | 137 |
|
||||
|[SamPink/dev-gpt](https://github.com/SamPink/dev-gpt) | 135 |
|
||||
|[menloparklab/langchain-cohere-qdrant-doc-retrieval](https://github.com/menloparklab/langchain-cohere-qdrant-doc-retrieval) | 135 |
|
||||
|[openai/openai-cookbook](https://github.com/openai/openai-cookbook) | 35401 |
|
||||
|[LAION-AI/Open-Assistant](https://github.com/LAION-AI/Open-Assistant) | 32861 |
|
||||
|[microsoft/TaskMatrix](https://github.com/microsoft/TaskMatrix) | 32766 |
|
||||
|[hpcaitech/ColossalAI](https://github.com/hpcaitech/ColossalAI) | 29560 |
|
||||
|[reworkd/AgentGPT](https://github.com/reworkd/AgentGPT) | 22315 |
|
||||
|[imartinez/privateGPT](https://github.com/imartinez/privateGPT) | 17474 |
|
||||
|[openai/chatgpt-retrieval-plugin](https://github.com/openai/chatgpt-retrieval-plugin) | 16923 |
|
||||
|[mindsdb/mindsdb](https://github.com/mindsdb/mindsdb) | 16112 |
|
||||
|[jerryjliu/llama_index](https://github.com/jerryjliu/llama_index) | 15407 |
|
||||
|[mlflow/mlflow](https://github.com/mlflow/mlflow) | 14345 |
|
||||
|[GaiZhenbiao/ChuanhuChatGPT](https://github.com/GaiZhenbiao/ChuanhuChatGPT) | 10372 |
|
||||
|[databrickslabs/dolly](https://github.com/databrickslabs/dolly) | 9919 |
|
||||
|[AIGC-Audio/AudioGPT](https://github.com/AIGC-Audio/AudioGPT) | 8177 |
|
||||
|[logspace-ai/langflow](https://github.com/logspace-ai/langflow) | 6807 |
|
||||
|[imClumsyPanda/langchain-ChatGLM](https://github.com/imClumsyPanda/langchain-ChatGLM) | 6087 |
|
||||
|[arc53/DocsGPT](https://github.com/arc53/DocsGPT) | 5292 |
|
||||
|[e2b-dev/e2b](https://github.com/e2b-dev/e2b) | 4622 |
|
||||
|[nsarrazin/serge](https://github.com/nsarrazin/serge) | 4076 |
|
||||
|[madawei2699/myGPTReader](https://github.com/madawei2699/myGPTReader) | 3952 |
|
||||
|[zauberzeug/nicegui](https://github.com/zauberzeug/nicegui) | 3952 |
|
||||
|[go-skynet/LocalAI](https://github.com/go-skynet/LocalAI) | 3762 |
|
||||
|[GreyDGL/PentestGPT](https://github.com/GreyDGL/PentestGPT) | 3388 |
|
||||
|[mmabrouk/chatgpt-wrapper](https://github.com/mmabrouk/chatgpt-wrapper) | 3243 |
|
||||
|[zilliztech/GPTCache](https://github.com/zilliztech/GPTCache) | 3189 |
|
||||
|[wenda-LLM/wenda](https://github.com/wenda-LLM/wenda) | 3050 |
|
||||
|[marqo-ai/marqo](https://github.com/marqo-ai/marqo) | 2930 |
|
||||
|[gkamradt/langchain-tutorials](https://github.com/gkamradt/langchain-tutorials) | 2710 |
|
||||
|[PrefectHQ/marvin](https://github.com/PrefectHQ/marvin) | 2545 |
|
||||
|[project-baize/baize-chatbot](https://github.com/project-baize/baize-chatbot) | 2479 |
|
||||
|[whitead/paper-qa](https://github.com/whitead/paper-qa) | 2399 |
|
||||
|[langgenius/dify](https://github.com/langgenius/dify) | 2344 |
|
||||
|[GerevAI/gerev](https://github.com/GerevAI/gerev) | 2283 |
|
||||
|[hwchase17/chat-langchain](https://github.com/hwchase17/chat-langchain) | 2266 |
|
||||
|[guangzhengli/ChatFiles](https://github.com/guangzhengli/ChatFiles) | 1903 |
|
||||
|[Azure-Samples/azure-search-openai-demo](https://github.com/Azure-Samples/azure-search-openai-demo) | 1884 |
|
||||
|[OpenBMB/BMTools](https://github.com/OpenBMB/BMTools) | 1860 |
|
||||
|[Farama-Foundation/PettingZoo](https://github.com/Farama-Foundation/PettingZoo) | 1813 |
|
||||
|[OpenGVLab/Ask-Anything](https://github.com/OpenGVLab/Ask-Anything) | 1571 |
|
||||
|[IntelligenzaArtificiale/Free-Auto-GPT](https://github.com/IntelligenzaArtificiale/Free-Auto-GPT) | 1480 |
|
||||
|[hwchase17/notion-qa](https://github.com/hwchase17/notion-qa) | 1464 |
|
||||
|[NVIDIA/NeMo-Guardrails](https://github.com/NVIDIA/NeMo-Guardrails) | 1419 |
|
||||
|[Unstructured-IO/unstructured](https://github.com/Unstructured-IO/unstructured) | 1410 |
|
||||
|[Kav-K/GPTDiscord](https://github.com/Kav-K/GPTDiscord) | 1363 |
|
||||
|[paulpierre/RasaGPT](https://github.com/paulpierre/RasaGPT) | 1344 |
|
||||
|[StanGirard/quivr](https://github.com/StanGirard/quivr) | 1330 |
|
||||
|[lunasec-io/lunasec](https://github.com/lunasec-io/lunasec) | 1318 |
|
||||
|[vocodedev/vocode-python](https://github.com/vocodedev/vocode-python) | 1286 |
|
||||
|[agiresearch/OpenAGI](https://github.com/agiresearch/OpenAGI) | 1156 |
|
||||
|[h2oai/h2ogpt](https://github.com/h2oai/h2ogpt) | 1141 |
|
||||
|[jina-ai/thinkgpt](https://github.com/jina-ai/thinkgpt) | 1106 |
|
||||
|[yanqiangmiffy/Chinese-LangChain](https://github.com/yanqiangmiffy/Chinese-LangChain) | 1072 |
|
||||
|[ttengwang/Caption-Anything](https://github.com/ttengwang/Caption-Anything) | 1064 |
|
||||
|[jina-ai/dev-gpt](https://github.com/jina-ai/dev-gpt) | 1057 |
|
||||
|[juncongmoo/chatllama](https://github.com/juncongmoo/chatllama) | 1003 |
|
||||
|[greshake/llm-security](https://github.com/greshake/llm-security) | 1002 |
|
||||
|[visual-openllm/visual-openllm](https://github.com/visual-openllm/visual-openllm) | 957 |
|
||||
|[richardyc/Chrome-GPT](https://github.com/richardyc/Chrome-GPT) | 918 |
|
||||
|[irgolic/AutoPR](https://github.com/irgolic/AutoPR) | 886 |
|
||||
|[mmz-001/knowledge_gpt](https://github.com/mmz-001/knowledge_gpt) | 867 |
|
||||
|[thomas-yanxin/LangChain-ChatGLM-Webui](https://github.com/thomas-yanxin/LangChain-ChatGLM-Webui) | 850 |
|
||||
|[microsoft/X-Decoder](https://github.com/microsoft/X-Decoder) | 837 |
|
||||
|[peterw/Chat-with-Github-Repo](https://github.com/peterw/Chat-with-Github-Repo) | 826 |
|
||||
|[cirediatpl/FigmaChain](https://github.com/cirediatpl/FigmaChain) | 782 |
|
||||
|[hashintel/hash](https://github.com/hashintel/hash) | 778 |
|
||||
|[seanpixel/Teenage-AGI](https://github.com/seanpixel/Teenage-AGI) | 773 |
|
||||
|[jina-ai/langchain-serve](https://github.com/jina-ai/langchain-serve) | 738 |
|
||||
|[corca-ai/EVAL](https://github.com/corca-ai/EVAL) | 737 |
|
||||
|[ai-sidekick/sidekick](https://github.com/ai-sidekick/sidekick) | 717 |
|
||||
|[rlancemartin/auto-evaluator](https://github.com/rlancemartin/auto-evaluator) | 703 |
|
||||
|[poe-platform/api-bot-tutorial](https://github.com/poe-platform/api-bot-tutorial) | 689 |
|
||||
|[SamurAIGPT/Camel-AutoGPT](https://github.com/SamurAIGPT/Camel-AutoGPT) | 666 |
|
||||
|[eyurtsev/kor](https://github.com/eyurtsev/kor) | 608 |
|
||||
|[run-llama/llama-lab](https://github.com/run-llama/llama-lab) | 559 |
|
||||
|[namuan/dr-doc-search](https://github.com/namuan/dr-doc-search) | 544 |
|
||||
|[pieroit/cheshire-cat](https://github.com/pieroit/cheshire-cat) | 520 |
|
||||
|[griptape-ai/griptape](https://github.com/griptape-ai/griptape) | 514 |
|
||||
|[getmetal/motorhead](https://github.com/getmetal/motorhead) | 481 |
|
||||
|[hwchase17/chat-your-data](https://github.com/hwchase17/chat-your-data) | 462 |
|
||||
|[langchain-ai/langchain-aiplugin](https://github.com/langchain-ai/langchain-aiplugin) | 452 |
|
||||
|[jina-ai/agentchain](https://github.com/jina-ai/agentchain) | 439 |
|
||||
|[SamurAIGPT/ChatGPT-Developer-Plugins](https://github.com/SamurAIGPT/ChatGPT-Developer-Plugins) | 437 |
|
||||
|[alexanderatallah/window.ai](https://github.com/alexanderatallah/window.ai) | 433 |
|
||||
|[michaelthwan/searchGPT](https://github.com/michaelthwan/searchGPT) | 427 |
|
||||
|[mpaepper/content-chatbot](https://github.com/mpaepper/content-chatbot) | 425 |
|
||||
|[mckaywrigley/repo-chat](https://github.com/mckaywrigley/repo-chat) | 422 |
|
||||
|[whyiyhw/chatgpt-wechat](https://github.com/whyiyhw/chatgpt-wechat) | 421 |
|
||||
|[freddyaboulton/gradio-tools](https://github.com/freddyaboulton/gradio-tools) | 407 |
|
||||
|[jonra1993/fastapi-alembic-sqlmodel-async](https://github.com/jonra1993/fastapi-alembic-sqlmodel-async) | 395 |
|
||||
|[yeagerai/yeagerai-agent](https://github.com/yeagerai/yeagerai-agent) | 383 |
|
||||
|[akshata29/chatpdf](https://github.com/akshata29/chatpdf) | 374 |
|
||||
|[OpenGVLab/InternGPT](https://github.com/OpenGVLab/InternGPT) | 368 |
|
||||
|[ruoccofabrizio/azure-open-ai-embeddings-qna](https://github.com/ruoccofabrizio/azure-open-ai-embeddings-qna) | 358 |
|
||||
|[101dotxyz/GPTeam](https://github.com/101dotxyz/GPTeam) | 357 |
|
||||
|[mtenenholtz/chat-twitter](https://github.com/mtenenholtz/chat-twitter) | 354 |
|
||||
|[amosjyng/langchain-visualizer](https://github.com/amosjyng/langchain-visualizer) | 343 |
|
||||
|[msoedov/langcorn](https://github.com/msoedov/langcorn) | 334 |
|
||||
|[showlab/VLog](https://github.com/showlab/VLog) | 330 |
|
||||
|[continuum-llms/chatgpt-memory](https://github.com/continuum-llms/chatgpt-memory) | 324 |
|
||||
|[steamship-core/steamship-langchain](https://github.com/steamship-core/steamship-langchain) | 323 |
|
||||
|[daodao97/chatdoc](https://github.com/daodao97/chatdoc) | 320 |
|
||||
|[xuwenhao/geektime-ai-course](https://github.com/xuwenhao/geektime-ai-course) | 308 |
|
||||
|[StevenGrove/GPT4Tools](https://github.com/StevenGrove/GPT4Tools) | 301 |
|
||||
|[logan-markewich/llama_index_starter_pack](https://github.com/logan-markewich/llama_index_starter_pack) | 300 |
|
||||
|[andylokandy/gpt-4-search](https://github.com/andylokandy/gpt-4-search) | 299 |
|
||||
|[Anil-matcha/ChatPDF](https://github.com/Anil-matcha/ChatPDF) | 287 |
|
||||
|[itamargol/openai](https://github.com/itamargol/openai) | 273 |
|
||||
|[BlackHC/llm-strategy](https://github.com/BlackHC/llm-strategy) | 267 |
|
||||
|[momegas/megabots](https://github.com/momegas/megabots) | 259 |
|
||||
|[bborn/howdoi.ai](https://github.com/bborn/howdoi.ai) | 238 |
|
||||
|[Cheems-Seminar/grounded-segment-any-parts](https://github.com/Cheems-Seminar/grounded-segment-any-parts) | 232 |
|
||||
|[ur-whitelab/exmol](https://github.com/ur-whitelab/exmol) | 227 |
|
||||
|[sullivan-sean/chat-langchainjs](https://github.com/sullivan-sean/chat-langchainjs) | 227 |
|
||||
|[explosion/spacy-llm](https://github.com/explosion/spacy-llm) | 226 |
|
||||
|[recalign/RecAlign](https://github.com/recalign/RecAlign) | 218 |
|
||||
|[jupyterlab/jupyter-ai](https://github.com/jupyterlab/jupyter-ai) | 218 |
|
||||
|[alvarosevilla95/autolang](https://github.com/alvarosevilla95/autolang) | 215 |
|
||||
|[conceptofmind/toolformer](https://github.com/conceptofmind/toolformer) | 213 |
|
||||
|[MagnivOrg/prompt-layer-library](https://github.com/MagnivOrg/prompt-layer-library) | 209 |
|
||||
|[JohnSnowLabs/nlptest](https://github.com/JohnSnowLabs/nlptest) | 208 |
|
||||
|[airobotlab/KoChatGPT](https://github.com/airobotlab/KoChatGPT) | 197 |
|
||||
|[langchain-ai/auto-evaluator](https://github.com/langchain-ai/auto-evaluator) | 195 |
|
||||
|[yvann-hub/Robby-chatbot](https://github.com/yvann-hub/Robby-chatbot) | 195 |
|
||||
|[alejandro-ao/langchain-ask-pdf](https://github.com/alejandro-ao/langchain-ask-pdf) | 192 |
|
||||
|[daveebbelaar/langchain-experiments](https://github.com/daveebbelaar/langchain-experiments) | 189 |
|
||||
|[NimbleBoxAI/ChainFury](https://github.com/NimbleBoxAI/ChainFury) | 187 |
|
||||
|[kaleido-lab/dolphin](https://github.com/kaleido-lab/dolphin) | 184 |
|
||||
|[Anil-matcha/Website-to-Chatbot](https://github.com/Anil-matcha/Website-to-Chatbot) | 183 |
|
||||
|[plchld/InsightFlow](https://github.com/plchld/InsightFlow) | 180 |
|
||||
|[OpenBMB/AgentVerse](https://github.com/OpenBMB/AgentVerse) | 166 |
|
||||
|[benthecoder/ClassGPT](https://github.com/benthecoder/ClassGPT) | 166 |
|
||||
|[jbrukh/gpt-jargon](https://github.com/jbrukh/gpt-jargon) | 161 |
|
||||
|[hardbyte/qabot](https://github.com/hardbyte/qabot) | 160 |
|
||||
|[shaman-ai/agent-actors](https://github.com/shaman-ai/agent-actors) | 153 |
|
||||
|[radi-cho/datasetGPT](https://github.com/radi-cho/datasetGPT) | 153 |
|
||||
|[poe-platform/poe-protocol](https://github.com/poe-platform/poe-protocol) | 152 |
|
||||
|[paolorechia/learn-langchain](https://github.com/paolorechia/learn-langchain) | 149 |
|
||||
|[ajndkr/lanarky](https://github.com/ajndkr/lanarky) | 149 |
|
||||
|[fengyuli-dev/multimedia-gpt](https://github.com/fengyuli-dev/multimedia-gpt) | 147 |
|
||||
|[yasyf/compress-gpt](https://github.com/yasyf/compress-gpt) | 144 |
|
||||
|[homanp/superagent](https://github.com/homanp/superagent) | 143 |
|
||||
|[realminchoi/babyagi-ui](https://github.com/realminchoi/babyagi-ui) | 141 |
|
||||
|[ethanyanjiali/minChatGPT](https://github.com/ethanyanjiali/minChatGPT) | 141 |
|
||||
|[ccurme/yolopandas](https://github.com/ccurme/yolopandas) | 139 |
|
||||
|[hwchase17/langchain-streamlit-template](https://github.com/hwchase17/langchain-streamlit-template) | 138 |
|
||||
|[Jaseci-Labs/jaseci](https://github.com/Jaseci-Labs/jaseci) | 136 |
|
||||
|[hirokidaichi/wanna](https://github.com/hirokidaichi/wanna) | 135 |
|
||||
|[steamship-core/vercel-examples](https://github.com/steamship-core/vercel-examples) | 134 |
|
||||
|[pablomarin/GPT-Azure-Search-Engine](https://github.com/pablomarin/GPT-Azure-Search-Engine) | 133 |
|
||||
|[ibiscp/LLM-IMDB](https://github.com/ibiscp/LLM-IMDB) | 133 |
|
||||
|[shauryr/S2QA](https://github.com/shauryr/S2QA) | 133 |
|
||||
|[jerlendds/osintbuddy](https://github.com/jerlendds/osintbuddy) | 132 |
|
||||
|[yuanjie-ai/ChatLLM](https://github.com/yuanjie-ai/ChatLLM) | 132 |
|
||||
|[yasyf/summ](https://github.com/yasyf/summ) | 132 |
|
||||
|[WongSaang/chatgpt-ui-server](https://github.com/WongSaang/chatgpt-ui-server) | 130 |
|
||||
|[peterw/StoryStorm](https://github.com/peterw/StoryStorm) | 127 |
|
||||
|[Teahouse-Studios/akari-bot](https://github.com/Teahouse-Studios/akari-bot) | 126 |
|
||||
|[vaibkumr/prompt-optimizer](https://github.com/vaibkumr/prompt-optimizer) | 125 |
|
||||
|[preset-io/promptimize](https://github.com/preset-io/promptimize) | 124 |
|
||||
|[homanp/vercel-langchain](https://github.com/homanp/vercel-langchain) | 124 |
|
||||
|[petehunt/langchain-github-bot](https://github.com/petehunt/langchain-github-bot) | 123 |
|
||||
|[eunomia-bpf/GPTtrace](https://github.com/eunomia-bpf/GPTtrace) | 118 |
|
||||
|[nicknochnack/LangchainDocuments](https://github.com/nicknochnack/LangchainDocuments) | 116 |
|
||||
|[jiran214/GPT-vup](https://github.com/jiran214/GPT-vup) | 112 |
|
||||
|[rsaryev/talk-codebase](https://github.com/rsaryev/talk-codebase) | 112 |
|
||||
|[Haste171/langchain-chatbot](https://github.com/Haste171/langchain-chatbot) | 134 |
|
||||
|[jmpaz/promptlib](https://github.com/jmpaz/promptlib) | 130 |
|
||||
|[Klingefjord/chatgpt-telegram](https://github.com/Klingefjord/chatgpt-telegram) | 130 |
|
||||
|[filip-michalsky/SalesGPT](https://github.com/filip-michalsky/SalesGPT) | 128 |
|
||||
|[handrew/browserpilot](https://github.com/handrew/browserpilot) | 128 |
|
||||
|[shauryr/S2QA](https://github.com/shauryr/S2QA) | 127 |
|
||||
|[steamship-core/vercel-examples](https://github.com/steamship-core/vercel-examples) | 127 |
|
||||
|[yasyf/summ](https://github.com/yasyf/summ) | 127 |
|
||||
|[gia-guar/JARVIS-ChatGPT](https://github.com/gia-guar/JARVIS-ChatGPT) | 126 |
|
||||
|[jerlendds/osintbuddy](https://github.com/jerlendds/osintbuddy) | 125 |
|
||||
|[ibiscp/LLM-IMDB](https://github.com/ibiscp/LLM-IMDB) | 124 |
|
||||
|[Teahouse-Studios/akari-bot](https://github.com/Teahouse-Studios/akari-bot) | 124 |
|
||||
|[hwchase17/chroma-langchain](https://github.com/hwchase17/chroma-langchain) | 124 |
|
||||
|[menloparklab/langchain-cohere-qdrant-doc-retrieval](https://github.com/menloparklab/langchain-cohere-qdrant-doc-retrieval) | 123 |
|
||||
|[peterw/StoryStorm](https://github.com/peterw/StoryStorm) | 123 |
|
||||
|[chakkaradeep/pyCodeAGI](https://github.com/chakkaradeep/pyCodeAGI) | 123 |
|
||||
|[petehunt/langchain-github-bot](https://github.com/petehunt/langchain-github-bot) | 115 |
|
||||
|[su77ungr/CASALIOY](https://github.com/su77ungr/CASALIOY) | 113 |
|
||||
|[eunomia-bpf/GPTtrace](https://github.com/eunomia-bpf/GPTtrace) | 113 |
|
||||
|[zenml-io/zenml-projects](https://github.com/zenml-io/zenml-projects) | 112 |
|
||||
|[microsoft/azure-openai-in-a-day-workshop](https://github.com/microsoft/azure-openai-in-a-day-workshop) | 112 |
|
||||
|[davila7/file-gpt](https://github.com/davila7/file-gpt) | 112 |
|
||||
|[prof-frink-lab/slangchain](https://github.com/prof-frink-lab/slangchain) | 111 |
|
||||
|[aurelio-labs/arxiv-bot](https://github.com/aurelio-labs/arxiv-bot) | 110 |
|
||||
|[fixie-ai/fixie-examples](https://github.com/fixie-ai/fixie-examples) | 108 |
|
||||
|[miaoshouai/miaoshouai-assistant](https://github.com/miaoshouai/miaoshouai-assistant) | 105 |
|
||||
|[flurb18/AgentOoba](https://github.com/flurb18/AgentOoba) | 103 |
|
||||
|[solana-labs/chatgpt-plugin](https://github.com/solana-labs/chatgpt-plugin) | 102 |
|
||||
|[Significant-Gravitas/Auto-GPT-Benchmarks](https://github.com/Significant-Gravitas/Auto-GPT-Benchmarks) | 102 |
|
||||
|[kaarthik108/snowChat](https://github.com/kaarthik108/snowChat) | 100 |
|
||||
|[pablomarin/GPT-Azure-Search-Engine](https://github.com/pablomarin/GPT-Azure-Search-Engine) | 111 |
|
||||
|[shamspias/customizable-gpt-chatbot](https://github.com/shamspias/customizable-gpt-chatbot) | 109 |
|
||||
|[WongSaang/chatgpt-ui-server](https://github.com/WongSaang/chatgpt-ui-server) | 108 |
|
||||
|[davila7/file-gpt](https://github.com/davila7/file-gpt) | 104 |
|
||||
|[enhancedocs/enhancedocs](https://github.com/enhancedocs/enhancedocs) | 102 |
|
||||
|[aurelio-labs/arxiv-bot](https://github.com/aurelio-labs/arxiv-bot) | 101 |
|
||||
|
||||
|
||||
|
||||
_Generated by [github-dependents-info](https://github.com/nvuillam/github-dependents-info)_
|
||||
|
||||
`github-dependents-info --repo hwchase17/langchain --markdownfile dependents.md --minstars 100 --sort stars`
|
||||
[github-dependents-info --repo hwchase17/langchain --markdownfile dependents.md --minstars 100 --sort stars]
|
||||
|
||||
@@ -1,25 +0,0 @@
|
||||
# Baseten
|
||||
|
||||
Learn how to use LangChain with models deployed on Baseten.
|
||||
|
||||
## Installation and setup
|
||||
|
||||
- Create a [Baseten](https://baseten.co) account and [API key](https://docs.baseten.co/settings/api-keys).
|
||||
- Install the Baseten Python client with `pip install baseten`
|
||||
- Use your API key to authenticate with `baseten login`
|
||||
|
||||
## Invoking a model
|
||||
|
||||
Baseten integrates with LangChain through the LLM module, which provides a standardized and interoperable interface for models that are deployed on your Baseten workspace.
|
||||
|
||||
You can deploy foundation models like WizardLM and Alpaca with one click from the [Baseten model library](https://app.baseten.co/explore/) or if you have your own model, [deploy it with this tutorial](https://docs.baseten.co/deploying-models/deploy).
|
||||
|
||||
In this example, we'll work with WizardLM. [Deploy WizardLM here](https://app.baseten.co/explore/wizardlm) and follow along with the deployed [model's version ID](https://docs.baseten.co/managing-models/manage).
|
||||
|
||||
```python
|
||||
from langchain.llms import Baseten
|
||||
|
||||
wizardlm = Baseten(model="MODEL_VERSION_ID", verbose=True)
|
||||
|
||||
wizardlm("What is the difference between a Wizard and a Sorcerer?")
|
||||
```
|
||||
@@ -6,11 +6,6 @@ This section covers several options for that. Note that these options are meant
|
||||
|
||||
What follows is a list of template GitHub repositories designed to be easily forked and modified to use your chain. This list is far from exhaustive, and we are EXTREMELY open to contributions here.
|
||||
|
||||
## [Anyscale](https://www.anyscale.com/model-serving)
|
||||
|
||||
Anyscale is a unified compute platform that makes it easy to develop, deploy, and manage scalable LLM applications in production using Ray.
|
||||
With Anyscale you can scale the most challenging LLM-based workloads and both develop and deploy LLM-based apps on a single compute platform.
|
||||
|
||||
## [Streamlit](https://github.com/hwchase17/langchain-streamlit-template)
|
||||
|
||||
This repo serves as a template for how to deploy a LangChain with Streamlit.
|
||||
|
||||
@@ -10,8 +10,7 @@
|
||||
|
||||
### Tutorials
|
||||
[LangChain Tutorials](https://www.youtube.com/watch?v=FuqdVNB_8c0&list=PL9V0lbeJ69brU-ojMpU1Y7Ic58Tap0Cw6) by [Edrick](https://www.youtube.com/@edrickdch):
|
||||
- ⛓ [LangChain, Chroma DB, OpenAI Beginner Guide | ChatGPT with your PDF](https://youtu.be/FuqdVNB_8c0)
|
||||
- ⛓ [LangChain 101: The Complete Beginner's Guide](https://youtu.be/P3MAbZ2eMUI)
|
||||
- ⛓ [LangChain, Chroma DB, OpenAI Beginner Guide | ChatGPT with your PDF](https://youtu.be/FuqdVNB_8c0)
|
||||
|
||||
[LangChain Crash Course: Build an AutoGPT app in 25 minutes](https://youtu.be/MlK6SIjcjE8) by [Nicholas Renotte](https://www.youtube.com/@NicholasRenotte)
|
||||
|
||||
|
||||
@@ -176,8 +176,6 @@ Additional Resources
|
||||
|
||||
- `Gallery <https://github.com/kyrolabs/awesome-langchain>`_: A collection of great projects that use Langchain, compiled by the folks at `Kyrolabs <https://kyrolabs.com>`_. Useful for finding inspiration and example implementations.
|
||||
|
||||
- `Deploying LLMs in Production <./additional_resources/deploy_llms.html>`_: A collection of best practices and tutorials for deploying LLMs in production.
|
||||
|
||||
- `Tracing <./additional_resources/tracing.html>`_: A guide on using tracing in LangChain to visualize the execution of chains and agents.
|
||||
|
||||
- `Model Laboratory <./additional_resources/model_laboratory.html>`_: Experimenting with different prompts, models, and chains is a big part of developing the best possible application. The ModelLaboratory makes it easy to do so.
|
||||
@@ -196,8 +194,6 @@ Additional Resources
|
||||
:hidden:
|
||||
|
||||
LangChainHub <https://github.com/hwchase17/langchain-hub>
|
||||
./additional_resources/deployments.md
|
||||
./additional_resources/deploy_llms.rst
|
||||
Gallery <https://github.com/kyrolabs/awesome-langchain>
|
||||
./additional_resources/tracing.md
|
||||
./additional_resources/model_laboratory.ipynb
|
||||
|
||||
@@ -1,26 +0,0 @@
|
||||
# Anthropic
|
||||
|
||||
>[Anthropic](https://en.wikipedia.org/wiki/Anthropic) is an American artificial intelligence (AI) startup and
|
||||
> public-benefit corporation, founded by former members of OpenAI. `Anthropic` specializes in developing general AI
|
||||
> systems and language models, with a company ethos of responsible AI usage.
|
||||
> `Anthropic` develops a chatbot, named `Claude`. Similar to `ChatGPT`, `Claude` uses a messaging
|
||||
> interface where users can submit questions or requests and receive highly detailed and relevant responses.
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
|
||||
```bash
|
||||
pip install anthropic
|
||||
```
|
||||
|
||||
See the [setup documentation](https://console.anthropic.com/docs/access).
|
||||
|
||||
|
||||
|
||||
## Chat Models
|
||||
|
||||
See a [usage example](../modules/models/chat/integrations/anthropic.ipynb)
|
||||
|
||||
```python
|
||||
from langchain.chat_models import ChatAnthropic
|
||||
```
|
||||
@@ -1,21 +0,0 @@
|
||||
# AwaDB
|
||||
|
||||
>[AwaDB](https://github.com/awa-ai/awadb) is an AI Native database for the search and storage of embedding vectors used by LLM Applications.
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
```bash
|
||||
pip install awadb
|
||||
```
|
||||
|
||||
|
||||
## VectorStore
|
||||
|
||||
There exists a wrapper around AwaDB vector databases, allowing you to use it as a vectorstore,
|
||||
whether for semantic search or example selection.
|
||||
|
||||
```python
|
||||
from langchain.vectorstores import AwaDB
|
||||
```
|
||||
|
||||
For a more detailed walkthrough of the AwaDB wrapper, see [this notebook](../modules/indexes/vectorstores/examples/awadb.ipynb)
|
||||
@@ -1,8 +1,7 @@
|
||||
# Beam
|
||||
|
||||
>[Beam](https://docs.beam.cloud/introduction) makes it easy to run code on GPUs, deploy scalable web APIs,
|
||||
> schedule cron jobs, and run massively parallel workloads — without managing any infrastructure.
|
||||
|
||||
This page covers how to use Beam within LangChain.
|
||||
It is broken into two parts: installation and setup, and then references to specific Beam wrappers.
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
@@ -10,19 +9,19 @@
|
||||
- Install the Beam CLI with `curl https://raw.githubusercontent.com/slai-labs/get-beam/main/get-beam.sh -sSfL | sh`
|
||||
- Register API keys with `beam configure`
|
||||
- Set environment variables (`BEAM_CLIENT_ID`) and (`BEAM_CLIENT_SECRET`)
|
||||
- Install the Beam SDK:
|
||||
```bash
|
||||
pip install beam-sdk
|
||||
```
|
||||
- Install the Beam SDK `pip install beam-sdk`
|
||||
|
||||
## LLM
|
||||
## Wrappers
|
||||
|
||||
### LLM
|
||||
|
||||
There exists a Beam LLM wrapper, which you can access with
|
||||
|
||||
```python
|
||||
from langchain.llms.beam import Beam
|
||||
```
|
||||
|
||||
### Example of the Beam app
|
||||
## Define your Beam app.
|
||||
|
||||
This is the environment you’ll be developing against once you start the app.
|
||||
It's also used to define the maximum response length from the model.
|
||||
@@ -45,7 +44,7 @@ llm = Beam(model_name="gpt2",
|
||||
verbose=False)
|
||||
```
|
||||
|
||||
### Deploy the Beam app
|
||||
## Deploy your Beam app
|
||||
|
||||
Once defined, you can deploy your Beam app by calling your model's `_deploy()` method.
|
||||
|
||||
@@ -53,9 +52,9 @@ Once defined, you can deploy your Beam app by calling your model's `_deploy()` m
|
||||
llm._deploy()
|
||||
```
|
||||
|
||||
### Call the Beam app
|
||||
## Call your Beam app
|
||||
|
||||
Once a beam model is deployed, it can be called by calling your model's `_call()` method.
|
||||
Once a beam model is deployed, it can be called by callying your model's `_call()` method.
|
||||
This returns the GPT2 text response to your prompt.
|
||||
|
||||
```python
|
||||
|
||||
@@ -18,7 +18,7 @@ from langchain import Bedrock
|
||||
|
||||
## Text Embedding Models
|
||||
|
||||
See a [usage example](../modules/models/text_embedding/examples/amazon_bedrock.ipynb).
|
||||
See a [usage example](../modules/models/text_embedding/examples/bedrock.ipynb).
|
||||
```python
|
||||
from langchain.embeddings import BedrockEmbeddings
|
||||
```
|
||||
@@ -1,52 +0,0 @@
|
||||
# ClickHouse
|
||||
|
||||
This page covers how to use ClickHouse Vector Search within LangChain.
|
||||
|
||||
[ClickHouse](https://clickhouse.com) is a open source real-time OLAP database with full SQL support and a wide range of functions to assist users in writing analytical queries. Some of these functions and data structures perform distance operations between vectors, enabling ClickHouse to be used as a vector database.
|
||||
|
||||
Due to the fully parallelized query pipeline, ClickHouse can process vector search operations very quickly, especially when performing exact matching through a linear scan over all rows, delivering processing speed comparable to dedicated vector databases.
|
||||
|
||||
High compression levels, tunable through custom compression codecs, enable very large datasets to be stored and queried. ClickHouse is not memory-bound, allowing multi-TB datasets containing embeddings to be queried.
|
||||
|
||||
The capabilities for computing the distance between two vectors are just another SQL function and can be effectively combined with more traditional SQL filtering and aggregation capabilities. This allows vectors to be stored and queried alongside metadata, and even rich text, enabling a broad array of use cases and applications.
|
||||
|
||||
Finally, experimental ClickHouse capabilities like [Approximate Nearest Neighbour (ANN) indices](https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/annindexes) support faster approximate matching of vectors and provide a promising development aimed to further enhance the vector matching capabilities of ClickHouse.
|
||||
|
||||
## Installation
|
||||
- Install clickhouse server by [binary](https://clickhouse.com/docs/en/install) or [docker image](https://hub.docker.com/r/clickhouse/clickhouse-server/)
|
||||
- Install the Python SDK with `pip install clickhouse-connect`
|
||||
|
||||
### Configure clickhouse vector index
|
||||
|
||||
Customize `ClickhouseSettings` object with parameters
|
||||
|
||||
```python
|
||||
from langchain.vectorstores import ClickHouse, ClickhouseSettings
|
||||
config = ClickhouseSettings(host="<clickhouse-server-host>", port=8123, ...)
|
||||
index = Clickhouse(embedding_function, config)
|
||||
index.add_documents(...)
|
||||
```
|
||||
|
||||
## Wrappers
|
||||
supported functions:
|
||||
- `add_texts`
|
||||
- `add_documents`
|
||||
- `from_texts`
|
||||
- `from_documents`
|
||||
- `similarity_search`
|
||||
- `asimilarity_search`
|
||||
- `similarity_search_by_vector`
|
||||
- `asimilarity_search_by_vector`
|
||||
- `similarity_search_with_relevance_scores`
|
||||
|
||||
### VectorStore
|
||||
|
||||
There exists a wrapper around open source Clickhouse database, allowing you to use it as a vectorstore,
|
||||
whether for semantic search or similar example retrieval.
|
||||
|
||||
To import this vectorstore:
|
||||
```python
|
||||
from langchain.vectorstores import Clickhouse
|
||||
```
|
||||
|
||||
For a more detailed walkthrough of the MyScale wrapper, see [this notebook](../modules/indexes/vectorstores/examples/clickhouse.ipynb)
|
||||
@@ -58,7 +58,7 @@
|
||||
"### Optional Parameters\n",
|
||||
"There following parameters are optional. When executing the method in a Databricks notebook, you don't need to provide them in most of the cases.\n",
|
||||
"* `host`: The Databricks workspace hostname, excluding 'https://' part. Defaults to 'DATABRICKS_HOST' environment variable or current workspace if in a Databricks notebook.\n",
|
||||
"* `api_token`: The Databricks personal access token for accessing the Databricks SQL warehouse or the cluster. Defaults to 'DATABRICKS_TOKEN' environment variable or a temporary one is generated if in a Databricks notebook.\n",
|
||||
"* `api_token`: The Databricks personal access token for accessing the Databricks SQL warehouse or the cluster. Defaults to 'DATABRICKS_API_TOKEN' environment variable or a temporary one is generated if in a Databricks notebook.\n",
|
||||
"* `warehouse_id`: The warehouse ID in the Databricks SQL.\n",
|
||||
"* `cluster_id`: The cluster ID in the Databricks Runtime. If running in a Databricks notebook and both 'warehouse_id' and 'cluster_id' are None, it uses the ID of the cluster the notebook is attached to.\n",
|
||||
"* `engine_args`: The arguments to be used when connecting Databricks.\n",
|
||||
@@ -1,36 +0,0 @@
|
||||
Databricks
|
||||
==========
|
||||
|
||||
The [Databricks](https://www.databricks.com/) Lakehouse Platform unifies data, analytics, and AI on one platform.
|
||||
|
||||
Databricks embraces the LangChain ecosystem in various ways:
|
||||
|
||||
1. Databricks connector for the SQLDatabase Chain: SQLDatabase.from_databricks() provides an easy way to query your data on Databricks through LangChain
|
||||
2. Databricks-managed MLflow integrates with LangChain: Tracking and serving LangChain applications with fewer steps
|
||||
3. Databricks as an LLM provider: Deploy your fine-tuned LLMs on Databricks via serving endpoints or cluster driver proxy apps, and query it as langchain.llms.Databricks
|
||||
4. Databricks Dolly: Databricks open-sourced Dolly which allows for commercial use, and can be accessed through the Hugging Face Hub
|
||||
|
||||
Databricks connector for the SQLDatabase Chain
|
||||
----------------------------------------------
|
||||
You can connect to [Databricks runtimes](https://docs.databricks.com/runtime/index.html) and [Databricks SQL](https://www.databricks.com/product/databricks-sql) using the SQLDatabase wrapper of LangChain. See the notebook [Connect to Databricks](./databricks/databricks.html) for details.
|
||||
|
||||
Databricks-managed MLflow integrates with LangChain
|
||||
---------------------------------------------------
|
||||
|
||||
MLflow is an open source platform to manage the ML lifecycle, including experimentation, reproducibility, deployment, and a central model registry. See the notebook [MLflow Callback Handler](./mlflow_tracking.ipynb) for details about MLflow's integration with LangChain.
|
||||
|
||||
Databricks provides a fully managed and hosted version of MLflow integrated with enterprise security features, high availability, and other Databricks workspace features such as experiment and run management and notebook revision capture. MLflow on Databricks offers an integrated experience for tracking and securing machine learning model training runs and running machine learning projects. See [MLflow guide](https://docs.databricks.com/mlflow/index.html) for more details.
|
||||
|
||||
Databricks-managed MLflow makes it more convenient to develop LangChain applications on Databricks. For MLflow tracking, you don't need to set the tracking uri. For MLflow Model Serving, you can save LangChain Chains in the MLflow langchain flavor, and then register and serve the Chain with a few clicks on Databricks, with credentials securely managed by MLflow Model Serving.
|
||||
|
||||
Databricks as an LLM provider
|
||||
-----------------------------
|
||||
|
||||
The notebook [Wrap Databricks endpoints as LLMs](../modules/models/llms/integrations/databricks.html) illustrates the method to wrap Databricks endpoints as LLMs in LangChain. It supports two types of endpoints: the serving endpoint, which is recommended for both production and development, and the cluster driver proxy app, which is recommended for interactive development.
|
||||
|
||||
Databricks endpoints support Dolly, but are also great for hosting models like MPT-7B or any other models from the Hugging Face ecosystem. Databricks endpoints can also be used with proprietary models like OpenAI to provide a governance layer for enterprises.
|
||||
|
||||
Databricks Dolly
|
||||
----------------
|
||||
|
||||
Databricks’ Dolly is an instruction-following large language model trained on the Databricks machine learning platform that is licensed for commercial use. The model is available on Hugging Face Hub as databricks/dolly-v2-12b. See the notebook [Hugging Face Hub](../modules/models/llms/integrations/huggingface_hub.html) for instructions to access it through the Hugging Face Hub integration with LangChain.
|
||||
@@ -1,24 +0,0 @@
|
||||
# Google Vertex AI
|
||||
|
||||
>[Vertex AI](https://cloud.google.com/vertex-ai/docs/start/introduction-unified-platform) is a machine learning (ML)
|
||||
> platform that lets you train and deploy ML models and AI applications.
|
||||
> `Vertex AI` combines data engineering, data science, and ML engineering workflows, enabling your teams to
|
||||
> collaborate using a common toolset.
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
|
||||
```bash
|
||||
pip install google-cloud-aiplatform
|
||||
```
|
||||
|
||||
See the [setup instructions](../modules/models/chat/integrations/google_vertex_ai_palm.ipynb)
|
||||
|
||||
|
||||
## Chat Models
|
||||
|
||||
See a [usage example](../modules/models/chat/integrations/google_vertex_ai_palm.ipynb)
|
||||
|
||||
```python
|
||||
from langchain.chat_models import ChatVertexAI
|
||||
```
|
||||
@@ -47,7 +47,7 @@ To use a the wrapper for a model hosted on Hugging Face Hub:
|
||||
```python
|
||||
from langchain.embeddings import HuggingFaceHubEmbeddings
|
||||
```
|
||||
For a more detailed walkthrough of this, see [this notebook](../modules/models/text_embedding/examples/huggingface_hub.ipynb)
|
||||
For a more detailed walkthrough of this, see [this notebook](../modules/models/text_embedding/examples/huggingfacehub.ipynb)
|
||||
|
||||
### Tokenizer
|
||||
|
||||
|
||||
@@ -1,368 +0,0 @@
|
||||
# LangChain Decorators ✨
|
||||
|
||||
lanchchain decorators is a layer on the top of LangChain that provides syntactic sugar 🍭 for writing custom langchain prompts and chains
|
||||
|
||||
For Feedback, Issues, Contributions - please raise an issue here:
|
||||
[ju-bezdek/langchain-decorators](https://github.com/ju-bezdek/langchain-decorators)
|
||||
|
||||
|
||||
|
||||
Main principles and benefits:
|
||||
|
||||
- more `pythonic` way of writing code
|
||||
- write multiline prompts that wont break your code flow with indentation
|
||||
- making use of IDE in-built support for **hinting**, **type checking** and **popup with docs** to quickly peek in the function to see the prompt, parameters it consumes etc.
|
||||
- leverage all the power of 🦜🔗 LangChain ecosystem
|
||||
- adding support for **optional parameters**
|
||||
- easily share parameters between the prompts by binding them to one class
|
||||
|
||||
|
||||
|
||||
Here is a simple example of a code written with **LangChain Decorators ✨**
|
||||
|
||||
``` python
|
||||
|
||||
@llm_prompt
|
||||
def write_me_short_post(topic:str, platform:str="twitter", audience:str = "developers")->str:
|
||||
"""
|
||||
Write me a short header for my post about {topic} for {platform} platform.
|
||||
It should be for {audience} audience.
|
||||
(Max 15 words)
|
||||
"""
|
||||
return
|
||||
|
||||
# run it naturaly
|
||||
write_me_short_post(topic="starwars")
|
||||
# or
|
||||
write_me_short_post(topic="starwars", platform="redit")
|
||||
```
|
||||
|
||||
# Quick start
|
||||
## Installation
|
||||
```bash
|
||||
pip install langchain_decorators
|
||||
```
|
||||
|
||||
## Examples
|
||||
|
||||
Good idea on how to start is to review the examples here:
|
||||
- [jupyter notebook](https://github.com/ju-bezdek/langchain-decorators/blob/main/example_notebook.ipynb)
|
||||
- [colab notebook](https://colab.research.google.com/drive/1no-8WfeP6JaLD9yUtkPgym6x0G9ZYZOG#scrollTo=N4cf__D0E2Yk)
|
||||
|
||||
# Defining other parameters
|
||||
Here we are just marking a function as a prompt with `llm_prompt` decorator, turning it effectively into a LLMChain. Instead of running it
|
||||
|
||||
|
||||
Standard LLMchain takes much more init parameter than just inputs_variables and prompt... here is this implementation detail hidden in the decorator.
|
||||
Here is how it works:
|
||||
|
||||
1. Using **Global settings**:
|
||||
|
||||
``` python
|
||||
# define global settings for all prompty (if not set - chatGPT is the current default)
|
||||
from langchain_decorators import GlobalSettings
|
||||
|
||||
GlobalSettings.define_settings(
|
||||
default_llm=ChatOpenAI(temperature=0.0), this is default... can change it here globally
|
||||
default_streaming_llm=ChatOpenAI(temperature=0.0,streaming=True), this is default... can change it here for all ... will be used for streaming
|
||||
)
|
||||
```
|
||||
|
||||
2. Using predefined **prompt types**
|
||||
|
||||
``` python
|
||||
#You can change the default prompt types
|
||||
from langchain_decorators import PromptTypes, PromptTypeSettings
|
||||
|
||||
PromptTypes.AGENT_REASONING.llm = ChatOpenAI()
|
||||
|
||||
# Or you can just define your own ones:
|
||||
class MyCustomPromptTypes(PromptTypes):
|
||||
GPT4=PromptTypeSettings(llm=ChatOpenAI(model="gpt-4"))
|
||||
|
||||
@llm_prompt(prompt_type=MyCustomPromptTypes.GPT4)
|
||||
def write_a_complicated_code(app_idea:str)->str:
|
||||
...
|
||||
|
||||
```
|
||||
|
||||
3. Define the settings **directly in the decorator**
|
||||
|
||||
``` python
|
||||
from langchain.llms import OpenAI
|
||||
|
||||
@llm_prompt(
|
||||
llm=OpenAI(temperature=0.7),
|
||||
stop_tokens=["\nObservation"],
|
||||
...
|
||||
)
|
||||
def creative_writer(book_title:str)->str:
|
||||
...
|
||||
```
|
||||
|
||||
## Passing a memory and/or callbacks:
|
||||
|
||||
To pass any of these, just declare them in the function (or use kwargs to pass anything)
|
||||
|
||||
```python
|
||||
|
||||
@llm_prompt()
|
||||
async def write_me_short_post(topic:str, platform:str="twitter", memory:SimpleMemory = None):
|
||||
"""
|
||||
{history_key}
|
||||
Write me a short header for my post about {topic} for {platform} platform.
|
||||
It should be for {audience} audience.
|
||||
(Max 15 words)
|
||||
"""
|
||||
pass
|
||||
|
||||
await write_me_short_post(topic="old movies")
|
||||
|
||||
```
|
||||
|
||||
# Simplified streaming
|
||||
|
||||
If we wan't to leverage streaming:
|
||||
- we need to define prompt as async function
|
||||
- turn on the streaming on the decorator, or we can define PromptType with streaming on
|
||||
- capture the stream using StreamingContext
|
||||
|
||||
This way we just mark which prompt should be streamed, not needing to tinker with what LLM should we use, passing around the creating and distribute streaming handler into particular part of our chain... just turn the streaming on/off on prompt/prompt type...
|
||||
|
||||
The streaming will happen only if we call it in streaming context ... there we can define a simple function to handle the stream
|
||||
|
||||
``` python
|
||||
# this code example is complete and should run as it is
|
||||
|
||||
from langchain_decorators import StreamingContext, llm_prompt
|
||||
|
||||
# this will mark the prompt for streaming (useful if we want stream just some prompts in our app... but don't want to pass distribute the callback handlers)
|
||||
# note that only async functions can be streamed (will get an error if it's not)
|
||||
@llm_prompt(capture_stream=True)
|
||||
async def write_me_short_post(topic:str, platform:str="twitter", audience:str = "developers"):
|
||||
"""
|
||||
Write me a short header for my post about {topic} for {platform} platform.
|
||||
It should be for {audience} audience.
|
||||
(Max 15 words)
|
||||
"""
|
||||
pass
|
||||
|
||||
|
||||
|
||||
# just an arbitrary function to demonstrate the streaming... wil be some websockets code in the real world
|
||||
tokens=[]
|
||||
def capture_stream_func(new_token:str):
|
||||
tokens.append(new_token)
|
||||
|
||||
# if we want to capture the stream, we need to wrap the execution into StreamingContext...
|
||||
# this will allow us to capture the stream even if the prompt call is hidden inside higher level method
|
||||
# only the prompts marked with capture_stream will be captured here
|
||||
with StreamingContext(stream_to_stdout=True, callback=capture_stream_func):
|
||||
result = await run_prompt()
|
||||
print("Stream finished ... we can distinguish tokens thanks to alternating colors")
|
||||
|
||||
|
||||
print("\nWe've captured",len(tokens),"tokens🎉\n")
|
||||
print("Here is the result:")
|
||||
print(result)
|
||||
```
|
||||
|
||||
|
||||
# Prompt declarations
|
||||
By default the prompt is is the whole function docs, unless you mark your prompt
|
||||
|
||||
## Documenting your prompt
|
||||
|
||||
We can specify what part of our docs is the prompt definition, by specifying a code block with **<prompt>** language tag
|
||||
|
||||
``` python
|
||||
@llm_prompt
|
||||
def write_me_short_post(topic:str, platform:str="twitter", audience:str = "developers"):
|
||||
"""
|
||||
Here is a good way to write a prompt as part of a function docstring, with additional documentation for devs.
|
||||
|
||||
It needs to be a code block, marked as a `<prompt>` language
|
||||
```<prompt>
|
||||
Write me a short header for my post about {topic} for {platform} platform.
|
||||
It should be for {audience} audience.
|
||||
(Max 15 words)
|
||||
```
|
||||
|
||||
Now only to code block above will be used as a prompt, and the rest of the docstring will be used as a description for developers.
|
||||
(It has also a nice benefit that IDE (like VS code) will display the prompt properly (not trying to parse it as markdown, and thus not showing new lines properly))
|
||||
"""
|
||||
return
|
||||
```
|
||||
|
||||
## Chat messages prompt
|
||||
|
||||
For chat models is very useful to define prompt as a set of message templates... here is how to do it:
|
||||
|
||||
``` python
|
||||
@llm_prompt
|
||||
def simulate_conversation(human_input:str, agent_role:str="a pirate"):
|
||||
"""
|
||||
## System message
|
||||
- note the `:system` sufix inside the <prompt:_role_> tag
|
||||
|
||||
|
||||
```<prompt:system>
|
||||
You are a {agent_role} hacker. You mus act like one.
|
||||
You reply always in code, using python or javascript code block...
|
||||
for example:
|
||||
|
||||
... do not reply with anything else.. just with code - respecting your role.
|
||||
```
|
||||
|
||||
# human message
|
||||
(we are using the real role that are enforced by the LLM - GPT supports system, assistant, user)
|
||||
``` <prompt:user>
|
||||
Helo, who are you
|
||||
```
|
||||
a reply:
|
||||
|
||||
|
||||
``` <prompt:assistant>
|
||||
\``` python <<- escaping inner code block with \ that should be part of the prompt
|
||||
def hello():
|
||||
print("Argh... hello you pesky pirate")
|
||||
\```
|
||||
```
|
||||
|
||||
we can also add some history using placeholder
|
||||
```<prompt:placeholder>
|
||||
{history}
|
||||
```
|
||||
```<prompt:user>
|
||||
{human_input}
|
||||
```
|
||||
|
||||
Now only to code block above will be used as a prompt, and the rest of the docstring will be used as a description for developers.
|
||||
(It has also a nice benefit that IDE (like VS code) will display the prompt properly (not trying to parse it as markdown, and thus not showing new lines properly))
|
||||
"""
|
||||
pass
|
||||
|
||||
```
|
||||
|
||||
the roles here are model native roles (assistant, user, system for chatGPT)
|
||||
|
||||
|
||||
|
||||
# Optional sections
|
||||
- you can define a whole sections of your prompt that should be optional
|
||||
- if any input in the section is missing, the whole section wont be rendered
|
||||
|
||||
the syntax for this is as follows:
|
||||
|
||||
``` python
|
||||
@llm_prompt
|
||||
def prompt_with_optional_partials():
|
||||
"""
|
||||
this text will be rendered always, but
|
||||
|
||||
{? anything inside this block will be rendered only if all the {value}s parameters are not empty (None | "") ?}
|
||||
|
||||
you can also place it in between the words
|
||||
this too will be rendered{? , but
|
||||
this block will be rendered only if {this_value} and {this_value}
|
||||
is not empty?} !
|
||||
"""
|
||||
```
|
||||
|
||||
|
||||
# Output parsers
|
||||
|
||||
- llm_prompt decorator natively tries to detect the best output parser based on the output type. (if not set, it returns the raw string)
|
||||
- list, dict and pydantic outputs are also supported natively (automaticaly)
|
||||
|
||||
``` python
|
||||
# this code example is complete and should run as it is
|
||||
|
||||
from langchain_decorators import llm_prompt
|
||||
|
||||
@llm_prompt
|
||||
def write_name_suggestions(company_business:str, count:int)->list:
|
||||
""" Write me {count} good name suggestions for company that {company_business}
|
||||
"""
|
||||
pass
|
||||
|
||||
write_name_suggestions(company_business="sells cookies", count=5)
|
||||
```
|
||||
|
||||
## More complex structures
|
||||
|
||||
for dict / pydantic you need to specify the formatting instructions...
|
||||
this can be tedious, that's why you can let the output parser gegnerate you the instructions based on the model (pydantic)
|
||||
|
||||
``` python
|
||||
from langchain_decorators import llm_prompt
|
||||
from pydantic import BaseModel, Field
|
||||
|
||||
|
||||
class TheOutputStructureWeExpect(BaseModel):
|
||||
name:str = Field (description="The name of the company")
|
||||
headline:str = Field( description="The description of the company (for landing page)")
|
||||
employees:list[str] = Field(description="5-8 fake employee names with their positions")
|
||||
|
||||
@llm_prompt()
|
||||
def fake_company_generator(company_business:str)->TheOutputStructureWeExpect:
|
||||
""" Generate a fake company that {company_business}
|
||||
{FORMAT_INSTRUCTIONS}
|
||||
"""
|
||||
return
|
||||
|
||||
company = fake_company_generator(company_business="sells cookies")
|
||||
|
||||
# print the result nicely formatted
|
||||
print("Company name: ",company.name)
|
||||
print("company headline: ",company.headline)
|
||||
print("company employees: ",company.employees)
|
||||
|
||||
```
|
||||
|
||||
|
||||
# Binding the prompt to an object
|
||||
|
||||
``` python
|
||||
from pydantic import BaseModel
|
||||
from langchain_decorators import llm_prompt
|
||||
|
||||
class AssistantPersonality(BaseModel):
|
||||
assistant_name:str
|
||||
assistant_role:str
|
||||
field:str
|
||||
|
||||
@property
|
||||
def a_property(self):
|
||||
return "whatever"
|
||||
|
||||
def hello_world(self, function_kwarg:str=None):
|
||||
"""
|
||||
We can reference any {field} or {a_property} inside our prompt... and combine it with {function_kwarg} in the method
|
||||
"""
|
||||
|
||||
|
||||
@llm_prompt
|
||||
def introduce_your_self(self)->str:
|
||||
"""
|
||||
``` <prompt:system>
|
||||
You are an assistant named {assistant_name}.
|
||||
Your role is to act as {assistant_role}
|
||||
```
|
||||
```<prompt:user>
|
||||
Introduce your self (in less than 20 words)
|
||||
```
|
||||
"""
|
||||
|
||||
|
||||
|
||||
personality = AssistantPersonality(assistant_name="John", assistant_role="a pirate")
|
||||
|
||||
print(personality.introduce_your_self(personality))
|
||||
```
|
||||
|
||||
|
||||
# More examples:
|
||||
|
||||
- these and few more examples are also available in the [colab notebook here](https://colab.research.google.com/drive/1no-8WfeP6JaLD9yUtkPgym6x0G9ZYZOG#scrollTo=N4cf__D0E2Yk)
|
||||
- including the [ReAct Agent re-implementation](https://colab.research.google.com/drive/1no-8WfeP6JaLD9yUtkPgym6x0G9ZYZOG#scrollTo=3bID5fryE2Yp) using purely langchain decorators
|
||||
@@ -35,6 +35,7 @@ from langchain.llms import AzureOpenAI
|
||||
For a more detailed walkthrough of the `Azure` wrapper, see [this notebook](../modules/models/llms/integrations/azure_openai_example.ipynb)
|
||||
|
||||
|
||||
|
||||
## Text Embedding Model
|
||||
|
||||
```python
|
||||
@@ -43,14 +44,6 @@ from langchain.embeddings import OpenAIEmbeddings
|
||||
For a more detailed walkthrough of this, see [this notebook](../modules/models/text_embedding/examples/openai.ipynb)
|
||||
|
||||
|
||||
## Chat Model
|
||||
|
||||
```python
|
||||
from langchain.chat_models import ChatOpenAI
|
||||
```
|
||||
For a more detailed walkthrough of this, see [this notebook](../modules/models/chat/integrations/openai.ipynb)
|
||||
|
||||
|
||||
## Tokenizer
|
||||
|
||||
There are several places you can use the `tiktoken` tokenizer. By default, it is used to count tokens
|
||||
|
||||
@@ -1,23 +1,19 @@
|
||||
# Prediction Guard
|
||||
|
||||
>[Prediction Guard](https://docs.predictionguard.com/) gives a quick and easy access to state-of-the-art open and closed access LLMs, without needing to spend days and weeks figuring out all of the implementation details, managing a bunch of different API specs, and setting up the infrastructure for model deployments.
|
||||
|
||||
This page covers how to use the Prediction Guard ecosystem within LangChain.
|
||||
It is broken into two parts: installation and setup, and then references to specific Prediction Guard wrappers.
|
||||
|
||||
## Installation and Setup
|
||||
- Install the Python SDK:
|
||||
```bash
|
||||
pip install predictionguard
|
||||
```
|
||||
|
||||
- Install the Python SDK with `pip install predictionguard`
|
||||
- Get an Prediction Guard access token (as described [here](https://docs.predictionguard.com/)) and set it as an environment variable (`PREDICTIONGUARD_TOKEN`)
|
||||
|
||||
## LLM
|
||||
## LLM Wrapper
|
||||
|
||||
There exists a Prediction Guard LLM wrapper, which you can access with
|
||||
```python
|
||||
from langchain.llms import PredictionGuard
|
||||
```
|
||||
|
||||
### Example
|
||||
You can provide the name of the Prediction Guard model as an argument when initializing the LLM:
|
||||
```python
|
||||
pgllm = PredictionGuard(model="MPT-7B-Instruct")
|
||||
@@ -28,12 +24,14 @@ You can also provide your access token directly as an argument:
|
||||
pgllm = PredictionGuard(model="MPT-7B-Instruct", token="<your access token>")
|
||||
```
|
||||
|
||||
Also, you can provide an "output" argument that is used to structure/ control the output of the LLM:
|
||||
Finally, you can provide an "output" argument that is used to structure/ control the output of the LLM:
|
||||
```python
|
||||
pgllm = PredictionGuard(model="MPT-7B-Instruct", output={"type": "boolean"})
|
||||
```
|
||||
|
||||
#### Basic usage of the controlled or guarded LLM:
|
||||
## Example usage
|
||||
|
||||
Basic usage of the controlled or guarded LLM wrapper:
|
||||
```python
|
||||
import os
|
||||
|
||||
@@ -74,7 +72,7 @@ pgllm = PredictionGuard(model="MPT-7B-Instruct",
|
||||
pgllm(prompt.format(query="What kind of post is this?"))
|
||||
```
|
||||
|
||||
#### Basic LLM Chaining with the Prediction Guard:
|
||||
Basic LLM Chaining with the Prediction Guard wrapper:
|
||||
```python
|
||||
import os
|
||||
|
||||
|
||||
@@ -1,35 +1,31 @@
|
||||
# PromptLayer
|
||||
|
||||
>[PromptLayer](https://docs.promptlayer.com/what-is-promptlayer/wxpF9EZkUwvdkwvVE9XEvC/how-promptlayer-works/dvgGSxNe6nB1jj8mUVbG8r)
|
||||
> is a devtool that allows you to track, manage, and share your GPT prompt engineering.
|
||||
> It acts as a middleware between your code and OpenAI's python library, recording all your API requests
|
||||
> and saving relevant metadata for easy exploration and search in the [PromptLayer](https://www.promptlayer.com) dashboard.
|
||||
This page covers how to use [PromptLayer](https://www.promptlayer.com) within LangChain.
|
||||
It is broken into two parts: installation and setup, and then references to specific PromptLayer wrappers.
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
- Install the `promptlayer` python library
|
||||
```bash
|
||||
pip install promptlayer
|
||||
```
|
||||
If you want to work with PromptLayer:
|
||||
- Install the promptlayer python library `pip install promptlayer`
|
||||
- Create a PromptLayer account
|
||||
- Create an api token and set it as an environment variable (`PROMPTLAYER_API_KEY`)
|
||||
|
||||
## Wrappers
|
||||
|
||||
## LLM
|
||||
### LLM
|
||||
|
||||
There exists an PromptLayer OpenAI LLM wrapper, which you can access with
|
||||
```python
|
||||
from langchain.llms import PromptLayerOpenAI
|
||||
```
|
||||
|
||||
### Example
|
||||
|
||||
To tag your requests, use the argument `pl_tags` when instantiating the LLM
|
||||
To tag your requests, use the argument `pl_tags` when instanializing the LLM
|
||||
```python
|
||||
from langchain.llms import PromptLayerOpenAI
|
||||
llm = PromptLayerOpenAI(pl_tags=["langchain-requests", "chatbot"])
|
||||
```
|
||||
|
||||
To get the PromptLayer request id, use the argument `return_pl_id` when instantiating the LLM
|
||||
To get the PromptLayer request id, use the argument `return_pl_id` when instanializing the LLM
|
||||
```python
|
||||
from langchain.llms import PromptLayerOpenAI
|
||||
llm = PromptLayerOpenAI(return_pl_id=True)
|
||||
@@ -46,14 +42,8 @@ You can use the PromptLayer request ID to add a prompt, score, or other metadata
|
||||
|
||||
This LLM is identical to the [OpenAI LLM](./openai.md), except that
|
||||
- all your requests will be logged to your PromptLayer account
|
||||
- you can add `pl_tags` when instantiating to tag your requests on PromptLayer
|
||||
- you can add `return_pl_id` when instantiating to return a PromptLayer request id to use [while tracking requests](https://magniv.notion.site/Track-4deee1b1f7a34c1680d085f82567dab9).
|
||||
- you can add `pl_tags` when instantializing to tag your requests on PromptLayer
|
||||
- you can add `return_pl_id` when instantializing to return a PromptLayer request id to use [while tracking requests](https://magniv.notion.site/Track-4deee1b1f7a34c1680d085f82567dab9).
|
||||
|
||||
## Chat Model
|
||||
|
||||
```python
|
||||
from langchain.chat_models import PromptLayerChatOpenAI
|
||||
```
|
||||
|
||||
See a [usage example](../modules/models/chat/integrations/promptlayer_chatopenai.ipynb).
|
||||
|
||||
PromptLayer also provides native wrappers for [`PromptLayerChatOpenAI`](../modules/models/chat/integrations/promptlayer_chatopenai.ipynb) and `PromptLayerOpenAIChat`
|
||||
|
||||
@@ -1,233 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Ray Serve\n",
|
||||
"\n",
|
||||
"[Ray Serve](https://docs.ray.io/en/latest/serve/index.html) is a scalable model serving library for building online inference APIs. Serve is particularly well suited for system composition, enabling you to build a complex inference service consisting of multiple chains and business logic all in Python code. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Goal of this notebook\n",
|
||||
"This notebook shows a simple example of how to deploy an OpenAI chain into production. You can extend it to deploy your own self-hosted models where you can easily define amount of hardware resources (GPUs and CPUs) needed to run your model in production efficiently. Read more about available options including autoscaling in the Ray Serve [documentation](https://docs.ray.io/en/latest/serve/getting_started.html).\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Setup Ray Serve\n",
|
||||
"Install ray with `pip install ray[serve]`. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## General Skeleton"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The general skeleton for deploying a service is the following:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# 0: Import ray serve and request from starlette\n",
|
||||
"from ray import serve\n",
|
||||
"from starlette.requests import Request\n",
|
||||
"\n",
|
||||
"# 1: Define a Ray Serve deployment.\n",
|
||||
"@serve.deployment\n",
|
||||
"class LLMServe:\n",
|
||||
"\n",
|
||||
" def __init__(self) -> None:\n",
|
||||
" # All the initialization code goes here\n",
|
||||
" pass\n",
|
||||
"\n",
|
||||
" async def __call__(self, request: Request) -> str:\n",
|
||||
" # You can parse the request here\n",
|
||||
" # and return a response\n",
|
||||
" return \"Hello World\"\n",
|
||||
"\n",
|
||||
"# 2: Bind the model to deployment\n",
|
||||
"deployment = LLMServe.bind()\n",
|
||||
"\n",
|
||||
"# 3: Run the deployment\n",
|
||||
"serve.api.run(deployment)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Shutdown the deployment\n",
|
||||
"serve.api.shutdown()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Example of deploying and OpenAI chain with custom prompts"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Get an OpenAI API key from [here](https://platform.openai.com/account/api-keys). By running the following code, you will be asked to provide your API key."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.llms import OpenAI\n",
|
||||
"from langchain import PromptTemplate, LLMChain"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from getpass import getpass\n",
|
||||
"OPENAI_API_KEY = getpass()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"@serve.deployment\n",
|
||||
"class DeployLLM:\n",
|
||||
"\n",
|
||||
" def __init__(self):\n",
|
||||
" # We initialize the LLM, template and the chain here\n",
|
||||
" llm = OpenAI(openai_api_key=OPENAI_API_KEY)\n",
|
||||
" template = \"Question: {question}\\n\\nAnswer: Let's think step by step.\"\n",
|
||||
" prompt = PromptTemplate(template=template, input_variables=[\"question\"])\n",
|
||||
" self.chain = LLMChain(llm=llm, prompt=prompt)\n",
|
||||
"\n",
|
||||
" def _run_chain(self, text: str):\n",
|
||||
" return self.chain(text)\n",
|
||||
"\n",
|
||||
" async def __call__(self, request: Request):\n",
|
||||
" # 1. Parse the request\n",
|
||||
" text = request.query_params[\"text\"]\n",
|
||||
" # 2. Run the chain\n",
|
||||
" resp = self._run_chain(text)\n",
|
||||
" # 3. Return the response\n",
|
||||
" return resp[\"text\"]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Now we can bind the deployment."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Bind the model to deployment\n",
|
||||
"deployment = DeployLLM.bind()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We can assign the port number and host when we want to run the deployment. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Example port number\n",
|
||||
"PORT_NUMBER = 8282\n",
|
||||
"# Run the deployment\n",
|
||||
"serve.api.run(deployment, port=PORT_NUMBER)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Now that service is deployed on port `localhost:8282` we can send a post request to get the results back."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import requests\n",
|
||||
"\n",
|
||||
"text = \"What NFL team won the Super Bowl in the year Justin Beiber was born?\"\n",
|
||||
"response = requests.post(f'http://localhost:{PORT_NUMBER}/?text={text}')\n",
|
||||
"print(response.content.decode())"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "ray",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.9"
|
||||
},
|
||||
"orig_nbformat": 4
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
||||
@@ -1,43 +0,0 @@
|
||||
# Shale Protocol
|
||||
|
||||
[Shale Protocol](https://shaleprotocol.com) provides production-ready inference APIs for open LLMs. It's a Plug & Play API as it's hosted on a highly scalable GPU cloud infrastructure.
|
||||
|
||||
Our free tier supports up to 1K daily requests per key as we want to eliminate the barrier for anyone to start building genAI apps with LLMs.
|
||||
|
||||
With Shale Protocol, developers/researchers can create apps and explore the capabilities of open LLMs at no cost.
|
||||
|
||||
This page covers how Shale-Serve API can be incorporated with LangChain.
|
||||
|
||||
As of June 2023, the API supports Vicuna-13B by default. We are going to support more LLMs such as Falcon-40B in future releases.
|
||||
|
||||
|
||||
## How to
|
||||
|
||||
### 1. Find the link to our Discord on https://shaleprotocol.com. Generate an API key through the "Shale Bot" on our Discord. No credit card is required and no free trials. It's a forever free tier with 1K limit per day per API key.
|
||||
|
||||
### 2. Use https://shale.live/v1 as OpenAI API drop-in replacement
|
||||
|
||||
For example
|
||||
```python
|
||||
from langchain.llms import OpenAI
|
||||
from langchain import PromptTemplate, LLMChain
|
||||
|
||||
import os
|
||||
os.environ['OPENAI_API_BASE'] = "https://shale.live/v1"
|
||||
os.environ['OPENAI_API_KEY'] = "ENTER YOUR API KEY"
|
||||
|
||||
llm = OpenAI()
|
||||
|
||||
template = """Question: {question}
|
||||
|
||||
# Answer: Let's think step by step."""
|
||||
|
||||
prompt = PromptTemplate(template=template, input_variables=["question"])
|
||||
|
||||
llm_chain = LLMChain(prompt=prompt, llm=llm)
|
||||
|
||||
question = "What NFL team won the Super Bowl in the year Justin Beiber was born?"
|
||||
|
||||
llm_chain.run(question)
|
||||
|
||||
```
|
||||
@@ -1,22 +0,0 @@
|
||||
# Tensorflow Hub
|
||||
|
||||
>[TensorFlow Hub](https://www.tensorflow.org/hub) is a repository of trained machine learning models ready for fine-tuning and deployable anywhere.
|
||||
|
||||
>[TensorFlow Hub](https://tfhub.dev/) lets you search and discover hundreds of trained, ready-to-deploy machine learning models in one place.
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
|
||||
```bash
|
||||
pip install tensorflow-hub
|
||||
pip install tensorflow_text
|
||||
```
|
||||
|
||||
|
||||
## Text Embedding Models
|
||||
|
||||
See a [usage example](../modules/models/text_embedding/examples/tensorflowhub.ipynb)
|
||||
|
||||
```python
|
||||
from langchain.embeddings import TensorflowHubEmbeddings
|
||||
```
|
||||
@@ -4,7 +4,7 @@
|
||||
What is Vectara?
|
||||
|
||||
**Vectara Overview:**
|
||||
- Vectara is developer-first API platform for building GenAI applications
|
||||
- Vectara is developer-first API platform for building conversational search applications
|
||||
- To use Vectara - first [sign up](https://console.vectara.com/signup) and create an account. Then create a corpus and an API key for indexing and searching.
|
||||
- You can use Vectara's [indexing API](https://docs.vectara.com/docs/indexing-apis/indexing) to add documents into Vectara's index
|
||||
- You can use Vectara's [Search API](https://docs.vectara.com/docs/search-apis/search) to query Vectara's index (which also supports Hybrid search implicitly).
|
||||
@@ -13,13 +13,6 @@ What is Vectara?
|
||||
## Installation and Setup
|
||||
To use Vectara with LangChain no special installation steps are required. You just have to provide your customer_id, corpus ID, and an API key created within the Vectara console to enable indexing and searching.
|
||||
|
||||
Alternatively these can be provided as environment variables
|
||||
- export `VECTARA_CUSTOMER_ID`="your_customer_id"
|
||||
- export `VECTARA_CORPUS_ID`="your_corpus_id"
|
||||
- export `VECTARA_API_KEY`="your-vectara-api-key"
|
||||
|
||||
## Usage
|
||||
|
||||
### VectorStore
|
||||
|
||||
There exists a wrapper around the Vectara platform, allowing you to use it as a vectorstore, whether for semantic search or example selection.
|
||||
@@ -39,21 +32,8 @@ vectara = Vectara(
|
||||
```
|
||||
The customer_id, corpus_id and api_key are optional, and if they are not supplied will be read from the environment variables `VECTARA_CUSTOMER_ID`, `VECTARA_CORPUS_ID` and `VECTARA_API_KEY`, respectively.
|
||||
|
||||
To query the vectorstore, you can use the `similarity_search` method (or `similarity_search_with_score`), which takes a query string and returns a list of results:
|
||||
```python
|
||||
results = vectara.similarity_score("what is LangChain?")
|
||||
```
|
||||
|
||||
`similarity_search_with_score` also supports the following additional arguments:
|
||||
- `k`: number of results to return (defaults to 5)
|
||||
- `lambda_val`: the [lexical matching](https://docs.vectara.com/docs/api-reference/search-apis/lexical-matching) factor for hybrid search (defaults to 0.025)
|
||||
- `filter`: a [filter](https://docs.vectara.com/docs/common-use-cases/filtering-by-metadata/filter-overview) to apply to the results (default None)
|
||||
- `n_sentence_context`: number of sentences to include before/after the actual matching segment when returning results. This defaults to 0 so as to return the exact text segment that matches, but can be used with other values e.g. 2 or 3 to return adjacent text segments.
|
||||
|
||||
The results are returned as a list of relevant documents, and a relevance score of each document.
|
||||
|
||||
|
||||
For a more detailed examples of using the Vectara wrapper, see one of these two sample notebooks:
|
||||
For a more detailed walkthrough of the Vectara wrapper, see one of the two example notebooks:
|
||||
* [Chat Over Documents with Vectara](./vectara/vectara_chat.html)
|
||||
* [Vectara Text Generation](./vectara/vectara_text_generation.html)
|
||||
|
||||
|
||||
@@ -102,11 +102,21 @@
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"<class 'langchain.vectorstores.vectara.Vectara'>\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"openai_api_key = os.environ['OPENAI_API_KEY']\n",
|
||||
"llm = OpenAI(openai_api_key=openai_api_key, temperature=0)\n",
|
||||
"retriever = vectorstore.as_retriever(lambda_val=0.025, k=5, filter=None)\n",
|
||||
"retriever = VectaraRetriever(vectorstore, alpha=0.025, k=5, filter=None)\n",
|
||||
"\n",
|
||||
"print(type(vectorstore))\n",
|
||||
"d = retriever.get_relevant_documents('What did the president say about Ketanji Brown Jackson')\n",
|
||||
"\n",
|
||||
"qa = ConversationalRetrievalChain.from_llm(llm, retriever, memory=memory)"
|
||||
@@ -132,7 +142,7 @@
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"\" The president said that Ketanji Brown Jackson is one of the nation's top legal minds and that she will continue Justice Breyer's legacy of excellence.\""
|
||||
"\" The president said that Ketanji Brown Jackson is one of the nation's top legal minds, a former top litigator in private practice, and a former federal public defender.\""
|
||||
]
|
||||
},
|
||||
"execution_count": 7,
|
||||
@@ -164,7 +174,7 @@
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"' Justice Stephen Breyer'"
|
||||
"' Justice Stephen Breyer.'"
|
||||
]
|
||||
},
|
||||
"execution_count": 9,
|
||||
@@ -231,7 +241,7 @@
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"\" The president said that Ketanji Brown Jackson is one of the nation's top legal minds and that she will continue Justice Breyer's legacy of excellence.\""
|
||||
"\" The president said that Ketanji Brown Jackson is one of the nation's top legal minds, a former top litigator in private practice, and a former federal public defender.\""
|
||||
]
|
||||
},
|
||||
"execution_count": 12,
|
||||
@@ -276,7 +286,7 @@
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"' Justice Stephen Breyer'"
|
||||
"' Justice Stephen Breyer.'"
|
||||
]
|
||||
},
|
||||
"execution_count": 14,
|
||||
@@ -334,7 +344,7 @@
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"Document(page_content='Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \\n\\nTonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \\n\\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \\n\\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.', metadata={'source': '../../../state_of_the_union.txt'})"
|
||||
"Document(page_content='Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence. A former top litigator in private practice. A former federal public defender.', metadata={'source': '../../modules/state_of_the_union.txt'})"
|
||||
]
|
||||
},
|
||||
"execution_count": 17,
|
||||
@@ -382,24 +392,6 @@
|
||||
"result = qa({\"question\": query, \"chat_history\": chat_history, \"vectordbkwargs\": vectordbkwargs})"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 35,
|
||||
"id": "24ebdaec",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
" The president said that Ketanji Brown Jackson is one of the nation's top legal minds and that she will continue Justice Breyer's legacy of excellence.\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"print(result['answer'])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "99b96dae",
|
||||
@@ -467,7 +459,7 @@
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"\" The president said that he nominated Circuit Court of Appeals Judge Ketanji Brown Jackson, who he described as one of the nation's top legal minds, to continue Justice Breyer's legacy of excellence.\""
|
||||
"' The president did not mention Ketanji Brown Jackson.'"
|
||||
]
|
||||
},
|
||||
"execution_count": 23,
|
||||
@@ -546,7 +538,7 @@
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"\" The president said that he nominated Circuit Court of Appeals Judge Ketanji Brown Jackson, who he described as one of the nation's top legal minds, and that she will continue Justice Breyer's legacy of excellence.\\nSOURCES: ../../../state_of_the_union.txt\""
|
||||
"' The president did not mention Ketanji Brown Jackson.\\nSOURCES: ../../modules/state_of_the_union.txt'"
|
||||
]
|
||||
},
|
||||
"execution_count": 27,
|
||||
@@ -606,7 +598,7 @@
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
" The president said that Ketanji Brown Jackson is one of the nation's top legal minds and that she will continue Justice Breyer's legacy of excellence."
|
||||
" The president said that Ketanji Brown Jackson is one of the nation's top legal minds, a former top litigator in private practice, and a former federal public defender."
|
||||
]
|
||||
}
|
||||
],
|
||||
@@ -628,7 +620,7 @@
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
" Justice Stephen Breyer"
|
||||
" Justice Stephen Breyer."
|
||||
]
|
||||
}
|
||||
],
|
||||
@@ -689,7 +681,7 @@
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"\" The president said that Ketanji Brown Jackson is one of the nation's top legal minds and that she will continue Justice Breyer's legacy of excellence.\""
|
||||
"\" The president said that Ketanji Brown Jackson is one of the nation's top legal minds, a former top litigator in private practice, and a former federal public defender.\""
|
||||
]
|
||||
},
|
||||
"execution_count": 33,
|
||||
|
||||
@@ -6,7 +6,7 @@
|
||||
"source": [
|
||||
"# Vectara Text Generation\n",
|
||||
"\n",
|
||||
"This notebook is based on [text generation](https://github.com/hwchase17/langchain/blob/master/docs/modules/chains/index_examples/vector_db_text_generation.ipynb) notebook and adapted to Vectara."
|
||||
"This notebook is based on [chat_vector_db](https://github.com/hwchase17/langchain/blob/master/docs/modules/chains/index_examples/question_answering.ipynb) and adapted to Vectara."
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -24,7 +24,6 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"from langchain.llms import OpenAI\n",
|
||||
"from langchain.docstore.document import Document\n",
|
||||
"import requests\n",
|
||||
@@ -160,7 +159,7 @@
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"[{'text': '\\n\\nEnvironment variables are a powerful tool for managing configuration settings in your applications. They allow you to store and access values from anywhere in your code, making it easier to keep your codebase organized and maintainable.\\n\\nHowever, there are times when you may want to use environment variables specifically for a single command. This is where shell variables come in. Shell variables are similar to environment variables, but they won\\'t be exported to spawned commands. They are defined with the following syntax:\\n\\n```sh\\nVAR_NAME=value\\n```\\n\\nFor example, if you wanted to use a shell variable instead of an environment variable in a command, you could do something like this:\\n\\n```sh\\nVAR=hello && echo $VAR && deno eval \"console.log(\\'Deno: \\' + Deno.env.get(\\'VAR\\'))\"\\n```\\n\\nThis would output the following:\\n\\n```\\nhello\\nDeno: undefined\\n```\\n\\nShell variables can be useful when you want to re-use a value, but don\\'t want it available in any spawned processes.\\n\\nAnother way to use environment variables is through pipelines. Pipelines provide a way to pipe the'}, {'text': '\\n\\nEnvironment variables are a great way to store and access sensitive information in your applications. They are also useful for configuring applications and managing different environments. In Deno, there are two ways to use environment variables: the built-in `Deno.env` and the `.env` file.\\n\\nThe `Deno.env` is a built-in feature of the Deno runtime that allows you to set and get environment variables. It has getter and setter methods that you can use to access and set environment variables. For example, you can set the `FIREBASE_API_KEY` and `FIREBASE_AUTH_DOMAIN` environment variables like this:\\n\\n```ts\\nDeno.env.set(\"FIREBASE_API_KEY\", \"examplekey123\");\\nDeno.env.set(\"FIREBASE_AUTH_DOMAIN\", \"firebasedomain.com\");\\n\\nconsole.log(Deno.env.get(\"FIREBASE_API_KEY\")); // examplekey123\\nconsole.log(Deno.env.get(\"FIREBASE_AUTH_DOMAIN\")); // firebasedomain'}, {'text': \"\\n\\nEnvironment variables are a powerful tool for managing configuration and settings in your applications. They allow you to store and access values that can be used in your code, and they can be set and changed without having to modify your code.\\n\\nIn Deno, environment variables are defined using the `export` command. For example, to set a variable called `VAR_NAME` to the value `value`, you would use the following command:\\n\\n```sh\\nexport VAR_NAME=value\\n```\\n\\nYou can then access the value of the environment variable in your code using the `Deno.env.get()` method. For example, if you wanted to log the value of the `VAR_NAME` variable, you could use the following code:\\n\\n```js\\nconsole.log(Deno.env.get('VAR_NAME'));\\n```\\n\\nYou can also set environment variables for a single command. To do this, you can list the environment variables before the command, like so:\\n\\n```\\nVAR=hello VAR2=bye deno run main.ts\\n```\\n\\nThis will set the environment variables `VAR` and `V\"}, {'text': \"\\n\\nEnvironment variables are a powerful tool for managing settings and configuration in your applications. They can be used to store information such as user preferences, application settings, and even passwords. In this blog post, we'll discuss how to make Deno scripts executable with a hashbang (shebang).\\n\\nA hashbang is a line of code that is placed at the beginning of a script. It tells the system which interpreter to use when running the script. In the case of Deno, the hashbang should be `#!/usr/bin/env -S deno run --allow-env`. This tells the system to use the Deno interpreter and to allow the script to access environment variables.\\n\\nOnce the hashbang is in place, you may need to give the script execution permissions. On Linux, this can be done with the command `sudo chmod +x hashbang.ts`. After that, you can execute the script by calling it like any other command: `./hashbang.ts`.\\n\\nIn the example program, we give the context permission to access the environment variables and print the Deno installation path. This is done by using the `Deno.env.get()` function, which returns the value of the specified environment\"}]\n"
|
||||
"[{'text': '\\n\\nEnvironment variables are an essential part of any development workflow. They provide a way to store and access information that is specific to the environment in which the code is running. This can be especially useful when working with different versions of a language or framework, or when running code on different machines.\\n\\nThe Deno CLI tasks extension provides a way to easily manage environment variables when running Deno commands. This extension provides a task definition for allowing you to create tasks that execute the `deno` CLI from within the editor. The template for the Deno CLI tasks has the following interface, which can be configured in a `tasks.json` within your workspace:\\n\\nThe task definition includes the `type` field, which should be set to `deno`, and the `command` field, which is the `deno` command to run (e.g. `run`, `test`, `cache`, etc.). Additionally, you can specify additional arguments to pass on the command line, the current working directory to execute the command, and any environment variables.\\n\\nUsing environment variables with the Deno CLI tasks extension is a great way to ensure that your code is running in the correct environment. For example, if you are running a test suite,'}, {'text': '\\n\\nEnvironment variables are an important part of any programming language, and they can be used to store and access data in a variety of ways. In this blog post, we\\'ll be taking a look at environment variables specifically for the shell.\\n\\nShell variables are similar to environment variables, but they won\\'t be exported to spawned commands. They are defined with the following syntax:\\n\\n```sh\\nVAR_NAME=value\\n```\\n\\nShell variables can be used to store and access data in a variety of ways. For example, you can use them to store values that you want to re-use, but don\\'t want to be available in any spawned processes.\\n\\nFor example, if you wanted to store a value and then use it in a command, you could do something like this:\\n\\n```sh\\nVAR=hello && echo $VAR && deno eval \"console.log(\\'Deno: \\' + Deno.env.get(\\'VAR\\'))\"\\n```\\n\\nThis would output the following:\\n\\n```\\nhello\\nDeno: undefined\\n```\\n\\nAs you can see, the value stored in the shell variable is not available in the spawned process.\\n\\n'}, {'text': '\\n\\nWhen it comes to developing applications, environment variables are an essential part of the process. Environment variables are used to store information that can be used by applications and scripts to customize their behavior. This is especially important when it comes to developing applications with Deno, as there are several environment variables that can impact the behavior of Deno.\\n\\nThe most important environment variable for Deno is `DENO_AUTH_TOKENS`. This environment variable is used to store authentication tokens that are used to access remote resources. This is especially important when it comes to accessing remote APIs or databases. Without the proper authentication tokens, Deno will not be able to access the remote resources.\\n\\nAnother important environment variable for Deno is `DENO_DIR`. This environment variable is used to store the directory where Deno will store its files. This includes the Deno executable, the Deno cache, and the Deno configuration files. By setting this environment variable, you can ensure that Deno will always be able to find the files it needs.\\n\\nFinally, there is the `DENO_PLUGINS` environment variable. This environment variable is used to store the list of plugins that Deno will use. This is important for customizing the'}, {'text': '\\n\\nEnvironment variables are a great way to store and access sensitive information in your Deno applications. Deno offers built-in support for environment variables with `Deno.env`, and you can also use a `.env` file to store and access environment variables. In this blog post, we\\'ll explore both of these options and how to use them in your Deno applications.\\n\\n## Built-in `Deno.env`\\n\\nThe Deno runtime offers built-in support for environment variables with [`Deno.env`](https://deno.land/api@v1.25.3?s=Deno.env). `Deno.env` has getter and setter methods. Here is example usage:\\n\\n```ts\\nDeno.env.set(\"FIREBASE_API_KEY\", \"examplekey123\");\\nDeno.env.set(\"FIREBASE_AUTH_DOMAIN\", \"firebasedomain.com\");\\n\\nconsole.log(Deno.env.get(\"FIREBASE_API_KEY\")); // examplekey123\\nconsole.log(Deno.env.get(\"FIREBASE_AUTH_'}]\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
|
||||
@@ -14,7 +14,6 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "9b22020a",
|
||||
"metadata": {},
|
||||
@@ -140,7 +139,6 @@
|
||||
"source": []
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "c0a6c031",
|
||||
"metadata": {},
|
||||
@@ -231,7 +229,7 @@
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"agent.run(\"What did biden say about ketanji brown jackson in the state of the union address?\")"
|
||||
"agent.run(\"What did biden say about ketanji brown jackson is the state of the union address?\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -273,7 +271,6 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "787a9b5e",
|
||||
"metadata": {},
|
||||
@@ -282,7 +279,6 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "9161ba91",
|
||||
"metadata": {},
|
||||
@@ -400,7 +396,6 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "49a0cbbe",
|
||||
"metadata": {},
|
||||
|
||||
@@ -1,150 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "9502d5b0",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# OpenAI Functions Agent\n",
|
||||
"\n",
|
||||
"This notebook showcases using an agent that uses the OpenAI functions ability"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "c0a83623",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain import LLMMathChain, OpenAI, SerpAPIWrapper, SQLDatabase, SQLDatabaseChain\n",
|
||||
"from langchain.agents import initialize_agent, Tool\n",
|
||||
"from langchain.agents import AgentType\n",
|
||||
"from langchain.chat_models import ChatOpenAI"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "6fefaba2",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"llm = ChatOpenAI(temperature=0, model=\"gpt-3.5-turbo-0613\")\n",
|
||||
"search = SerpAPIWrapper()\n",
|
||||
"llm_math_chain = LLMMathChain.from_llm(llm=llm, verbose=True)\n",
|
||||
"db = SQLDatabase.from_uri(\"sqlite:///../../../../../notebooks/Chinook.db\")\n",
|
||||
"db_chain = SQLDatabaseChain.from_llm(llm, db, verbose=True)\n",
|
||||
"tools = [\n",
|
||||
" Tool(\n",
|
||||
" name = \"Search\",\n",
|
||||
" func=search.run,\n",
|
||||
" description=\"useful for when you need to answer questions about current events. You should ask targeted questions\"\n",
|
||||
" ),\n",
|
||||
" Tool(\n",
|
||||
" name=\"Calculator\",\n",
|
||||
" func=llm_math_chain.run,\n",
|
||||
" description=\"useful for when you need to answer questions about math\"\n",
|
||||
" ),\n",
|
||||
" Tool(\n",
|
||||
" name=\"FooBar-DB\",\n",
|
||||
" func=db_chain.run,\n",
|
||||
" description=\"useful for when you need to answer questions about FooBar. Input should be in the form of a question containing full context\"\n",
|
||||
" )\n",
|
||||
"]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"id": "9ff6cee9",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"mrkl = initialize_agent(tools, llm, agent=AgentType.OPENAI_FUNCTIONS, verbose=True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"id": "ba8e4cbe",
|
||||
"metadata": {
|
||||
"scrolled": false
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new chain...\u001b[0m\n",
|
||||
"\u001b[32;1m\u001b[1;3m\n",
|
||||
"Invoking: `Search` with `{'query': 'Leo DiCaprio girlfriend'}`\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[0m\u001b[36;1m\u001b[1;3mAmidst his casual romance with Gigi, Leo allegedly entered a relationship with 19-year old model, Eden Polani, in February 2023.\u001b[0m\u001b[32;1m\u001b[1;3m\n",
|
||||
"Invoking: `Calculator` with `{'expression': '19^0.43'}`\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new chain...\u001b[0m\n",
|
||||
"19^0.43\u001b[32;1m\u001b[1;3m```text\n",
|
||||
"19**0.43\n",
|
||||
"```\n",
|
||||
"...numexpr.evaluate(\"19**0.43\")...\n",
|
||||
"\u001b[0m\n",
|
||||
"Answer: \u001b[33;1m\u001b[1;3m3.547023357958959\u001b[0m\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n",
|
||||
"\u001b[33;1m\u001b[1;3mAnswer: 3.547023357958959\u001b[0m\u001b[32;1m\u001b[1;3mLeo DiCaprio's girlfriend is reportedly Eden Polani. Her current age raised to the power of 0.43 is approximately 3.55.\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"\"Leo DiCaprio's girlfriend is reportedly Eden Polani. Her current age raised to the power of 0.43 is approximately 3.55.\""
|
||||
]
|
||||
},
|
||||
"execution_count": 7,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"mrkl.run(\"Who is Leo DiCaprio's girlfriend? What is her current age raised to the 0.43 power?\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "9f5f6743",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.1"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -30,83 +30,38 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.llms import OpenAI\n",
|
||||
"from langchain.chat_models import ChatOpenAI\n",
|
||||
"from langchain.agents.agent_types import AgentType"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "bd806175",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Using ZERO_SHOT_REACT_DESCRIPTION\n",
|
||||
"\n",
|
||||
"This shows how to initialize the agent using the ZERO_SHOT_REACT_DESCRIPTION agent type. Note that this is an alternative to the above."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"id": "a1717204",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"agent = create_csv_agent(\n",
|
||||
" OpenAI(temperature=0), \n",
|
||||
" 'titanic.csv', \n",
|
||||
" verbose=True, \n",
|
||||
" agent_type=AgentType.ZERO_SHOT_REACT_DESCRIPTION\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "c31bb8a6",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Using OpenAI Functions\n",
|
||||
"\n",
|
||||
"This shows how to initialize the agent using the OPENAI_FUNCTIONS agent type. Note that this is an alternative to the above."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "16c4dc59",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"agent = create_csv_agent(\n",
|
||||
" ChatOpenAI(temperature=0, model=\"gpt-3.5-turbo-0613\"), \n",
|
||||
" 'titanic.csv', \n",
|
||||
" verbose=True, \n",
|
||||
" agent_type=AgentType.OPENAI_FUNCTIONS\n",
|
||||
")"
|
||||
"from langchain.llms import OpenAI"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "16c4dc59",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"agent = create_csv_agent(OpenAI(temperature=0), 'titanic.csv', verbose=True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "46b9489d",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Error in on_chain_start callback: 'name'\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\u001b[32;1m\u001b[1;3m\n",
|
||||
"Invoking: `python_repl_ast` with `df.shape[0]`\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[0m\u001b[36;1m\u001b[1;3m891\u001b[0m\u001b[32;1m\u001b[1;3mThere are 891 rows in the dataframe.\u001b[0m\n",
|
||||
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
|
||||
"\u001b[32;1m\u001b[1;3mThought: I need to count the number of rows\n",
|
||||
"Action: python_repl_ast\n",
|
||||
"Action Input: df.shape[0]\u001b[0m\n",
|
||||
"Observation: \u001b[36;1m\u001b[1;3m891\u001b[0m\n",
|
||||
"Thought:\u001b[32;1m\u001b[1;3m I now know the final answer\n",
|
||||
"Final Answer: There are 891 rows.\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
@@ -114,10 +69,10 @@
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'There are 891 rows in the dataframe.'"
|
||||
"'There are 891 rows.'"
|
||||
]
|
||||
},
|
||||
"execution_count": 4,
|
||||
"execution_count": 5,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
@@ -128,28 +83,23 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"execution_count": 6,
|
||||
"id": "a96309be",
|
||||
"metadata": {
|
||||
"scrolled": false
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Error in on_chain_start callback: 'name'\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\u001b[32;1m\u001b[1;3m\n",
|
||||
"Invoking: `python_repl_ast` with `df[df['SibSp'] > 3]['PassengerId'].count()`\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[0m\u001b[36;1m\u001b[1;3m30\u001b[0m\u001b[32;1m\u001b[1;3mThere are 30 people in the dataframe who have more than 3 siblings.\u001b[0m\n",
|
||||
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
|
||||
"\u001b[32;1m\u001b[1;3mThought: I need to count the number of people with more than 3 siblings\n",
|
||||
"Action: python_repl_ast\n",
|
||||
"Action Input: df[df['SibSp'] > 3].shape[0]\u001b[0m\n",
|
||||
"Observation: \u001b[36;1m\u001b[1;3m30\u001b[0m\n",
|
||||
"Thought:\u001b[32;1m\u001b[1;3m I now know the final answer\n",
|
||||
"Final Answer: 30 people have more than 3 siblings.\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
@@ -157,10 +107,10 @@
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'There are 30 people in the dataframe who have more than 3 siblings.'"
|
||||
"'30 people have more than 3 siblings.'"
|
||||
]
|
||||
},
|
||||
"execution_count": 5,
|
||||
"execution_count": 6,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
@@ -171,39 +121,35 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"execution_count": 7,
|
||||
"id": "964a09f7",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Error in on_chain_start callback: 'name'\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\u001b[32;1m\u001b[1;3m\n",
|
||||
"Invoking: `python_repl_ast` with `import pandas as pd\n",
|
||||
"import math\n",
|
||||
"\n",
|
||||
"# Create a dataframe\n",
|
||||
"data = {'Age': [22, 38, 26, 35, 35]}\n",
|
||||
"df = pd.DataFrame(data)\n",
|
||||
"\n",
|
||||
"# Calculate the average age\n",
|
||||
"average_age = df['Age'].mean()\n",
|
||||
"\n",
|
||||
"# Calculate the square root of the average age\n",
|
||||
"square_root = math.sqrt(average_age)\n",
|
||||
"\n",
|
||||
"square_root`\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[0m\u001b[36;1m\u001b[1;3m5.585696017507576\u001b[0m\u001b[32;1m\u001b[1;3mThe square root of the average age is approximately 5.59.\u001b[0m\n",
|
||||
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
|
||||
"\u001b[32;1m\u001b[1;3mThought: I need to calculate the average age first\n",
|
||||
"Action: python_repl_ast\n",
|
||||
"Action Input: df['Age'].mean()\u001b[0m\n",
|
||||
"Observation: \u001b[36;1m\u001b[1;3m29.69911764705882\u001b[0m\n",
|
||||
"Thought:\u001b[32;1m\u001b[1;3m I now need to calculate the square root of the average age\n",
|
||||
"Action: python_repl_ast\n",
|
||||
"Action Input: math.sqrt(df['Age'].mean())\u001b[0m\n",
|
||||
"Observation: \u001b[36;1m\u001b[1;3mNameError(\"name 'math' is not defined\")\u001b[0m\n",
|
||||
"Thought:\u001b[32;1m\u001b[1;3m I need to import the math library\n",
|
||||
"Action: python_repl_ast\n",
|
||||
"Action Input: import math\u001b[0m\n",
|
||||
"Observation: \u001b[36;1m\u001b[1;3m\u001b[0m\n",
|
||||
"Thought:\u001b[32;1m\u001b[1;3m I now need to calculate the square root of the average age\n",
|
||||
"Action: python_repl_ast\n",
|
||||
"Action Input: math.sqrt(df['Age'].mean())\u001b[0m\n",
|
||||
"Observation: \u001b[36;1m\u001b[1;3m5.449689683556195\u001b[0m\n",
|
||||
"Thought:\u001b[32;1m\u001b[1;3m I now know the final answer\n",
|
||||
"Final Answer: 5.449689683556195\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
@@ -211,10 +157,10 @@
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'The square root of the average age is approximately 5.59.'"
|
||||
"'5.449689683556195'"
|
||||
]
|
||||
},
|
||||
"execution_count": 6,
|
||||
"execution_count": 7,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
@@ -235,26 +181,23 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"execution_count": 3,
|
||||
"id": "15f11fbd",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Error in on_chain_start callback: 'name'\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\u001b[32;1m\u001b[1;3m\n",
|
||||
"Invoking: `python_repl_ast` with `df1['Age'].nunique() - df2['Age'].nunique()`\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[0m\u001b[36;1m\u001b[1;3m-1\u001b[0m\u001b[32;1m\u001b[1;3mThere is 1 row in the age column that is different between the two dataframes.\u001b[0m\n",
|
||||
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
|
||||
"\u001b[32;1m\u001b[1;3mThought: I need to compare the age columns in both dataframes\n",
|
||||
"Action: python_repl_ast\n",
|
||||
"Action Input: len(df1[df1['Age'] != df2['Age']])\u001b[0m\n",
|
||||
"Observation: \u001b[36;1m\u001b[1;3m177\u001b[0m\n",
|
||||
"Thought:\u001b[32;1m\u001b[1;3m I now know the final answer\n",
|
||||
"Final Answer: 177 rows in the age column are different.\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
@@ -262,17 +205,17 @@
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'There is 1 row in the age column that is different between the two dataframes.'"
|
||||
"'177 rows in the age column are different.'"
|
||||
]
|
||||
},
|
||||
"execution_count": 8,
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"agent = create_csv_agent(ChatOpenAI(temperature=0, model=\"gpt-3.5-turbo-0613\"), ['titanic.csv', 'titanic_age_fillna.csv'], verbose=True, agent_type=AgentType.OPENAI_FUNCTIONS)\n",
|
||||
"agent.run(\"how many rows in the age column are different between the two dfs?\")"
|
||||
"agent = create_csv_agent(OpenAI(temperature=0), ['titanic.csv', 'titanic_age_fillna.csv'], verbose=True)\n",
|
||||
"agent.run(\"how many rows in the age column are different?\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -19,9 +19,7 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.agents import create_pandas_dataframe_agent\n",
|
||||
"from langchain.chat_models import ChatOpenAI\n",
|
||||
"from langchain.agents.agent_types import AgentType"
|
||||
"from langchain.agents import create_pandas_dataframe_agent"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -37,16 +35,6 @@
|
||||
"df = pd.read_csv('titanic.csv')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "a62858e2",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Using ZERO_SHOT_REACT_DESCRIPTION\n",
|
||||
"\n",
|
||||
"This shows how to initialize the agent using the ZERO_SHOT_REACT_DESCRIPTION agent type. Note that this is an alternative to the above."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
@@ -57,34 +45,9 @@
|
||||
"agent = create_pandas_dataframe_agent(OpenAI(temperature=0), df, verbose=True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "7233ab56",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Using OpenAI Functions\n",
|
||||
"\n",
|
||||
"This shows how to initialize the agent using the OPENAI_FUNCTIONS agent type. Note that this is an alternative to the above."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "a8ea710e",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"agent = create_pandas_dataframe_agent(\n",
|
||||
" ChatOpenAI(temperature=0, model=\"gpt-3.5-turbo-0613\"), \n",
|
||||
" df, \n",
|
||||
" verbose=True, \n",
|
||||
" agent_type=AgentType.OPENAI_FUNCTIONS\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "a9207a2e",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
@@ -94,12 +57,13 @@
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new chain...\u001b[0m\n",
|
||||
"\u001b[32;1m\u001b[1;3m\n",
|
||||
"Invoking: `python_repl_ast` with `df.shape[0]`\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[0m\u001b[36;1m\u001b[1;3m891\u001b[0m\u001b[32;1m\u001b[1;3mThere are 891 rows in the dataframe.\u001b[0m\n",
|
||||
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
|
||||
"\u001b[32;1m\u001b[1;3mThought: I need to count the number of rows\n",
|
||||
"Action: python_repl_ast\n",
|
||||
"Action Input: df.shape[0]\u001b[0m\n",
|
||||
"Observation: \u001b[36;1m\u001b[1;3m891\u001b[0m\n",
|
||||
"Thought:\u001b[32;1m\u001b[1;3m I now know the final answer\n",
|
||||
"Final Answer: There are 891 rows.\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
@@ -107,10 +71,10 @@
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'There are 891 rows in the dataframe.'"
|
||||
"'There are 891 rows.'"
|
||||
]
|
||||
},
|
||||
"execution_count": 5,
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
@@ -208,6 +172,7 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "c4bc0584",
|
||||
"metadata": {},
|
||||
@@ -292,7 +257,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.1"
|
||||
"version": "3.9.16"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
||||
@@ -12,7 +12,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"execution_count": 4,
|
||||
"id": "f98e9c90-5c37-4fb9-af3e-d09693af8543",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
@@ -22,24 +22,12 @@
|
||||
"from langchain.agents.agent_toolkits import create_python_agent\n",
|
||||
"from langchain.tools.python.tool import PythonREPLTool\n",
|
||||
"from langchain.python import PythonREPL\n",
|
||||
"from langchain.llms.openai import OpenAI\n",
|
||||
"from langchain.agents.agent_types import AgentType\n",
|
||||
"from langchain.chat_models import ChatOpenAI"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "ca30d64c",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Using ZERO_SHOT_REACT_DESCRIPTION\n",
|
||||
"\n",
|
||||
"This shows how to initialize the agent using the ZERO_SHOT_REACT_DESCRIPTION agent type. Note that this is an alternative to the above."
|
||||
"from langchain.llms.openai import OpenAI"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"execution_count": 5,
|
||||
"id": "cc422f53-c51c-4694-a834-72ecd1e68363",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
@@ -49,34 +37,7 @@
|
||||
"agent_executor = create_python_agent(\n",
|
||||
" llm=OpenAI(temperature=0, max_tokens=1000),\n",
|
||||
" tool=PythonREPLTool(),\n",
|
||||
" verbose=True,\n",
|
||||
" agent_type=AgentType.ZERO_SHOT_REACT_DESCRIPTION\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "bb487e8e",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Using OpenAI Functions\n",
|
||||
"\n",
|
||||
"This shows how to initialize the agent using the OPENAI_FUNCTIONS agent type. Note that this is an alternative to the above."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "6e651822",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"agent_executor = create_python_agent(\n",
|
||||
" llm=ChatOpenAI(temperature=0, model=\"gpt-3.5-turbo-0613\"),\n",
|
||||
" tool=PythonREPLTool(),\n",
|
||||
" verbose=True,\n",
|
||||
" agent_type=AgentType.OPENAI_FUNCTIONS,\n",
|
||||
" agent_executor_kwargs={\"handle_parsing_errors\": True},\n",
|
||||
" verbose=True\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
@@ -91,7 +52,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"execution_count": 3,
|
||||
"id": "25cd4f92-ea9b-4fe6-9838-a4f85f81eebe",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
@@ -103,20 +64,23 @@
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new chain...\u001b[0m\n",
|
||||
"\u001b[32;1m\u001b[1;3m\n",
|
||||
"Invoking: `Python_REPL` with `def fibonacci(n):\n",
|
||||
" if n <= 0:\n",
|
||||
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
|
||||
"\u001b[32;1m\u001b[1;3m I need to calculate the 10th fibonacci number\n",
|
||||
"Action: Python REPL\n",
|
||||
"Action Input: def fibonacci(n):\n",
|
||||
" if n == 0:\n",
|
||||
" return 0\n",
|
||||
" elif n == 1:\n",
|
||||
" return 1\n",
|
||||
" else:\n",
|
||||
" return fibonacci(n-1) + fibonacci(n-2)\n",
|
||||
"\n",
|
||||
"fibonacci(10)`\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[0m\u001b[36;1m\u001b[1;3m\u001b[0m\u001b[32;1m\u001b[1;3mThe 10th Fibonacci number is 55.\u001b[0m\n",
|
||||
" return fibonacci(n-1) + fibonacci(n-2)\u001b[0m\n",
|
||||
"Observation: \u001b[36;1m\u001b[1;3m\u001b[0m\n",
|
||||
"Thought:\u001b[32;1m\u001b[1;3m I need to call the function with 10 as the argument\n",
|
||||
"Action: Python REPL\n",
|
||||
"Action Input: fibonacci(10)\u001b[0m\n",
|
||||
"Observation: \u001b[36;1m\u001b[1;3m\u001b[0m\n",
|
||||
"Thought:\u001b[32;1m\u001b[1;3m I now know the final answer\n",
|
||||
"Final Answer: 55\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
@@ -124,10 +88,10 @@
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'The 10th Fibonacci number is 55.'"
|
||||
"'55'"
|
||||
]
|
||||
},
|
||||
"execution_count": 4,
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
@@ -147,11 +111,9 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"execution_count": 4,
|
||||
"id": "4b9f60e7-eb6a-4f14-8604-498d863d4482",
|
||||
"metadata": {
|
||||
"scrolled": false
|
||||
},
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
@@ -159,70 +121,59 @@
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new chain...\u001b[0m\n",
|
||||
"\u001b[32;1m\u001b[1;3mCould not parse tool input: {'name': 'python', 'arguments': 'import torch\\nimport torch.nn as nn\\nimport torch.optim as optim\\n\\n# Define the neural network\\nclass SingleNeuron(nn.Module):\\n def __init__(self):\\n super(SingleNeuron, self).__init__()\\n self.linear = nn.Linear(1, 1)\\n \\n def forward(self, x):\\n return self.linear(x)\\n\\n# Create the synthetic data\\nx_train = torch.tensor([[1.0], [2.0], [3.0], [4.0]], dtype=torch.float32)\\ny_train = torch.tensor([[2.0], [4.0], [6.0], [8.0]], dtype=torch.float32)\\n\\n# Create the neural network\\nmodel = SingleNeuron()\\n\\n# Define the loss function and optimizer\\ncriterion = nn.MSELoss()\\noptimizer = optim.SGD(model.parameters(), lr=0.01)\\n\\n# Train the neural network\\nfor epoch in range(1, 1001):\\n # Forward pass\\n y_pred = model(x_train)\\n \\n # Compute loss\\n loss = criterion(y_pred, y_train)\\n \\n # Backward pass and optimization\\n optimizer.zero_grad()\\n loss.backward()\\n optimizer.step()\\n \\n # Print the loss every 100 epochs\\n if epoch % 100 == 0:\\n print(f\"Epoch {epoch}: Loss = {loss.item()}\")\\n\\n# Make a prediction for x = 5\\nx_test = torch.tensor([[5.0]], dtype=torch.float32)\\ny_pred = model(x_test)\\ny_pred.item()'} because the `arguments` is not valid JSON.\u001b[0mInvalid or incomplete response\u001b[32;1m\u001b[1;3m\n",
|
||||
"Invoking: `Python_REPL` with `import torch\n",
|
||||
"import torch.nn as nn\n",
|
||||
"import torch.optim as optim\n",
|
||||
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
|
||||
"\u001b[32;1m\u001b[1;3m I need to write a neural network in PyTorch and train it on the given data.\n",
|
||||
"Action: Python REPL\n",
|
||||
"Action Input: \n",
|
||||
"import torch\n",
|
||||
"\n",
|
||||
"# Define the neural network\n",
|
||||
"class SingleNeuron(nn.Module):\n",
|
||||
" def __init__(self):\n",
|
||||
" super(SingleNeuron, self).__init__()\n",
|
||||
" self.linear = nn.Linear(1, 1)\n",
|
||||
" \n",
|
||||
" def forward(self, x):\n",
|
||||
" return self.linear(x)\n",
|
||||
"# Define the model\n",
|
||||
"model = torch.nn.Sequential(\n",
|
||||
" torch.nn.Linear(1, 1)\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"# Create the synthetic data\n",
|
||||
"x_train = torch.tensor([[1.0], [2.0], [3.0], [4.0]], dtype=torch.float32)\n",
|
||||
"y_train = torch.tensor([[2.0], [4.0], [6.0], [8.0]], dtype=torch.float32)\n",
|
||||
"# Define the loss\n",
|
||||
"loss_fn = torch.nn.MSELoss()\n",
|
||||
"\n",
|
||||
"# Create the neural network\n",
|
||||
"model = SingleNeuron()\n",
|
||||
"# Define the optimizer\n",
|
||||
"optimizer = torch.optim.SGD(model.parameters(), lr=0.01)\n",
|
||||
"\n",
|
||||
"# Define the loss function and optimizer\n",
|
||||
"criterion = nn.MSELoss()\n",
|
||||
"optimizer = optim.SGD(model.parameters(), lr=0.01)\n",
|
||||
"# Define the data\n",
|
||||
"x_data = torch.tensor([[1.0], [2.0], [3.0], [4.0]])\n",
|
||||
"y_data = torch.tensor([[2.0], [4.0], [6.0], [8.0]])\n",
|
||||
"\n",
|
||||
"# Train the neural network\n",
|
||||
"for epoch in range(1, 1001):\n",
|
||||
"# Train the model\n",
|
||||
"for epoch in range(1000):\n",
|
||||
" # Forward pass\n",
|
||||
" y_pred = model(x_train)\n",
|
||||
" \n",
|
||||
" # Compute loss\n",
|
||||
" loss = criterion(y_pred, y_train)\n",
|
||||
" \n",
|
||||
" # Backward pass and optimization\n",
|
||||
" y_pred = model(x_data)\n",
|
||||
"\n",
|
||||
" # Compute and print loss\n",
|
||||
" loss = loss_fn(y_pred, y_data)\n",
|
||||
" if (epoch+1) % 100 == 0:\n",
|
||||
" print(f'Epoch {epoch+1}: loss = {loss.item():.4f}')\n",
|
||||
"\n",
|
||||
" # Zero the gradients\n",
|
||||
" optimizer.zero_grad()\n",
|
||||
"\n",
|
||||
" # Backward pass\n",
|
||||
" loss.backward()\n",
|
||||
"\n",
|
||||
" # Update the weights\n",
|
||||
" optimizer.step()\n",
|
||||
" \n",
|
||||
" # Print the loss every 100 epochs\n",
|
||||
" if epoch % 100 == 0:\n",
|
||||
" print(f\"Epoch {epoch}: Loss = {loss.item()}\")\n",
|
||||
"\n",
|
||||
"# Make a prediction for x = 5\n",
|
||||
"x_test = torch.tensor([[5.0]], dtype=torch.float32)\n",
|
||||
"y_pred = model(x_test)\n",
|
||||
"y_pred.item()`\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[0m\u001b[36;1m\u001b[1;3mEpoch 100: Loss = 0.03825576975941658\n",
|
||||
"Epoch 200: Loss = 0.02100197970867157\n",
|
||||
"Epoch 300: Loss = 0.01152981910854578\n",
|
||||
"Epoch 400: Loss = 0.006329738534986973\n",
|
||||
"Epoch 500: Loss = 0.0034749575424939394\n",
|
||||
"Epoch 600: Loss = 0.0019077073084190488\n",
|
||||
"Epoch 700: Loss = 0.001047312980517745\n",
|
||||
"Epoch 800: Loss = 0.0005749554838985205\n",
|
||||
"Epoch 900: Loss = 0.0003156439634039998\n",
|
||||
"Epoch 1000: Loss = 0.00017328384274151176\n",
|
||||
"\u001b[0m\u001b[32;1m\u001b[1;3m\n",
|
||||
"Invoking: `Python_REPL` with `x_test.item()`\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[0m\u001b[36;1m\u001b[1;3m\u001b[0m\u001b[32;1m\u001b[1;3mThe prediction for x = 5 is 10.000173568725586.\u001b[0m\n",
|
||||
"\u001b[0m\n",
|
||||
"Observation: \u001b[36;1m\u001b[1;3mEpoch 100: loss = 0.0013\n",
|
||||
"Epoch 200: loss = 0.0007\n",
|
||||
"Epoch 300: loss = 0.0004\n",
|
||||
"Epoch 400: loss = 0.0002\n",
|
||||
"Epoch 500: loss = 0.0001\n",
|
||||
"Epoch 600: loss = 0.0001\n",
|
||||
"Epoch 700: loss = 0.0000\n",
|
||||
"Epoch 800: loss = 0.0000\n",
|
||||
"Epoch 900: loss = 0.0000\n",
|
||||
"Epoch 1000: loss = 0.0000\n",
|
||||
"\u001b[0m\n",
|
||||
"Thought:\u001b[32;1m\u001b[1;3m I now know the final answer\n",
|
||||
"Final Answer: The prediction for x = 5 is 10.0.\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
@@ -230,10 +181,10 @@
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'The prediction for x = 5 is 10.000173568725586.'"
|
||||
"'The prediction for x = 5 is 10.0.'"
|
||||
]
|
||||
},
|
||||
"execution_count": 5,
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
@@ -255,9 +206,9 @@
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"display_name": "LangChain",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
"name": "langchain"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
@@ -269,7 +220,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.1"
|
||||
"version": "3.9.16"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
||||
@@ -37,30 +37,7 @@
|
||||
"from langchain.agents.agent_toolkits import SQLDatabaseToolkit\n",
|
||||
"from langchain.sql_database import SQLDatabase\n",
|
||||
"from langchain.llms.openai import OpenAI\n",
|
||||
"from langchain.agents import AgentExecutor\n",
|
||||
"from langchain.agents.agent_types import AgentType\n",
|
||||
"from langchain.chat_models import ChatOpenAI"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "65ec5bb3",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"db = SQLDatabase.from_uri(\"sqlite:///../../../../../notebooks/Chinook.db\")\n",
|
||||
"toolkit = SQLDatabaseToolkit(db=db, llm=OpenAI(temperature=0))\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "f74d1792",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Using ZERO_SHOT_REACT_DESCRIPTION\n",
|
||||
"\n",
|
||||
"This shows how to initialize the agent using the ZERO_SHOT_REACT_DESCRIPTION agent type. Note that this is an alternative to the above."
|
||||
"from langchain.agents import AgentExecutor"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -72,39 +49,16 @@
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"db = SQLDatabase.from_uri(\"sqlite:///../../../../notebooks/Chinook.db\")\n",
|
||||
"toolkit = SQLDatabaseToolkit(db=db)\n",
|
||||
"\n",
|
||||
"agent_executor = create_sql_agent(\n",
|
||||
" llm=OpenAI(temperature=0),\n",
|
||||
" toolkit=toolkit,\n",
|
||||
" verbose=True,\n",
|
||||
" agent_type=AgentType.ZERO_SHOT_REACT_DESCRIPTION\n",
|
||||
" verbose=True\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "971cc455",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Using OpenAI Functions\n",
|
||||
"\n",
|
||||
"This shows how to initialize the agent using the OPENAI_FUNCTIONS agent type. Note that this is an alternative to the above."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "6426a27d",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# agent_executor = create_sql_agent(\n",
|
||||
"# llm=ChatOpenAI(temperature=0, model=\"gpt-3.5-turbo-0613\"),\n",
|
||||
"# toolkit=toolkit,\n",
|
||||
"# verbose=True,\n",
|
||||
"# agent_type=AgentType.OPENAI_FUNCTIONS\n",
|
||||
"# )"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "36ae48c7-cb08-4fef-977e-c7d4b96a464b",
|
||||
@@ -115,7 +69,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"execution_count": 3,
|
||||
"id": "ff70e83d-5ad0-4fc7-bb96-27d82ac166d7",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
@@ -127,16 +81,14 @@
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new chain...\u001b[0m\n",
|
||||
"\u001b[32;1m\u001b[1;3m\n",
|
||||
"Invoking: `list_tables_sql_db` with `{}`\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[0m\u001b[38;5;200m\u001b[1;3mAlbum, Artist, Track, PlaylistTrack, InvoiceLine, sales_table, Playlist, Genre, Employee, Customer, Invoice, MediaType\u001b[0m\u001b[32;1m\u001b[1;3m\n",
|
||||
"Invoking: `schema_sql_db` with `PlaylistTrack`\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[0m\u001b[33;1m\u001b[1;3m\n",
|
||||
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
|
||||
"\u001b[32;1m\u001b[1;3mAction: list_tables_sql_db\n",
|
||||
"Action Input: \"\"\u001b[0m\n",
|
||||
"Observation: \u001b[38;5;200m\u001b[1;3mArtist, Invoice, Playlist, Genre, Album, PlaylistTrack, Track, InvoiceLine, MediaType, Employee, Customer\u001b[0m\n",
|
||||
"Thought:\u001b[32;1m\u001b[1;3m I should look at the schema of the playlisttrack table\n",
|
||||
"Action: schema_sql_db\n",
|
||||
"Action Input: \"PlaylistTrack\"\u001b[0m\n",
|
||||
"Observation: \u001b[33;1m\u001b[1;3m\n",
|
||||
"CREATE TABLE \"PlaylistTrack\" (\n",
|
||||
"\t\"PlaylistId\" INTEGER NOT NULL, \n",
|
||||
"\t\"TrackId\" INTEGER NOT NULL, \n",
|
||||
@@ -145,36 +97,13 @@
|
||||
"\tFOREIGN KEY(\"PlaylistId\") REFERENCES \"Playlist\" (\"PlaylistId\")\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"/*\n",
|
||||
"3 rows from PlaylistTrack table:\n",
|
||||
"PlaylistId\tTrackId\n",
|
||||
"1\t3402\n",
|
||||
"1\t3389\n",
|
||||
"1\t3390\n",
|
||||
"*/\u001b[0m\u001b[32;1m\u001b[1;3mThe `PlaylistTrack` table has two columns: `PlaylistId` and `TrackId`. It is a junction table that represents the relationship between playlists and tracks. \n",
|
||||
"\n",
|
||||
"Here is the schema of the `PlaylistTrack` table:\n",
|
||||
"\n",
|
||||
"```\n",
|
||||
"CREATE TABLE \"PlaylistTrack\" (\n",
|
||||
"\t\"PlaylistId\" INTEGER NOT NULL, \n",
|
||||
"\t\"TrackId\" INTEGER NOT NULL, \n",
|
||||
"\tPRIMARY KEY (\"PlaylistId\", \"TrackId\"), \n",
|
||||
"\tFOREIGN KEY(\"TrackId\") REFERENCES \"Track\" (\"TrackId\"), \n",
|
||||
"\tFOREIGN KEY(\"PlaylistId\") REFERENCES \"Playlist\" (\"PlaylistId\")\n",
|
||||
")\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"Here are three sample rows from the `PlaylistTrack` table:\n",
|
||||
"\n",
|
||||
"```\n",
|
||||
"PlaylistId TrackId\n",
|
||||
"1 3402\n",
|
||||
"1 3389\n",
|
||||
"1 3390\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"Please let me know if there is anything else I can help you with.\u001b[0m\n",
|
||||
"SELECT * FROM 'PlaylistTrack' LIMIT 3;\n",
|
||||
"PlaylistId TrackId\n",
|
||||
"1 3402\n",
|
||||
"1 3389\n",
|
||||
"1 3390\u001b[0m\n",
|
||||
"Thought:\u001b[32;1m\u001b[1;3m I now know the final answer\n",
|
||||
"Final Answer: The PlaylistTrack table has two columns, PlaylistId and TrackId, and is linked to the Playlist and Track tables.\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
@@ -182,10 +111,10 @@
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'The `PlaylistTrack` table has two columns: `PlaylistId` and `TrackId`. It is a junction table that represents the relationship between playlists and tracks. \\n\\nHere is the schema of the `PlaylistTrack` table:\\n\\n```\\nCREATE TABLE \"PlaylistTrack\" (\\n\\t\"PlaylistId\" INTEGER NOT NULL, \\n\\t\"TrackId\" INTEGER NOT NULL, \\n\\tPRIMARY KEY (\"PlaylistId\", \"TrackId\"), \\n\\tFOREIGN KEY(\"TrackId\") REFERENCES \"Track\" (\"TrackId\"), \\n\\tFOREIGN KEY(\"PlaylistId\") REFERENCES \"Playlist\" (\"PlaylistId\")\\n)\\n```\\n\\nHere are three sample rows from the `PlaylistTrack` table:\\n\\n```\\nPlaylistId TrackId\\n1 3402\\n1 3389\\n1 3390\\n```\\n\\nPlease let me know if there is anything else I can help you with.'"
|
||||
"'The PlaylistTrack table has two columns, PlaylistId and TrackId, and is linked to the Playlist and Track tables.'"
|
||||
]
|
||||
},
|
||||
"execution_count": 4,
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
@@ -590,7 +519,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.1"
|
||||
"version": "3.10.9"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
||||
@@ -1,7 +1,6 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "18ada398-dce6-4049-9b56-fc0ede63da9c",
|
||||
"metadata": {},
|
||||
@@ -12,7 +11,6 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "eecb683b-3a46-4b9d-81a3-7caefbfec1a1",
|
||||
"metadata": {},
|
||||
@@ -90,7 +88,6 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "f4814175-964d-42f1-aa9d-22801ce1e912",
|
||||
"metadata": {},
|
||||
@@ -126,7 +123,6 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "8a38ad10",
|
||||
"metadata": {},
|
||||
@@ -169,7 +165,7 @@
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"agent_executor.run(\"What did biden say about ketanji brown jackson in the state of the union address?\")"
|
||||
"agent_executor.run(\"What did biden say about ketanji brown jackson is the state of the union address?\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -207,11 +203,10 @@
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"agent_executor.run(\"What did biden say about ketanji brown jackson in the state of the union address? List the source.\")"
|
||||
"agent_executor.run(\"What did biden say about ketanji brown jackson is the state of the union address? List the source.\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "7ca07707",
|
||||
"metadata": {},
|
||||
@@ -260,7 +255,6 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "71680984-edaf-4a63-90f5-94edbd263550",
|
||||
"metadata": {},
|
||||
@@ -305,7 +299,7 @@
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"agent_executor.run(\"What did biden say about ketanji brown jackson in the state of the union address?\")"
|
||||
"agent_executor.run(\"What did biden say about ketanji brown jackson is the state of the union address?\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -25,15 +25,6 @@ Next, we have some examples of customizing and generically working with tools
|
||||
./tools/custom_tools.ipynb
|
||||
./tools/multi_input_tool.ipynb
|
||||
./tools/tool_input_validation.ipynb
|
||||
./tools/human_approval.ipynb
|
||||
|
||||
Tools are also usable outside of the LangChain ecosystem! Here are examples of doing so
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
:glob:
|
||||
|
||||
./tools/tools_as_openai_functions.ipynb
|
||||
|
||||
|
||||
In this documentation we cover generic tooling functionality (eg how to create your own)
|
||||
|
||||
@@ -160,9 +160,3 @@ Below is a list of all supported tools and relevant information:
|
||||
- Notes: A connection to the OpenWeatherMap API (https://api.openweathermap.org), specifically the `/data/2.5/weather` endpoint.
|
||||
- Requires LLM: No
|
||||
- Extra Parameters: `openweathermap_api_key` (your API key to access this endpoint)
|
||||
|
||||
**sleep**
|
||||
|
||||
- Tool Name: Sleep
|
||||
- Tool Description: Make agent sleep for some time.
|
||||
- Requires LLM: No
|
||||
|
||||
@@ -1,138 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "4111c9d4",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Tools as OpenAI Functions\n",
|
||||
"\n",
|
||||
"This notebook goes over how to use LangChain tools as OpenAI functions."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "d65d8a60",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.chat_models import ChatOpenAI\n",
|
||||
"from langchain.schema import HumanMessage"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "abd8dc72",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"model = ChatOpenAI(model=\"gpt-3.5-turbo-0613\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "dce2cdb7",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.tools import MoveFileTool, format_tool_to_openai_function"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "3b3dc766",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"tools = [MoveFileTool()]\n",
|
||||
"functions = [format_tool_to_openai_function(t) for t in tools]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "230a7939",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"message = model.predict_messages([HumanMessage(content='move file foo to bar')], functions=functions)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"id": "c118c940",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"AIMessage(content='', additional_kwargs={'function_call': {'name': 'move_file', 'arguments': '{\\n \"source_path\": \"foo\",\\n \"destination_path\": \"bar\"\\n}'}}, example=False)"
|
||||
]
|
||||
},
|
||||
"execution_count": 6,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"message"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"id": "d618e3d8",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"{'name': 'move_file',\n",
|
||||
" 'arguments': '{\\n \"source_path\": \"foo\",\\n \"destination_path\": \"bar\"\\n}'}"
|
||||
]
|
||||
},
|
||||
"execution_count": 8,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"message.additional_kwargs['function_call']"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "751da79f",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.1"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -1,7 +1,6 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "23234b50-e6c6-4c87-9f97-259c15f36894",
|
||||
"metadata": {
|
||||
@@ -12,7 +11,6 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "29dd6333-307c-43df-b848-65001c01733b",
|
||||
"metadata": {},
|
||||
@@ -30,7 +28,6 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "2b6d7dba-cd20-472a-ae05-f68675cc9ea4",
|
||||
"metadata": {},
|
||||
@@ -39,7 +36,6 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "e4592215-6604-47e2-89ff-5db3af6d1e40",
|
||||
"metadata": {
|
||||
@@ -54,11 +50,6 @@
|
||||
" self, serialized: Dict[str, Any], prompts: List[str], **kwargs: Any\n",
|
||||
" ) -> Any:\n",
|
||||
" \"\"\"Run when LLM starts running.\"\"\"\n",
|
||||
" \n",
|
||||
" def on_chat_model_start(\n",
|
||||
" self, serialized: Dict[str, Any], messages: List[List[BaseMessage]], **kwargs: Any\n",
|
||||
" ) -> Any:\n",
|
||||
" \"\"\"Run when Chat Model starts running.\"\"\"\n",
|
||||
"\n",
|
||||
" def on_llm_new_token(self, token: str, **kwargs: Any) -> Any:\n",
|
||||
" \"\"\"Run on new LLM token. Only available when streaming is enabled.\"\"\"\n",
|
||||
@@ -109,7 +100,6 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "cbccd7d1",
|
||||
"metadata": {},
|
||||
@@ -118,8 +108,8 @@
|
||||
"\n",
|
||||
"The `callbacks` argument is available on most objects throughout the API (Chains, Models, Tools, Agents, etc.) in two different places:\n",
|
||||
"\n",
|
||||
"- **Constructor callbacks**: defined in the constructor, eg. `LLMChain(callbacks=[handler], tags=['a-tag'])`, which will be used for all calls made on that object, and will be scoped to that object only, eg. if you pass a handler to the `LLMChain` constructor, it will not be used by the Model attached to that chain.\n",
|
||||
"- **Request callbacks**: defined in the `call()`/`run()`/`apply()` methods used for issuing a request, eg. `chain.call(inputs, callbacks=[handler], tags=['a-tag'])`, which will be used for that specific request only, and all sub-requests that it contains (eg. a call to an LLMChain triggers a call to a Model, which uses the same handler passed in the `call()` method).\n",
|
||||
"- **Constructor callbacks**: defined in the constructor, eg. `LLMChain(callbacks=[handler])`, which will be used for all calls made on that object, and will be scoped to that object only, eg. if you pass a handler to the `LLMChain` constructor, it will not be used by the Model attached to that chain.\n",
|
||||
"- **Request callbacks**: defined in the `call()`/`run()`/`apply()` methods used for issuing a request, eg. `chain.call(inputs, callbacks=[handler])`, which will be used for that specific request only, and all sub-requests that it contains (eg. a call to an LLMChain triggers a call to a Model, which uses the same handler passed in the `call()` method).\n",
|
||||
"\n",
|
||||
"The `verbose` argument is available on most objects throughout the API (Chains, Models, Tools, Agents, etc.) as a constructor argument, eg. `LLMChain(verbose=True)`, and it is equivalent to passing a `ConsoleCallbackHandler` to the `callbacks` argument of that object and all child objects. This is useful for debugging, as it will log all events to the console.\n",
|
||||
"\n",
|
||||
@@ -130,18 +120,6 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "46128fb4",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Tags\n",
|
||||
"\n",
|
||||
"You can add tags to your callbacks by passing a `tags` argument to the `call()`/`run()`/`apply()` methods. This is useful for filtering your logs, eg. if you want to log all requests made to a specific LLMChain, you can add a tag, and then filter your logs by that tag. You can pass tags to both constructor and request callbacks, see the examples above for details. These tags are then passed to the `tags` argument of the \"start\" callback methods, ie. `on_llm_start`, `on_chat_model_start`, `on_chain_start`, `on_tool_start`."
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "d3bf3304-43fb-47ad-ae50-0637a17018a2",
|
||||
"metadata": {},
|
||||
@@ -223,7 +201,6 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "389c8448-5283-49e3-8c04-dbe1522e202c",
|
||||
"metadata": {},
|
||||
@@ -291,7 +268,6 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "bc9785fa-4f71-4797-91a3-4fe7e57d0429",
|
||||
"metadata": {
|
||||
@@ -387,7 +363,6 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "d26dbb34-fcc3-401c-a115-39c7620d2d65",
|
||||
"metadata": {},
|
||||
@@ -573,7 +548,6 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "32b29135-f852-4492-88ed-547275c72c53",
|
||||
"metadata": {},
|
||||
@@ -582,7 +556,6 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "fbb606d6-2863-46c5-8347-9f0bdb3805bb",
|
||||
"metadata": {},
|
||||
@@ -591,7 +564,6 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "f62cd10c-494c-47d6-aa98-6e926cb9c456",
|
||||
"metadata": {},
|
||||
@@ -600,7 +572,6 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "d5a74b3f-3769-4a4f-99c7-b6a3b20a94e2",
|
||||
"metadata": {},
|
||||
@@ -858,7 +829,6 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "254fef1b-6b6e-4352-9cf4-363fba895ac7",
|
||||
"metadata": {},
|
||||
|
||||
@@ -1,240 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "6605e7f7",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Extraction\n",
|
||||
"\n",
|
||||
"The extraction chain uses the OpenAI `functions` parameter to specify a schema to extract entities from a document. This helps us make sure that the model outputs exactly the schema of entities and properties that we want, with their appropriate types.\n",
|
||||
"\n",
|
||||
"The extraction chain is to be used when we want to extract several entities with their properties from the same passage (i.e. what people were mentioned in this passage?)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "34f04daf",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.chat_models import ChatOpenAI\n",
|
||||
"from langchain.chains import create_extraction_chain, create_extraction_chain_pydantic\n",
|
||||
"from langchain.prompts import ChatPromptTemplate"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "a2648974",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"llm = ChatOpenAI(temperature=0, \n",
|
||||
" model=\"gpt-3.5-turbo-0613\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "5ef034ce",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Extracting entities"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "78ff9df9",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"To extract entities, we need to create a schema like the following, were we specify all the properties we want to find and the type we expect them to have. We can also specify which of these properties are required and which are optional."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"id": "4ac43eba",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"schema = {\n",
|
||||
" \"properties\": {\n",
|
||||
" \"person_name\": {\"type\": \"string\"}, \n",
|
||||
" \"person_height\":{\"type\": \"integer\"},\n",
|
||||
" \"person_hair_color\": {\"type\": \"string\"},\n",
|
||||
" \"dog_name\": {\"type\": \"string\"},\n",
|
||||
" \"dog_breed\": {\"type\": \"string\"}\n",
|
||||
" },\n",
|
||||
" \"required\": [\"person_name\", \"person_height\"]\n",
|
||||
" }"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"id": "640bd005",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"inp = \"\"\"\n",
|
||||
"Alex is 5 feet tall. Claudia is 4 feet taller Alex and jumps higher than him. Claudia is a brunette and Alex is blonde.\n",
|
||||
"Alex's dog Frosty is a labrador and likes to play hide and seek.\n",
|
||||
" \"\"\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"id": "64313214",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"chain = create_extraction_chain(schema, llm)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "17c48adb",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"As we can see, we extracted the required entities and their properties in the required format:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"id": "cc5436ed",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[{'person_name': 'Alex',\n",
|
||||
" 'person_height': 5,\n",
|
||||
" 'person_hair_color': 'blonde',\n",
|
||||
" 'dog_name': 'Frosty',\n",
|
||||
" 'dog_breed': 'labrador'},\n",
|
||||
" {'person_name': 'Claudia',\n",
|
||||
" 'person_height': 9,\n",
|
||||
" 'person_hair_color': 'brunette',\n",
|
||||
" 'dog_name': '',\n",
|
||||
" 'dog_breed': ''}]"
|
||||
]
|
||||
},
|
||||
"execution_count": 10,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"chain.run(inp)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "698b4c4d",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Pydantic example"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "6504a6d9",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We can also use a Pydantic schema to choose the required properties and types and we will set as 'Optional' those that are not strictly required.\n",
|
||||
"\n",
|
||||
"By using the `create_extraction_chain_pydantic` function, we can send a Pydantic schema as input and the output will be an instantiated object that respects our desired schema. \n",
|
||||
"\n",
|
||||
"In this way, we can specify our schema in the same manner that we would a new class or function in Python - with purely Pythonic types."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 15,
|
||||
"id": "6792866b",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from typing import Optional, List\n",
|
||||
"from pydantic import BaseModel, Field"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 19,
|
||||
"id": "36a63761",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"class Properties(BaseModel):\n",
|
||||
" person_name: str\n",
|
||||
" person_height: int\n",
|
||||
" person_hair_color: str\n",
|
||||
" dog_breed: Optional[str]\n",
|
||||
" dog_name: Optional[str]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 20,
|
||||
"id": "8ffd1e57",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"chain = create_extraction_chain_pydantic(pydantic_schema=Properties, llm=llm)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 21,
|
||||
"id": "24baa954",
|
||||
"metadata": {
|
||||
"scrolled": false
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[Properties(person_name='Alex', person_height=5, person_hair_color='blonde', dog_breed='labrador', dog_name='Frosty'),\n",
|
||||
" Properties(person_name='Claudia', person_height=9, person_hair_color='brunette', dog_breed=None, dog_name=None)]"
|
||||
]
|
||||
},
|
||||
"execution_count": 21,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"inp = \"\"\"\n",
|
||||
"Alex is 5 feet tall. Claudia is 4 feet taller Alex and jumps higher than him. Claudia is a brunette and Alex is blonde.\n",
|
||||
"Alex's dog Frosty is a labrador and likes to play hide and seek.\n",
|
||||
" \"\"\"\n",
|
||||
"chain.run(inp)"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "general",
|
||||
"language": "python",
|
||||
"name": "general"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.6"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -177,7 +177,7 @@
|
||||
"\u001b[32;1m\u001b[1;3mMATCH (a:Actor)-[:ACTED_IN]->(m:Movie {name: 'Top Gun'})\n",
|
||||
"RETURN a.name\u001b[0m\n",
|
||||
"Full Context:\n",
|
||||
"\u001b[32;1m\u001b[1;3m[{'a.name': 'Val Kilmer'}, {'a.name': 'Anthony Edwards'}, {'a.name': 'Meg Ryan'}, {'a.name': 'Tom Cruise'}]\u001b[0m\n",
|
||||
"\u001b[32;1m\u001b[1;3m[{'a.name': 'Tom Cruise'}, {'a.name': 'Val Kilmer'}, {'a.name': 'Anthony Edwards'}, {'a.name': 'Meg Ryan'}]\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
@@ -185,7 +185,7 @@
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'Val Kilmer, Anthony Edwards, Meg Ryan, and Tom Cruise played in Top Gun.'"
|
||||
"'Tom Cruise, Val Kilmer, Anthony Edwards, and Meg Ryan played in Top Gun.'"
|
||||
]
|
||||
},
|
||||
"execution_count": 7,
|
||||
@@ -197,180 +197,10 @@
|
||||
"chain.run(\"Who played in Top Gun?\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "2d28c4df",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Limit the number of results\n",
|
||||
"You can limit the number of results from the Cypher QA Chain using the `top_k` parameter.\n",
|
||||
"The default is 10."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"id": "df230946",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"chain = GraphCypherQAChain.from_llm(\n",
|
||||
" ChatOpenAI(temperature=0), graph=graph, verbose=True, top_k=2\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"id": "3f1600ee",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new GraphCypherQAChain chain...\u001b[0m\n",
|
||||
"Generated Cypher:\n",
|
||||
"\u001b[32;1m\u001b[1;3mMATCH (a:Actor)-[:ACTED_IN]->(m:Movie {name: 'Top Gun'})\n",
|
||||
"RETURN a.name\u001b[0m\n",
|
||||
"Full Context:\n",
|
||||
"\u001b[32;1m\u001b[1;3m[{'a.name': 'Val Kilmer'}, {'a.name': 'Anthony Edwards'}]\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'Val Kilmer and Anthony Edwards played in Top Gun.'"
|
||||
]
|
||||
},
|
||||
"execution_count": 9,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"chain.run(\"Who played in Top Gun?\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "88c16206",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Return intermediate results\n",
|
||||
"You can return intermediate steps from the Cypher QA Chain using the `return_intermediate_steps` parameter"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"id": "e412f36b",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"chain = GraphCypherQAChain.from_llm(\n",
|
||||
" ChatOpenAI(temperature=0), graph=graph, verbose=True, return_intermediate_steps=True\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"id": "4f4699dc",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new GraphCypherQAChain chain...\u001b[0m\n",
|
||||
"Generated Cypher:\n",
|
||||
"\u001b[32;1m\u001b[1;3mMATCH (a:Actor)-[:ACTED_IN]->(m:Movie {name: 'Top Gun'})\n",
|
||||
"RETURN a.name\u001b[0m\n",
|
||||
"Full Context:\n",
|
||||
"\u001b[32;1m\u001b[1;3m[{'a.name': 'Val Kilmer'}, {'a.name': 'Anthony Edwards'}, {'a.name': 'Meg Ryan'}, {'a.name': 'Tom Cruise'}]\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n",
|
||||
"Intermediate steps: [{'query': \"MATCH (a:Actor)-[:ACTED_IN]->(m:Movie {name: 'Top Gun'})\\nRETURN a.name\"}, {'context': [{'a.name': 'Val Kilmer'}, {'a.name': 'Anthony Edwards'}, {'a.name': 'Meg Ryan'}, {'a.name': 'Tom Cruise'}]}]\n",
|
||||
"Final answer: Val Kilmer, Anthony Edwards, Meg Ryan, and Tom Cruise played in Top Gun.\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"result = chain(\"Who played in Top Gun?\")\n",
|
||||
"print(f\"Intermediate steps: {result['intermediate_steps']}\")\n",
|
||||
"print(f\"Final answer: {result['result']}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "d6e1b054",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Return direct results\n",
|
||||
"You can return direct results from the Cypher QA Chain using the `return_direct` parameter"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 12,
|
||||
"id": "2d3acf10",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"chain = GraphCypherQAChain.from_llm(\n",
|
||||
" ChatOpenAI(temperature=0), graph=graph, verbose=True, return_direct=True\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 13,
|
||||
"id": "b0a9d143",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new GraphCypherQAChain chain...\u001b[0m\n",
|
||||
"Generated Cypher:\n",
|
||||
"\u001b[32;1m\u001b[1;3mMATCH (a:Actor)-[:ACTED_IN]->(m:Movie {name: 'Top Gun'})\n",
|
||||
"RETURN a.name\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[{'a.name': 'Val Kilmer'},\n",
|
||||
" {'a.name': 'Anthony Edwards'},\n",
|
||||
" {'a.name': 'Meg Ryan'},\n",
|
||||
" {'a.name': 'Tom Cruise'}]"
|
||||
]
|
||||
},
|
||||
"execution_count": 13,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"chain.run(\"Who played in Top Gun?\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "74d0a36f",
|
||||
"id": "b4825316",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
@@ -392,7 +222,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.8.8"
|
||||
"version": "3.9.1"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
||||
@@ -1,270 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "c94240f5",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# NebulaGraphQAChain\n",
|
||||
"\n",
|
||||
"This notebook shows how to use LLMs to provide a natural language interface to NebulaGraph database."
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "dbc0ee68",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"You will need to have a running NebulaGraph cluster, for which you can run a containerized cluster by running the following script:\n",
|
||||
"\n",
|
||||
"```bash\n",
|
||||
"curl -fsSL nebula-up.siwei.io/install.sh | bash\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"Other options are:\n",
|
||||
"- Install as a [Docker Desktop Extension](https://www.docker.com/blog/distributed-cloud-native-graph-database-nebulagraph-docker-extension/). See [here](https://docs.nebula-graph.io/3.5.0/2.quick-start/1.quick-start-workflow/)\n",
|
||||
"- NebulaGraph Cloud Service. See [here](https://www.nebula-graph.io/cloud)\n",
|
||||
"- Deploy from package, source code, or via Kubernetes. See [here](https://docs.nebula-graph.io/)\n",
|
||||
"\n",
|
||||
"Once the cluster is running, we could create the SPACE and SCHEMA for the database."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "c82f4141",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"%pip install ipython-ngql\n",
|
||||
"%load_ext ngql\n",
|
||||
"\n",
|
||||
"# connect ngql jupyter extension to nebulagraph\n",
|
||||
"%ngql --address 127.0.0.1 --port 9669 --user root --password nebula\n",
|
||||
"# create a new space\n",
|
||||
"%ngql CREATE SPACE IF NOT EXISTS langchain(partition_num=1, replica_factor=1, vid_type=fixed_string(128));\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "eda0809a",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Wait for a few seconds for the space to be created.\n",
|
||||
"%ngql USE langchain;"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "119fe35c",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Create the schema, for full dataset, refer [here](https://www.siwei.io/en/nebulagraph-etl-dbt/)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "5aa796ee",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"%%ngql\n",
|
||||
"CREATE TAG IF NOT EXISTS movie(name string);\n",
|
||||
"CREATE TAG IF NOT EXISTS person(name string, birthdate string);\n",
|
||||
"CREATE EDGE IF NOT EXISTS acted_in();\n",
|
||||
"CREATE TAG INDEX IF NOT EXISTS person_index ON person(name(128));\n",
|
||||
"CREATE TAG INDEX IF NOT EXISTS movie_index ON movie(name(128));"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "66e4799a",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Wait for schema creation to complete, then we can insert some data."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "d8eea530",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"UsageError: Cell magic `%%ngql` not found.\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"%%ngql\n",
|
||||
"INSERT VERTEX person(name, birthdate) VALUES \"Al Pacino\":(\"Al Pacino\", \"1940-04-25\");\n",
|
||||
"INSERT VERTEX movie(name) VALUES \"The Godfather II\":(\"The Godfather II\");\n",
|
||||
"INSERT VERTEX movie(name) VALUES \"The Godfather Coda: The Death of Michael Corleone\":(\"The Godfather Coda: The Death of Michael Corleone\");\n",
|
||||
"INSERT EDGE acted_in() VALUES \"Al Pacino\"->\"The Godfather II\":();\n",
|
||||
"INSERT EDGE acted_in() VALUES \"Al Pacino\"->\"The Godfather Coda: The Death of Michael Corleone\":();"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "62812aad",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.chat_models import ChatOpenAI\n",
|
||||
"from langchain.chains import NebulaGraphQAChain\n",
|
||||
"from langchain.graphs import NebulaGraph"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "0928915d",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"graph = NebulaGraph(\n",
|
||||
" space=\"langchain\",\n",
|
||||
" username=\"root\",\n",
|
||||
" password=\"nebula\",\n",
|
||||
" address=\"127.0.0.1\",\n",
|
||||
" port=9669,\n",
|
||||
" session_pool_size=30,\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "58c1a8ea",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Refresh graph schema information\n",
|
||||
"\n",
|
||||
"If the schema of database changes, you can refresh the schema information needed to generate nGQL statements."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "4e3de44f",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# graph.refresh_schema()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "1fe76ccd",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Node properties: [{'tag': 'movie', 'properties': [('name', 'string')]}, {'tag': 'person', 'properties': [('name', 'string'), ('birthdate', 'string')]}]\n",
|
||||
"Edge properties: [{'edge': 'acted_in', 'properties': []}]\n",
|
||||
"Relationships: ['(:person)-[:acted_in]->(:movie)']\n",
|
||||
"\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"print(graph.get_schema)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "68a3c677",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Querying the graph\n",
|
||||
"\n",
|
||||
"We can now use the graph cypher QA chain to ask question of the graph"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "7476ce98",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"chain = NebulaGraphQAChain.from_llm(\n",
|
||||
" ChatOpenAI(temperature=0), graph=graph, verbose=True\n",
|
||||
")\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"id": "ef8ee27b",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"\n",
|
||||
"\u001b[1m> Entering new NebulaGraphQAChain chain...\u001b[0m\n",
|
||||
"Generated nGQL:\n",
|
||||
"\u001b[32;1m\u001b[1;3mMATCH (p:`person`)-[:acted_in]->(m:`movie`) WHERE m.`movie`.`name` == 'The Godfather II'\n",
|
||||
"RETURN p.`person`.`name`\u001b[0m\n",
|
||||
"Full Context:\n",
|
||||
"\u001b[32;1m\u001b[1;3m{'p.person.name': ['Al Pacino']}\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished chain.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'Al Pacino played in The Godfather II.'"
|
||||
]
|
||||
},
|
||||
"execution_count": 6,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"chain.run(\"Who played in The Godfather II?\")"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.3"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -1,389 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "a13ea924",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Tagging\n",
|
||||
"\n",
|
||||
"The tagging chain uses the OpenAI `functions` parameter to specify a schema to tag a document with. This helps us make sure that the model outputs exactly tags that we want, with their appropriate types.\n",
|
||||
"\n",
|
||||
"The tagging chain is to be used when we want to tag a passage with a specific attribute (i.e. what is the sentiment of this message?)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "bafb496a",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.chat_models import ChatOpenAI\n",
|
||||
"from langchain.chains import create_tagging_chain, create_tagging_chain_pydantic\n",
|
||||
"from langchain.prompts import ChatPromptTemplate"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "39f3ce3e",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"llm = ChatOpenAI(\n",
|
||||
" temperature=0, \n",
|
||||
" model=\"gpt-3.5-turbo-0613\"\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "832ddcd9",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Simplest approach, only specifying type"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "4fc8d766",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We can start by specifying a few properties with their expected type in our schema"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "8329f943",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"schema = {\n",
|
||||
" \"properties\": {\n",
|
||||
" \"sentiment\": {\"type\": \"string\"}, \n",
|
||||
" \"aggressiveness\": {\"type\": \"integer\"},\n",
|
||||
" \"language\": {\"type\": \"string\"},\n",
|
||||
" }\n",
|
||||
" }"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "6146ae70",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"chain = create_tagging_chain(schema, llm)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "9e306ca3",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"As we can see in the examples, it correctly interprets what we want but the results vary so that we get, for example, sentiments in different languages ('positive', 'enojado' etc.).\n",
|
||||
"\n",
|
||||
"We will see how to control these results in the next section."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 59,
|
||||
"id": "5509b6a6",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"{'sentiment': 'positive', 'language': 'Spanish'}"
|
||||
]
|
||||
},
|
||||
"execution_count": 59,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"inp = \"Estoy increiblemente contento de haberte conocido! Creo que seremos muy buenos amigos!\"\n",
|
||||
"chain.run(inp)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 60,
|
||||
"id": "9154474c",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"{'sentiment': 'enojado', 'aggressiveness': 1, 'language': 'Spanish'}"
|
||||
]
|
||||
},
|
||||
"execution_count": 60,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"inp = \"Estoy muy enojado con vos! Te voy a dar tu merecido!\"\n",
|
||||
"chain.run(inp)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 61,
|
||||
"id": "aae85b27",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"{'sentiment': 'positive', 'aggressiveness': 0, 'language': 'English'}"
|
||||
]
|
||||
},
|
||||
"execution_count": 61,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"inp = \"Weather is ok here, I can go outside without much more than a coat\"\n",
|
||||
"chain.run(inp)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "bebb2f83",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## More control\n",
|
||||
"\n",
|
||||
"By being smart about how we define our schema we can have more control over the model's output. Specifically we can define:\n",
|
||||
"\n",
|
||||
"- possible values for each property\n",
|
||||
"- description to make sure that the model understands the property\n",
|
||||
"- required properties to be returned"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "69ef0b9a",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Following is an example of how we can use _enum_, _description_ and _required_ to control for each of the previously mentioned aspects:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"id": "6a5f7961",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"schema = {\n",
|
||||
" \"properties\": {\n",
|
||||
" \"sentiment\": {\"type\": \"string\", \"enum\": [\"happy\", \"neutral\", \"sad\"]}, \n",
|
||||
" \"aggressiveness\": {\"type\": \"integer\", \"enum\": [1,2,3,4,5], \"description\": \"describes how aggressive the statement is, the higher the number the more aggressive\"},\n",
|
||||
" \"language\": {\"type\": \"string\", \"enum\": [\"spanish\", \"english\", \"french\", \"german\", \"italian\"]},\n",
|
||||
" },\n",
|
||||
" \"required\": [\"language\", \"sentiment\", \"aggressiveness\"]\n",
|
||||
" }"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 12,
|
||||
"id": "e5a5881f",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"chain = create_tagging_chain(schema, llm)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "5ded2332",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Now the answers are much better!"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 13,
|
||||
"id": "d9b9d53d",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"{'sentiment': 'happy', 'aggressiveness': 0, 'language': 'spanish'}"
|
||||
]
|
||||
},
|
||||
"execution_count": 13,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"inp = \"Estoy increiblemente contento de haberte conocido! Creo que seremos muy buenos amigos!\"\n",
|
||||
"chain.run(inp)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 14,
|
||||
"id": "1c12fa00",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"{'sentiment': 'sad', 'aggressiveness': 10, 'language': 'spanish'}"
|
||||
]
|
||||
},
|
||||
"execution_count": 14,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"inp = \"Estoy muy enojado con vos! Te voy a dar tu merecido!\"\n",
|
||||
"chain.run(inp)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 15,
|
||||
"id": "0bdfcb05",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"{'sentiment': 'neutral', 'aggressiveness': 0, 'language': 'english'}"
|
||||
]
|
||||
},
|
||||
"execution_count": 15,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"inp = \"Weather is ok here, I can go outside without much more than a coat\"\n",
|
||||
"chain.run(inp)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "e68ad17e",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Specifying schema with Pydantic"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "2f5970ec",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We can also use a Pydantic schema to specify the required properties and types. We can also send other arguments, such as 'enum' or 'description' as can be seen in the example below.\n",
|
||||
"\n",
|
||||
"By using the `create_tagging_chain_pydantic` function, we can send a Pydantic schema as input and the output will be an instantiated object that respects our desired schema. \n",
|
||||
"\n",
|
||||
"In this way, we can specify our schema in the same manner that we would a new class or function in Python - with purely Pythonic types."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 16,
|
||||
"id": "bf1f367e",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from enum import Enum\n",
|
||||
"from pydantic import BaseModel, Field"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 23,
|
||||
"id": "83a2e826",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"class Tags(BaseModel):\n",
|
||||
" sentiment: str = Field(..., enum=[\"happy\", \"neutral\", \"sad\"])\n",
|
||||
" aggressiveness: int = Field(..., description=\"describes how aggressive the statement is, the higher the number the more aggressive\", enum=[1, 2, 3, 4, 5])\n",
|
||||
" language: str = Field(..., enum=[\"spanish\", \"english\", \"french\", \"german\", \"italian\"])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 24,
|
||||
"id": "6e404892",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"chain = create_tagging_chain_pydantic(Tags, llm)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 25,
|
||||
"id": "b5fc43c4",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"inp = \"Estoy muy enojado con vos! Te voy a dar tu merecido!\"\n",
|
||||
"res = chain.run(inp)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 26,
|
||||
"id": "5074bcc3",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"Tags(sentiment='sad', aggressiveness=10, language='spanish')"
|
||||
]
|
||||
},
|
||||
"execution_count": 26,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"res"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "general",
|
||||
"language": "python",
|
||||
"name": "general"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.6"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -137,6 +137,7 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "a178173b-b183-432a-a517-250fe3191173",
|
||||
"metadata": {},
|
||||
@@ -351,7 +352,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.1"
|
||||
"version": "3.10.10"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
||||
@@ -350,7 +350,7 @@
|
||||
"\"\"\"\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"reduce_template_string = \"\"\"Give the following python fuctions name and their descritpion, answer the following question\n",
|
||||
"reduce_template_string = \"\"\"Give the following following python fuctions name and their descritpion, answer the following question\n",
|
||||
"{code_description}\n",
|
||||
"Question: {question}\n",
|
||||
"Answer:\n",
|
||||
|
||||
@@ -30,15 +30,12 @@ For detailed instructions on how to get set up with Unstructured, see installati
|
||||
:maxdepth: 1
|
||||
:glob:
|
||||
|
||||
./document_loaders/examples/airtable.ipynb
|
||||
./document_loaders/examples/audio.ipynb
|
||||
./document_loaders/examples/conll-u.ipynb
|
||||
./document_loaders/examples/copypaste.ipynb
|
||||
./document_loaders/examples/csv.ipynb
|
||||
./document_loaders/examples/email.ipynb
|
||||
./document_loaders/examples/epub.ipynb
|
||||
./document_loaders/examples/evernote.ipynb
|
||||
./document_loaders/examples/excel.ipynb
|
||||
./document_loaders/examples/facebook_chat.ipynb
|
||||
./document_loaders/examples/file_directory.ipynb
|
||||
./document_loaders/examples/html.ipynb
|
||||
@@ -117,7 +114,6 @@ We need access tokens and sometime other parameters to get access to these datas
|
||||
./document_loaders/examples/discord_loader.ipynb
|
||||
./document_loaders/examples/docugami.ipynb
|
||||
./document_loaders/examples/duckdb.ipynb
|
||||
./document_loaders/examples/fauna.ipynb
|
||||
./document_loaders/examples/figma.ipynb
|
||||
./document_loaders/examples/gitbook.ipynb
|
||||
./document_loaders/examples/git.ipynb
|
||||
@@ -139,7 +135,6 @@ We need access tokens and sometime other parameters to get access to these datas
|
||||
./document_loaders/examples/reddit.ipynb
|
||||
./document_loaders/examples/roam.ipynb
|
||||
./document_loaders/examples/slack.ipynb
|
||||
./document_loaders/examples/snowflake.ipynb
|
||||
./document_loaders/examples/spreedly.ipynb
|
||||
./document_loaders/examples/stripe.ipynb
|
||||
./document_loaders/examples/tomarkdown.ipynb
|
||||
|
||||
@@ -1,75 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "e310c8dc-acd0-48d2-801c-f37ce99acd2d",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# acreom"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "04a2c95d-4114-431e-904a-32d79005c28b",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"[acreom](https://acreom.com) is a dev-first knowledge base with tasks running on local markdown files.\n",
|
||||
"\n",
|
||||
"Below is an example on how to load a local acreom vault into Langchain. As the local vault in acreom is a folder of plain text .md files, the loader requires the path to the directory. \n",
|
||||
"\n",
|
||||
"Vault files may contain some metadata which is stored as a YAML header. These values will be added to the document’s metadata if `collect_metadata` is set to true. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "0169bee5-aa7a-4ec7-b7e7-b3bb2e58f3bb",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.document_loaders import AcreomLoader"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "c1b49ab3-616b-4149-bef5-7559d65d3d2b",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"loader = AcreomLoader('<path-to-acreom-vault>', collect_metadata=False)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "3127a018-9c1c-4886-8321-f5666d970a95",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"docs = loader.load()"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.10"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -1,142 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "7ae421e6",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Airtable"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "98aea00d",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"! pip install pyairtable"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"id": "592483eb",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.document_loaders import AirtableLoader"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "637e1205",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"* Get your API key [here](https://support.airtable.com/docs/creating-and-using-api-keys-and-access-tokens).\n",
|
||||
"* Get ID of your base [here](https://airtable.com/developers/web/api/introduction).\n",
|
||||
"* Get your table ID from the table url as shown [here](https://www.highviewapps.com/kb/where-can-i-find-the-airtable-base-id-and-table-id/#:~:text=Both%20the%20Airtable%20Base%20ID,URL%20that%20begins%20with%20tbl)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "c12a7aff",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"api_key=\"xxx\"\n",
|
||||
"base_id=\"xxx\"\n",
|
||||
"table_id=\"xxx\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"id": "ccddd5a6",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"loader = AirtableLoader(api_key,table_id,base_id)\n",
|
||||
"docs = loader.load()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "ae76c25c",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Returns each table row as `dict`."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"id": "7abec7ce",
|
||||
"metadata": {
|
||||
"scrolled": true
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"3"
|
||||
]
|
||||
},
|
||||
"execution_count": 10,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"len(docs)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"id": "403c95da",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"{'id': 'recF3GbGZCuh9sXIQ',\n",
|
||||
" 'createdTime': '2023-06-09T04:47:21.000Z',\n",
|
||||
" 'fields': {'Priority': 'High',\n",
|
||||
" 'Status': 'In progress',\n",
|
||||
" 'Name': 'Document Splitters'}}"
|
||||
]
|
||||
},
|
||||
"execution_count": 11,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"eval(docs[0].page_content)"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.16"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
File diff suppressed because one or more lines are too long
@@ -29,6 +29,7 @@
|
||||
"cell_type": "code",
|
||||
"execution_count": 26,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"jupyter": {
|
||||
"outputs_hidden": false
|
||||
}
|
||||
@@ -44,6 +45,7 @@
|
||||
"cell_type": "code",
|
||||
"execution_count": 27,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"jupyter": {
|
||||
"outputs_hidden": false
|
||||
}
|
||||
@@ -74,6 +76,7 @@
|
||||
"cell_type": "code",
|
||||
"execution_count": 28,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"jupyter": {
|
||||
"outputs_hidden": false
|
||||
}
|
||||
@@ -93,6 +96,7 @@
|
||||
"cell_type": "code",
|
||||
"execution_count": 29,
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"jupyter": {
|
||||
"outputs_hidden": false
|
||||
}
|
||||
@@ -148,211 +152,6 @@
|
||||
"source": [
|
||||
"print(data)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## `UnstructuredCSVLoader`\n",
|
||||
"\n",
|
||||
"You can also load the table using the `UnstructuredCSVLoader`. One advantage of using `UnstructuredCSVLoader` is that if you use it in `\"elements\"` mode, an HTML representation of the table will be available in the metadata."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.document_loaders.csv_loader import UnstructuredCSVLoader"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"loader = UnstructuredCSVLoader(file_path='example_data/mlb_teams_2012.csv', mode=\"elements\")\n",
|
||||
"docs = loader.load()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"<table border=\"1\" class=\"dataframe\">\n",
|
||||
" <tbody>\n",
|
||||
" <tr>\n",
|
||||
" <td>Nationals</td>\n",
|
||||
" <td>81.34</td>\n",
|
||||
" <td>98</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <td>Reds</td>\n",
|
||||
" <td>82.20</td>\n",
|
||||
" <td>97</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <td>Yankees</td>\n",
|
||||
" <td>197.96</td>\n",
|
||||
" <td>95</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <td>Giants</td>\n",
|
||||
" <td>117.62</td>\n",
|
||||
" <td>94</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <td>Braves</td>\n",
|
||||
" <td>83.31</td>\n",
|
||||
" <td>94</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <td>Athletics</td>\n",
|
||||
" <td>55.37</td>\n",
|
||||
" <td>94</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <td>Rangers</td>\n",
|
||||
" <td>120.51</td>\n",
|
||||
" <td>93</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <td>Orioles</td>\n",
|
||||
" <td>81.43</td>\n",
|
||||
" <td>93</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <td>Rays</td>\n",
|
||||
" <td>64.17</td>\n",
|
||||
" <td>90</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <td>Angels</td>\n",
|
||||
" <td>154.49</td>\n",
|
||||
" <td>89</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <td>Tigers</td>\n",
|
||||
" <td>132.30</td>\n",
|
||||
" <td>88</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <td>Cardinals</td>\n",
|
||||
" <td>110.30</td>\n",
|
||||
" <td>88</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <td>Dodgers</td>\n",
|
||||
" <td>95.14</td>\n",
|
||||
" <td>86</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <td>White Sox</td>\n",
|
||||
" <td>96.92</td>\n",
|
||||
" <td>85</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <td>Brewers</td>\n",
|
||||
" <td>97.65</td>\n",
|
||||
" <td>83</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <td>Phillies</td>\n",
|
||||
" <td>174.54</td>\n",
|
||||
" <td>81</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <td>Diamondbacks</td>\n",
|
||||
" <td>74.28</td>\n",
|
||||
" <td>81</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <td>Pirates</td>\n",
|
||||
" <td>63.43</td>\n",
|
||||
" <td>79</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <td>Padres</td>\n",
|
||||
" <td>55.24</td>\n",
|
||||
" <td>76</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <td>Mariners</td>\n",
|
||||
" <td>81.97</td>\n",
|
||||
" <td>75</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <td>Mets</td>\n",
|
||||
" <td>93.35</td>\n",
|
||||
" <td>74</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <td>Blue Jays</td>\n",
|
||||
" <td>75.48</td>\n",
|
||||
" <td>73</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <td>Royals</td>\n",
|
||||
" <td>60.91</td>\n",
|
||||
" <td>72</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <td>Marlins</td>\n",
|
||||
" <td>118.07</td>\n",
|
||||
" <td>69</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <td>Red Sox</td>\n",
|
||||
" <td>173.18</td>\n",
|
||||
" <td>69</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <td>Indians</td>\n",
|
||||
" <td>78.43</td>\n",
|
||||
" <td>68</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <td>Twins</td>\n",
|
||||
" <td>94.08</td>\n",
|
||||
" <td>66</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <td>Rockies</td>\n",
|
||||
" <td>78.06</td>\n",
|
||||
" <td>64</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <td>Cubs</td>\n",
|
||||
" <td>88.19</td>\n",
|
||||
" <td>61</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <td>Astros</td>\n",
|
||||
" <td>60.65</td>\n",
|
||||
" <td>55</td>\n",
|
||||
" </tr>\n",
|
||||
" </tbody>\n",
|
||||
"</table>\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"print(docs[0].metadata[\"text_as_html\"])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
@@ -371,7 +170,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.8.13"
|
||||
"version": "3.10.6"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
||||
@@ -1,167 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"# Embaas\n",
|
||||
"[embaas](https://embaas.io) is a fully managed NLP API service that offers features like embedding generation, document text extraction, document to embeddings and more. You can choose a [variety of pre-trained models](https://embaas.io/docs/models/embeddings).\n",
|
||||
"\n",
|
||||
"### Prerequisites\n",
|
||||
"Create a free embaas account at [https://embaas.io/register](https://embaas.io/register) and generate an [API key](https://embaas.io/dashboard/api-keys)\n",
|
||||
"\n",
|
||||
"### Document Text Extraction API\n",
|
||||
"The document text extraction API allows you to extract the text from a given document. The API supports a variety of document formats, including PDF, mp3, mp4 and more. For a full list of supported formats, check out the API docs (link below)."
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Set API key\n",
|
||||
"embaas_api_key = \"YOUR_API_KEY\"\n",
|
||||
"# or set environment variable\n",
|
||||
"os.environ[\"EMBAAS_API_KEY\"] = \"YOUR_API_KEY\""
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"#### Using a blob (bytes)"
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.document_loaders.embaas import EmbaasBlobLoader\n",
|
||||
"from langchain.document_loaders.blob_loaders import Blob"
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"blob_loader = EmbaasBlobLoader()\n",
|
||||
"blob = Blob.from_path(\"example.pdf\")\n",
|
||||
"documents = blob_loader.load(blob)"
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# You can also directly create embeddings with your preferred embeddings model\n",
|
||||
"blob_loader = EmbaasBlobLoader(params={\"model\": \"e5-large-v2\", \"should_embed\": True})\n",
|
||||
"blob = Blob.from_path(\"example.pdf\")\n",
|
||||
"documents = blob_loader.load(blob)\n",
|
||||
"\n",
|
||||
"print(documents[0][\"metadata\"][\"embedding\"])"
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"ExecuteTime": {
|
||||
"start_time": "2023-06-12T22:19:48.366886Z",
|
||||
"end_time": "2023-06-12T22:19:48.380467Z"
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"#### Using a file"
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.document_loaders.embaas import EmbaasLoader"
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"file_loader = EmbaasLoader(file_path=\"example.pdf\")\n",
|
||||
"documents = file_loader.load()"
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 15,
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Disable automatic text splitting\n",
|
||||
"file_loader = EmbaasLoader(file_path=\"example.mp3\", params={\"should_chunk\": False})\n",
|
||||
"documents = file_loader.load()"
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false,
|
||||
"ExecuteTime": {
|
||||
"start_time": "2023-06-12T22:24:31.880857Z",
|
||||
"end_time": "2023-06-12T22:24:31.894665Z"
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"For more detailed information about the embaas document text extraction API, please refer to [the official embaas API documentation](https://embaas.io/api-reference)."
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 2
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython2",
|
||||
"version": "2.7.6"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 0
|
||||
}
|
||||
@@ -9,7 +9,7 @@
|
||||
"\n",
|
||||
">[EPUB](https://en.wikipedia.org/wiki/EPUB) is an e-book file format that uses the \".epub\" file extension. The term is short for electronic publication and is sometimes styled ePub. `EPUB` is supported by many e-readers, and compatible software is available for most smartphones, tablets, and computers.\n",
|
||||
"\n",
|
||||
"This covers how to load `.epub` documents into the Document format that we can use downstream. You'll need to install the [`pandoc`](https://pandoc.org/installing.html) package for this loader to work."
|
||||
"This covers how to load `.epub` documents into the Document format that we can use downstream. You'll need to install the [`pandocs`](https://pandoc.org/installing.html) package for this loader to work."
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -19,7 +19,7 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#!pip install pandoc"
|
||||
"#!pip install pandocs"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -1,27 +0,0 @@
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<factbook>
|
||||
<country>
|
||||
<name>United States</name>
|
||||
<capital>Washington, DC</capital>
|
||||
<leader>Joe Biden</leader>
|
||||
<sport>Baseball</sport>
|
||||
</country>
|
||||
<country>
|
||||
<name>Canada</name>
|
||||
<capital>Ottawa</capital>
|
||||
<leader>Justin Trudeau</leader>
|
||||
<sport>Hockey</sport>
|
||||
</country>
|
||||
<country>
|
||||
<name>France</name>
|
||||
<capital>Paris</capital>
|
||||
<leader>Emmanuel Macron</leader>
|
||||
<sport>Soccer</sport>
|
||||
</country>
|
||||
<country>
|
||||
<name>Trinidad & Tobado</name>
|
||||
<capital>Port of Spain</capital>
|
||||
<leader>Keith Rowley</leader>
|
||||
<sport>Track & Field</sport>
|
||||
</country>
|
||||
</factbook>
|
||||
@@ -1,84 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Fauna\n",
|
||||
"\n",
|
||||
">[Fauna](https://fauna.com/) is a Document Database.\n",
|
||||
"\n",
|
||||
"Query `Fauna` documents"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#!pip install fauna"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Query data example"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.document_loaders.fauna import FaunaLoader\n",
|
||||
"\n",
|
||||
"secret = \"<enter-valid-fauna-secret>\"\n",
|
||||
"query = \"Item.all()\" # Fauna query. Assumes that the collection is called \"Item\"\n",
|
||||
"field = \"text\" # The field that contains the page content. Assumes that the field is called \"text\"\n",
|
||||
"\n",
|
||||
"loader = FaunaLoader(query, field, secret)\n",
|
||||
"docs = loader.lazy_load()\n",
|
||||
"\n",
|
||||
"for value in docs:\n",
|
||||
" print(value)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Query with Pagination\n",
|
||||
"You get a `after` value if there are more data. You can get values after the curcor by passing in the `after` string in query. \n",
|
||||
"\n",
|
||||
"To learn more following [this link](https://fqlx-beta--fauna-docs.netlify.app/fqlx/beta/reference/schema_entities/set/static-paginate)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"query = \"\"\"\n",
|
||||
"Item.paginate(\"hs+DzoPOg ... aY1hOohozrV7A\")\n",
|
||||
"Item.all()\n",
|
||||
"\"\"\"\n",
|
||||
"loader = FaunaLoader(query, field, secret)"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"language_info": {
|
||||
"name": "python"
|
||||
},
|
||||
"orig_nbformat": 4
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
||||
@@ -22,16 +22,6 @@
|
||||
"Load .docx using `Docx2txt` into a document."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "7b80ea891",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"!pip install docx2txt "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
|
||||
@@ -33,7 +33,7 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#!wget -r -A.html -P rtdocs https://python.langchain.com/en/latest/"
|
||||
"#!wget -r -A.html -P rtdocs https://langchain.readthedocs.io/en/latest/"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -146,73 +146,6 @@
|
||||
"documents[0]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Add custom scraping rules\n",
|
||||
"\n",
|
||||
"The `SitemapLoader` uses `beautifulsoup4` for the scraping process, and it scrapes every element on the page by default. The `SitemapLoader` constructor accepts a custom scraping function. This feature can be helpful to tailor the scraping process to your specific needs; for example, you might want to avoid scraping headers or navigation elements.\n",
|
||||
"\n",
|
||||
" The following example shows how to develop and use a custom function to avoid navigation and header elements."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Import the `beautifulsoup4` library and define the custom function."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"pip install beautifulsoup4"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from bs4 import BeautifulSoup\n",
|
||||
"\n",
|
||||
"def remove_nav_and_header_elements(content: BeautifulSoup) -> str:\n",
|
||||
" # Find all 'nav' and 'header' elements in the BeautifulSoup object\n",
|
||||
" nav_elements = content.find_all('nav')\n",
|
||||
" header_elements = content.find_all('header')\n",
|
||||
"\n",
|
||||
" # Remove each 'nav' and 'header' element from the BeautifulSoup object\n",
|
||||
" for element in nav_elements + header_elements:\n",
|
||||
" element.decompose()\n",
|
||||
"\n",
|
||||
" return str(content.get_text())"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Add your custom function to the `SitemapLoader` object."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"loader = SitemapLoader(\n",
|
||||
" \"https://langchain.readthedocs.io/sitemap.xml\",\n",
|
||||
" filter_urls=[\"https://python.langchain.com/en/latest/\"],\n",
|
||||
" parsing_function=remove_nav_and_header_elements\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
|
||||
@@ -41,7 +41,7 @@
|
||||
"source": [
|
||||
"# Optionally set your Slack URL. This will give you proper URLs in the docs sources.\n",
|
||||
"SLACK_WORKSPACE_URL = \"https://xxx.slack.com\"\n",
|
||||
"LOCAL_ZIPFILE = \"\" # Paste the local path to your Slack zip file here.\n",
|
||||
"LOCAL_ZIPFILE = \"\" # Paste the local paty to your Slack zip file here.\n",
|
||||
"\n",
|
||||
"loader = SlackDirectoryLoader(LOCAL_ZIPFILE, SLACK_WORKSPACE_URL)"
|
||||
]
|
||||
|
||||
@@ -1,98 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Snowflake\n",
|
||||
"\n",
|
||||
"This notebooks goes over how to load documents from Snowflake"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"! pip install snowflake-connector-python"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import settings as s\n",
|
||||
"from langchain.document_loaders import SnowflakeLoader"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"QUERY = \"select text, survey_id from CLOUD_DATA_SOLUTIONS.HAPPY_OR_NOT.OPEN_FEEDBACK limit 10\"\n",
|
||||
"snowflake_loader = SnowflakeLoader(\n",
|
||||
" query=QUERY,\n",
|
||||
" user=s.SNOWFLAKE_USER,\n",
|
||||
" password=s.SNOWFLAKE_PASS,\n",
|
||||
" account=s.SNOWFLAKE_ACCOUNT,\n",
|
||||
" warehouse=s.SNOWFLAKE_WAREHOUSE,\n",
|
||||
" role=s.SNOWFLAKE_ROLE,\n",
|
||||
" database=s.SNOWFLAKE_DATABASE,\n",
|
||||
" schema=s.SNOWFLAKE_SCHEMA\n",
|
||||
")\n",
|
||||
"snowflake_documents = snowflake_loader.load()\n",
|
||||
"print(snowflake_documents)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from snowflakeLoader import SnowflakeLoader\n",
|
||||
"import settings as s\n",
|
||||
"QUERY = \"select text, survey_id as source from CLOUD_DATA_SOLUTIONS.HAPPY_OR_NOT.OPEN_FEEDBACK limit 10\"\n",
|
||||
"snowflake_loader = SnowflakeLoader(\n",
|
||||
" query=QUERY,\n",
|
||||
" user=s.SNOWFLAKE_USER,\n",
|
||||
" password=s.SNOWFLAKE_PASS,\n",
|
||||
" account=s.SNOWFLAKE_ACCOUNT,\n",
|
||||
" warehouse=s.SNOWFLAKE_WAREHOUSE,\n",
|
||||
" role=s.SNOWFLAKE_ROLE,\n",
|
||||
" database=s.SNOWFLAKE_DATABASE,\n",
|
||||
" schema=s.SNOWFLAKE_SCHEMA,\n",
|
||||
" metadata_columns=['source']\n",
|
||||
")\n",
|
||||
"snowflake_documents = snowflake_loader.load()\n",
|
||||
"print(snowflake_documents)"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.1"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
||||
@@ -1,78 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "22a849cc",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# XML\n",
|
||||
"\n",
|
||||
"The `UnstructuredXMLLoader` is used to load `XML` files. The loader works with `.xml` files. The page content will be the text extracted from the XML tags."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "e6616e3a",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.document_loaders import UnstructuredXMLLoader"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "a654e4d9",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"Document(page_content='United States\\n\\nWashington, DC\\n\\nJoe Biden\\n\\nBaseball\\n\\nCanada\\n\\nOttawa\\n\\nJustin Trudeau\\n\\nHockey\\n\\nFrance\\n\\nParis\\n\\nEmmanuel Macron\\n\\nSoccer\\n\\nTrinidad & Tobado\\n\\nPort of Spain\\n\\nKeith Rowley\\n\\nTrack & Field', metadata={'source': 'example_data/factbook.xml'})"
|
||||
]
|
||||
},
|
||||
"execution_count": 2,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"loader = UnstructuredXMLLoader(\n",
|
||||
" \"example_data/factbook.xml\",\n",
|
||||
")\n",
|
||||
"docs = loader.load()\n",
|
||||
"docs[0]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "a54342bb",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.8.15"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -1,296 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "e48afb8d",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Loading documents from a YouTube url\n",
|
||||
"\n",
|
||||
"Building chat or QA applications on YouTube videos is a topic of high interest.\n",
|
||||
"\n",
|
||||
"Below we show how to easily go from a YouTube url to text to chat!\n",
|
||||
"\n",
|
||||
"We wil use the `OpenAIWhisperParser`, which will use the OpenAI Whisper API to transcribe audio to text.\n",
|
||||
"\n",
|
||||
"Note: You will need to have an `OPENAI_API_KEY` supplied."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "5f34e934",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.document_loaders.generic import GenericLoader\n",
|
||||
"from langchain.document_loaders.parsers import OpenAIWhisperParser\n",
|
||||
"from langchain.document_loaders.blob_loaders.youtube_audio import YoutubeAudioLoader"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "85fc12bd",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We will use `yt_dlp` to download audio for YouTube urls.\n",
|
||||
"\n",
|
||||
"We will use `pydub` to split downloaded audio files (such that we adhere to Whisper API's 25MB file size limit)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "fb5a6606",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"! pip install yt_dlp\n",
|
||||
"! pip install pydub"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "b0e119f4",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### YouTube url to text\n",
|
||||
"\n",
|
||||
"Use `YoutubeAudioLoader` to fetch / download the audio files.\n",
|
||||
"\n",
|
||||
"Then, ues `OpenAIWhisperParser()` to transcribe them to text.\n",
|
||||
"\n",
|
||||
"Let's take the first lecture of Andrej Karpathy's YouTube course as an example! "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "23e1e134",
|
||||
"metadata": {
|
||||
"scrolled": false
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"[youtube] Extracting URL: https://youtu.be/kCc8FmEb1nY\n",
|
||||
"[youtube] kCc8FmEb1nY: Downloading webpage\n",
|
||||
"[youtube] kCc8FmEb1nY: Downloading android player API JSON\n",
|
||||
"[info] kCc8FmEb1nY: Downloading 1 format(s): 140\n",
|
||||
"[dashsegments] Total fragments: 11\n",
|
||||
"[download] Destination: /Users/31treehaus/Desktop/AI/langchain-fork/docs/modules/indexes/document_loaders/examples/Let's build GPT: from scratch, in code, spelled out..m4a\n",
|
||||
"[download] 100% of 107.73MiB in 00:00:18 at 5.92MiB/s \n",
|
||||
"[FixupM4a] Correcting container of \"/Users/31treehaus/Desktop/AI/langchain-fork/docs/modules/indexes/document_loaders/examples/Let's build GPT: from scratch, in code, spelled out..m4a\"\n",
|
||||
"[ExtractAudio] Not converting audio /Users/31treehaus/Desktop/AI/langchain-fork/docs/modules/indexes/document_loaders/examples/Let's build GPT: from scratch, in code, spelled out..m4a; file is already in target format m4a\n",
|
||||
"[youtube] Extracting URL: https://youtu.be/VMj-3S1tku0\n",
|
||||
"[youtube] VMj-3S1tku0: Downloading webpage\n",
|
||||
"[youtube] VMj-3S1tku0: Downloading android player API JSON\n",
|
||||
"[info] VMj-3S1tku0: Downloading 1 format(s): 140\n",
|
||||
"[download] /Users/31treehaus/Desktop/AI/langchain-fork/docs/modules/indexes/document_loaders/examples/The spelled-out intro to neural networks and backpropagation: building micrograd.m4a has already been downloaded\n",
|
||||
"[download] 100% of 134.98MiB\n",
|
||||
"[ExtractAudio] Not converting audio /Users/31treehaus/Desktop/AI/langchain-fork/docs/modules/indexes/document_loaders/examples/The spelled-out intro to neural networks and backpropagation: building micrograd.m4a; file is already in target format m4a\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# Two Karpathy lecture videos\n",
|
||||
"urls = [\"https://youtu.be/kCc8FmEb1nY\",\n",
|
||||
" \"https://youtu.be/VMj-3S1tku0\"]\n",
|
||||
"\n",
|
||||
"# Directory to save audio files \n",
|
||||
"save_dir = \"~/Downloads/YouTube\"\n",
|
||||
"\n",
|
||||
"# Transcribe the videos to text\n",
|
||||
"loader = GenericLoader(YoutubeAudioLoader(urls,save_dir),OpenAIWhisperParser())\n",
|
||||
"docs = loader.load()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "72a94fd8",
|
||||
"metadata": {
|
||||
"scrolled": false
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"\"Hello, my name is Andrej and I've been training deep neural networks for a bit more than a decade. And in this lecture I'd like to show you what neural network training looks like under the hood. So in particular we are going to start with a blank Jupyter notebook and by the end of this lecture we will define and train a neural net and you'll get to see everything that goes on under the hood and exactly sort of how that works on an intuitive level. Now specifically what I would like to do is I w\""
|
||||
]
|
||||
},
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# Returns a list of Documents, which can be easily viewed or parsed\n",
|
||||
"docs[0].page_content[0:500]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "93be6b49",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Building a chat app from YouTube video\n",
|
||||
"\n",
|
||||
"Given `Documents`, we can easily enable chat / question+answering."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "1823f042",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.chains import RetrievalQA\n",
|
||||
"from langchain.vectorstores import FAISS\n",
|
||||
"from langchain.chat_models import ChatOpenAI\n",
|
||||
"from langchain.embeddings import OpenAIEmbeddings\n",
|
||||
"from langchain.text_splitter import RecursiveCharacterTextSplitter"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "7257cda1",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Combine doc\n",
|
||||
"combined_docs = [doc.page_content for doc in docs]\n",
|
||||
"text = \" \".join(combined_docs)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"id": "147c0c55",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Split them\n",
|
||||
"text_splitter = RecursiveCharacterTextSplitter(chunk_size = 1500, chunk_overlap = 150)\n",
|
||||
"splits = text_splitter.split_text(text)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"id": "f3556703",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Build an index\n",
|
||||
"embeddings = OpenAIEmbeddings()\n",
|
||||
"vectordb = FAISS.from_texts(splits,embeddings)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"id": "beaa99db",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Build a QA chain\n",
|
||||
"qa_chain = RetrievalQA.from_chain_type(llm = ChatOpenAI(model_name=\"gpt-3.5-turbo\", temperature=0),\n",
|
||||
" chain_type=\"stuff\",\n",
|
||||
" retriever=vectordb.as_retriever())"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"id": "f2239a62",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"\"We need to zero out the gradient before backprop at each step because the backward pass accumulates gradients in the grad attribute of each parameter. If we don't reset the grad to zero before each backward pass, the gradients will accumulate and add up, leading to incorrect updates and slower convergence. By resetting the grad to zero before each backward pass, we ensure that the gradients are calculated correctly and that the optimization process works as intended.\""
|
||||
]
|
||||
},
|
||||
"execution_count": 9,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# Ask a question!\n",
|
||||
"query = \"Why do we need to zero out the gradient before backprop at each step?\"\n",
|
||||
"qa_chain.run(query)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"id": "a8d01098",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'In the context of transformers, an encoder is a component that reads in a sequence of input tokens and generates a sequence of hidden representations. On the other hand, a decoder is a component that takes in a sequence of hidden representations and generates a sequence of output tokens. The main difference between the two is that the encoder is used to encode the input sequence into a fixed-length representation, while the decoder is used to decode the fixed-length representation into an output sequence. In machine translation, for example, the encoder reads in the source language sentence and generates a fixed-length representation, which is then used by the decoder to generate the target language sentence.'"
|
||||
]
|
||||
},
|
||||
"execution_count": 10,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"query = \"What is the difference between an encoder and decoder?\"\n",
|
||||
"qa_chain.run(query)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"id": "fe1e77dd",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'For any token, x is the input vector that contains the private information of that token, k and q are the key and query vectors respectively, which are produced by forwarding linear modules on x, and v is the vector that is calculated by propagating the same linear module on x again. The key vector represents what the token contains, and the query vector represents what the token is looking for. The vector v is the information that the token will communicate to other tokens if it finds them interesting, and it gets aggregated for the purposes of the self-attention mechanism.'"
|
||||
]
|
||||
},
|
||||
"execution_count": 11,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"query = \"For any token, what are x, k, v, and q?\"\n",
|
||||
"qa_chain.run(query)"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.16"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -1,7 +1,6 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "df770c72",
|
||||
"metadata": {},
|
||||
@@ -56,12 +55,11 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "6b278a1b",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Add video info"
|
||||
"## Add video info"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -81,36 +79,20 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"loader = YoutubeLoader.from_youtube_url(\"https://www.youtube.com/watch?v=QsYGlZkevEg\", add_video_info=True)\n",
|
||||
"loader.load()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "fc417e31",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Add language preferences\n",
|
||||
"\n",
|
||||
"Language param : It's a list of language codes in a descending priority, `en` by default.\n",
|
||||
"\n",
|
||||
"translation param : It's a translate preference when the youtube does'nt have your select language, `en` by default."
|
||||
"loader = YoutubeLoader.from_youtube_url(\"https://www.youtube.com/watch?v=QsYGlZkevEg\", add_video_info=True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "08510625",
|
||||
"id": "97b98e92",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"loader = YoutubeLoader.from_youtube_url(\"https://www.youtube.com/watch?v=QsYGlZkevEg\", add_video_info=True, language=['en','id'], translation='en')\n",
|
||||
"loader.load()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "65796cc5",
|
||||
"metadata": {},
|
||||
|
||||
@@ -1,90 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# AWS Kendra\n",
|
||||
"\n",
|
||||
"> AWS Kendra is an intelligent search service provided by Amazon Web Services (AWS). It utilizes advanced natural language processing (NLP) and machine learning algorithms to enable powerful search capabilities across various data sources within an organization. Kendra is designed to help users find the information they need quickly and accurately, improving productivity and decision-making.\n",
|
||||
"\n",
|
||||
"> With Kendra, users can search across a wide range of content types, including documents, FAQs, knowledge bases, manuals, and websites. It supports multiple languages and can understand complex queries, synonyms, and contextual meanings to provide highly relevant search results."
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Using the AWS Kendra Index Retriever"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#!pip install boto3"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import boto3\n",
|
||||
"from langchain.retrievers import AwsKendraIndexRetriever"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Create New Retriever"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"kclient = boto3.client('kendra', region_name=\"us-east-1\")\n",
|
||||
"\n",
|
||||
"retriever = AwsKendraIndexRetriever(\n",
|
||||
" kclient=kclient,\n",
|
||||
" kendraindex=\"kendraindex\",\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Now you can use retrieved documents from AWS Kendra Index"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"retriever.get_relevant_documents(\"what is langchain\")"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"language_info": {
|
||||
"name": "python"
|
||||
},
|
||||
"orig_nbformat": 4
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
||||
@@ -1,121 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "fc0db1bc",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# LOTR (Merger Retriever)\n",
|
||||
"\n",
|
||||
"Lord of the Retrievers, also known as MergerRetriever, takes a list of retrievers as input and merges the results of their get_relevant_documents() methods into a single list. The merged results will be a list of documents that are relevant to the query and that have been ranked by the different retrievers.\n",
|
||||
"\n",
|
||||
"The MergerRetriever class can be used to improve the accuracy of document retrieval in a number of ways. First, it can combine the results of multiple retrievers, which can help to reduce the risk of bias in the results. Second, it can rank the results of the different retrievers, which can help to ensure that the most relevant documents are returned first."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "9fbcc58f",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"import chromadb\n",
|
||||
"from langchain.retrievers.merger_retriever import MergerRetriever\n",
|
||||
"from langchain.vectorstores import Chroma\n",
|
||||
"from langchain.embeddings import HuggingFaceEmbeddings\n",
|
||||
"from langchain.embeddings import OpenAIEmbeddings\n",
|
||||
"from langchain.document_transformers import EmbeddingsRedundantFilter\n",
|
||||
"from langchain.retrievers.document_compressors import DocumentCompressorPipeline\n",
|
||||
"from langchain.retrievers import ContextualCompressionRetriever\n",
|
||||
"\n",
|
||||
"# Get 3 diff embeddings.\n",
|
||||
"all_mini = HuggingFaceEmbeddings(model_name=\"all-MiniLM-L6-v2\")\n",
|
||||
"multi_qa_mini = HuggingFaceEmbeddings(model_name=\"multi-qa-MiniLM-L6-dot-v1\")\n",
|
||||
"filter_embeddings = OpenAIEmbeddings()\n",
|
||||
"\n",
|
||||
"ABS_PATH = os.path.dirname(os.path.abspath(__file__))\n",
|
||||
"DB_DIR = os.path.join(ABS_PATH, \"db\")\n",
|
||||
"\n",
|
||||
"# Instantiate 2 diff cromadb indexs, each one with a diff embedding.\n",
|
||||
"client_settings = chromadb.config.Settings(\n",
|
||||
" chroma_db_impl=\"duckdb+parquet\",\n",
|
||||
" persist_directory=DB_DIR,\n",
|
||||
" anonymized_telemetry=False,\n",
|
||||
")\n",
|
||||
"db_all = Chroma(\n",
|
||||
" collection_name=\"project_store_all\",\n",
|
||||
" persist_directory=DB_DIR,\n",
|
||||
" client_settings=client_settings,\n",
|
||||
" embedding_function=all_mini,\n",
|
||||
")\n",
|
||||
"db_multi_qa = Chroma(\n",
|
||||
" collection_name=\"project_store_multi\",\n",
|
||||
" persist_directory=DB_DIR,\n",
|
||||
" client_settings=client_settings,\n",
|
||||
" embedding_function=multi_qa_mini,\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"# Define 2 diff retrievers with 2 diff embeddings and diff search type.\n",
|
||||
"retriever_all = db_all.as_retriever(\n",
|
||||
" search_type=\"similarity\", search_kwargs={\"k\": 5, \"include_metadata\": True}\n",
|
||||
")\n",
|
||||
"retriever_multi_qa = db_multi_qa.as_retriever(\n",
|
||||
" search_type=\"mmr\", search_kwargs={\"k\": 5, \"include_metadata\": True}\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"# The Lord of the Retrievers will hold the ouput of boths retrievers and can be used as any other \n",
|
||||
"# retriever on different types of chains.\n",
|
||||
"lotr = MergerRetriever(retrievers=[retriever_all, retriever_multi_qa])\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "c152339d",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Remove redundant results from the merged retrievers."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "039faea6",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"\n",
|
||||
"# We can remove redundant results from both retrievers using yet another embedding. \n",
|
||||
"# Using multiples embeddings in diff steps could help reduce biases.\n",
|
||||
"filter = EmbeddingsRedundantFilter(embeddings=filter_embeddings)\n",
|
||||
"pipeline = DocumentCompressorPipeline(transformers=[filter])\n",
|
||||
"compression_retriever = ContextualCompressionRetriever(\n",
|
||||
" base_compressor=pipeline, base_retriever=lotr\n",
|
||||
")"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.6"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -99,14 +99,13 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "2d958271",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Similarity Score Threshold Retrieval\n",
|
||||
"\n",
|
||||
"You can also use a retrieval method that sets a similarity score threshold and only returns documents with a score above that threshold"
|
||||
"You can also a retrieval method that sets a similarity score threshold and only returns documents with a score above that threshold"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -38,7 +38,6 @@ Usage examples for the text splitters:
|
||||
- `Recursive Character <./text_splitters/examples/recursive_text_splitter.html>`_
|
||||
- `spaCy <./text_splitters/examples/spacy.html>`_
|
||||
- `tiktoken (OpenAI) <./text_splitters/examples/tiktoken_splitter.html>`_
|
||||
- `Header Metadata <./text_splitters/examples/markdown_header_metadata.html>`_
|
||||
|
||||
|
||||
.. toctree::
|
||||
|
||||
@@ -1,457 +1,413 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# CodeTextSplitter\n",
|
||||
"\n",
|
||||
"CodeTextSplitter allows you to split your code with multiple language support. Import enum `Language` and specify the language. "
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# CodeTextSplitter\n",
|
||||
"\n",
|
||||
"CodeTextSplitter allows you to split your code with multiple language support. Import enum `Language` and specify the language. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.text_splitter import (\n",
|
||||
" RecursiveCharacterTextSplitter,\n",
|
||||
" Language,\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"['cpp',\n",
|
||||
" 'go',\n",
|
||||
" 'java',\n",
|
||||
" 'js',\n",
|
||||
" 'php',\n",
|
||||
" 'proto',\n",
|
||||
" 'python',\n",
|
||||
" 'rst',\n",
|
||||
" 'ruby',\n",
|
||||
" 'rust',\n",
|
||||
" 'scala',\n",
|
||||
" 'swift',\n",
|
||||
" 'markdown',\n",
|
||||
" 'latex',\n",
|
||||
" 'html']"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.text_splitter import (\n",
|
||||
" RecursiveCharacterTextSplitter,\n",
|
||||
" Language,\n",
|
||||
")"
|
||||
"execution_count": 2,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# Full list of support languages\n",
|
||||
"[e.value for e in Language]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"['\\nclass ', '\\ndef ', '\\n\\tdef ', '\\n\\n', '\\n', ' ', '']"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"['cpp',\n",
|
||||
" 'go',\n",
|
||||
" 'java',\n",
|
||||
" 'js',\n",
|
||||
" 'php',\n",
|
||||
" 'proto',\n",
|
||||
" 'python',\n",
|
||||
" 'rst',\n",
|
||||
" 'ruby',\n",
|
||||
" 'rust',\n",
|
||||
" 'scala',\n",
|
||||
" 'swift',\n",
|
||||
" 'markdown',\n",
|
||||
" 'latex',\n",
|
||||
" 'html',\n",
|
||||
" 'sol']"
|
||||
]
|
||||
},
|
||||
"execution_count": 2,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# Full list of support languages\n",
|
||||
"[e.value for e in Language]"
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# You can also see the separators used for a given language\n",
|
||||
"RecursiveCharacterTextSplitter.get_separators_for_language(Language.PYTHON)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Python\n",
|
||||
"\n",
|
||||
"Here's an example using the PythonTextSplitter"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[Document(page_content='def hello_world():\\n print(\"Hello, World!\")', metadata={}),\n",
|
||||
" Document(page_content='# Call the function\\nhello_world()', metadata={})]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"['\\nclass ', '\\ndef ', '\\n\\tdef ', '\\n\\n', '\\n', ' ', '']"
|
||||
]
|
||||
},
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# You can also see the separators used for a given language\n",
|
||||
"RecursiveCharacterTextSplitter.get_separators_for_language(Language.PYTHON)"
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"PYTHON_CODE = \"\"\"\n",
|
||||
"def hello_world():\n",
|
||||
" print(\"Hello, World!\")\n",
|
||||
"\n",
|
||||
"# Call the function\n",
|
||||
"hello_world()\n",
|
||||
"\"\"\"\n",
|
||||
"python_splitter = RecursiveCharacterTextSplitter.from_language(\n",
|
||||
" language=Language.PYTHON, chunk_size=50, chunk_overlap=0\n",
|
||||
")\n",
|
||||
"python_docs = python_splitter.create_documents([PYTHON_CODE])\n",
|
||||
"python_docs"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## JS\n",
|
||||
"Here's an example using the JS text splitter"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[Document(page_content='function helloWorld() {\\n console.log(\"Hello, World!\");\\n}', metadata={}),\n",
|
||||
" Document(page_content='// Call the function\\nhelloWorld();', metadata={})]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Python\n",
|
||||
"\n",
|
||||
"Here's an example using the PythonTextSplitter"
|
||||
"execution_count": 5,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"JS_CODE = \"\"\"\n",
|
||||
"function helloWorld() {\n",
|
||||
" console.log(\"Hello, World!\");\n",
|
||||
"}\n",
|
||||
"\n",
|
||||
"// Call the function\n",
|
||||
"helloWorld();\n",
|
||||
"\"\"\"\n",
|
||||
"\n",
|
||||
"js_splitter = RecursiveCharacterTextSplitter.from_language(\n",
|
||||
" language=Language.JS, chunk_size=60, chunk_overlap=0\n",
|
||||
")\n",
|
||||
"js_docs = js_splitter.create_documents([JS_CODE])\n",
|
||||
"js_docs"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Markdown\n",
|
||||
"\n",
|
||||
"Here's an example using the Markdown text splitter."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"markdown_text = \"\"\"\n",
|
||||
"# 🦜️🔗 LangChain\n",
|
||||
"\n",
|
||||
"⚡ Building applications with LLMs through composability ⚡\n",
|
||||
"\n",
|
||||
"## Quick Install\n",
|
||||
"\n",
|
||||
"```bash\n",
|
||||
"# Hopefully this code block isn't split\n",
|
||||
"pip install langchain\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"As an open source project in a rapidly developing field, we are extremely open to contributions.\n",
|
||||
"\"\"\"\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[Document(page_content='# 🦜️🔗 LangChain', metadata={}),\n",
|
||||
" Document(page_content='⚡ Building applications with LLMs through composability ⚡', metadata={}),\n",
|
||||
" Document(page_content='## Quick Install', metadata={}),\n",
|
||||
" Document(page_content=\"```bash\\n# Hopefully this code block isn't split\", metadata={}),\n",
|
||||
" Document(page_content='pip install langchain', metadata={}),\n",
|
||||
" Document(page_content='```', metadata={}),\n",
|
||||
" Document(page_content='As an open source project in a rapidly developing field, we', metadata={}),\n",
|
||||
" Document(page_content='are extremely open to contributions.', metadata={})]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[Document(page_content='def hello_world():\\n print(\"Hello, World!\")', metadata={}),\n",
|
||||
" Document(page_content='# Call the function\\nhello_world()', metadata={})]"
|
||||
]
|
||||
},
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"PYTHON_CODE = \"\"\"\n",
|
||||
"def hello_world():\n",
|
||||
" print(\"Hello, World!\")\n",
|
||||
"\n",
|
||||
"# Call the function\n",
|
||||
"hello_world()\n",
|
||||
"\"\"\"\n",
|
||||
"python_splitter = RecursiveCharacterTextSplitter.from_language(\n",
|
||||
" language=Language.PYTHON, chunk_size=50, chunk_overlap=0\n",
|
||||
")\n",
|
||||
"python_docs = python_splitter.create_documents([PYTHON_CODE])\n",
|
||||
"python_docs"
|
||||
"execution_count": 8,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"md_splitter = RecursiveCharacterTextSplitter.from_language(\n",
|
||||
" language=Language.MARKDOWN, chunk_size=60, chunk_overlap=0\n",
|
||||
")\n",
|
||||
"md_docs = md_splitter.create_documents([markdown_text])\n",
|
||||
"md_docs"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Latex\n",
|
||||
"\n",
|
||||
"Here's an example on Latex text"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"latex_text = \"\"\"\n",
|
||||
"\\documentclass{article}\n",
|
||||
"\n",
|
||||
"\\begin{document}\n",
|
||||
"\n",
|
||||
"\\maketitle\n",
|
||||
"\n",
|
||||
"\\section{Introduction}\n",
|
||||
"Large language models (LLMs) are a type of machine learning model that can be trained on vast amounts of text data to generate human-like language. In recent years, LLMs have made significant advances in a variety of natural language processing tasks, including language translation, text generation, and sentiment analysis.\n",
|
||||
"\n",
|
||||
"\\subsection{History of LLMs}\n",
|
||||
"The earliest LLMs were developed in the 1980s and 1990s, but they were limited by the amount of data that could be processed and the computational power available at the time. In the past decade, however, advances in hardware and software have made it possible to train LLMs on massive datasets, leading to significant improvements in performance.\n",
|
||||
"\n",
|
||||
"\\subsection{Applications of LLMs}\n",
|
||||
"LLMs have many applications in industry, including chatbots, content creation, and virtual assistants. They can also be used in academia for research in linguistics, psychology, and computational linguistics.\n",
|
||||
"\n",
|
||||
"\\end{document}\n",
|
||||
"\"\"\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[Document(page_content='\\\\documentclass{article}\\n\\n\\x08egin{document}\\n\\n\\\\maketitle', metadata={}),\n",
|
||||
" Document(page_content='\\\\section{Introduction}', metadata={}),\n",
|
||||
" Document(page_content='Large language models (LLMs) are a type of machine learning', metadata={}),\n",
|
||||
" Document(page_content='model that can be trained on vast amounts of text data to', metadata={}),\n",
|
||||
" Document(page_content='generate human-like language. In recent years, LLMs have', metadata={}),\n",
|
||||
" Document(page_content='made significant advances in a variety of natural language', metadata={}),\n",
|
||||
" Document(page_content='processing tasks, including language translation, text', metadata={}),\n",
|
||||
" Document(page_content='generation, and sentiment analysis.', metadata={}),\n",
|
||||
" Document(page_content='\\\\subsection{History of LLMs}', metadata={}),\n",
|
||||
" Document(page_content='The earliest LLMs were developed in the 1980s and 1990s,', metadata={}),\n",
|
||||
" Document(page_content='but they were limited by the amount of data that could be', metadata={}),\n",
|
||||
" Document(page_content='processed and the computational power available at the', metadata={}),\n",
|
||||
" Document(page_content='time. In the past decade, however, advances in hardware and', metadata={}),\n",
|
||||
" Document(page_content='software have made it possible to train LLMs on massive', metadata={}),\n",
|
||||
" Document(page_content='datasets, leading to significant improvements in', metadata={}),\n",
|
||||
" Document(page_content='performance.', metadata={}),\n",
|
||||
" Document(page_content='\\\\subsection{Applications of LLMs}', metadata={}),\n",
|
||||
" Document(page_content='LLMs have many applications in industry, including', metadata={}),\n",
|
||||
" Document(page_content='chatbots, content creation, and virtual assistants. They', metadata={}),\n",
|
||||
" Document(page_content='can also be used in academia for research in linguistics,', metadata={}),\n",
|
||||
" Document(page_content='psychology, and computational linguistics.', metadata={}),\n",
|
||||
" Document(page_content='\\\\end{document}', metadata={})]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## JS\n",
|
||||
"Here's an example using the JS text splitter"
|
||||
"execution_count": 10,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"latex_splitter = RecursiveCharacterTextSplitter.from_language(\n",
|
||||
" language=Language.MARKDOWN, chunk_size=60, chunk_overlap=0\n",
|
||||
")\n",
|
||||
"latex_docs = latex_splitter.create_documents([latex_text])\n",
|
||||
"latex_docs"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## HTML\n",
|
||||
"\n",
|
||||
"Here's an example using an HTML text splitter"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"html_text = \"\"\"\n",
|
||||
"<!DOCTYPE html>\n",
|
||||
"<html>\n",
|
||||
" <head>\n",
|
||||
" <title>🦜️🔗 LangChain</title>\n",
|
||||
" <style>\n",
|
||||
" body {\n",
|
||||
" font-family: Arial, sans-serif;\n",
|
||||
" }\n",
|
||||
" h1 {\n",
|
||||
" color: darkblue;\n",
|
||||
" }\n",
|
||||
" </style>\n",
|
||||
" </head>\n",
|
||||
" <body>\n",
|
||||
" <div>\n",
|
||||
" <h1>🦜️🔗 LangChain</h1>\n",
|
||||
" <p>⚡ Building applications with LLMs through composability ⚡</p>\n",
|
||||
" </div>\n",
|
||||
" <div>\n",
|
||||
" As an open source project in a rapidly developing field, we are extremely open to contributions.\n",
|
||||
" </div>\n",
|
||||
" </body>\n",
|
||||
"</html>\n",
|
||||
"\"\"\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 12,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[Document(page_content='<!DOCTYPE html>\\n<html>\\n <head>', metadata={}),\n",
|
||||
" Document(page_content='<title>🦜️🔗 LangChain</title>\\n <style>', metadata={}),\n",
|
||||
" Document(page_content='body {', metadata={}),\n",
|
||||
" Document(page_content='font-family: Arial, sans-serif;', metadata={}),\n",
|
||||
" Document(page_content='}\\n h1 {', metadata={}),\n",
|
||||
" Document(page_content='color: darkblue;\\n }', metadata={}),\n",
|
||||
" Document(page_content='</style>\\n </head>\\n <body>\\n <div>', metadata={}),\n",
|
||||
" Document(page_content='<h1>🦜️🔗 LangChain</h1>', metadata={}),\n",
|
||||
" Document(page_content='<p>⚡ Building applications with LLMs through', metadata={}),\n",
|
||||
" Document(page_content='composability ⚡</p>', metadata={}),\n",
|
||||
" Document(page_content='</div>\\n <div>', metadata={}),\n",
|
||||
" Document(page_content='As an open source project in a rapidly', metadata={}),\n",
|
||||
" Document(page_content='developing field, we are extremely open to contributions.', metadata={}),\n",
|
||||
" Document(page_content='</div>\\n </body>\\n</html>', metadata={})]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[Document(page_content='function helloWorld() {\\n console.log(\"Hello, World!\");\\n}', metadata={}),\n",
|
||||
" Document(page_content='// Call the function\\nhelloWorld();', metadata={})]"
|
||||
]
|
||||
},
|
||||
"execution_count": 5,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"JS_CODE = \"\"\"\n",
|
||||
"function helloWorld() {\n",
|
||||
" console.log(\"Hello, World!\");\n",
|
||||
"}\n",
|
||||
"\n",
|
||||
"// Call the function\n",
|
||||
"helloWorld();\n",
|
||||
"\"\"\"\n",
|
||||
"\n",
|
||||
"js_splitter = RecursiveCharacterTextSplitter.from_language(\n",
|
||||
" language=Language.JS, chunk_size=60, chunk_overlap=0\n",
|
||||
")\n",
|
||||
"js_docs = js_splitter.create_documents([JS_CODE])\n",
|
||||
"js_docs"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Solidity\n",
|
||||
"Here's an example using the Solidity text splitter"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[Document(page_content='pragma solidity ^0.8.20;', metadata={}),\n",
|
||||
" Document(page_content='contract HelloWorld {\\n function add(uint a, uint b) pure public returns(uint) {\\n return a + b;\\n }\\n}', metadata={})]"
|
||||
]
|
||||
},
|
||||
"execution_count": 5,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"SOL_CODE = \"\"\"\n",
|
||||
"pragma solidity ^0.8.20;",
|
||||
"\n",
|
||||
"contract HelloWorld {\n",
|
||||
" function add(uint a, uint b) pure public returns(uint) {\n",
|
||||
" return a + b;\n",
|
||||
" }\n",
|
||||
"}\n",
|
||||
"\"\"\"\n",
|
||||
"\n",
|
||||
"sol_splitter = RecursiveCharacterTextSplitter.from_language(\n",
|
||||
" language=Language.SOL, chunk_size=128, chunk_overlap=0\n",
|
||||
")\n",
|
||||
"sol_docs = sol_splitter.create_documents([SOL_CODE])\n",
|
||||
"sol_docs"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Markdown\n",
|
||||
"\n",
|
||||
"Here's an example using the Markdown text splitter."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"markdown_text = \"\"\"\n",
|
||||
"# 🦜️🔗 LangChain\n",
|
||||
"\n",
|
||||
"⚡ Building applications with LLMs through composability ⚡\n",
|
||||
"\n",
|
||||
"## Quick Install\n",
|
||||
"\n",
|
||||
"```bash\n",
|
||||
"# Hopefully this code block isn't split\n",
|
||||
"pip install langchain\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"As an open source project in a rapidly developing field, we are extremely open to contributions.\n",
|
||||
"\"\"\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[Document(page_content='# 🦜️🔗 LangChain', metadata={}),\n",
|
||||
" Document(page_content='⚡ Building applications with LLMs through composability ⚡', metadata={}),\n",
|
||||
" Document(page_content='## Quick Install', metadata={}),\n",
|
||||
" Document(page_content=\"```bash\\n# Hopefully this code block isn't split\", metadata={}),\n",
|
||||
" Document(page_content='pip install langchain', metadata={}),\n",
|
||||
" Document(page_content='```', metadata={}),\n",
|
||||
" Document(page_content='As an open source project in a rapidly developing field, we', metadata={}),\n",
|
||||
" Document(page_content='are extremely open to contributions.', metadata={})]"
|
||||
]
|
||||
},
|
||||
"execution_count": 8,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"md_splitter = RecursiveCharacterTextSplitter.from_language(\n",
|
||||
" language=Language.MARKDOWN, chunk_size=60, chunk_overlap=0\n",
|
||||
")\n",
|
||||
"md_docs = md_splitter.create_documents([markdown_text])\n",
|
||||
"md_docs"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Latex\n",
|
||||
"\n",
|
||||
"Here's an example on Latex text"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"latex_text = \"\"\"\n",
|
||||
"\\documentclass{article}\n",
|
||||
"\n",
|
||||
"\\begin{document}\n",
|
||||
"\n",
|
||||
"\\maketitle\n",
|
||||
"\n",
|
||||
"\\section{Introduction}\n",
|
||||
"Large language models (LLMs) are a type of machine learning model that can be trained on vast amounts of text data to generate human-like language. In recent years, LLMs have made significant advances in a variety of natural language processing tasks, including language translation, text generation, and sentiment analysis.\n",
|
||||
"\n",
|
||||
"\\subsection{History of LLMs}\n",
|
||||
"The earliest LLMs were developed in the 1980s and 1990s, but they were limited by the amount of data that could be processed and the computational power available at the time. In the past decade, however, advances in hardware and software have made it possible to train LLMs on massive datasets, leading to significant improvements in performance.\n",
|
||||
"\n",
|
||||
"\\subsection{Applications of LLMs}\n",
|
||||
"LLMs have many applications in industry, including chatbots, content creation, and virtual assistants. They can also be used in academia for research in linguistics, psychology, and computational linguistics.\n",
|
||||
"\n",
|
||||
"\\end{document}\n",
|
||||
"\"\"\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[Document(page_content='\\\\documentclass{article}\\n\\n\\x08egin{document}\\n\\n\\\\maketitle', metadata={}),\n",
|
||||
" Document(page_content='\\\\section{Introduction}', metadata={}),\n",
|
||||
" Document(page_content='Large language models (LLMs) are a type of machine learning', metadata={}),\n",
|
||||
" Document(page_content='model that can be trained on vast amounts of text data to', metadata={}),\n",
|
||||
" Document(page_content='generate human-like language. In recent years, LLMs have', metadata={}),\n",
|
||||
" Document(page_content='made significant advances in a variety of natural language', metadata={}),\n",
|
||||
" Document(page_content='processing tasks, including language translation, text', metadata={}),\n",
|
||||
" Document(page_content='generation, and sentiment analysis.', metadata={}),\n",
|
||||
" Document(page_content='\\\\subsection{History of LLMs}', metadata={}),\n",
|
||||
" Document(page_content='The earliest LLMs were developed in the 1980s and 1990s,', metadata={}),\n",
|
||||
" Document(page_content='but they were limited by the amount of data that could be', metadata={}),\n",
|
||||
" Document(page_content='processed and the computational power available at the', metadata={}),\n",
|
||||
" Document(page_content='time. In the past decade, however, advances in hardware and', metadata={}),\n",
|
||||
" Document(page_content='software have made it possible to train LLMs on massive', metadata={}),\n",
|
||||
" Document(page_content='datasets, leading to significant improvements in', metadata={}),\n",
|
||||
" Document(page_content='performance.', metadata={}),\n",
|
||||
" Document(page_content='\\\\subsection{Applications of LLMs}', metadata={}),\n",
|
||||
" Document(page_content='LLMs have many applications in industry, including', metadata={}),\n",
|
||||
" Document(page_content='chatbots, content creation, and virtual assistants. They', metadata={}),\n",
|
||||
" Document(page_content='can also be used in academia for research in linguistics,', metadata={}),\n",
|
||||
" Document(page_content='psychology, and computational linguistics.', metadata={}),\n",
|
||||
" Document(page_content='\\\\end{document}', metadata={})]"
|
||||
]
|
||||
},
|
||||
"execution_count": 10,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"latex_splitter = RecursiveCharacterTextSplitter.from_language(\n",
|
||||
" language=Language.MARKDOWN, chunk_size=60, chunk_overlap=0\n",
|
||||
")\n",
|
||||
"latex_docs = latex_splitter.create_documents([latex_text])\n",
|
||||
"latex_docs"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## HTML\n",
|
||||
"\n",
|
||||
"Here's an example using an HTML text splitter"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"html_text = \"\"\"\n",
|
||||
"<!DOCTYPE html>\n",
|
||||
"<html>\n",
|
||||
" <head>\n",
|
||||
" <title>🦜️🔗 LangChain</title>\n",
|
||||
" <style>\n",
|
||||
" body {\n",
|
||||
" font-family: Arial, sans-serif;\n",
|
||||
" }\n",
|
||||
" h1 {\n",
|
||||
" color: darkblue;\n",
|
||||
" }\n",
|
||||
" </style>\n",
|
||||
" </head>\n",
|
||||
" <body>\n",
|
||||
" <div>\n",
|
||||
" <h1>🦜️🔗 LangChain</h1>\n",
|
||||
" <p>⚡ Building applications with LLMs through composability ⚡</p>\n",
|
||||
" </div>\n",
|
||||
" <div>\n",
|
||||
" As an open source project in a rapidly developing field, we are extremely open to contributions.\n",
|
||||
" </div>\n",
|
||||
" </body>\n",
|
||||
"</html>\n",
|
||||
"\"\"\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 12,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[Document(page_content='<!DOCTYPE html>\\n<html>\\n <head>', metadata={}),\n",
|
||||
" Document(page_content='<title>🦜️🔗 LangChain</title>\\n <style>', metadata={}),\n",
|
||||
" Document(page_content='body {', metadata={}),\n",
|
||||
" Document(page_content='font-family: Arial, sans-serif;', metadata={}),\n",
|
||||
" Document(page_content='}\\n h1 {', metadata={}),\n",
|
||||
" Document(page_content='color: darkblue;\\n }', metadata={}),\n",
|
||||
" Document(page_content='</style>\\n </head>\\n <body>\\n <div>', metadata={}),\n",
|
||||
" Document(page_content='<h1>🦜️🔗 LangChain</h1>', metadata={}),\n",
|
||||
" Document(page_content='<p>⚡ Building applications with LLMs through', metadata={}),\n",
|
||||
" Document(page_content='composability ⚡</p>', metadata={}),\n",
|
||||
" Document(page_content='</div>\\n <div>', metadata={}),\n",
|
||||
" Document(page_content='As an open source project in a rapidly', metadata={}),\n",
|
||||
" Document(page_content='developing field, we are extremely open to contributions.', metadata={}),\n",
|
||||
" Document(page_content='</div>\\n </body>\\n</html>', metadata={})]"
|
||||
]
|
||||
},
|
||||
"execution_count": 12,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"html_splitter = RecursiveCharacterTextSplitter.from_language(\n",
|
||||
" language=Language.MARKDOWN, chunk_size=60, chunk_overlap=0\n",
|
||||
")\n",
|
||||
"html_docs = html_splitter.create_documents([html_text])\n",
|
||||
"html_docs"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.1"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
||||
"execution_count": 12,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"html_splitter = RecursiveCharacterTextSplitter.from_language(\n",
|
||||
" language=Language.MARKDOWN, chunk_size=60, chunk_overlap=0\n",
|
||||
")\n",
|
||||
"html_docs = html_splitter.create_documents([html_text])\n",
|
||||
"html_docs"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.1"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
||||
|
||||
@@ -1,145 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "70e9b619",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# MarkdownHeaderTextSplitter\n",
|
||||
"\n",
|
||||
"This splits a markdown file by a specified set of headers. For example, if we want to split this markdown:\n",
|
||||
"```\n",
|
||||
"md = '# Foo\\n\\n ## Bar\\n\\nHi this is Jim \\nHi this is Joe\\n\\n ## Baz\\n\\n Hi this is Molly' \n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"Headers to split on:\n",
|
||||
"```\n",
|
||||
"[(\"#\", \"Header 1\"),(\"##\", \"Header 2\")]\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"Expected output:\n",
|
||||
"```\n",
|
||||
"{'content': 'Hi this is Jim \\nHi this is Joe', 'metadata': {'Header 1': 'Foo', 'Header 2': 'Bar'}}\n",
|
||||
"{'content': 'Hi this is Molly', 'metadata': {'Header 1': 'Foo', 'Header 2': 'Baz'}}\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"Optionally, this also includes `return_each_line` in case a user want to perform other types of aggregation. \n",
|
||||
"\n",
|
||||
"If `return_each_line=True`, each line and associated header metadata are simply returned. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "19c044f0",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.text_splitter import MarkdownHeaderTextSplitter"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"id": "2ae3649b",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"{'content': 'Hi this is Jim \\nHi this is Joe', 'metadata': {'Header 1': 'Foo', 'Header 2': 'Bar'}}\n",
|
||||
"{'content': 'Hi this is Lance', 'metadata': {'Header 1': 'Foo', 'Header 2': 'Bar', 'Header 3': 'Boo'}}\n",
|
||||
"{'content': 'Hi this is Molly', 'metadata': {'Header 1': 'Foo', 'Header 2': 'Baz'}}\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"markdown_document = '# Foo\\n\\n ## Bar\\n\\nHi this is Jim\\n\\nHi this is Joe\\n\\n ### Boo \\n\\n Hi this is Lance \\n\\n ## Baz\\n\\n Hi this is Molly' \n",
|
||||
" \n",
|
||||
"headers_to_split_on = [\n",
|
||||
" (\"#\", \"Header 1\"),\n",
|
||||
" (\"##\", \"Header 2\"),\n",
|
||||
" (\"###\", \"Header 3\"),\n",
|
||||
"]\n",
|
||||
"\n",
|
||||
"markdown_splitter = MarkdownHeaderTextSplitter(headers_to_split_on=headers_to_split_on)\n",
|
||||
"splits = markdown_splitter.split_text(markdown_document)\n",
|
||||
"for split in splits:\n",
|
||||
" print(split)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "2a32026a",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Here's an example on a larger file with `return_each_line=True` passed, allowing each line to be examined."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"id": "8af8f9a2",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"{'content': 'Markdown[9] is a lightweight markup language for creating formatted text using a plain-text editor. John Gruber created Markdown in 2004 as a markup language that is appealing to human readers in its source code form.[9]', 'metadata': {'Header 1': 'Intro', 'Header 2': 'History'}}\n",
|
||||
"{'content': 'Markdown is widely used in blogging, instant messaging, online forums, collaborative software, documentation pages, and readme files.', 'metadata': {'Header 1': 'Intro', 'Header 2': 'History'}}\n",
|
||||
"{'content': 'As Markdown popularity grew rapidly, many Markdown implementations appeared, driven mostly by the need for', 'metadata': {'Header 1': 'Intro', 'Header 2': 'Rise and divergence'}}\n",
|
||||
"{'content': 'additional features such as tables, footnotes, definition lists,[note 1] and Markdown inside HTML blocks.', 'metadata': {'Header 1': 'Intro', 'Header 2': 'Rise and divergence'}}\n",
|
||||
"{'content': 'From 2012, a group of people, including Jeff Atwood and John MacFarlane, launched what Atwood characterised as a standardisation effort.', 'metadata': {'Header 1': 'Intro', 'Header 2': 'Rise and divergence', 'Header 4': 'Standardization'}}\n",
|
||||
"{'content': 'Implementations of Markdown are available for over a dozen programming languages.', 'metadata': {'Header 1': 'Intro', 'Header 2': 'Implementations'}}\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"markdown_document = '# Intro \\n\\n ## History \\n\\n Markdown[9] is a lightweight markup language for creating formatted text using a plain-text editor. John Gruber created Markdown in 2004 as a markup language that is appealing to human readers in its source code form.[9] \\n\\n Markdown is widely used in blogging, instant messaging, online forums, collaborative software, documentation pages, and readme files. \\n\\n ## Rise and divergence \\n\\n As Markdown popularity grew rapidly, many Markdown implementations appeared, driven mostly by the need for \\n\\n additional features such as tables, footnotes, definition lists,[note 1] and Markdown inside HTML blocks. \\n\\n #### Standardization \\n\\n From 2012, a group of people, including Jeff Atwood and John MacFarlane, launched what Atwood characterised as a standardisation effort. \\n\\n ## Implementations \\n\\n Implementations of Markdown are available for over a dozen programming languages.'\n",
|
||||
" \n",
|
||||
"headers_to_split_on = [\n",
|
||||
" (\"#\", \"Header 1\"),\n",
|
||||
" (\"##\", \"Header 2\"),\n",
|
||||
" (\"###\", \"Header 3\"),\n",
|
||||
" (\"####\", \"Header 4\"),\n",
|
||||
"]\n",
|
||||
" \n",
|
||||
"markdown_splitter = MarkdownHeaderTextSplitter(headers_to_split_on=headers_to_split_on,return_each_line=True)\n",
|
||||
"splits = markdown_splitter.split_text(markdown_document)\n",
|
||||
"for line in splits:\n",
|
||||
" print(line)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "987183f2",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.16"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -1,131 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "73dbcdb9",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# SentenceTransformersTokenTextSplitter\n",
|
||||
"\n",
|
||||
"This notebook demonstrates how to use the `SentenceTransformersTokenTextSplitter` text splitter.\n",
|
||||
"\n",
|
||||
"Language models have a token limit. You should not exceed the token limit. When you split your text into chunks it is therefore a good idea to count the number of tokens. There are many tokenizers. When you count tokens in your text you should use the same tokenizer as used in the language model. \n",
|
||||
"\n",
|
||||
"The `SentenceTransformersTokenTextSplitter` is a specialized text splitter for use with the sentence-transformer models. The default behaviour is to split the text into chunks that fit the token window of the sentence transformer model that you would like to use."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "9dd5419e",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.text_splitter import SentenceTransformersTokenTextSplitter"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "b43e5d54",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"splitter = SentenceTransformersTokenTextSplitter(chunk_overlap=0)\n",
|
||||
"text = \"Lorem \""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "1df84cb4",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"2\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"count_start_and_stop_tokens = 2\n",
|
||||
"text_token_count = splitter.count_tokens(text=text) - count_start_and_stop_tokens\n",
|
||||
"print(text_token_count)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "d7ad2213",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"tokens in text to split: 514\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"token_multiplier = splitter.maximum_tokens_per_chunk // text_token_count + 1\n",
|
||||
"\n",
|
||||
"# `text_to_split` does not fit in a single chunk\n",
|
||||
"text_to_split = text * token_multiplier\n",
|
||||
"\n",
|
||||
"print(f\"tokens in text to split: {splitter.count_tokens(text=text_to_split)}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "818aea04",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"lorem\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"text_chunks = splitter.split_text(text=text_to_split)\n",
|
||||
"\n",
|
||||
"print(text_chunks[1])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "e9ba4f23",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.1"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -12,8 +12,7 @@
|
||||
"\n",
|
||||
"- `length_function`: how the length of chunks is calculated. Defaults to just counting number of characters, but it's pretty common to pass a token counter here.\n",
|
||||
"- `chunk_size`: the maximum size of your chunks (as measured by the length function).\n",
|
||||
"- `chunk_overlap`: the maximum overlap between chunks. It can be nice to have some overlap to maintain some continuity between chunks (eg do a sliding window).\n",
|
||||
"- `add_start_index` : wether to include the starting position of each chunk within the original document in the metadata. "
|
||||
"- `chunk_overlap`: the maximum overlap between chunks. It can be nice to have some overlap to maintain some continuity between chunks (eg do a sliding window)."
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -50,7 +49,6 @@
|
||||
" chunk_size = 100,\n",
|
||||
" chunk_overlap = 20,\n",
|
||||
" length_function = len,\n",
|
||||
" add_start_index = True,\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
@@ -64,8 +62,8 @@
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"page_content='Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and' metadata={'start_index': 0}\n",
|
||||
"page_content='of Congress and the Cabinet. Justices of the Supreme Court. My fellow Americans.' metadata={'start_index': 82}\n"
|
||||
"page_content='Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and' lookup_str='' metadata={} lookup_index=0\n",
|
||||
"page_content='of Congress and the Cabinet. Justices of the Supreme Court. My fellow Americans.' lookup_str='' metadata={} lookup_index=0\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
@@ -92,7 +90,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.16"
|
||||
"version": "3.9.1"
|
||||
},
|
||||
"vscode": {
|
||||
"interpreter": {
|
||||
|
||||
@@ -1,194 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "833c4789",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# AwaDB\n",
|
||||
"[AwaDB](https://github.com/awa-ai/awadb) is an AI Native database for the search and storage of embedding vectors used by LLM Applications.\n",
|
||||
"This notebook shows how to use functionality related to the AwaDB."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "252930ea",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"!pip install awadb"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "f2b71a47",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.text_splitter import CharacterTextSplitter\n",
|
||||
"from langchain.vectorstores import AwaDB\n",
|
||||
"from langchain.document_loaders import TextLoader"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "49be0bac",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"loader = TextLoader('../../../state_of_the_union.txt')\n",
|
||||
"documents = loader.load()\n",
|
||||
"text_splitter = CharacterTextSplitter(chunk_size= 100, chunk_overlap=0)\n",
|
||||
"docs = text_splitter.split_documents(documents)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "18714278",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"db = AwaDB.from_documents(docs)\n",
|
||||
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
|
||||
"docs = db.similarity_search(query)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "62b7a4c5",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"print(docs[0].page_content)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "a9b4be48",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "87fec6b5",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Similarity search with score"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "17231924",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The returned distance score is between 0-1. 0 is dissimilar, 1 is the most similar"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "f40ddae1",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"docs = db.similarity_search_with_score(query)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "f0045583",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"print(docs[0])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "8c2da99d",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"(Document(page_content='And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.', metadata={'source': '../../../state_of_the_union.txt'}), 0.561813814013747)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "0b49fb59",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Restore the table created and added data before"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "1bfa6e25",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"AwaDB automatically persists added document data"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "2a0f3b35",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"If you can restore the table you created and added before, you can just do this as below:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "1fd4b5b0",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"awadb_client = awadb.Client()\n",
|
||||
"ret = awadb_client.Load('langchain_awadb')\n",
|
||||
"if ret : print('awadb load table success')\n",
|
||||
"else:\n",
|
||||
" print('awadb load table failed')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "5ae9a9dd",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"awadb load table success"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.1"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -1,245 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Azure Cognitive Search"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Install Azure Cognitive Search SDK"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"!pip install --index-url=https://pkgs.dev.azure.com/azure-sdk/public/_packaging/azure-sdk-for-python/pypi/simple/ azure-search-documents==11.4.0a20230509004\n",
|
||||
"!pip install azure-identity"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Import required libraries"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import os, json\n",
|
||||
"import openai\n",
|
||||
"from dotenv import load_dotenv\n",
|
||||
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
|
||||
"from langchain.schema import BaseRetriever\n",
|
||||
"from langchain.vectorstores.azuresearch import AzureSearch"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Configure OpenAI settings\n",
|
||||
"Configure the OpenAI settings to use Azure OpenAI or OpenAI"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Load environment variables from a .env file using load_dotenv():\n",
|
||||
"load_dotenv()\n",
|
||||
"\n",
|
||||
"openai.api_type = \"azure\"\n",
|
||||
"openai.api_base = \"YOUR_OPENAI_ENDPOINT\"\n",
|
||||
"openai.api_version = \"2023-05-15\"\n",
|
||||
"openai.api_key = \"YOUR_OPENAI_API_KEY\"\n",
|
||||
"model: str = \"text-embedding-ada-002\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Configure vector store settings\n",
|
||||
" \n",
|
||||
"Set up the vector store settings using environment variables:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"vector_store_address: str = 'YOUR_AZURE_SEARCH_ENDPOINT'\n",
|
||||
"vector_store_password: str = 'YOUR_AZURE_SEARCH_ADMIN_KEY'\n",
|
||||
"index_name: str = \"langchain-vector-demo\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Create embeddings and vector store instances\n",
|
||||
" \n",
|
||||
"Create instances of the OpenAIEmbeddings and AzureSearch classes:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"embeddings: OpenAIEmbeddings = OpenAIEmbeddings(model=model, chunk_size=1) \n",
|
||||
"vector_store: AzureSearch = AzureSearch(azure_search_endpoint=vector_store_address, \n",
|
||||
" azure_search_key=vector_store_password, \n",
|
||||
" index_name=index_name, \n",
|
||||
" embedding_function=embeddings.embed_query) \n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Insert text and embeddings into vector store\n",
|
||||
" \n",
|
||||
"Add texts and metadata from the JSON data to the vector store:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 12,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.document_loaders import TextLoader\n",
|
||||
"from langchain.text_splitter import CharacterTextSplitter\n",
|
||||
"loader = TextLoader('../../../state_of_the_union.txt', encoding='utf-8')\n",
|
||||
"\n",
|
||||
"documents = loader.load()\n",
|
||||
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
|
||||
"docs = text_splitter.split_documents(documents)\n",
|
||||
"\n",
|
||||
"vector_store.add_documents(documents=docs)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Perform a vector similarity search\n",
|
||||
" \n",
|
||||
"Execute a pure vector similarity search using the similarity_search() method:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 13,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \n",
|
||||
"\n",
|
||||
"Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n",
|
||||
"\n",
|
||||
"One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n",
|
||||
"\n",
|
||||
"And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# Perform a similarity search\n",
|
||||
"docs = vector_store.similarity_search(query=\"What did the president say about Ketanji Brown Jackson\", k=3, search_type='similarity')\n",
|
||||
"print(docs[0].page_content)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Perform a Hybrid Search\n",
|
||||
"\n",
|
||||
"Execute hybrid search using the hybrid_search() method:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 14,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \n",
|
||||
"\n",
|
||||
"Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n",
|
||||
"\n",
|
||||
"One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n",
|
||||
"\n",
|
||||
"And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# Perform a hybrid search \n",
|
||||
"docs = vector_store.similarity_search(query=\"What did the president say about Ketanji Brown Jackson\", k=3)\n",
|
||||
"print(docs[0].page_content)"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3.9.13 ('.venv': venv)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.3"
|
||||
},
|
||||
"orig_nbformat": 4,
|
||||
"vscode": {
|
||||
"interpreter": {
|
||||
"hash": "645053d6307d413a1a75681b5ebb6449bb2babba4bcb0bf65a1ddc3dbefb108a"
|
||||
}
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
||||
@@ -151,15 +151,6 @@
|
||||
"## Similarity search with score"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "346347d7",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The returned distance score is cosine distance. Therefore, a lower score is better."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
|
||||
@@ -1,399 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "683953b3",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# ClickHouse Vector Search\n",
|
||||
"\n",
|
||||
"> [ClickHouse](https://clickhouse.com/) is the fastest and most resource efficient open-source database for real-time apps and analytics with full SQL support and a wide range of functions to assist users in writing analytical queries. Lately added data structures and distance search functions (like `L2Distance`) as well as [approximate nearest neighbor search indexes](https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/annindexes) enable ClickHouse to be used as a high performance and scalable vector database to store and search vectors with SQL.\n",
|
||||
"\n",
|
||||
"This notebook shows how to use functionality related to the `ClickHouse` vector search."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "43ead5d5-2c1f-4dce-a69a-cb00e4f9d6f0",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Setting up envrionments"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "b2c434bc",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Setting up local clickhouse server with docker (optional)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "249a7751",
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2023-06-03T08:43:43.035606Z",
|
||||
"start_time": "2023-06-03T08:43:42.618531Z"
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"! docker run -d -p 8123:8123 -p9000:9000 --name langchain-clickhouse-server --ulimit nofile=262144:262144 clickhouse/clickhouse-server:23.4.2.11"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "7bd3c1c0",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Setup up clickhouse client driver"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "9d614bf8",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"!pip install clickhouse-connect"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "15a1d477-9cdb-4d82-b019-96951ecb2b72",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We want to use OpenAIEmbeddings so we have to get the OpenAI API Key."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "91003ea5-0c8c-436c-a5de-aaeaeef2f458",
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2023-06-03T08:49:35.383673Z",
|
||||
"start_time": "2023-06-03T08:49:33.984547Z"
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"import getpass\n",
|
||||
"\n",
|
||||
"if not os.environ['OPENAI_API_KEY']:\n",
|
||||
" os.environ['OPENAI_API_KEY'] = getpass.getpass('OpenAI API Key:')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "aac9563e",
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2023-06-03T08:33:31.554934Z",
|
||||
"start_time": "2023-06-03T08:33:31.549590Z"
|
||||
},
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
|
||||
"from langchain.text_splitter import CharacterTextSplitter\n",
|
||||
"from langchain.vectorstores import Clickhouse, ClickhouseSettings"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "a3c3999a",
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2023-06-03T08:33:32.527387Z",
|
||||
"start_time": "2023-06-03T08:33:32.501312Z"
|
||||
},
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.document_loaders import TextLoader\n",
|
||||
"loader = TextLoader('../../../state_of_the_union.txt')\n",
|
||||
"documents = loader.load()\n",
|
||||
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
|
||||
"docs = text_splitter.split_documents(documents)\n",
|
||||
"\n",
|
||||
"embeddings = OpenAIEmbeddings()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "6e104aee",
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2023-06-03T08:33:35.503823Z",
|
||||
"start_time": "2023-06-03T08:33:33.745832Z"
|
||||
}
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Inserting data...: 100%|██████████| 42/42 [00:00<00:00, 2801.49it/s]\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"for d in docs:\n",
|
||||
" d.metadata = {'some': 'metadata'}\n",
|
||||
"settings = ClickhouseSettings(table=\"clickhouse_vector_search_example\")\n",
|
||||
"docsearch = Clickhouse.from_documents(docs, embeddings, config=settings)\n",
|
||||
"\n",
|
||||
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
|
||||
"docs = docsearch.similarity_search(query)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "9c608226",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \n",
|
||||
"\n",
|
||||
"Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n",
|
||||
"\n",
|
||||
"One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n",
|
||||
"\n",
|
||||
"And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"print(docs[0].page_content)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "e3a8b105",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Get connection info and data schema"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"id": "69996818",
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2023-06-03T08:28:58.252991Z",
|
||||
"start_time": "2023-06-03T08:28:58.197560Z"
|
||||
},
|
||||
"scrolled": false
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\u001b[92m\u001b[1mdefault.clickhouse_vector_search_example @ localhost:8123\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1musername: None\u001b[0m\n",
|
||||
"\n",
|
||||
"Table Schema:\n",
|
||||
"---------------------------------------------------\n",
|
||||
"|\u001b[94mid \u001b[0m|\u001b[96mNullable(String) \u001b[0m|\n",
|
||||
"|\u001b[94mdocument \u001b[0m|\u001b[96mNullable(String) \u001b[0m|\n",
|
||||
"|\u001b[94membedding \u001b[0m|\u001b[96mArray(Float32) \u001b[0m|\n",
|
||||
"|\u001b[94mmetadata \u001b[0m|\u001b[96mObject('json') \u001b[0m|\n",
|
||||
"|\u001b[94muuid \u001b[0m|\u001b[96mUUID \u001b[0m|\n",
|
||||
"---------------------------------------------------\n",
|
||||
"\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"print(str(docsearch))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "324ac147",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Clickhouse table schema"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "b5bd7c5b",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"> Clickhouse table will be automatically created if not exist by default. Advanced users could pre-create the table with optimized settings. For distributed Clickhouse cluster with sharding, table engine should be configured as `Distributed`."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"id": "54f4f561",
|
||||
"metadata": {
|
||||
"scrolled": false
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Clickhouse Table DDL:\n",
|
||||
"\n",
|
||||
"CREATE TABLE IF NOT EXISTS default.clickhouse_vector_search_example(\n",
|
||||
" id Nullable(String),\n",
|
||||
" document Nullable(String),\n",
|
||||
" embedding Array(Float32),\n",
|
||||
" metadata JSON,\n",
|
||||
" uuid UUID DEFAULT generateUUIDv4(),\n",
|
||||
" CONSTRAINT cons_vec_len CHECK length(embedding) = 1536,\n",
|
||||
" INDEX vec_idx embedding TYPE annoy(100,'L2Distance') GRANULARITY 1000\n",
|
||||
") ENGINE = MergeTree ORDER BY uuid SETTINGS index_granularity = 8192\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"print(f\"Clickhouse Table DDL:\\n\\n{docsearch.schema}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "f59360c0",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Filtering\n",
|
||||
"\n",
|
||||
"You can have direct access to ClickHouse SQL where statement. You can write `WHERE` clause following standard SQL.\n",
|
||||
"\n",
|
||||
"**NOTE**: Please be aware of SQL injection, this interface must not be directly called by end-user.\n",
|
||||
"\n",
|
||||
"If you custimized your `column_map` under your setting, you search with filter like this:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"id": "232055f6",
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2023-06-03T08:29:36.680805Z",
|
||||
"start_time": "2023-06-03T08:29:34.963676Z"
|
||||
}
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Inserting data...: 100%|██████████| 42/42 [00:00<00:00, 6939.56it/s]\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain.vectorstores import Clickhouse, ClickhouseSettings\n",
|
||||
"from langchain.document_loaders import TextLoader\n",
|
||||
"\n",
|
||||
"loader = TextLoader('../../../state_of_the_union.txt')\n",
|
||||
"documents = loader.load()\n",
|
||||
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
|
||||
"docs = text_splitter.split_documents(documents)\n",
|
||||
"\n",
|
||||
"embeddings = OpenAIEmbeddings()\n",
|
||||
"\n",
|
||||
"for i, d in enumerate(docs):\n",
|
||||
" d.metadata = {'doc_id': i}\n",
|
||||
"\n",
|
||||
"docsearch = Clickhouse.from_documents(docs, embeddings)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"id": "ddbcee77",
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2023-06-03T08:29:43.487436Z",
|
||||
"start_time": "2023-06-03T08:29:43.040831Z"
|
||||
}
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"0.6779101415357189 {'doc_id': 0} Madam Speaker, Madam...\n",
|
||||
"0.6997970363474885 {'doc_id': 8} And so many families...\n",
|
||||
"0.7044504914336727 {'doc_id': 1} Groups of citizens b...\n",
|
||||
"0.7053558702165094 {'doc_id': 6} And I’m taking robus...\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"meta = docsearch.metadata_column\n",
|
||||
"output = docsearch.similarity_search_with_relevance_scores('What did the president say about Ketanji Brown Jackson?', \n",
|
||||
" k=4, where_str=f\"{meta}.doc_id<10\")\n",
|
||||
"for d, dist in output:\n",
|
||||
" print(dist, d.metadata, d.page_content[:20] + '...')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "a359ed74",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Deleting your data"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"id": "fb6a9d36",
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
"end_time": "2023-06-03T08:30:24.822384Z",
|
||||
"start_time": "2023-06-03T08:30:24.798571Z"
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"docsearch.drop()"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.2"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -1,7 +1,6 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "2ce41f46-5711-4311-b04d-2fe233ac5b1b",
|
||||
"metadata": {},
|
||||
@@ -14,7 +13,6 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "7ee37d28",
|
||||
"metadata": {},
|
||||
@@ -57,7 +55,6 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "8dbb6de2",
|
||||
"metadata": {
|
||||
@@ -101,7 +98,6 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "ed6f905b-4853-4a44-9730-614aa8e22b78",
|
||||
"metadata": {},
|
||||
@@ -149,7 +145,6 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "3febb987-e903-416f-af26-6897d84c8d61",
|
||||
"metadata": {},
|
||||
@@ -157,15 +152,6 @@
|
||||
"### Similarity search with score"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "bb1df11a",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The returned distance score is cosine distance. Therefore, a lower score is better."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
|
||||
@@ -1,7 +1,6 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "a3afefb0-7e99-4912-a222-c6b186da11af",
|
||||
"metadata": {},
|
||||
@@ -14,7 +13,6 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "5031a3ec",
|
||||
"metadata": {},
|
||||
@@ -56,7 +54,6 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "6e57a389-f637-4b8f-9ab2-759ae7485f78",
|
||||
"metadata": {},
|
||||
@@ -98,7 +95,6 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "efbb6684-3846-4332-a624-ddd4d75844c1",
|
||||
"metadata": {},
|
||||
@@ -146,7 +142,6 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "43896697-f99e-47b6-9117-47a25e9afa9c",
|
||||
"metadata": {},
|
||||
@@ -154,15 +149,6 @@
|
||||
"### Similarity search with score"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "414a9bc9",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The returned distance score is cosine distance. Therefore, a lower score is better."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
|
||||
@@ -1,7 +1,6 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "683953b3",
|
||||
"metadata": {},
|
||||
@@ -30,7 +29,6 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "38237514-b3fa-44a4-9cff-30cd6bf50073",
|
||||
"metadata": {},
|
||||
@@ -40,12 +38,20 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"execution_count": 10,
|
||||
"id": "47f9b495-88f1-4286-8d5d-1416103931a7",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdin",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"OpenAI API Key: ········\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"import getpass\n",
|
||||
@@ -58,7 +64,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"execution_count": 2,
|
||||
"id": "aac9563e",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
@@ -73,7 +79,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"execution_count": 7,
|
||||
"id": "a3c3999a",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
@@ -91,7 +97,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"execution_count": 8,
|
||||
"id": "5eabdb75",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
@@ -106,7 +112,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 12,
|
||||
"execution_count": 9,
|
||||
"id": "4b172de8",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
@@ -131,18 +137,17 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "f13473b5",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Similarity Search with score\n",
|
||||
"There are some FAISS specific methods. One of them is `similarity_search_with_score`, which allows you to return not only the documents but also the distance score of the query to them. The returned distance score is L2 distance. Therefore, a lower score is better."
|
||||
"There are some FAISS specific methods. One of them is `similarity_search_with_score`, which allows you to return not only the documents but also the similarity score of the query to them."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 13,
|
||||
"execution_count": 6,
|
||||
"id": "186ee1d8",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -152,18 +157,18 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 14,
|
||||
"execution_count": 7,
|
||||
"id": "284e04b5",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"(Document(page_content='Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \\n\\nTonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \\n\\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \\n\\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.', metadata={'source': '../../../state_of_the_union.txt'}),\n",
|
||||
" 0.36913747)"
|
||||
"(Document(page_content='In state after state, new laws have been passed, not only to suppress the vote, but to subvert entire elections. \\n\\nWe cannot let this happen. \\n\\nTonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \\n\\nTonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \\n\\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \\n\\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.', lookup_str='', metadata={'source': '../../state_of_the_union.txt'}, lookup_index=0),\n",
|
||||
" 0.3914415)"
|
||||
]
|
||||
},
|
||||
"execution_count": 14,
|
||||
"execution_count": 7,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
@@ -173,7 +178,6 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "f34420cf",
|
||||
"metadata": {},
|
||||
@@ -183,7 +187,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 15,
|
||||
"execution_count": 9,
|
||||
"id": "b558ebb7",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -193,7 +197,6 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "31bda7fd",
|
||||
"metadata": {},
|
||||
@@ -204,7 +207,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 16,
|
||||
"execution_count": 10,
|
||||
"id": "428a6816",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -214,7 +217,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 17,
|
||||
"execution_count": 11,
|
||||
"id": "56d1841c",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -224,7 +227,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 18,
|
||||
"execution_count": 12,
|
||||
"id": "39055525",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -234,17 +237,17 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 19,
|
||||
"execution_count": 13,
|
||||
"id": "98378c4e",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"Document(page_content='Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \\n\\nTonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \\n\\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \\n\\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.', metadata={'source': '../../../state_of_the_union.txt'})"
|
||||
"Document(page_content='In state after state, new laws have been passed, not only to suppress the vote, but to subvert entire elections. \\n\\nWe cannot let this happen. \\n\\nTonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \\n\\nTonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \\n\\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \\n\\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.', lookup_str='', metadata={'source': '../../state_of_the_union.txt'}, lookup_index=0)"
|
||||
]
|
||||
},
|
||||
"execution_count": 19,
|
||||
"execution_count": 13,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
@@ -254,7 +257,6 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "57da60d4",
|
||||
"metadata": {},
|
||||
@@ -265,7 +267,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 20,
|
||||
"execution_count": 6,
|
||||
"id": "6dfd2b78",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -276,17 +278,17 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 21,
|
||||
"execution_count": 8,
|
||||
"id": "29960da7",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"{'068c473b-d420-487a-806b-fb0ccea7f711': Document(page_content='foo', metadata={})}"
|
||||
"{'e0b74348-6c93-4893-8764-943139ec1d17': Document(page_content='foo', lookup_str='', metadata={}, lookup_index=0)}"
|
||||
]
|
||||
},
|
||||
"execution_count": 21,
|
||||
"execution_count": 8,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
@@ -297,17 +299,17 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 22,
|
||||
"execution_count": 9,
|
||||
"id": "83392605",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"{'807e0c63-13f6-4070-9774-5c6f0fbb9866': Document(page_content='bar', metadata={})}"
|
||||
"{'bdc50ae3-a1bb-4678-9260-1b0979578f40': Document(page_content='bar', lookup_str='', metadata={}, lookup_index=0)}"
|
||||
]
|
||||
},
|
||||
"execution_count": 22,
|
||||
"execution_count": 9,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
@@ -318,7 +320,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 23,
|
||||
"execution_count": 10,
|
||||
"id": "a3fcc1c7",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -328,18 +330,18 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 24,
|
||||
"execution_count": 11,
|
||||
"id": "41c51f89",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"{'068c473b-d420-487a-806b-fb0ccea7f711': Document(page_content='foo', metadata={}),\n",
|
||||
" '807e0c63-13f6-4070-9774-5c6f0fbb9866': Document(page_content='bar', metadata={})}"
|
||||
"{'e0b74348-6c93-4893-8764-943139ec1d17': Document(page_content='foo', lookup_str='', metadata={}, lookup_index=0),\n",
|
||||
" 'd5211050-c777-493d-8825-4800e74cfdb6': Document(page_content='bar', lookup_str='', metadata={}, lookup_index=0)}"
|
||||
]
|
||||
},
|
||||
"execution_count": 24,
|
||||
"execution_count": 11,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
@@ -348,140 +350,13 @@
|
||||
"db1.docstore._dict"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "f4294b96",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Similarity Search with filtering\n",
|
||||
"FAISS vectorstore can also support filtering, since the FAISS does not natively support filtering we have to do it manually. This is done by first fetching more results than `k` and then filtering them. You can filter the documents based on metadata. You can also set the `fetch_k` parameter when calling any search method to set how many documents you want to fetch before filtering. Here is a small example:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 25,
|
||||
"id": "d5bf812c",
|
||||
"execution_count": null,
|
||||
"id": "f80b60de",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Content: foo, Metadata: {'page': 1}, Score: 5.159960813797904e-15\n",
|
||||
"Content: foo, Metadata: {'page': 2}, Score: 5.159960813797904e-15\n",
|
||||
"Content: foo, Metadata: {'page': 3}, Score: 5.159960813797904e-15\n",
|
||||
"Content: foo, Metadata: {'page': 4}, Score: 5.159960813797904e-15\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain.schema import Document\n",
|
||||
"list_of_documents = [\n",
|
||||
" Document(page_content=\"foo\", metadata=dict(page=1)),\n",
|
||||
" Document(page_content=\"bar\", metadata=dict(page=1)),\n",
|
||||
" Document(page_content=\"foo\", metadata=dict(page=2)),\n",
|
||||
" Document(page_content=\"barbar\", metadata=dict(page=2)),\n",
|
||||
" Document(page_content=\"foo\", metadata=dict(page=3)),\n",
|
||||
" Document(page_content=\"bar burr\", metadata=dict(page=3)),\n",
|
||||
" Document(page_content=\"foo\", metadata=dict(page=4)),\n",
|
||||
" Document(page_content=\"bar bruh\", metadata=dict(page=4))\n",
|
||||
"]\n",
|
||||
"db = FAISS.from_documents(list_of_documents, embeddings)\n",
|
||||
"results_with_scores = db.similarity_search_with_score(\"foo\")\n",
|
||||
"for doc, score in results_with_scores:\n",
|
||||
" print(f\"Content: {doc.page_content}, Metadata: {doc.metadata}, Score: {score}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "3d33c126",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Now we make the same query call but we filter for only `page = 1` "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 26,
|
||||
"id": "83159330",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Content: foo, Metadata: {'page': 1}, Score: 5.159960813797904e-15\n",
|
||||
"Content: bar, Metadata: {'page': 1}, Score: 0.3131446838378906\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"results_with_scores = db.similarity_search_with_score(\"foo\", filter=dict(page=1))\n",
|
||||
"for doc, score in results_with_scores:\n",
|
||||
" print(f\"Content: {doc.page_content}, Metadata: {doc.metadata}, Score: {score}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "0be136e0",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Same thing can be done with the `max_marginal_relevance_search` as well."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 27,
|
||||
"id": "432c6980",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Content: foo, Metadata: {'page': 1}\n",
|
||||
"Content: bar, Metadata: {'page': 1}\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"results = db.max_marginal_relevance_search(\"foo\", filter=dict(page=1))\n",
|
||||
"for doc in results:\n",
|
||||
" print(f\"Content: {doc.page_content}, Metadata: {doc.metadata}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "1b4ecd86",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Here is an example of how to set `fetch_k` parameter when calling `similarity_search`. Usually you would want the `fetch_k` parameter >> `k` parameter. This is because the `fetch_k` parameter is the number of documents that will be fetched before filtering. If you set `fetch_k` to a low number, you might not get enough documents to filter from."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 28,
|
||||
"id": "1fd60fd1",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Content: foo, Metadata: {'page': 1}, Score: 5.159960813797904e-15\n",
|
||||
"Content: bar, Metadata: {'page': 1}, Score: 0.3131446838378906\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"results = db.similarity_search(\"foo\", filter=dict(page=1), k=1, fetch_k=4)\n",
|
||||
"for doc, score in results_with_scores:\n",
|
||||
" print(f\"Content: {doc.page_content}, Metadata: {doc.metadata}, Score: {score}\")"
|
||||
]
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
@@ -500,7 +375,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.16"
|
||||
"version": "3.10.6"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
||||
Binary file not shown.
@@ -1,157 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Hologres\n",
|
||||
"\n",
|
||||
">[Hologres](https://www.alibabacloud.com/help/en/hologres/latest/introduction) is a unified real-time data warehousing service developed by Alibaba Cloud. You can use Hologres to write, update, process, and analyze large amounts of data in real time. \n",
|
||||
">Hologres supports standard SQL syntax, is compatible with PostgreSQL, and supports most PostgreSQL functions. Hologres supports online analytical processing (OLAP) and ad hoc analysis for up to petabytes of data, and provides high-concurrency and low-latency online data services. \n",
|
||||
"\n",
|
||||
">Hologres provides **vector database** functionality by adopting [Proxima](https://www.alibabacloud.com/help/en/hologres/latest/vector-processing).\n",
|
||||
">Proxima is a high-performance software library developed by Alibaba DAMO Academy. It allows you to search for the nearest neighbors of vectors. Proxima provides higher stability and performance than similar open source software such as Faiss. Proxima allows you to search for similar text or image embeddings with high throughput and low latency. Hologres is deeply integrated with Proxima to provide a high-performance vector search service.\n",
|
||||
"\n",
|
||||
"This notebook shows how to use functionality related to the `Hologres Proxima` vector database.\n",
|
||||
"Click [here](https://www.alibabacloud.com/zh/product/hologres) to fast deploy a Hologres cloud instance."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
|
||||
"from langchain.text_splitter import CharacterTextSplitter\n",
|
||||
"from langchain.vectorstores import Hologres"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Split documents and get embeddings by call OpenAI API"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.document_loaders import TextLoader\n",
|
||||
"\n",
|
||||
"loader = TextLoader(\"../../../state_of_the_union.txt\")\n",
|
||||
"documents = loader.load()\n",
|
||||
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
|
||||
"docs = text_splitter.split_documents(documents)\n",
|
||||
"\n",
|
||||
"embeddings = OpenAIEmbeddings()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Connect to Hologres by setting related ENVIRONMENTS.\n",
|
||||
"```\n",
|
||||
"export PG_HOST={host}\n",
|
||||
"export PG_PORT={port} # Optional, default is 80\n",
|
||||
"export PG_DATABASE={db_name} # Optional, default is postgres\n",
|
||||
"export PG_USER={username}\n",
|
||||
"export PG_PASSWORD={password}\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"Then store your embeddings and documents into Hologres"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"\n",
|
||||
"connection_string = Hologres.connection_string_from_db_params(\n",
|
||||
" host=os.environ.get(\"PGHOST\", \"localhost\"),\n",
|
||||
" port=int(os.environ.get(\"PGPORT\", \"80\")),\n",
|
||||
" database=os.environ.get(\"PGDATABASE\", \"postgres\"),\n",
|
||||
" user=os.environ.get(\"PGUSER\", \"postgres\"),\n",
|
||||
" password=os.environ.get(\"PGPASSWORD\", \"postgres\"),\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"vector_db = Hologres.from_documents(\n",
|
||||
" docs,\n",
|
||||
" embeddings,\n",
|
||||
" connection_string=connection_string,\n",
|
||||
" table_name=\"langchain_example_embeddings\",\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Query and retrieve data"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
|
||||
"docs = vector_db.similarity_search(query)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \n",
|
||||
"\n",
|
||||
"Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n",
|
||||
"\n",
|
||||
"One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n",
|
||||
"\n",
|
||||
"And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"print(docs[0].page_content)"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.16"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 4
|
||||
}
|
||||
@@ -5,18 +5,16 @@
|
||||
"id": "683953b3",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Commented out until further notice\n",
|
||||
"# MongoDB Atlas Vector Search\n",
|
||||
"\n",
|
||||
"MongoDB Atlas Vector Search\n",
|
||||
">[MongoDB Atlas](https://www.mongodb.com/docs/atlas/) is a document database managed in the cloud. It also enables Lucene and its vector search feature.\n",
|
||||
"\n",
|
||||
">[MongoDB Atlas](https://www.mongodb.com/docs/atlas/) is a fully-managed cloud database available in AWS , Azure, and GCP. It now has support for native Vector Search on your MongoDB document data.\n",
|
||||
"This notebook shows how to use the functionality related to the `MongoDB Atlas Vector Search` feature where you can store your embeddings in MongoDB documents and create a Lucene vector index to perform a KNN search.\n",
|
||||
"\n",
|
||||
"This notebook shows how to use `MongoDB Atlas Vector Search` to store your embeddings in MongoDB documents, create a vector search index, and perform KNN search with an approximate nearest neighbor algorithm.\n",
|
||||
"It uses the [knnBeta Operator](https://www.mongodb.com/docs/atlas/atlas-search/knn-beta) available in MongoDB Atlas Search. This feature is in early access and available only for evaluation purposes, to validate functionality, and to gather feedback from a small closed group of early access users. It is not recommended for production deployments as we may introduce breaking changes.\n",
|
||||
"\n",
|
||||
"It uses the [knnBeta Operator](https://www.mongodb.com/docs/atlas/atlas-search/knn-beta) available in MongoDB Atlas Search. This feature is in Public Preview and available for evaluation purposes, to validate functionality, and to gather feedback from public preview users. It is not recommended for production deployments as we may introduce breaking changes.\n",
|
||||
"\n",
|
||||
"To use MongoDB Atlas, you must first deploy a cluster. We have a Forever-Free tier of clusters available. \n",
|
||||
"To get started head over to Atlas here: [quick start](https://www.mongodb.com/docs/atlas/getting-started/)."
|
||||
"To use MongoDB Atlas, you must have first deployed a cluster. Free clusters are available. \n",
|
||||
"Here is the MongoDB Atlas [quick start](https://www.mongodb.com/docs/atlas/getting-started/)."
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -39,29 +37,16 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"import getpass\n",
|
||||
"\n",
|
||||
"MONGODB_ATLAS_CLUSTER_URI = getpass.getpass('MongoDB Atlas Cluster URI:')\n",
|
||||
"MONGODB_ATLAS_CLUSTER_URI = os.environ['MONGODB_ATLAS_CLUSTER_URI']"
|
||||
"MONGODB_ATLAS_URI = os.environ['MONGODB_ATLAS_URI']"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "457ace44-1d95-4001-9dd5-78811ab208ad",
|
||||
"id": "320af802-9271-46ee-948f-d2453933d44b",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We want to use `OpenAIEmbeddings` so we need to set up our OpenAI API Key. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "2d8f240d",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"os.environ['OPENAI_API_KEY'] = getpass.getpass('OpenAI API Key:')\n",
|
||||
"OPENAI_API_KEY = os.environ['OPENAI_API_KEY']"
|
||||
"We want to use `OpenAIEmbeddings` so we have to get the OpenAI API Key. Make sure the environment variable `OPENAI_API_KEY` is set up before proceeding."
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -69,8 +54,8 @@
|
||||
"id": "1f3ecc42",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Now, let's create a vector search index on your cluster. In the below example, `embedding` is the name of the field that contains the embedding vector. Please refer to the [documentation](https://www.mongodb.com/docs/atlas/atlas-search/define-field-mappings-for-vector-search) to get more details on how to define an Atlas Vector Search index.\n",
|
||||
"You can name the index `langchain_demo` and create the index on the namespace `lanchain_db.langchain_col`. Finally, write the following definition in the JSON editor on MongoDB Atlas:\n",
|
||||
"Now, let's create a Lucene vector index on your cluster. In the below example, `embedding` is the name of the field that contains the embedding vector. Please refer to the [documentation](https://www.mongodb.com/docs/atlas/atlas-search/define-field-mappings-for-vector-search) to get more details on how to define an Atlas Search index.\n",
|
||||
"You can name the index `langchain_demo` and create the index on the namespace `lanchain_db.langchain_col`. Finally, write the following definition in the JSON editor:\n",
|
||||
"\n",
|
||||
"```json\n",
|
||||
"{\n",
|
||||
@@ -129,7 +114,7 @@
|
||||
"from pymongo import MongoClient\n",
|
||||
"\n",
|
||||
"# initialize MongoDB python client\n",
|
||||
"client = MongoClient(MONGODB_ATLAS_CLUSTER_URI)\n",
|
||||
"client = MongoClient(MONGODB_ATLAS_CONNECTION_STRING)\n",
|
||||
"\n",
|
||||
"db_name = \"lanchain_db\"\n",
|
||||
"collection_name = \"langchain_col\"\n",
|
||||
@@ -158,47 +143,6 @@
|
||||
"source": [
|
||||
"print(docs[0].page_content)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "851a2ec9-9390-49a4-8412-3e132c9f789d",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"You can reuse the vector search index you created, make sure the `OPENAI_API_KEY` environment variable is set up, then execute another query."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "6336fe79-3e73-48be-b20a-0ff1bb6a4399",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from pymongo import MongoClient\n",
|
||||
"from langchain.vectorstores import MongoDBAtlasVectorSearch\n",
|
||||
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
|
||||
"import os\n",
|
||||
"\n",
|
||||
"MONGODB_ATLAS_URI = os.environ['MONGODB_ATLAS_URI']\n",
|
||||
"\n",
|
||||
"# initialize MongoDB python client\n",
|
||||
"client = MongoClient(MONGODB_ATLAS_URI)\n",
|
||||
"\n",
|
||||
"db_name = \"langchain_db\"\n",
|
||||
"collection_name = \"langchain_col\"\n",
|
||||
"collection = client[db_name][collection_name]\n",
|
||||
"index_name = \"langchain_index\"\n",
|
||||
"\n",
|
||||
"# initialize vector store\n",
|
||||
"vectorStore = MongoDBAtlasVectorSearch(\n",
|
||||
" collection, OpenAIEmbeddings(), index_name=index_name)\n",
|
||||
"\n",
|
||||
"# perform a similarity search between the embedding of the query and the embeddings of the documents\n",
|
||||
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
|
||||
"docs = vectorStore.similarity_search(query)\n",
|
||||
"\n",
|
||||
"print(docs[0].page_content)"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
@@ -217,7 +161,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.1"
|
||||
"version": "3.11.3"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
||||
@@ -1,7 +1,6 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "683953b3",
|
||||
"metadata": {},
|
||||
@@ -14,7 +13,6 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "43ead5d5-2c1f-4dce-a69a-cb00e4f9d6f0",
|
||||
"metadata": {},
|
||||
@@ -35,7 +33,6 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "15a1d477-9cdb-4d82-b019-96951ecb2b72",
|
||||
"metadata": {},
|
||||
@@ -57,7 +54,6 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "a9d16fa3",
|
||||
"metadata": {},
|
||||
@@ -173,7 +169,6 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "e3a8b105",
|
||||
"metadata": {},
|
||||
@@ -192,7 +187,6 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "f59360c0",
|
||||
"metadata": {},
|
||||
@@ -237,24 +231,6 @@
|
||||
"docsearch = MyScale.from_documents(docs, embeddings)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "8d867b05",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Similarity search with score"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "9ec25cc5",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The returned distance score is cosine distance. Therefore, a lower score is better."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 16,
|
||||
@@ -281,7 +257,6 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "a359ed74",
|
||||
"metadata": {},
|
||||
|
||||
@@ -24,7 +24,7 @@
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"!pip install pinecone-client openai tiktoken"
|
||||
"!pip install pinecone-client"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -70,7 +70,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"execution_count": 2,
|
||||
"id": "aac9563e",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
@@ -85,7 +85,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"execution_count": 2,
|
||||
"id": "a3c3999a",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -135,51 +135,13 @@
|
||||
"print(docs[0].page_content)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "d46d1452",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Maximal Marginal Relevance Searches\n",
|
||||
"\n",
|
||||
"In addition to using similarity search in the retriever object, you can also use `mmr` as retriever.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "a359ed74",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"retriever = docsearch.as_retriever(search_type=\"mmr\")\n",
|
||||
"matched_docs = retriever.get_relevant_documents(query)\n",
|
||||
"for i, d in enumerate(matched_docs):\n",
|
||||
" print(f\"\\n## Document {i}\\n\")\n",
|
||||
" print(d.page_content)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "7c477287",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Or use `max_marginal_relevance_search` directly:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "9ca82740",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"found_docs = docsearch.max_marginal_relevance_search(query, k=2, fetch_k=10)\n",
|
||||
"for i, doc in enumerate(found_docs):\n",
|
||||
" print(f\"{i + 1}.\", doc.page_content, \"\\n\")"
|
||||
]
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
|
||||
@@ -1,7 +1,6 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "683953b3",
|
||||
"metadata": {},
|
||||
@@ -34,7 +33,6 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "7b2f111b-357a-4f42-9730-ef0603bdc1b5",
|
||||
"metadata": {},
|
||||
@@ -51,7 +49,7 @@
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"name": "stdin",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"OpenAI API Key: ········\n"
|
||||
@@ -106,7 +104,6 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "eeead681",
|
||||
"metadata": {},
|
||||
@@ -143,7 +140,6 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "59f0b954",
|
||||
"metadata": {},
|
||||
@@ -174,7 +170,6 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "749658ce",
|
||||
"metadata": {},
|
||||
@@ -205,7 +200,6 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "c9e21ce9",
|
||||
"metadata": {},
|
||||
@@ -237,7 +231,6 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "93540013",
|
||||
"metadata": {},
|
||||
@@ -286,7 +279,6 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "1f9215c8",
|
||||
"metadata": {
|
||||
@@ -349,15 +341,13 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "1bda9bf5",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Similarity search with score\n",
|
||||
"\n",
|
||||
"Sometimes we might want to perform the search, but also obtain a relevancy score to know how good is a particular result. \n",
|
||||
"The returned distance score is cosine distance. Therefore, a lower score is better."
|
||||
"Sometimes we might want to perform the search, but also obtain a relevancy score to know how good is a particular result."
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -410,7 +400,6 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "525e3582",
|
||||
"metadata": {},
|
||||
@@ -421,7 +410,6 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "1c2c58dc",
|
||||
"metadata": {},
|
||||
@@ -435,7 +423,6 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "c58c30bf",
|
||||
"metadata": {
|
||||
@@ -516,7 +503,6 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "691a82d6",
|
||||
"metadata": {},
|
||||
@@ -554,7 +540,6 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "0c851b4f",
|
||||
"metadata": {},
|
||||
@@ -617,7 +602,6 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "0358ecde",
|
||||
"metadata": {},
|
||||
|
||||
@@ -1,139 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "2b9582dc",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# SingleStoreDB vector search\n",
|
||||
"[SingleStore DB](https://singlestore.com) is a high-performance distributed database that supports deployment both in the [cloud](https://www.singlestore.com/cloud/) and on-premises. For a significant duration, it has provided support for vector functions such as [dot_product](https://docs.singlestore.com/managed-service/en/reference/sql-reference/vector-functions/dot_product.html), thereby positioning itself as an ideal solution for AI applications that require text similarity matching. \n",
|
||||
"This tutorial illustrates how to utilize the features of the SingleStore DB Vector Store."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "e4a61a4d",
|
||||
"metadata": {
|
||||
"scrolled": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Establishing a connection to the database is facilitated through the singlestoredb Python connector.\n",
|
||||
"# Please ensure that this connector is installed in your working environment.\n",
|
||||
"!pip install singlestoredb"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "39a0132a",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"import getpass\n",
|
||||
"\n",
|
||||
"# We want to use OpenAIEmbeddings so we have to get the OpenAI API Key.\n",
|
||||
"os.environ['OPENAI_API_KEY'] = getpass.getpass('OpenAI API Key:')\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "6104fde8",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
|
||||
"from langchain.text_splitter import CharacterTextSplitter\n",
|
||||
"from langchain.vectorstores import SingleStoreDB\n",
|
||||
"from langchain.document_loaders import TextLoader"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "7b45113c",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Load text samples \n",
|
||||
"from langchain.document_loaders import TextLoader\n",
|
||||
"loader = TextLoader('../../../state_of_the_union.txt')\n",
|
||||
"documents = loader.load()\n",
|
||||
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
|
||||
"docs = text_splitter.split_documents(documents)\n",
|
||||
"\n",
|
||||
"embeddings = OpenAIEmbeddings()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "535b2687",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"There are several ways to establish a [connection](https://singlestoredb-python.labs.singlestore.com/generated/singlestoredb.connect.html) to the database. You can either set up environment variables or pass named parameters to the `SingleStoreDB constructor`. Alternatively, you may provide these parameters to the `from_documents` and `from_texts` methods."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "d0b316bf",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Setup connection url as environment variable\n",
|
||||
"os.environ['SINGLESTOREDB_URL'] = 'root:pass@localhost:3306/db'\n",
|
||||
"\n",
|
||||
"# Load documents to the store\n",
|
||||
"docsearch = SingleStoreDB.from_documents(\n",
|
||||
" docs,\n",
|
||||
" embeddings,\n",
|
||||
" table_name = \"noteook\", # use table with a custom name \n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "0eaa4297",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
|
||||
"docs = docsearch.similarity_search(query) # Find documents that correspond to the query\n",
|
||||
"print(docs[0].page_content)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "86efff90",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.2"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -1,7 +1,6 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "683953b3",
|
||||
"metadata": {},
|
||||
@@ -10,7 +9,6 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "cc80fa84-1f2f-48b4-bd39-3e6412f012f1",
|
||||
"metadata": {},
|
||||
@@ -87,7 +85,6 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "69bff365-3039-4ff8-a641-aa190166179d",
|
||||
"metadata": {},
|
||||
@@ -239,7 +236,6 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "18152965",
|
||||
"metadata": {},
|
||||
@@ -247,15 +243,6 @@
|
||||
"## Similarity search with score\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "ea13e80a",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The returned distance score is cosine distance. Therefore, a lower score is better."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
@@ -289,7 +276,6 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "794a7552",
|
||||
"metadata": {},
|
||||
|
||||
@@ -1,199 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"# Tigris\n",
|
||||
"\n",
|
||||
"> [Tigris](htttps://tigrisdata.com) is an open source Serverless NoSQL Database and Search Platform designed to simplify building high-performance vector search applications.\n",
|
||||
"> Tigris eliminates the infrastructure complexity of managing, operating, and synchronizing multiple tools, allowing you to focus on building great applications instead."
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"This notebook guides you how to use Tigris as your VectorStore"
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"**Pre requisites**\n",
|
||||
"1. An OpenAI account. You can sign up for an account [here](https://platform.openai.com/)\n",
|
||||
"2. [Sign up for a free Tigris account](https://console.preview.tigrisdata.cloud). Once you have signed up for the Tigris account, create a new project called `vectordemo`. Next, make a note of the *Uri* for the region you've created your project in, the **clientId** and **clientSecret**. You can get all this information from the **Application Keys** section of the project."
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"Let's first install our dependencies:"
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"!pip install tigrisdb openapi-schema-pydantic openai tiktoken"
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"We will load the `OpenAI` api key and `Tigris` credentials in our environment"
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"import getpass\n",
|
||||
"\n",
|
||||
"os.environ['OPENAI_API_KEY'] = getpass.getpass('OpenAI API Key:')\n",
|
||||
"os.environ['TIGRIS_PROJECT'] = getpass.getpass('Tigris Project Name:')\n",
|
||||
"os.environ['TIGRIS_CLIENT_ID'] = getpass.getpass('Tigris Client Id:')\n",
|
||||
"os.environ['TIGRIS_CLIENT_SECRET'] = getpass.getpass('Tigris Client Secret:')"
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
|
||||
"from langchain.text_splitter import CharacterTextSplitter\n",
|
||||
"from langchain.vectorstores import Tigris\n",
|
||||
"from langchain.document_loaders import TextLoader"
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"### Initialize Tigris vector store\n",
|
||||
"Let's import our test dataset:"
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"loader = TextLoader('../../../state_of_the_union.txt')\n",
|
||||
"documents = loader.load()\n",
|
||||
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
|
||||
"docs = text_splitter.split_documents(documents)\n",
|
||||
"\n",
|
||||
"embeddings = OpenAIEmbeddings()"
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"vector_store = Tigris.from_documents(docs, embeddings, index_name=\"my_embeddings\")"
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"### Similarity Search"
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
|
||||
"found_docs = vector_store.similarity_search(query)\n",
|
||||
"print(found_docs)"
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"### Similarity Search with score (vector distance)"
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
|
||||
"result = vector_store.similarity_search_with_score(query)\n",
|
||||
"for (doc, score) in result:\n",
|
||||
" print(f\"document={doc}, score={score}\")"
|
||||
],
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
}
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 2
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython2",
|
||||
"version": "2.7.6"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 0
|
||||
}
|
||||
@@ -1,23 +1,21 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "683953b3",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Vectara\n",
|
||||
"\n",
|
||||
">[Vectara](https://vectara.com/) is a API platform for building LLM-powered applications. It provides a simple to use API for document indexing and query that is managed by Vectara and is optimized for performance and accuracy. \n",
|
||||
">[Vectara](https://Vectara.com/docs/) is a API platform for building LLM-powered applications. It provides a simple to use API for document indexing and query that is managed by Vectara and is optimized for performance and accuracy. \n",
|
||||
"\n",
|
||||
"\n",
|
||||
"This notebook shows how to use functionality related to the `Vectara` vector database. \n",
|
||||
"\n",
|
||||
"See the [Vectara API documentation ](https://docs.vectara.com/docs/) for more information on how to use the API."
|
||||
"See the [Vectara API documentation ](https://Vectara.com/docs/) for more information on how to use the API."
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "7b2f111b-357a-4f42-9730-ef0603bdc1b5",
|
||||
"metadata": {},
|
||||
@@ -89,7 +87,6 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "eeead681",
|
||||
"metadata": {},
|
||||
@@ -101,7 +98,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"execution_count": 5,
|
||||
"id": "8429667e",
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
@@ -116,7 +113,6 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "1f9215c8",
|
||||
"metadata": {
|
||||
@@ -133,7 +129,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"execution_count": 6,
|
||||
"id": "a8c513ab",
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
@@ -145,12 +141,12 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
|
||||
"found_docs = vectara.similarity_search(query, n_sentence_context=0)"
|
||||
"found_docs = vectara.similarity_search(query)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"execution_count": 7,
|
||||
"id": "fc516993",
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
@@ -164,13 +160,7 @@
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \n",
|
||||
"\n",
|
||||
"Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n",
|
||||
"\n",
|
||||
"One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n",
|
||||
"\n",
|
||||
"And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.\n"
|
||||
"Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence. A former top litigator in private practice. A former federal public defender.\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
@@ -179,7 +169,6 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "1bda9bf5",
|
||||
"metadata": {},
|
||||
@@ -191,7 +180,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"execution_count": 8,
|
||||
"id": "8804a21d",
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
@@ -207,7 +196,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"execution_count": 9,
|
||||
"id": "756a6887",
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
@@ -220,15 +209,9 @@
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \n",
|
||||
"Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence. A former top litigator in private practice. A former federal public defender.\n",
|
||||
"\n",
|
||||
"Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n",
|
||||
"\n",
|
||||
"One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n",
|
||||
"\n",
|
||||
"And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.\n",
|
||||
"\n",
|
||||
"Score: 0.7129974\n"
|
||||
"Score: 1.0046461\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
@@ -239,7 +222,6 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "691a82d6",
|
||||
"metadata": {},
|
||||
@@ -251,7 +233,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"execution_count": 11,
|
||||
"id": "9427195f",
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
@@ -263,10 +245,10 @@
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"VectaraRetriever(vectorstore=<langchain.vectorstores.vectara.Vectara object at 0x122db2830>, search_type='similarity', search_kwargs={'lambda_val': 0.025, 'k': 5, 'filter': '', 'n_sentence_context': '0'})"
|
||||
"VectorStoreRetriever(vectorstore=<langchain.vectorstores.vectara.Vectara object at 0x156d3e830>, search_type='similarity', search_kwargs={})"
|
||||
]
|
||||
},
|
||||
"execution_count": 9,
|
||||
"execution_count": 11,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
@@ -278,7 +260,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"execution_count": 15,
|
||||
"id": "f3c70c31",
|
||||
"metadata": {
|
||||
"ExecuteTime": {
|
||||
@@ -290,10 +272,10 @@
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"Document(page_content='Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \\n\\nTonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \\n\\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \\n\\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.', metadata={'source': '../../../state_of_the_union.txt'})"
|
||||
"Document(page_content='Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence. A former top litigator in private practice. A former federal public defender.', metadata={'source': '../../modules/state_of_the_union.txt'})"
|
||||
]
|
||||
},
|
||||
"execution_count": 10,
|
||||
"execution_count": 15,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
@@ -328,7 +310,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.9"
|
||||
"version": "3.11.3"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
||||
@@ -1,7 +1,6 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "683953b3",
|
||||
"metadata": {},
|
||||
@@ -48,7 +47,6 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "6b34828d-e627-4d85-aabd-eeb15d9f4b00",
|
||||
"metadata": {},
|
||||
@@ -167,7 +165,6 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "a15863ee",
|
||||
"metadata": {},
|
||||
@@ -175,16 +172,6 @@
|
||||
"## Similarity search with score"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "64e03db8",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Sometimes we might want to perform the search, but also obtain a relevancy score to know how good is a particular result. \n",
|
||||
"The returned distance score is cosine distance. Therefore, a lower score is better."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
@@ -209,6 +196,7 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "8fc3487b",
|
||||
"metadata": {},
|
||||
@@ -217,6 +205,7 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "281c0fcc",
|
||||
"metadata": {},
|
||||
@@ -225,7 +214,6 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "05fd146c",
|
||||
"metadata": {},
|
||||
@@ -234,6 +222,7 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "503e2e75",
|
||||
"metadata": {},
|
||||
@@ -270,6 +259,7 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "fbd7a6cb",
|
||||
"metadata": {},
|
||||
@@ -278,6 +268,7 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "f349acb9",
|
||||
"metadata": {},
|
||||
@@ -379,7 +370,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.9"
|
||||
"version": "3.9.16"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
||||
@@ -121,7 +121,7 @@
|
||||
"\n",
|
||||
"Human: Hi there my friend\n",
|
||||
"AI: Hi there, how are you doing today?\n",
|
||||
"Human: Not too bad - how are you?\n",
|
||||
"Human: Not to bad - how are you?\n",
|
||||
"Chatbot:\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1m> Finished LLMChain chain.\u001b[0m\n"
|
||||
|
||||
@@ -118,29 +118,6 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "955f1b15",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## DynamoDBChatMessageHistory with Custom Endpoint URL\n",
|
||||
"\n",
|
||||
"Sometimes it is useful to specify the URL to the AWS endpoint to connect to. For instance, when you are running locally against [Localstack](https://localstack.cloud/). For those cases you can specify the URL via the `endpoint_url` parameter in the constructor."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "225713c8",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.memory.chat_message_histories import DynamoDBChatMessageHistory\n",
|
||||
"\n",
|
||||
"history = DynamoDBChatMessageHistory(table_name=\"SessionTable\", session_id=\"0\", endpoint_url=\"http://localhost.localstack.cloud:4566\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "3b33c988",
|
||||
"metadata": {},
|
||||
|
||||
@@ -1,7 +1,6 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "d31df93e",
|
||||
"metadata": {},
|
||||
@@ -10,7 +9,7 @@
|
||||
"\n",
|
||||
"This notebook walks through how LangChain thinks about memory. \n",
|
||||
"\n",
|
||||
"Memory involves keeping a concept of state around throughout a user's interactions with a language model. A user's interactions with a language model are captured in the concept of ChatMessages, so this boils down to ingesting, capturing, transforming and extracting knowledge from a sequence of chat messages. There are many different ways to do this, each of which exists as its own memory type.\n",
|
||||
"Memory involves keeping a concept of state around throughout a user's interactions with an language model. A user's interactions with a language model are captured in the concept of ChatMessages, so this boils down to ingesting, capturing, transforming and extracting knowledge from a sequence of chat messages. There are many different ways to do this, each of which exists as its own memory type.\n",
|
||||
"\n",
|
||||
"In general, for each type of memory there are two ways to understanding using memory. These are the standalone functions which extract information from a sequence of messages, and then there is the way you can use this type of memory in a chain. \n",
|
||||
"\n",
|
||||
@@ -26,7 +25,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"execution_count": 1,
|
||||
"id": "87235cf1",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -42,18 +41,18 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"execution_count": 5,
|
||||
"id": "be030822",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[HumanMessage(content='hi!', additional_kwargs={}, example=False),\n",
|
||||
" AIMessage(content='whats up?', additional_kwargs={}, example=False)]"
|
||||
"[HumanMessage(content='hi!', additional_kwargs={}),\n",
|
||||
" AIMessage(content='whats up?', additional_kwargs={})]"
|
||||
]
|
||||
},
|
||||
"execution_count": 3,
|
||||
"execution_count": 5,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
@@ -76,7 +75,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"execution_count": 7,
|
||||
"id": "a382b160",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -86,7 +85,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"execution_count": 10,
|
||||
"id": "a280d337",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -98,7 +97,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"execution_count": 12,
|
||||
"id": "1b739c0a",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
@@ -108,7 +107,7 @@
|
||||
"{'history': 'Human: hi!\\nAI: whats up?'}"
|
||||
]
|
||||
},
|
||||
"execution_count": 7,
|
||||
"execution_count": 12,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
@@ -127,7 +126,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"execution_count": 13,
|
||||
"id": "798ceb1c",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -139,18 +138,18 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"execution_count": 14,
|
||||
"id": "698688fd",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"{'history': [HumanMessage(content='hi!', additional_kwargs={}, example=False),\n",
|
||||
" AIMessage(content='whats up?', additional_kwargs={}, example=False)]}"
|
||||
"{'history': [HumanMessage(content='hi!', additional_kwargs={}),\n",
|
||||
" AIMessage(content='whats up?', additional_kwargs={})]}"
|
||||
]
|
||||
},
|
||||
"execution_count": 9,
|
||||
"execution_count": 14,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
@@ -170,7 +169,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"execution_count": 15,
|
||||
"id": "54301321",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -189,7 +188,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"execution_count": 16,
|
||||
"id": "ae046bff",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
@@ -217,7 +216,7 @@
|
||||
"\" Hi there! It's nice to meet you. How can I help you today?\""
|
||||
]
|
||||
},
|
||||
"execution_count": 11,
|
||||
"execution_count": 16,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
@@ -228,7 +227,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 12,
|
||||
"execution_count": 17,
|
||||
"id": "d8e2a6ff",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
@@ -257,7 +256,7 @@
|
||||
"\" That's great! It's always nice to have a conversation with someone new. What would you like to talk about?\""
|
||||
]
|
||||
},
|
||||
"execution_count": 12,
|
||||
"execution_count": 17,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
@@ -268,7 +267,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 13,
|
||||
"execution_count": 18,
|
||||
"id": "15eda316",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
@@ -299,7 +298,7 @@
|
||||
"\" Sure! I'm an AI created to help people with their everyday tasks. I'm programmed to understand natural language and provide helpful information. I'm also constantly learning and updating my knowledge base so I can provide more accurate and helpful answers.\""
|
||||
]
|
||||
},
|
||||
"execution_count": 13,
|
||||
"execution_count": 18,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
@@ -320,7 +319,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 14,
|
||||
"execution_count": 1,
|
||||
"id": "b5acbc4b",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -339,7 +338,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 15,
|
||||
"execution_count": 2,
|
||||
"id": "7812ee21",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -349,20 +348,18 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 16,
|
||||
"execution_count": 3,
|
||||
"id": "3ed6e6a0",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[{'type': 'human',\n",
|
||||
" 'data': {'content': 'hi!', 'additional_kwargs': {}, 'example': False}},\n",
|
||||
" {'type': 'ai',\n",
|
||||
" 'data': {'content': 'whats up?', 'additional_kwargs': {}, 'example': False}}]"
|
||||
"[{'type': 'human', 'data': {'content': 'hi!', 'additional_kwargs': {}}},\n",
|
||||
" {'type': 'ai', 'data': {'content': 'whats up?', 'additional_kwargs': {}}}]"
|
||||
]
|
||||
},
|
||||
"execution_count": 16,
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
@@ -373,7 +370,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 17,
|
||||
"execution_count": 4,
|
||||
"id": "cdf4ebd2",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -383,18 +380,18 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 18,
|
||||
"execution_count": 5,
|
||||
"id": "9724e24b",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[HumanMessage(content='hi!', additional_kwargs={}, example=False),\n",
|
||||
" AIMessage(content='whats up?', additional_kwargs={}, example=False)]"
|
||||
"[HumanMessage(content='hi!', additional_kwargs={}),\n",
|
||||
" AIMessage(content='whats up?', additional_kwargs={})]"
|
||||
]
|
||||
},
|
||||
"execution_count": 18,
|
||||
"execution_count": 5,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
@@ -410,6 +407,14 @@
|
||||
"source": [
|
||||
"And that's it for the getting started! There are plenty of different types of memory, check out our examples to see them all"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "3dd37d93",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
@@ -428,7 +433,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.9"
|
||||
"version": "3.9.1"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
||||
@@ -7,12 +7,7 @@
|
||||
"source": [
|
||||
"# Anthropic\n",
|
||||
"\n",
|
||||
"\n",
|
||||
">[Anthropic](https://en.wikipedia.org/wiki/Anthropic) is an American artificial intelligence (AI) startup and \n",
|
||||
"> public-benefit corporation, founded by former members of OpenAI. `Anthropic` specializes in developing general AI \n",
|
||||
"> systems and language models, with a company ethos of responsible AI usage.\n",
|
||||
"> `Anthropic` develops a chatbot, named `Claude`. Similar to `ChatGPT`, `Claude` uses a messaging \n",
|
||||
"> interface where users can submit questions or requests and receive highly detailed and relevant responses.\n"
|
||||
"This notebook covers how to get started with Anthropic chat models."
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -176,7 +171,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.6"
|
||||
"version": "3.11.3"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
||||
@@ -4,14 +4,9 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Google Vertex AI PaLM \n",
|
||||
"# Google Cloud Platform Vertex AI PaLM \n",
|
||||
"\n",
|
||||
">[Vertex AI](https://cloud.google.com/vertex-ai/docs/start/introduction-unified-platform) is a machine learning (ML) \n",
|
||||
"> platform that lets you train and deploy ML models and AI applications. \n",
|
||||
"> `Vertex AI` combines data engineering, data science, and ML engineering workflows, enabling your teams to \n",
|
||||
"> collaborate using a common toolset.\n",
|
||||
"\n",
|
||||
"**Note:** This is seperate from the Google PaLM integration. Google has chosen to offer an enterprise version of PaLM through GCP, and this supports the models made available through there. \n",
|
||||
"Note: This is seperate from the Google PaLM integration. Google has chosen to offer an enterprise version of PaLM through GCP, and this supports the models made available through there. \n",
|
||||
"\n",
|
||||
"PaLM API on Vertex AI is a Preview offering, subject to the Pre-GA Offerings Terms of the [GCP Service Specific Terms](https://cloud.google.com/terms/service-terms). \n",
|
||||
"\n",
|
||||
@@ -162,7 +157,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.6"
|
||||
"version": "3.9.1"
|
||||
},
|
||||
"vscode": {
|
||||
"interpreter": {
|
||||
|
||||
@@ -1,19 +1,18 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "959300d4",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# PromptLayer ChatOpenAI\n",
|
||||
"\n",
|
||||
">[PromptLayer](https://docs.promptlayer.com/what-is-promptlayer/wxpF9EZkUwvdkwvVE9XEvC/how-promptlayer-works/dvgGSxNe6nB1jj8mUVbG8r) \n",
|
||||
"> is a devtool that allows you to track, manage, and share your GPT prompt engineering. \n",
|
||||
"> It acts as a middleware between your code and OpenAI's python library, recording all your API requests \n",
|
||||
"> and saving relevant metadata for easy exploration and search in the [PromptLayer](https://www.promptlayer.com) dashboard."
|
||||
"This example showcases how to connect to [PromptLayer](https://www.promptlayer.com) to start recording your ChatOpenAI requests."
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "6a45943e",
|
||||
"metadata": {},
|
||||
@@ -57,6 +56,7 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "8564ce7d",
|
||||
"metadata": {},
|
||||
@@ -78,6 +78,7 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "bf0294de",
|
||||
"metadata": {},
|
||||
@@ -109,6 +110,7 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "a2d76826",
|
||||
"metadata": {},
|
||||
@@ -123,6 +125,7 @@
|
||||
"source": []
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "c43803d1",
|
||||
"metadata": {},
|
||||
@@ -158,7 +161,7 @@
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"display_name": "base",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
@@ -172,7 +175,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.6"
|
||||
"version": "3.8.8 (default, Apr 13 2021, 12:59:45) \n[Clang 10.0.0 ]"
|
||||
},
|
||||
"vscode": {
|
||||
"interpreter": {
|
||||
|
||||
@@ -631,7 +631,7 @@
|
||||
"id": "56ea6a08",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"You'll need to get a Momento auth token to use this class. This can either be passed in to a momento.CacheClient if you'd like to instantiate that directly, as a named parameter `auth_token` to `MomentoChatMessageHistory.from_client_params`, or can just be set as an environment variable `MOMENTO_AUTH_TOKEN`."
|
||||
"You'll need to get a Momemto auth token to use this class. This can either be passed in to a momento.CacheClient if you'd like to instantiate that directly, as a named parameter `auth_token` to `MomentoChatMessageHistory.from_client_params`, or can just be set as an environment variable `MOMENTO_AUTH_TOKEN`."
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -1,103 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "9597802c",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Aviary\n",
|
||||
"\n",
|
||||
"[Aviary](https://www.anyscale.com/) is an open source tooklit for evaluating and deploying production open source LLMs. \n",
|
||||
"\n",
|
||||
"This example goes over how to use LangChain to interact with `Aviary`. You can try Aviary out [https://aviary.anyscale.com](here).\n",
|
||||
"\n",
|
||||
"You can find out more about Aviary at https://github.com/ray-project/aviary. \n",
|
||||
"\n",
|
||||
"One Aviary instance can serve multiple models. You can get a list of the available models by using the cli:\n",
|
||||
"\n",
|
||||
"`% aviary models`\n",
|
||||
"\n",
|
||||
"Or you can connect directly to the endpoint and get a list of available models by using the `/models` endpoint.\n",
|
||||
"\n",
|
||||
"The constructor requires a url for an Aviary backend, and optionally a token to validate the connection. \n",
|
||||
"\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "6fb585dd",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"from langchain.llms import Aviary\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "3fec5a59",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"llm = Aviary(model='amazon/LightGPT', aviary_url=os.environ['AVIARY_URL'], aviary_token=os.environ['AVIARY_TOKEN'])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"id": "4efd54dd",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Love is an emotion that involves feelings of attraction, affection and empathy for another person. It can also refer to a deep bond between two people or groups of people. Love can be expressed in many different ways, such as through words, actions, gestures, music, art, literature, and other forms of communication.\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"result = llm.predict('What is the meaning of love?')\n",
|
||||
"print(result) "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "27e526b6",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.15"
|
||||
},
|
||||
"vscode": {
|
||||
"interpreter": {
|
||||
"hash": "a0a0263b650d907a3bfe41c0f8d6a63a071b884df3cfdc1579f00cdc1aed6b03"
|
||||
}
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
@@ -1,196 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Baseten\n",
|
||||
"\n",
|
||||
"[Baseten](https://baseten.co) provides all the infrastructure you need to deploy and serve ML models performantly, scalably, and cost-efficiently.\n",
|
||||
"\n",
|
||||
"This example demonstrates using Langchain with models deployed on Baseten."
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Setup\n",
|
||||
"\n",
|
||||
"To run this notebook, you'll need a [Baseten account](https://baseten.co) and an [API key](https://docs.baseten.co/settings/api-keys).\n",
|
||||
"\n",
|
||||
"You'll also need to install the Baseten Python package:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"!pip install baseten"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import baseten\n",
|
||||
"\n",
|
||||
"baseten.login(\"YOUR_API_KEY\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Single model call\n",
|
||||
"\n",
|
||||
"First, you'll need to deploy a model to Baseten.\n",
|
||||
"\n",
|
||||
"You can deploy foundation models like WizardLM and Alpaca with one click from the [Baseten model library](https://app.baseten.co/explore/) or if you have your own model, [deploy it with this tutorial](https://docs.baseten.co/deploying-models/deploy).\n",
|
||||
"\n",
|
||||
"In this example, we'll work with WizardLM. [Deploy WizardLM here](https://app.baseten.co/explore/llama) and follow along with the deployed [model's version ID](https://docs.baseten.co/managing-models/manage)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.llms import Baseten"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Load the model\n",
|
||||
"wizardlm = Baseten(model=\"MODEL_VERSION_ID\", verbose=True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Prompt the model\n",
|
||||
"\n",
|
||||
"wizardlm(\"What is the difference between a Wizard and a Sorcerer?\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Chained model calls\n",
|
||||
"\n",
|
||||
"We can chain together multiple calls to one or multiple models, which is the whole point of Langchain!\n",
|
||||
"\n",
|
||||
"This example uses WizardLM to plan a meal with an entree, three sides, and an alcoholic and non-alcoholic beverage pairing."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.chains import SimpleSequentialChain\n",
|
||||
"from langchain import PromptTemplate, LLMChain"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Build the first link in the chain\n",
|
||||
"\n",
|
||||
"prompt = PromptTemplate(\n",
|
||||
" input_variables=[\"cuisine\"],\n",
|
||||
" template=\"Name a complex entree for a {cuisine} dinner. Respond with just the name of a single dish.\",\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"link_one = LLMChain(llm=wizardlm, prompt=prompt)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Build the second link in the chain\n",
|
||||
"\n",
|
||||
"prompt = PromptTemplate(\n",
|
||||
" input_variables=[\"entree\"],\n",
|
||||
" template=\"What are three sides that would go with {entree}. Respond with only a list of the sides.\",\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"link_two = LLMChain(llm=wizardlm, prompt=prompt)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Build the third link in the chain\n",
|
||||
"\n",
|
||||
"prompt = PromptTemplate(\n",
|
||||
" input_variables=[\"sides\"],\n",
|
||||
" template=\"What is one alcoholic and one non-alcoholic beverage that would go well with this list of sides: {sides}. Respond with only the names of the beverages.\",\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"link_three = LLMChain(llm=wizardlm, prompt=prompt)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Run the full chain!\n",
|
||||
"\n",
|
||||
"menu_maker = SimpleSequentialChain(chains=[link_one, link_two, link_three], verbose=True)\n",
|
||||
"menu_maker.run(\"South Indian\")"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": ".venv",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.4"
|
||||
},
|
||||
"orig_nbformat": 4
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
||||
@@ -6,11 +6,7 @@
|
||||
"id": "J-yvaDTmTTza"
|
||||
},
|
||||
"source": [
|
||||
"# Beam\n",
|
||||
"\n",
|
||||
">[Beam](https://docs.beam.cloud/introduction) makes it easy to run code on GPUs, deploy scalable web APIs, \n",
|
||||
"> schedule cron jobs, and run massively parallel workloads — without managing any infrastructure.\n",
|
||||
"\n",
|
||||
"# Beam integration for langchain\n",
|
||||
"\n",
|
||||
"Calls the Beam API wrapper to deploy and make subsequent calls to an instance of the gpt2 LLM in a cloud deployment. Requires installation of the Beam library and registration of Beam Client ID and Client Secret. By calling the wrapper an instance of the model is created and run, with returned text relating to the prompt. Additional calls can then be made by directly calling the Beam API.\n",
|
||||
"\n",
|
||||
@@ -155,9 +151,9 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.6"
|
||||
"version": "3.11.3"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 4
|
||||
"nbformat_minor": 1
|
||||
}
|
||||
|
||||
@@ -4,7 +4,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Bedrock"
|
||||
"# Amazon Bedrock"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user