mirror of
https://github.com/hwchase17/langchain.git
synced 2025-09-02 19:47:13 +00:00
community[minor]: add maritalk chat (#17675)
**Description:** Adds the MariTalk chat that is based on a LLM specially trained for Portuguese. **Twitter handle:** @MaritacaAI
This commit is contained in:
201
docs/docs/integrations/chat/maritalk.ipynb
Normal file
201
docs/docs/integrations/chat/maritalk.ipynb
Normal file
@@ -0,0 +1,201 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a href=\"https://colab.research.google.com/github/langchain-ai/langchain/docs/docs/integrations/chat/maritalk.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>\n",
|
||||
"\n",
|
||||
"# Maritalk\n",
|
||||
"\n",
|
||||
"## Introduction\n",
|
||||
"\n",
|
||||
"MariTalk is an assistant developed by the Brazilian company [Maritaca AI](www.maritaca.ai).\n",
|
||||
"MariTalk is based on language models that have been specially trained to understand Portuguese well.\n",
|
||||
"\n",
|
||||
"This notebook demonstrates how to use MariTalk with LangChain through two examples:\n",
|
||||
"\n",
|
||||
"1. A simple example of how to use MariTalk to perform a task.\n",
|
||||
"2. LLM + RAG: The second example shows how to answer a question whose answer is found in a long document that does not fit within the token limit of MariTalk. For this, we will use a simple searcher (BM25) to first search the document for the most relevant sections and then feed them to MariTalk for answering."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Installation\n",
|
||||
"First, install the LangChain library (and all its dependencies) using the following command:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"!pip install langchain-core langchain-community"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## API Key\n",
|
||||
"You will need an API key that can be obtained from chat.maritaca.ai (\"Chaves da API\" section)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"\n",
|
||||
"### Example 1 - Pet Name Suggestions\n",
|
||||
"\n",
|
||||
"Let's define our language model, ChatMaritalk, and configure it with your API key."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.prompts.chat import ChatPromptTemplate\n",
|
||||
"from langchain_community.chat_models import ChatMaritalk\n",
|
||||
"from langchain_core.output_parsers import StrOutputParser\n",
|
||||
"\n",
|
||||
"llm = ChatMaritalk(\n",
|
||||
" api_key=\"\", # Insert your API key here\n",
|
||||
" temperature=0.7,\n",
|
||||
" max_tokens=100,\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"output_parser = StrOutputParser()\n",
|
||||
"\n",
|
||||
"chat_prompt = ChatPromptTemplate.from_messages(\n",
|
||||
" [\n",
|
||||
" (\n",
|
||||
" \"system\",\n",
|
||||
" \"You are an assistant specialized in suggesting pet names. Given the animal, you must suggest 4 names.\",\n",
|
||||
" ),\n",
|
||||
" (\"human\", \"I have a {animal}\"),\n",
|
||||
" ]\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"chain = chat_prompt | llm | output_parser\n",
|
||||
"\n",
|
||||
"response = chain.invoke({\"animal\": \"dog\"})\n",
|
||||
"print(response) # should answer something like \"1. Max\\n2. Bella\\n3. Charlie\\n4. Rocky\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Example 2 - RAG + LLM: UNICAMP 2024 Entrance Exam Question Answering System\n",
|
||||
"For this example, we need to install some extra libraries:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"!pip install unstructured rank_bm25 pdf2image pdfminer-six pikepdf pypdf unstructured_inference fastapi kaleido uvicorn \"pillow<10.1.0\" pillow_heif -q"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### Loading the database\n",
|
||||
"\n",
|
||||
"The first step is to create a database with the information from the notice. For this, we will download the notice from the COMVEST website and segment the extracted text into 500-character windows."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.document_loaders import OnlinePDFLoader\n",
|
||||
"from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
|
||||
"\n",
|
||||
"# Loading the COMVEST 2024 notice\n",
|
||||
"loader = OnlinePDFLoader(\n",
|
||||
" \"https://www.comvest.unicamp.br/wp-content/uploads/2023/10/31-2023-Dispoe-sobre-o-Vestibular-Unicamp-2024_com-retificacao.pdf\"\n",
|
||||
")\n",
|
||||
"data = loader.load()\n",
|
||||
"\n",
|
||||
"text_splitter = RecursiveCharacterTextSplitter(\n",
|
||||
" chunk_size=500, chunk_overlap=100, separators=[\"\\n\", \" \", \"\"]\n",
|
||||
")\n",
|
||||
"texts = text_splitter.split_documents(data)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### Creating a Searcher\n",
|
||||
"Now that we have our database, we need a searcher. For this example, we will use a simple BM25 as a search system, but this could be replaced by any other searcher (such as search via embeddings)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.retrievers import BM25Retriever\n",
|
||||
"\n",
|
||||
"retriever = BM25Retriever.from_documents(texts)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### Combining Search System + LLM\n",
|
||||
"Now that we have our searcher, we just need to implement a prompt specifying the task and invoke the chain."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.chains.question_answering import load_qa_chain\n",
|
||||
"\n",
|
||||
"prompt = \"\"\"Baseado nos seguintes documentos, responda a pergunta abaixo.\n",
|
||||
"\n",
|
||||
"{context}\n",
|
||||
"\n",
|
||||
"Pergunta: {query}\n",
|
||||
"\"\"\"\n",
|
||||
"\n",
|
||||
"qa_prompt = ChatPromptTemplate.from_messages([(\"human\", prompt)])\n",
|
||||
"\n",
|
||||
"chain = load_qa_chain(llm, chain_type=\"stuff\", verbose=True, prompt=qa_prompt)\n",
|
||||
"\n",
|
||||
"query = \"Qual o tempo máximo para realização da prova?\"\n",
|
||||
"\n",
|
||||
"docs = retriever.get_relevant_documents(query)\n",
|
||||
"\n",
|
||||
"chain.invoke(\n",
|
||||
" {\"input_documents\": docs, \"query\": query}\n",
|
||||
") # Should output something like: \"O tempo máximo para realização da prova é de 5 horas.\""
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"language_info": {
|
||||
"name": "python"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
Reference in New Issue
Block a user