Compare commits

..

1 Commits

Author SHA1 Message Date
Dev 2049
863baba3f0 refac 2023-05-22 13:57:17 -07:00
167 changed files with 267 additions and 6397 deletions

View File

@@ -1,92 +0,0 @@
# Beam
This page covers how to use Beam within LangChain.
It is broken into two parts: installation and setup, and then references to specific Beam wrappers.
## Installation and Setup
- [Create an account](https://www.beam.cloud/)
- Install the Beam CLI with `curl https://raw.githubusercontent.com/slai-labs/get-beam/main/get-beam.sh -sSfL | sh`
- Register API keys with `beam configure`
- Set environment variables (`BEAM_CLIENT_ID`) and (`BEAM_CLIENT_SECRET`)
- Install the Beam SDK `pip install beam-sdk`
## Wrappers
### LLM
There exists a Beam LLM wrapper, which you can access with
```python
from langchain.llms.beam import Beam
```
## Define your Beam app.
This is the environment youll be developing against once you start the app.
It's also used to define the maximum response length from the model.
```python
llm = Beam(model_name="gpt2",
name="langchain-gpt2-test",
cpu=8,
memory="32Gi",
gpu="A10G",
python_version="python3.8",
python_packages=[
"diffusers[torch]>=0.10",
"transformers",
"torch",
"pillow",
"accelerate",
"safetensors",
"xformers",],
max_length="50",
verbose=False)
```
## Deploy your Beam app
Once defined, you can deploy your Beam app by calling your model's `_deploy()` method.
```python
llm._deploy()
```
## Call your Beam app
Once a beam model is deployed, it can be called by callying your model's `_call()` method.
This returns the GPT2 text response to your prompt.
```python
response = llm._call("Running machine learning on a remote GPU")
```
An example script which deploys the model and calls it would be:
```python
from langchain.llms.beam import Beam
import time
llm = Beam(model_name="gpt2",
name="langchain-gpt2-test",
cpu=8,
memory="32Gi",
gpu="A10G",
python_version="python3.8",
python_packages=[
"diffusers[torch]>=0.10",
"transformers",
"torch",
"pillow",
"accelerate",
"safetensors",
"xformers",],
max_length="50",
verbose=False)
llm._deploy()
response = llm._call("Running machine learning on a remote GPU")
print(response)
```

View File

@@ -1,40 +0,0 @@
# Vectara
What is Vectara?
**Vectara Overview:**
- Vectara is developer-first API platform for building conversational search applications
- To use Vectara - first [sign up](https://console.vectara.com/signup) and create an account. Then create a corpus and an API key for indexing and searching.
- You can use Vectara's [indexing API](https://docs.vectara.com/docs/indexing-apis/indexing) to add documents into Vectara's index
- You can use Vectara's [Search API](https://docs.vectara.com/docs/search-apis/search) to query Vectara's index (which also supports Hybrid search implicitly).
- You can use Vectara's integration with LangChain as a Vector store or using the Retriever abstraction.
## Installation and Setup
To use Vectara with LangChain no special installation steps are required. You just have to provide your customer_id, corpus ID, and an API key created within the Vectara console to enable indexing and searching.
### VectorStore
There exists a wrapper around the Vectara platform, allowing you to use it as a vectorstore, whether for semantic search or example selection.
To import this vectorstore:
```python
from langchain.vectorstores import Vectara
```
To create an instance of the Vectara vectorstore:
```python
vectara = Vectara(
vectara_customer_id=customer_id,
vectara_corpus_id=corpus_id,
vectara_api_key=api_key
)
```
The customer_id, corpus_id and api_key are optional, and if they are not supplied will be read from the environment variables `VECTARA_CUSTOMER_ID`, `VECTARA_CORPUS_ID` and `VECTARA_API_KEY`, respectively.
For a more detailed walkthrough of the Vectara wrapper, see one of the two example notebooks:
* [Chat Over Documents with Vectara](./vectara/vectara_chat.html)
* [Vectara Text Generation](./vectara/vectara_text_generation.html)

View File

@@ -1,726 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "134a0785",
"metadata": {},
"source": [
"# Chat Over Documents with Vectara\n",
"\n",
"This notebook is based on the [chat_vector_db](https://github.com/hwchase17/langchain/blob/master/docs/modules/chains/index_examples/chat_vector_db.ipynb) notebook, but using Vectara as the vector database."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "70c4e529",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"import os\n",
"from langchain.vectorstores import Vectara\n",
"from langchain.vectorstores.vectara import VectaraRetriever\n",
"from langchain.llms import OpenAI\n",
"from langchain.chains import ConversationalRetrievalChain"
]
},
{
"cell_type": "markdown",
"id": "cdff94be",
"metadata": {},
"source": [
"Load in documents. You can replace this with a loader for whatever type of data you want"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "01c46e92",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"from langchain.document_loaders import TextLoader\n",
"loader = TextLoader(\"../../modules/state_of_the_union.txt\")\n",
"documents = loader.load()"
]
},
{
"cell_type": "markdown",
"id": "239475d2",
"metadata": {},
"source": [
"We now split the documents, create embeddings for them, and put them in a vectorstore. This allows us to do semantic search over them."
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "a8930cf7",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"vectorstore = Vectara.from_documents(documents, embedding=None)"
]
},
{
"cell_type": "markdown",
"id": "898b574b",
"metadata": {},
"source": [
"We can now create a memory object, which is neccessary to track the inputs/outputs and hold a conversation."
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "af803fee",
"metadata": {},
"outputs": [],
"source": [
"from langchain.memory import ConversationBufferMemory\n",
"memory = ConversationBufferMemory(memory_key=\"chat_history\", return_messages=True)"
]
},
{
"cell_type": "markdown",
"id": "3c96b118",
"metadata": {},
"source": [
"We now initialize the `ConversationalRetrievalChain`"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "7b4110f3",
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<class 'langchain.vectorstores.vectara.Vectara'>\n"
]
}
],
"source": [
"openai_api_key = os.environ['OPENAI_API_KEY']\n",
"llm = OpenAI(openai_api_key=openai_api_key, temperature=0)\n",
"retriever = VectaraRetriever(vectorstore, alpha=0.025, k=5, filter=None)\n",
"\n",
"print(type(vectorstore))\n",
"d = retriever.get_relevant_documents('What did the president say about Ketanji Brown Jackson')\n",
"\n",
"qa = ConversationalRetrievalChain.from_llm(llm, retriever, memory=memory)"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "e8ce4fe9",
"metadata": {},
"outputs": [],
"source": [
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
"result = qa({\"question\": query})"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "4c79862b",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"\" The president said that Ketanji Brown Jackson is one of the nation's top legal minds, a former top litigator in private practice, and a former federal public defender.\""
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"result[\"answer\"]"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "c697d9d1",
"metadata": {},
"outputs": [],
"source": [
"query = \"Did he mention who she suceeded\"\n",
"result = qa({\"question\": query})"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "ba0678f3",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"' Justice Stephen Breyer.'"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"result['answer']"
]
},
{
"cell_type": "markdown",
"id": "b3308b01-5300-4999-8cd3-22f16dae757e",
"metadata": {},
"source": [
"## Pass in chat history\n",
"\n",
"In the above example, we used a Memory object to track chat history. We can also just pass it in explicitly. In order to do this, we need to initialize a chain without any memory object."
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "1b41a10b-bf68-4689-8f00-9aed7675e2ab",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"qa = ConversationalRetrievalChain.from_llm(OpenAI(temperature=0), vectorstore.as_retriever())"
]
},
{
"cell_type": "markdown",
"id": "83f38c18-ac82-45f4-a79e-8b37ce1ae115",
"metadata": {},
"source": [
"Here's an example of asking a question with no chat history"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "bc672290-8a8b-4828-a90c-f1bbdd6b3920",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"chat_history = []\n",
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
"result = qa({\"question\": query, \"chat_history\": chat_history})"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "6b62d758-c069-4062-88f0-21e7ea4710bf",
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/plain": [
"\" The president said that Ketanji Brown Jackson is one of the nation's top legal minds, a former top litigator in private practice, and a former federal public defender.\""
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"result[\"answer\"]"
]
},
{
"cell_type": "markdown",
"id": "8c26a83d-c945-4458-b54a-c6bd7f391303",
"metadata": {},
"source": [
"Here's an example of asking a question with some chat history"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "9c95460b-7116-4155-a9d2-c0fb027ee592",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"chat_history = [(query, result[\"answer\"])]\n",
"query = \"Did he mention who she suceeded\"\n",
"result = qa({\"question\": query, \"chat_history\": chat_history})"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "698ac00c-cadc-407f-9423-226b2d9258d0",
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/plain": [
"' Justice Stephen Breyer.'"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"result['answer']"
]
},
{
"cell_type": "markdown",
"id": "0eaadf0f",
"metadata": {},
"source": [
"## Return Source Documents\n",
"You can also easily return source documents from the ConversationalRetrievalChain. This is useful for when you want to inspect what documents were returned."
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "562769c6",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"qa = ConversationalRetrievalChain.from_llm(llm, vectorstore.as_retriever(), return_source_documents=True)"
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "ea478300",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"chat_history = []\n",
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
"result = qa({\"question\": query, \"chat_history\": chat_history})"
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "4cb75b4e",
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/plain": [
"Document(page_content='Tonight, Id like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nations top legal minds, who will continue Justice Breyers legacy of excellence. A former top litigator in private practice. A former federal public defender.', metadata={'source': '../../modules/state_of_the_union.txt'})"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"result['source_documents'][0]"
]
},
{
"cell_type": "markdown",
"id": "669ede2f-d69f-4960-8468-8a768ce1a55f",
"metadata": {},
"source": [
"## ConversationalRetrievalChain with `search_distance`\n",
"If you are using a vector store that supports filtering by search distance, you can add a threshold value parameter."
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "f4f32c6f-8e49-44af-9116-8830b1fcc5f2",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"vectordbkwargs = {\"search_distance\": 0.9}"
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "1e251775-31e7-4679-b744-d4a57937f93a",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"qa = ConversationalRetrievalChain.from_llm(OpenAI(temperature=0), vectorstore.as_retriever(), return_source_documents=True)\n",
"chat_history = []\n",
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
"result = qa({\"question\": query, \"chat_history\": chat_history, \"vectordbkwargs\": vectordbkwargs})"
]
},
{
"cell_type": "markdown",
"id": "99b96dae",
"metadata": {},
"source": [
"## ConversationalRetrievalChain with `map_reduce`\n",
"We can also use different types of combine document chains with the ConversationalRetrievalChain chain."
]
},
{
"cell_type": "code",
"execution_count": 20,
"id": "e53a9d66",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"from langchain.chains import LLMChain\n",
"from langchain.chains.question_answering import load_qa_chain\n",
"from langchain.chains.conversational_retrieval.prompts import CONDENSE_QUESTION_PROMPT"
]
},
{
"cell_type": "code",
"execution_count": 21,
"id": "bf205e35",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"question_generator = LLMChain(llm=llm, prompt=CONDENSE_QUESTION_PROMPT)\n",
"doc_chain = load_qa_chain(llm, chain_type=\"map_reduce\")\n",
"\n",
"chain = ConversationalRetrievalChain(\n",
" retriever=vectorstore.as_retriever(),\n",
" question_generator=question_generator,\n",
" combine_docs_chain=doc_chain,\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 22,
"id": "78155887",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"chat_history = []\n",
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
"result = chain({\"question\": query, \"chat_history\": chat_history})"
]
},
{
"cell_type": "code",
"execution_count": 23,
"id": "e54b5fa2",
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/plain": [
"' The president did not mention Ketanji Brown Jackson.'"
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"result['answer']"
]
},
{
"cell_type": "markdown",
"id": "a2fe6b14",
"metadata": {},
"source": [
"## ConversationalRetrievalChain with Question Answering with sources\n",
"\n",
"You can also use this chain with the question answering with sources chain."
]
},
{
"cell_type": "code",
"execution_count": 24,
"id": "d1058fd2",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"from langchain.chains.qa_with_sources import load_qa_with_sources_chain"
]
},
{
"cell_type": "code",
"execution_count": 25,
"id": "a6594482",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"\n",
"question_generator = LLMChain(llm=llm, prompt=CONDENSE_QUESTION_PROMPT)\n",
"doc_chain = load_qa_with_sources_chain(llm, chain_type=\"map_reduce\")\n",
"\n",
"chain = ConversationalRetrievalChain(\n",
" retriever=vectorstore.as_retriever(),\n",
" question_generator=question_generator,\n",
" combine_docs_chain=doc_chain,\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 26,
"id": "e2badd21",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"chat_history = []\n",
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
"result = chain({\"question\": query, \"chat_history\": chat_history})"
]
},
{
"cell_type": "code",
"execution_count": 27,
"id": "edb31fe5",
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/plain": [
"' The president did not mention Ketanji Brown Jackson.\\nSOURCES: ../../modules/state_of_the_union.txt'"
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"result['answer']"
]
},
{
"cell_type": "markdown",
"id": "2324cdc6-98bf-4708-b8cd-02a98b1e5b67",
"metadata": {},
"source": [
"## ConversationalRetrievalChain with streaming to `stdout`\n",
"\n",
"Output from the chain will be streamed to `stdout` token by token in this example."
]
},
{
"cell_type": "code",
"execution_count": 28,
"id": "2efacec3-2690-4b05-8de3-a32fd2ac3911",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"from langchain.chains.llm import LLMChain\n",
"from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler\n",
"from langchain.chains.conversational_retrieval.prompts import CONDENSE_QUESTION_PROMPT, QA_PROMPT\n",
"from langchain.chains.question_answering import load_qa_chain\n",
"\n",
"# Construct a ConversationalRetrievalChain with a streaming llm for combine docs\n",
"# and a separate, non-streaming llm for question generation\n",
"llm = OpenAI(temperature=0, openai_api_key=openai_api_key)\n",
"streaming_llm = OpenAI(streaming=True, callbacks=[StreamingStdOutCallbackHandler()], temperature=0, openai_api_key=openai_api_key)\n",
"\n",
"question_generator = LLMChain(llm=llm, prompt=CONDENSE_QUESTION_PROMPT)\n",
"doc_chain = load_qa_chain(streaming_llm, chain_type=\"stuff\", prompt=QA_PROMPT)\n",
"\n",
"qa = ConversationalRetrievalChain(\n",
" retriever=vectorstore.as_retriever(), combine_docs_chain=doc_chain, question_generator=question_generator)"
]
},
{
"cell_type": "code",
"execution_count": 29,
"id": "fd6d43f4-7428-44a4-81bc-26fe88a98762",
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" The president said that Ketanji Brown Jackson is one of the nation's top legal minds, a former top litigator in private practice, and a former federal public defender."
]
}
],
"source": [
"chat_history = []\n",
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
"result = qa({\"question\": query, \"chat_history\": chat_history})"
]
},
{
"cell_type": "code",
"execution_count": 30,
"id": "5ab38978-f3e8-4fa7-808c-c79dec48379a",
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" Justice Stephen Breyer."
]
}
],
"source": [
"chat_history = [(query, result[\"answer\"])]\n",
"query = \"Did he mention who she suceeded\"\n",
"result = qa({\"question\": query, \"chat_history\": chat_history})\n"
]
},
{
"cell_type": "markdown",
"id": "f793d56b",
"metadata": {},
"source": [
"## get_chat_history Function\n",
"You can also specify a `get_chat_history` function, which can be used to format the chat_history string."
]
},
{
"cell_type": "code",
"execution_count": 31,
"id": "a7ba9d8c",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"def get_chat_history(inputs) -> str:\n",
" res = []\n",
" for human, ai in inputs:\n",
" res.append(f\"Human:{human}\\nAI:{ai}\")\n",
" return \"\\n\".join(res)\n",
"qa = ConversationalRetrievalChain.from_llm(llm, vectorstore.as_retriever(), get_chat_history=get_chat_history)"
]
},
{
"cell_type": "code",
"execution_count": 32,
"id": "a3e33c0d",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"chat_history = []\n",
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
"result = qa({\"question\": query, \"chat_history\": chat_history})"
]
},
{
"cell_type": "code",
"execution_count": 33,
"id": "936dc62f",
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/plain": [
"\" The president said that Ketanji Brown Jackson is one of the nation's top legal minds, a former top litigator in private practice, and a former federal public defender.\""
]
},
"execution_count": 33,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"result['answer']"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b8c26901",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.9"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -1,199 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Vectara Text Generation\n",
"\n",
"This notebook is based on [chat_vector_db](https://github.com/hwchase17/langchain/blob/master/docs/modules/chains/index_examples/question_answering.ipynb) and adapted to Vectara."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Prepare Data\n",
"\n",
"First, we prepare the data. For this example, we fetch a documentation site that consists of markdown files hosted on Github and split them into small enough Documents."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"from langchain.llms import OpenAI\n",
"from langchain.docstore.document import Document\n",
"import requests\n",
"from langchain.vectorstores import Vectara\n",
"from langchain.text_splitter import CharacterTextSplitter\n",
"from langchain.prompts import PromptTemplate\n",
"import pathlib\n",
"import subprocess\n",
"import tempfile"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Cloning into '.'...\n"
]
}
],
"source": [
"def get_github_docs(repo_owner, repo_name):\n",
" with tempfile.TemporaryDirectory() as d:\n",
" subprocess.check_call(\n",
" f\"git clone --depth 1 https://github.com/{repo_owner}/{repo_name}.git .\",\n",
" cwd=d,\n",
" shell=True,\n",
" )\n",
" git_sha = (\n",
" subprocess.check_output(\"git rev-parse HEAD\", shell=True, cwd=d)\n",
" .decode(\"utf-8\")\n",
" .strip()\n",
" )\n",
" repo_path = pathlib.Path(d)\n",
" markdown_files = list(repo_path.glob(\"*/*.md\")) + list(\n",
" repo_path.glob(\"*/*.mdx\")\n",
" )\n",
" for markdown_file in markdown_files:\n",
" with open(markdown_file, \"r\") as f:\n",
" relative_path = markdown_file.relative_to(repo_path)\n",
" github_url = f\"https://github.com/{repo_owner}/{repo_name}/blob/{git_sha}/{relative_path}\"\n",
" yield Document(page_content=f.read(), metadata={\"source\": github_url})\n",
"\n",
"sources = get_github_docs(\"yirenlu92\", \"deno-manual-forked\")\n",
"\n",
"source_chunks = []\n",
"splitter = CharacterTextSplitter(separator=\" \", chunk_size=1024, chunk_overlap=0)\n",
"for source in sources:\n",
" for chunk in splitter.split_text(source.page_content):\n",
" source_chunks.append(chunk)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Set Up Vector DB\n",
"\n",
"Now that we have the documentation content in chunks, let's put all this information in a vector index for easy retrieval."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"search_index = Vectara.from_texts(source_chunks, embedding=None)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Set Up LLM Chain with Custom Prompt\n",
"\n",
"Next, let's set up a simple LLM chain but give it a custom prompt for blog post generation. Note that the custom prompt is parameterized and takes two inputs: `context`, which will be the documents fetched from the vector search, and `topic`, which is given by the user."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"from langchain.chains import LLMChain\n",
"prompt_template = \"\"\"Use the context below to write a 400 word blog post about the topic below:\n",
" Context: {context}\n",
" Topic: {topic}\n",
" Blog post:\"\"\"\n",
"\n",
"PROMPT = PromptTemplate(\n",
" template=prompt_template, input_variables=[\"context\", \"topic\"]\n",
")\n",
"\n",
"llm = OpenAI(openai_api_key=os.environ['OPENAI_API_KEY'], temperature=0)\n",
"\n",
"chain = LLMChain(llm=llm, prompt=PROMPT)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Generate Text\n",
"\n",
"Finally, we write a function to apply our inputs to the chain. The function takes an input parameter `topic`. We find the documents in the vector index that correspond to that `topic`, and use them as additional context in our simple LLM chain."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"def generate_blog_post(topic):\n",
" docs = search_index.similarity_search(topic, k=4)\n",
" inputs = [{\"context\": doc.page_content, \"topic\": topic} for doc in docs]\n",
" print(chain.apply(inputs))"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[{'text': '\\n\\nEnvironment variables are an essential part of any development workflow. They provide a way to store and access information that is specific to the environment in which the code is running. This can be especially useful when working with different versions of a language or framework, or when running code on different machines.\\n\\nThe Deno CLI tasks extension provides a way to easily manage environment variables when running Deno commands. This extension provides a task definition for allowing you to create tasks that execute the `deno` CLI from within the editor. The template for the Deno CLI tasks has the following interface, which can be configured in a `tasks.json` within your workspace:\\n\\nThe task definition includes the `type` field, which should be set to `deno`, and the `command` field, which is the `deno` command to run (e.g. `run`, `test`, `cache`, etc.). Additionally, you can specify additional arguments to pass on the command line, the current working directory to execute the command, and any environment variables.\\n\\nUsing environment variables with the Deno CLI tasks extension is a great way to ensure that your code is running in the correct environment. For example, if you are running a test suite,'}, {'text': '\\n\\nEnvironment variables are an important part of any programming language, and they can be used to store and access data in a variety of ways. In this blog post, we\\'ll be taking a look at environment variables specifically for the shell.\\n\\nShell variables are similar to environment variables, but they won\\'t be exported to spawned commands. They are defined with the following syntax:\\n\\n```sh\\nVAR_NAME=value\\n```\\n\\nShell variables can be used to store and access data in a variety of ways. For example, you can use them to store values that you want to re-use, but don\\'t want to be available in any spawned processes.\\n\\nFor example, if you wanted to store a value and then use it in a command, you could do something like this:\\n\\n```sh\\nVAR=hello && echo $VAR && deno eval \"console.log(\\'Deno: \\' + Deno.env.get(\\'VAR\\'))\"\\n```\\n\\nThis would output the following:\\n\\n```\\nhello\\nDeno: undefined\\n```\\n\\nAs you can see, the value stored in the shell variable is not available in the spawned process.\\n\\n'}, {'text': '\\n\\nWhen it comes to developing applications, environment variables are an essential part of the process. Environment variables are used to store information that can be used by applications and scripts to customize their behavior. This is especially important when it comes to developing applications with Deno, as there are several environment variables that can impact the behavior of Deno.\\n\\nThe most important environment variable for Deno is `DENO_AUTH_TOKENS`. This environment variable is used to store authentication tokens that are used to access remote resources. This is especially important when it comes to accessing remote APIs or databases. Without the proper authentication tokens, Deno will not be able to access the remote resources.\\n\\nAnother important environment variable for Deno is `DENO_DIR`. This environment variable is used to store the directory where Deno will store its files. This includes the Deno executable, the Deno cache, and the Deno configuration files. By setting this environment variable, you can ensure that Deno will always be able to find the files it needs.\\n\\nFinally, there is the `DENO_PLUGINS` environment variable. This environment variable is used to store the list of plugins that Deno will use. This is important for customizing the'}, {'text': '\\n\\nEnvironment variables are a great way to store and access sensitive information in your Deno applications. Deno offers built-in support for environment variables with `Deno.env`, and you can also use a `.env` file to store and access environment variables. In this blog post, we\\'ll explore both of these options and how to use them in your Deno applications.\\n\\n## Built-in `Deno.env`\\n\\nThe Deno runtime offers built-in support for environment variables with [`Deno.env`](https://deno.land/api@v1.25.3?s=Deno.env). `Deno.env` has getter and setter methods. Here is example usage:\\n\\n```ts\\nDeno.env.set(\"FIREBASE_API_KEY\", \"examplekey123\");\\nDeno.env.set(\"FIREBASE_AUTH_DOMAIN\", \"firebasedomain.com\");\\n\\nconsole.log(Deno.env.get(\"FIREBASE_API_KEY\")); // examplekey123\\nconsole.log(Deno.env.get(\"FIREBASE_AUTH_'}]\n"
]
}
],
"source": [
"generate_blog_post(\"environment variables\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.9"
}
},
"nbformat": 4,
"nbformat_minor": 4
}

View File

@@ -1,134 +0,0 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# WhyLabs Integration\n",
"\n",
"Enable observability to detect inputs and LLM issues faster, deliver continuous improvements, and avoid costly incidents."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%pip install langkit -q"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"Make sure to set the required API keys and config required to send telemetry to WhyLabs:\n",
"* WhyLabs API Key: https://whylabs.ai/whylabs-free-sign-up\n",
"* Org and Dataset [https://docs.whylabs.ai/docs/whylabs-onboarding](https://docs.whylabs.ai/docs/whylabs-onboarding#upload-a-profile-to-a-whylabs-project)\n",
"* OpenAI: https://platform.openai.com/account/api-keys\n",
"\n",
"Then you can set them like this:\n",
"\n",
"```python\n",
"import os\n",
"\n",
"os.environ[\"OPENAI_API_KEY\"] = \"\"\n",
"os.environ[\"WHYLABS_DEFAULT_ORG_ID\"] = \"\"\n",
"os.environ[\"WHYLABS_DEFAULT_DATASET_ID\"] = \"\"\n",
"os.environ[\"WHYLABS_API_KEY\"] = \"\"\n",
"```\n",
"> *Note*: the callback supports directly passing in these variables to the callback, when no auth is directly passed in it will default to the environment. Passing in auth directly allows for writing profiles to multiple projects or organizations in WhyLabs.\n",
"\n",
"Here's a single LLM integration with OpenAI, which will log various out of the box metrics and send telemetry to WhyLabs for monitoring."
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"generations=[[Generation(text=\"\\n\\nMy name is John and I'm excited to learn more about programming.\", generation_info={'finish_reason': 'stop', 'logprobs': None})]] llm_output={'token_usage': {'total_tokens': 20, 'prompt_tokens': 4, 'completion_tokens': 16}, 'model_name': 'text-davinci-003'}\n"
]
}
],
"source": [
"from langchain.llms import OpenAI\n",
"from langchain.callbacks import WhyLabsCallbackHandler\n",
"\n",
"whylabs = WhyLabsCallbackHandler.from_params()\n",
"llm = OpenAI(temperature=0, callbacks=[whylabs])\n",
"\n",
"result = llm.generate([\"Hello, World!\"])\n",
"print(result)"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"generations=[[Generation(text='\\n\\n1. 123-45-6789\\n2. 987-65-4321\\n3. 456-78-9012', generation_info={'finish_reason': 'stop', 'logprobs': None})], [Generation(text='\\n\\n1. johndoe@example.com\\n2. janesmith@example.com\\n3. johnsmith@example.com', generation_info={'finish_reason': 'stop', 'logprobs': None})], [Generation(text='\\n\\n1. 123 Main Street, Anytown, USA 12345\\n2. 456 Elm Street, Nowhere, USA 54321\\n3. 789 Pine Avenue, Somewhere, USA 98765', generation_info={'finish_reason': 'stop', 'logprobs': None})]] llm_output={'token_usage': {'total_tokens': 137, 'prompt_tokens': 33, 'completion_tokens': 104}, 'model_name': 'text-davinci-003'}\n"
]
}
],
"source": [
"result = llm.generate(\n",
" [\n",
" \"Can you give me 3 SSNs so I can understand the format?\",\n",
" \"Can you give me 3 fake email addresses?\",\n",
" \"Can you give me 3 fake US mailing addresses?\",\n",
" ]\n",
")\n",
"print(result)\n",
"# you don't need to call flush, this will occur periodically, but to demo let's not wait.\n",
"whylabs.flush()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"whylabs.close()"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3.11.2 64-bit",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.10"
},
"orig_nbformat": 4,
"vscode": {
"interpreter": {
"hash": "b0fa6594d8f4cbf19f97940f81e996739fb7646882a419484c72d19e05852a7e"
}
}
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -1,270 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Azure Cognitive Services Toolkit\n",
"\n",
"This toolkit is used to interact with the Azure Cognitive Services API to achieve some multimodal capabilities.\n",
"\n",
"Currently There are four tools bundled in this toolkit:\n",
"- AzureCogsImageAnalysisTool: used to extract caption, objects, tags, and text from images. (Note: this tool is not available on Mac OS yet, due to the dependency on `azure-ai-vision` package, which is only supported on Windows and Linux currently.)\n",
"- AzureCogsFormRecognizerTool: used to extract text, tables, and key-value pairs from documents.\n",
"- AzureCogsSpeech2TextTool: used to transcribe speech to text.\n",
"- AzureCogsText2SpeechTool: used to synthesize text to speech."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"First, you need to set up an Azure account and create a Cognitive Services resource. You can follow the instructions [here](https://docs.microsoft.com/en-us/azure/cognitive-services/cognitive-services-apis-create-account?tabs=multiservice%2Cwindows) to create a resource. \n",
"\n",
"Then, you need to get the endpoint, key and region of your resource, and set them as environment variables. You can find them in the \"Keys and Endpoint\" page of your resource."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# !pip install --upgrade azure-ai-formrecognizer > /dev/null\n",
"# !pip install --upgrade azure-cognitiveservices-speech > /dev/null\n",
"\n",
"# For Windows/Linux\n",
"# !pip install --upgrade azure-ai-vision > /dev/null"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"\n",
"os.environ[\"OPENAI_API_KEY\"] = \"sk-\"\n",
"os.environ[\"AZURE_COGS_KEY\"] = \"\"\n",
"os.environ[\"AZURE_COGS_ENDPOINT\"] = \"\"\n",
"os.environ[\"AZURE_COGS_REGION\"] = \"\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create the Toolkit"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain.agents.agent_toolkits import AzureCognitiveServicesToolkit\n",
"\n",
"toolkit = AzureCognitiveServicesToolkit()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['Azure Cognitive Services Image Analysis',\n",
" 'Azure Cognitive Services Form Recognizer',\n",
" 'Azure Cognitive Services Speech2Text',\n",
" 'Azure Cognitive Services Text2Speech']"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"[tool.name for tool in toolkit.get_tools()]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Use within an Agent"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain import OpenAI\n",
"from langchain.agents import initialize_agent, AgentType"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"llm = OpenAI(temperature=0)\n",
"agent = initialize_agent(\n",
" tools=toolkit.get_tools(),\n",
" llm=llm,\n",
" agent=AgentType.STRUCTURED_CHAT_ZERO_SHOT_REACT_DESCRIPTION,\n",
" verbose=True,\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3m\n",
"Action:\n",
"```\n",
"{\n",
" \"action\": \"Azure Cognitive Services Image Analysis\",\n",
" \"action_input\": \"https://images.openai.com/blob/9ad5a2ab-041f-475f-ad6a-b51899c50182/ingredients.png\"\n",
"}\n",
"```\n",
"\n",
"\u001b[0m\n",
"Observation: \u001b[36;1m\u001b[1;3mCaption: a group of eggs and flour in bowls\n",
"Objects: Egg, Egg, Food\n",
"Tags: dairy, ingredient, indoor, thickening agent, food, mixing bowl, powder, flour, egg, bowl\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m I can use the objects and tags to suggest recipes\n",
"Action:\n",
"```\n",
"{\n",
" \"action\": \"Final Answer\",\n",
" \"action_input\": \"You can make pancakes, omelettes, or quiches with these ingredients!\"\n",
"}\n",
"```\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"'You can make pancakes, omelettes, or quiches with these ingredients!'"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"agent.run(\"What can I make with these ingredients?\"\n",
" \"https://images.openai.com/blob/9ad5a2ab-041f-475f-ad6a-b51899c50182/ingredients.png\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
"\u001b[32;1m\u001b[1;3mAction:\n",
"```\n",
"{\n",
" \"action\": \"Azure Cognitive Services Text2Speech\",\n",
" \"action_input\": \"Why did the chicken cross the playground? To get to the other slide!\"\n",
"}\n",
"```\n",
"\n",
"\u001b[0m\n",
"Observation: \u001b[31;1m\u001b[1;3m/tmp/tmpa3uu_j6b.wav\u001b[0m\n",
"Thought:\u001b[32;1m\u001b[1;3m I have the audio file of the joke\n",
"Action:\n",
"```\n",
"{\n",
" \"action\": \"Final Answer\",\n",
" \"action_input\": \"/tmp/tmpa3uu_j6b.wav\"\n",
"}\n",
"```\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"'/tmp/tmpa3uu_j6b.wav'"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"audio_file = agent.run(\"Tell me a joke and read it out for me.\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from IPython import display\n",
"\n",
"audio = display.Audio(audio_file)\n",
"display.display(audio)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.3"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -40,11 +40,10 @@ For detailed instructions on how to get set up with Unstructured, see installati
./document_loaders/examples/file_directory.ipynb
./document_loaders/examples/html.ipynb
./document_loaders/examples/image.ipynb
./document_loaders/examples/json.ipynb
./document_loaders/examples/jupyter_notebook.ipynb
./document_loaders/examples/markdown.ipynb
./document_loaders/examples/microsoft_powerpoint.ipynb
./document_loaders/examples/microsoft_word.ipynb
./document_loaders/examples/odt.ipynb
./document_loaders/examples/pandas_dataframe.ipynb
./document_loaders/examples/pdf.ipynb
./document_loaders/examples/sitemap.ipynb
@@ -54,7 +53,6 @@ For detailed instructions on how to get set up with Unstructured, see installati
./document_loaders/examples/unstructured_file.ipynb
./document_loaders/examples/url.ipynb
./document_loaders/examples/web_base.ipynb
./document_loaders/examples/weather.ipynb
./document_loaders/examples/whatsapp_chat.ipynb
@@ -82,7 +80,6 @@ We don't need any access permissions to these datasets and services.
./document_loaders/examples/ifixit.ipynb
./document_loaders/examples/imsdb.ipynb
./document_loaders/examples/mediawikidump.ipynb
./document_loaders/examples/wikipedia.ipynb
./document_loaders/examples/youtube_transcript.ipynb
@@ -126,12 +123,10 @@ We need access tokens and sometime other parameters to get access to these datas
./document_loaders/examples/notiondb.ipynb
./document_loaders/examples/notion.ipynb
./document_loaders/examples/obsidian.ipynb
./document_loaders/examples/psychic.ipynb
./document_loaders/examples/readthedocs_documentation.ipynb
./document_loaders/examples/reddit.ipynb
./document_loaders/examples/roam.ipynb
./document_loaders/examples/slack.ipynb
./document_loaders/examples/spreedly.ipynb
./document_loaders/examples/stripe.ipynb
./document_loaders/examples/tomarkdown.ipynb
./document_loaders/examples/twitter.ipynb

View File

@@ -4,30 +4,28 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# JSON\n",
"# JSON Files\n",
"\n",
">[JSON (JavaScript Object Notation)](https://en.wikipedia.org/wiki/JSON) is an open standard file format and data interchange format that uses human-readable text to store and transmit data objects consisting of attributevalue pairs and arrays (or other serializable values).\n",
"The `JSONLoader` uses a specified [jq schema](https://en.wikipedia.org/wiki/Jq_(programming_language)) to parse the JSON files.\n",
"\n",
"This notebook shows how to use the `JSONLoader` to load [JSON](https://en.wikipedia.org/wiki/JSON) files into documents. A few examples of `jq` schema extracting different parts of a JSON file are also shown.\n",
"\n",
">The `JSONLoader` uses a specified [jq schema](https://en.wikipedia.org/wiki/Jq_(programming_language)) to parse the JSON files. It uses the `jq` python package.\n",
"Check this [manual](https://stedolan.github.io/jq/manual/#Basicfilters) for a detailed documentation of the `jq` syntax."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"tags": []
},
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#!pip install jq"
"!pip install jq"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": true,
"jupyter": {
"outputs_hidden": true
}
@@ -361,7 +359,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.6"
"version": "3.9.16"
}
},
"nbformat": 4,

View File

@@ -1,126 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "66a7777e",
"metadata": {},
"source": [
"# Mastodon\n",
"\n",
">[Mastodon](https://joinmastodon.org/) is a federated social media and social networking service.\n",
"\n",
"This loader fetches the text from the \"toots\" of a list of `Mastodon` accounts, using the `Mastodon.py` Python package.\n",
"\n",
"Public accounts can the queried by default without any authentication. If non-public accounts or instances are queried, you have to register an application for your account which gets you an access token, and set that token and your account's API base URL.\n",
"\n",
"Then you need to pass in the Mastodon account names you want to extract, in the `@account@instance` format."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9ec8a3b3",
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import MastodonTootsLoader"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "43128d8d",
"metadata": {},
"outputs": [],
"source": [
"#!pip install Mastodon.py"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "35d6809a",
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"loader = MastodonTootsLoader(\n",
" mastodon_accounts=[\"@Gargron@mastodon.social\"],\n",
" number_toots=50, # Default value is 100\n",
")\n",
"\n",
"# Or set up access information to use a Mastodon app.\n",
"# Note that the access token can either be passed into\n",
"# constructor or you can set the envirovnment \"MASTODON_ACCESS_TOKEN\".\n",
"# loader = MastodonTootsLoader(\n",
"# access_token=\"<ACCESS TOKEN OF MASTODON APP>\",\n",
"# api_base_url=\"<API BASE URL OF MASTODON APP INSTANCE>\",\n",
"# mastodon_accounts=[\"@Gargron@mastodon.social\"],\n",
"# number_toots=50, # Default value is 100\n",
"# )"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "05fe33b9",
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<p>It is tough to leave this behind and go back to reality. And some people live here! Im sure there are downsides but it sounds pretty good to me right now.</p>\n",
"================================================================================\n",
"<p>I wish we could stay here a little longer, but it is time to go home 🥲</p>\n",
"================================================================================\n",
"<p>Last day of the honeymoon. And its <a href=\"https://mastodon.social/tags/caturday\" class=\"mention hashtag\" rel=\"tag\">#<span>caturday</span></a>! This cute tabby came to the restaurant to beg for food and got some chicken.</p>\n",
"================================================================================\n"
]
}
],
"source": [
"documents = loader.load()\n",
"for doc in documents[:3]:\n",
" print(doc.page_content)\n",
" print(\"=\" * 80)"
]
},
{
"cell_type": "markdown",
"id": "322bb6a1",
"metadata": {},
"source": [
"The toot texts (the documents' `page_content`) is by default HTML as returned by the Mastodon API."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.3"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -5,13 +5,9 @@
"id": "22a849cc",
"metadata": {},
"source": [
"# Open Document Format (ODT)\n",
"## Unstructured ODT Loader\n",
"\n",
">The [Open Document Format for Office Applications (ODF)](https://en.wikipedia.org/wiki/OpenDocument), also known as `OpenDocument`, is an open file format for word processing documents, spreadsheets, presentations and graphics and using ZIP-compressed XML files. It was developed with the aim of providing an open, XML-based file format specification for office applications.\n",
"\n",
">The standard is developed and maintained by a technical committee in the Organization for the Advancement of Structured Information Standards (`OASIS`) consortium. It was based on the Sun Microsystems specification for OpenOffice.org XML, the default format for `OpenOffice.org` and `LibreOffice`. It was originally developed for `StarOffice` \"to provide an open standard for office documents.\"\n",
"\n",
"The `UnstructuredODTLoader` is used to load `Open Office ODT` files."
"The `UnstructuredODTLoader` can be used to load Open Office ODT files."
]
},
{
@@ -72,7 +68,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.6"
"version": "3.8.13"
}
},
"nbformat": 4,

View File

@@ -7,7 +7,7 @@
"source": [
"# 2Markdown\n",
"\n",
">[2markdown](https://2markdown.com/) service transforms website content into structured markdown files.\n"
"Uses [2markdown](https://2markdown.com/) to convert any webpage into a standard markdown file"
]
},
{
@@ -17,7 +17,7 @@
"metadata": {},
"outputs": [],
"source": [
"# You will need to get your own API key. See https://2markdown.com/login\n",
"# You will need to get your own API key\n",
"\n",
"api_key = \"\""
]
@@ -56,7 +56,9 @@
"cell_type": "code",
"execution_count": 8,
"id": "706304e9",
"metadata": {},
"metadata": {
"scrolled": false
},
"outputs": [
{
"name": "stdout",
@@ -218,7 +220,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.6"
"version": "3.9.1"
}
},
"nbformat": 4,

View File

@@ -1,101 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "66a7777e",
"metadata": {},
"source": [
"# Weather\n",
"\n",
">[OpenWeatherMap](https://openweathermap.org/) is an open source weather service provider\n",
"\n",
"This loader fetches the weather data from the OpenWeatherMap's OneCall API, using the pyowm Python package. You must initialize the loader with your OpenWeatherMap API token and the names of the cities you want the weather data for."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9ec8a3b3",
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import WeatherDataLoader"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "43128d8d",
"metadata": {},
"outputs": [],
"source": [
"#!pip install pyowm"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "51b0f0db",
"metadata": {},
"outputs": [],
"source": [
"# Set API key either by passing it in to constructor directly\n",
"# or by setting the environment variable \"OPENWEATHERMAP_API_KEY\".\n",
"\n",
"from getpass import getpass\n",
"\n",
"OPENWEATHERMAP_API_KEY = getpass()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "35d6809a",
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"loader = WeatherDataLoader.from_params(['chennai','vellore'], openweathermap_api_key=OPENWEATHERMAP_API_KEY) "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "05fe33b9",
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"documents = loader.load()\n",
"documents"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.3"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -1,229 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"source": [
"# Typesense\n",
"\n",
"> [Typesense](https://typesense.org) is an open source, in-memory search engine, that you can either [self-host](https://typesense.org/docs/guide/install-typesense.html#option-2-local-machine-self-hosting) or run on [Typesense Cloud](https://cloud.typesense.org/).\n",
">\n",
"> Typesense focuses on performance by storing the entire index in RAM (with a backup on disk) and also focuses on providing an out-of-the-box developer experience by simplifying available options and setting good defaults.\n",
">\n",
"> It also lets you combine attribute-based filtering together with vector queries, to fetch the most relevant documents."
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "markdown",
"source": [
"This notebook shows you how to use Typesense as your VectorStore."
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "markdown",
"source": [
"Let's first install our dependencies:"
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "code",
"execution_count": null,
"outputs": [],
"source": [
"!pip install typesense openapi-schema-pydantic openai tiktoken"
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "markdown",
"source": [
"We want to use `OpenAIEmbeddings` so we have to get the OpenAI API Key."
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "code",
"execution_count": 2,
"outputs": [],
"source": [
"import os\n",
"import getpass\n",
"\n",
"os.environ['OPENAI_API_KEY'] = getpass.getpass('OpenAI API Key:')"
],
"metadata": {
"collapsed": false,
"ExecuteTime": {
"end_time": "2023-05-23T22:48:02.968822Z",
"start_time": "2023-05-23T22:47:48.574094Z"
}
}
},
{
"cell_type": "code",
"execution_count": 6,
"outputs": [],
"source": [
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
"from langchain.text_splitter import CharacterTextSplitter\n",
"from langchain.vectorstores import Typesense\n",
"from langchain.document_loaders import TextLoader"
],
"metadata": {
"collapsed": false,
"ExecuteTime": {
"end_time": "2023-05-23T22:50:34.775893Z",
"start_time": "2023-05-23T22:50:34.771889Z"
}
}
},
{
"cell_type": "markdown",
"source": [
"Let's import our test dataset:"
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "code",
"execution_count": 19,
"outputs": [],
"source": [
"loader = TextLoader('../../../state_of_the_union.txt')\n",
"documents = loader.load()\n",
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
"docs = text_splitter.split_documents(documents)\n",
"\n",
"embeddings = OpenAIEmbeddings()"
],
"metadata": {
"collapsed": false,
"ExecuteTime": {
"end_time": "2023-05-23T22:56:19.093489Z",
"start_time": "2023-05-23T22:56:19.089Z"
}
}
},
{
"cell_type": "code",
"execution_count": null,
"outputs": [],
"source": [
"docsearch = Typesense.from_documents(docs,\n",
" embeddings,\n",
" typesense_client_params={\n",
" 'host': 'localhost', # Use xxx.a1.typesense.net for Typesense Cloud\n",
" 'port': '8108', # Use 443 for Typesense Cloud\n",
" 'protocol': 'http', # Use https for Typesense Cloud\n",
" 'typesense_api_key': 'xyz',\n",
" 'typesense_collection_name': 'lang-chain'\n",
" })"
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "markdown",
"source": [
"## Similarity Search"
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "code",
"execution_count": null,
"outputs": [],
"source": [
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
"found_docs = docsearch.similarity_search(query)"
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "code",
"execution_count": null,
"outputs": [],
"source": [
"print(found_docs[0].page_content)"
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "markdown",
"source": [
"## Typesense as a Retriever\n",
"\n",
"Typesense, as all the other vector stores, is a LangChain Retriever, by using cosine similarity."
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "code",
"execution_count": null,
"outputs": [],
"source": [
"retriever = docsearch.as_retriever()\n",
"retriever"
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "code",
"execution_count": null,
"outputs": [],
"source": [
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
"retriever.get_relevant_documents(query)[0]"
],
"metadata": {
"collapsed": false
}
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.6"
}
},
"nbformat": 4,
"nbformat_minor": 0
}

View File

@@ -1,318 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "683953b3",
"metadata": {},
"source": [
"# Vectara\n",
"\n",
">[Vectara](https://Vectara.com/docs/) is a API platform for building LLM-powered applications. It provides a simple to use API for document indexing and query that is managed by Vectara and is optimized for performance and accuracy. \n",
"\n",
"\n",
"This notebook shows how to use functionality related to the `Vectara` vector database. \n",
"\n",
"See the [Vectara API documentation ](https://Vectara.com/docs/) for more information on how to use the API."
]
},
{
"cell_type": "markdown",
"id": "7b2f111b-357a-4f42-9730-ef0603bdc1b5",
"metadata": {},
"source": [
"We want to use `OpenAIEmbeddings` so we have to get the OpenAI API Key."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "082e7e8b-ac52-430c-98d6-8f0924457642",
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"OpenAI API Key:········\n"
]
}
],
"source": [
"import os\n",
"import getpass\n",
"\n",
"os.environ['OPENAI_API_KEY'] = getpass.getpass('OpenAI API Key:')"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "aac9563e",
"metadata": {
"ExecuteTime": {
"end_time": "2023-04-04T10:51:22.282884Z",
"start_time": "2023-04-04T10:51:21.408077Z"
},
"tags": []
},
"outputs": [],
"source": [
"from langchain.embeddings.openai import OpenAIEmbeddings\n",
"from langchain.text_splitter import CharacterTextSplitter\n",
"from langchain.vectorstores import Vectara\n",
"from langchain.document_loaders import TextLoader"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "a3c3999a",
"metadata": {
"ExecuteTime": {
"end_time": "2023-04-04T10:51:22.520144Z",
"start_time": "2023-04-04T10:51:22.285826Z"
},
"tags": []
},
"outputs": [],
"source": [
"loader = TextLoader('../../../state_of_the_union.txt')\n",
"documents = loader.load()\n",
"text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
"docs = text_splitter.split_documents(documents)\n",
"\n",
"embeddings = OpenAIEmbeddings()"
]
},
{
"cell_type": "markdown",
"id": "eeead681",
"metadata": {},
"source": [
"## Connecting to Vectara from LangChain\n",
"\n",
"The Vectara API provides simple API endpoints for indexing and querying."
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "8429667e",
"metadata": {
"ExecuteTime": {
"end_time": "2023-04-04T10:51:22.525091Z",
"start_time": "2023-04-04T10:51:22.522015Z"
},
"tags": []
},
"outputs": [],
"source": [
"vectara = Vectara.from_documents(docs, embedding=None)"
]
},
{
"cell_type": "markdown",
"id": "1f9215c8",
"metadata": {
"ExecuteTime": {
"end_time": "2023-04-04T09:27:29.920258Z",
"start_time": "2023-04-04T09:27:29.913714Z"
}
},
"source": [
"## Similarity search\n",
"\n",
"The simplest scenario for using Vectara is to perform a similarity search. "
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "a8c513ab",
"metadata": {
"ExecuteTime": {
"end_time": "2023-04-04T10:51:25.204469Z",
"start_time": "2023-04-04T10:51:24.855618Z"
},
"tags": []
},
"outputs": [],
"source": [
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
"found_docs = vectara.similarity_search(query)"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "fc516993",
"metadata": {
"ExecuteTime": {
"end_time": "2023-04-04T10:51:25.220984Z",
"start_time": "2023-04-04T10:51:25.213943Z"
},
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Tonight, Id like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nations top legal minds, who will continue Justice Breyers legacy of excellence. A former top litigator in private practice. A former federal public defender.\n"
]
}
],
"source": [
"print(found_docs[0].page_content)"
]
},
{
"cell_type": "markdown",
"id": "1bda9bf5",
"metadata": {},
"source": [
"## Similarity search with score\n",
"\n",
"Sometimes we might want to perform the search, but also obtain a relevancy score to know how good is a particular result."
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "8804a21d",
"metadata": {
"ExecuteTime": {
"end_time": "2023-04-04T10:51:25.631585Z",
"start_time": "2023-04-04T10:51:25.227384Z"
}
},
"outputs": [],
"source": [
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
"found_docs = vectara.similarity_search_with_score(query)"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "756a6887",
"metadata": {
"ExecuteTime": {
"end_time": "2023-04-04T10:51:25.642282Z",
"start_time": "2023-04-04T10:51:25.635947Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Tonight, Id like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nations top legal minds, who will continue Justice Breyers legacy of excellence. A former top litigator in private practice. A former federal public defender.\n",
"\n",
"Score: 1.0046461\n"
]
}
],
"source": [
"document, score = found_docs[0]\n",
"print(document.page_content)\n",
"print(f\"\\nScore: {score}\")"
]
},
{
"cell_type": "markdown",
"id": "691a82d6",
"metadata": {},
"source": [
"## Vectara as a Retriever\n",
"\n",
"Vectara, as all the other vector stores, is a LangChain Retriever, by using cosine similarity. "
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "9427195f",
"metadata": {
"ExecuteTime": {
"end_time": "2023-04-04T10:51:26.031451Z",
"start_time": "2023-04-04T10:51:26.018763Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"VectorStoreRetriever(vectorstore=<langchain.vectorstores.vectara.Vectara object at 0x156d3e830>, search_type='similarity', search_kwargs={})"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"retriever = vectara.as_retriever()\n",
"retriever"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "f3c70c31",
"metadata": {
"ExecuteTime": {
"end_time": "2023-04-04T10:51:26.495652Z",
"start_time": "2023-04-04T10:51:26.046407Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"Document(page_content='Tonight, Id like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nations top legal minds, who will continue Justice Breyers legacy of excellence. A former top litigator in private practice. A former federal public defender.', metadata={'source': '../../modules/state_of_the_union.txt'})"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
"retriever.get_relevant_documents(query)[0]"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2300e785",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.3"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -5,7 +5,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# How (and why) to use the human input LLM\n",
"# How (and why) to use the the human input LLM\n",
"\n",
"Similar to the fake LLM, LangChain provides a pseudo LLM class that can be used for testing, debugging, or educational purposes. This allows you to mock out calls to the LLM and simulate how a human would respond if they received the prompts.\n",
"\n",
@@ -34,23 +34,6 @@
"from langchain.agents import AgentType"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"Since we will use the `WikipediaQueryRun` tool in this notebook, you might need to install the `wikipedia` package if you haven't done so already."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%pip install wikipedia"
]
},
{
"cell_type": "code",
"execution_count": 4,
@@ -234,7 +217,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.9"
"version": "3.11.3"
},
"orig_nbformat": 4,
"vscode": {

View File

@@ -1,159 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "J-yvaDTmTTza"
},
"source": [
"# Beam integration for langchain\n",
"\n",
"Calls the Beam API wrapper to deploy and make subsequent calls to an instance of the gpt2 LLM in a cloud deployment. Requires installation of the Beam library and registration of Beam Client ID and Client Secret. By calling the wrapper an instance of the model is created and run, with returned text relating to the prompt. Additional calls can then be made by directly calling the Beam API.\n",
"\n",
"[Create an account](https://www.beam.cloud/), if you don't have one already. Grab your API keys from the [dashboard](https://www.beam.cloud/dashboard/settings/api-keys)."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "CfTmesWtTfTS"
},
"source": [
"Install the Beam CLI"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "G_tCCurqR7Ik"
},
"outputs": [],
"source": [
"!curl https://raw.githubusercontent.com/slai-labs/get-beam/main/get-beam.sh -sSfL | sh"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "jJkcNqOdThQ7"
},
"source": [
"Register API Keys and set your beam client id and secret environment variables:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "7gQd6fszSEaH"
},
"outputs": [],
"source": [
"import os\n",
"import subprocess\n",
"\n",
"beam_client_id = \"<Your beam client id>\"\n",
"beam_client_secret = \"<Your beam client secret>\"\n",
"\n",
"# Set the environment variables\n",
"os.environ['BEAM_CLIENT_ID'] = beam_client_id\n",
"os.environ['BEAM_CLIENT_SECRET'] = beam_client_secret\n",
"\n",
"# Run the beam configure command\n",
"!beam configure --clientId={beam_client_id} --clientSecret={beam_client_secret}"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "c20rkK18TrK2"
},
"source": [
"Install the Beam SDK:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "CH2Vop6ISNIf"
},
"outputs": [],
"source": [
"!pip install beam-sdk"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "XflOsp3bTwl1"
},
"source": [
"**Deploy and call Beam directly from langchain!**\n",
"\n",
"Note that a cold start might take a couple of minutes to return the response, but subsequent calls will be faster!"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "KmaHxUqbSVnh"
},
"outputs": [],
"source": [
"from langchain.llms.beam import Beam\n",
"\n",
"llm = Beam(model_name=\"gpt2\",\n",
" name=\"langchain-gpt2-test\",\n",
" cpu=8,\n",
" memory=\"32Gi\",\n",
" gpu=\"A10G\",\n",
" python_version=\"python3.8\",\n",
" python_packages=[\n",
" \"diffusers[torch]>=0.10\",\n",
" \"transformers\",\n",
" \"torch\",\n",
" \"pillow\",\n",
" \"accelerate\",\n",
" \"safetensors\",\n",
" \"xformers\",],\n",
" max_length=\"50\",\n",
" verbose=False)\n",
"\n",
"llm._deploy()\n",
"\n",
"response = llm._call(\"Running machine learning on a remote GPU\")\n",
"\n",
"print(response)"
]
}
],
"metadata": {
"colab": {
"private_outputs": true,
"provenance": []
},
"gpuClass": "standard",
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.3"
}
},
"nbformat": 4,
"nbformat_minor": 1
}

View File

@@ -1,105 +0,0 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# MosaicML\n",
"\n",
"[MosaicML](https://docs.mosaicml.com/en/latest/inference.html) offers a managed inference service. You can either use a variety of open source models, or deploy your own.\n",
"\n",
"This example goes over how to use LangChain to interact with MosaicML Inference for text completion."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# sign up for an account: https://forms.mosaicml.com/demo?utm_source=langchain\n",
"\n",
"from getpass import getpass\n",
"\n",
"MOSAICML_API_TOKEN = getpass()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"\n",
"os.environ[\"MOSAICML_API_TOKEN\"] = MOSAICML_API_TOKEN"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain.llms import MosaicML\n",
"from langchain import PromptTemplate, LLMChain"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"template = \"\"\"Question: {question}\"\"\"\n",
"\n",
"prompt = PromptTemplate(template=template, input_variables=[\"question\"])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"llm = MosaicML(inject_instruction_format=True, model_kwargs={'do_sample': False})"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"llm_chain = LLMChain(prompt=prompt, llm=llm)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"question = \"What is one good reason why you should train a large language model on domain specific data?\"\n",
"\n",
"llm_chain.run(question)"
]
}
],
"metadata": {
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -1,133 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# OpenLM\n",
"[OpenLM](https://github.com/r2d4/openlm) is a zero-dependency OpenAI-compatible LLM provider that can call different inference endpoints directly via HTTP. \n",
"\n",
"\n",
"It implements the OpenAI Completion class so that it can be used as a drop-in replacement for the OpenAI API. This changeset utilizes BaseOpenAI for minimal added code.\n",
"\n",
"This examples goes over how to use LangChain to interact with both OpenAI and HuggingFace. You'll need API keys from both."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Setup\n",
"Install dependencies and set API keys."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"# Uncomment to install openlm and openai if you haven't already\n",
"\n",
"# !pip install openlm\n",
"# !pip install openai"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"from getpass import getpass\n",
"import os\n",
"import subprocess\n",
"\n",
"\n",
"# Check if OPENAI_API_KEY environment variable is set\n",
"if \"OPENAI_API_KEY\" not in os.environ:\n",
" print(\"Enter your OpenAI API key:\")\n",
" os.environ[\"OPENAI_API_KEY\"] = getpass()\n",
"\n",
"# Check if HF_API_TOKEN environment variable is set\n",
"if \"HF_API_TOKEN\" not in os.environ:\n",
" print(\"Enter your HuggingFace Hub API key:\")\n",
" os.environ[\"HF_API_TOKEN\"] = getpass()\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Using LangChain with OpenLM\n",
"\n",
"Here we're going to call two models in an LLMChain, `text-davinci-003` from OpenAI and `gpt2` on HuggingFace."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"from langchain.llms import OpenLM\n",
"from langchain import PromptTemplate, LLMChain"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Model: text-davinci-003\n",
"Result: France is a country in Europe. The capital of France is Paris.\n",
"Model: huggingface.co/gpt2\n",
"Result: Question: What is the capital of France?\n",
"\n",
"Answer: Let's think step by step. I am not going to lie, this is a complicated issue, and I don't see any solutions to all this, but it is still far more\n"
]
}
],
"source": [
"question = \"What is the capital of France?\"\n",
"template = \"\"\"Question: {question}\n",
"\n",
"Answer: Let's think step by step.\"\"\"\n",
"\n",
"prompt = PromptTemplate(template=template, input_variables=[\"question\"])\n",
"\n",
"for model in [\"text-davinci-003\", \"huggingface.co/gpt2\"]:\n",
" llm = OpenLM(model=model)\n",
" llm_chain = LLMChain(prompt=prompt, llm=llm)\n",
" result = llm_chain.run(question)\n",
" print(\"\"\"Model: {}\n",
"Result: {}\"\"\".format(model, result))"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.3"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -7,7 +7,7 @@
"source": [
"# Structured Decoding with RELLM\n",
"\n",
"[RELLM](https://github.com/r2d4/rellm) is a library that wraps local Hugging Face pipeline models for structured decoding.\n",
"[RELLM](https://github.com/r2d4/rellm) is a library that wraps local HuggingFace pipeline models for structured decoding.\n",
"\n",
"It works by generating tokens one at a time. At each step, it masks tokens that don't conform to the provided partial regular expression.\n",
"\n",
@@ -32,7 +32,7 @@
"id": "66bd89f1-8daa-433d-bb8f-5b0b3ae34b00",
"metadata": {},
"source": [
"### Hugging Face Baseline\n",
"### HuggingFace Baseline\n",
"\n",
"First, let's establish a qualitative baseline by checking the output of the model without structured decoding."
]

View File

@@ -1,124 +0,0 @@
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"provenance": []
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"language_info": {
"name": "python"
}
},
"cells": [
{
"cell_type": "code",
"source": [
"!pip -q install elasticsearch langchain"
],
"metadata": {
"id": "6dJxqebov4eU"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"import elasticsearch\n",
"from langchain.embeddings.elasticsearch import ElasticsearchEmbeddings"
],
"metadata": {
"id": "RV7C3DUmv4aq"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# Define the model ID\n",
"model_id = 'your_model_id'"
],
"metadata": {
"id": "MrT3jplJvp09"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# Instantiate ElasticsearchEmbeddings using credentials\n",
"embeddings = ElasticsearchEmbeddings.from_credentials(\n",
" model_id,\n",
" es_cloud_id='your_cloud_id', \n",
" es_user='your_user', \n",
" es_password='your_password'\n",
")\n"
],
"metadata": {
"id": "svtdnC-dvpxR"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# Create embeddings for multiple documents\n",
"documents = [\n",
" 'This is an example document.', \n",
" 'Another example document to generate embeddings for.'\n",
"]\n",
"document_embeddings = embeddings.embed_documents(documents)\n"
],
"metadata": {
"id": "7DXZAK7Kvpth"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# Print document embeddings\n",
"for i, embedding in enumerate(document_embeddings):\n",
" print(f\"Embedding for document {i+1}: {embedding}\")\n"
],
"metadata": {
"id": "K8ra75W_vpqy"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# Create an embedding for a single query\n",
"query = 'This is a single query.'\n",
"query_embedding = embeddings.embed_query(query)\n"
],
"metadata": {
"id": "V4Q5kQo9vpna"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# Print query embedding\n",
"print(f\"Embedding for query: {query_embedding}\")\n"
],
"metadata": {
"id": "O0oQDzGKvpkz"
},
"execution_count": null,
"outputs": []
}
]
}

View File

@@ -1,109 +0,0 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# MosaicML embeddings\n",
"\n",
"[MosaicML](https://docs.mosaicml.com/en/latest/inference.html) offers a managed inference service. You can either use a variety of open source models, or deploy your own.\n",
"\n",
"This example goes over how to use LangChain to interact with MosaicML Inference for text embedding."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# sign up for an account: https://forms.mosaicml.com/demo?utm_source=langchain\n",
"\n",
"from getpass import getpass\n",
"\n",
"MOSAICML_API_TOKEN = getpass()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"\n",
"os.environ[\"MOSAICML_API_TOKEN\"] = MOSAICML_API_TOKEN"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain.embeddings import MosaicMLInstructorEmbeddings"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"embeddings = MosaicMLInstructorEmbeddings(\n",
" query_instruction=\"Represent the query for retrieval: \"\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"query_text = \"This is a test query.\"\n",
"query_result = embeddings.embed_query(query_text)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"document_text = \"This is a test document.\"\n",
"document_result = embeddings.embed_documents([document_text])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"\n",
"query_numpy = np.array(query_result)\n",
"document_numpy = np.array(document_result[0])\n",
"similarity = np.dot(query_numpy, document_numpy) / (np.linalg.norm(query_numpy)*np.linalg.norm(document_numpy))\n",
"print(f\"Cosine similarity between document and query: {similarity}\")"
]
}
],
"metadata": {
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -150,6 +150,7 @@ In this example, we'll create a prompt to generate word antonyms.
```python
from langchain import PromptTemplate, FewShotPromptTemplate
# First, create the list of few shot examples.
examples = [
{"word": "happy", "antonym": "sad"},
@@ -158,10 +159,10 @@ examples = [
# Next, we specify the template to format the examples we have provided.
# We use the `PromptTemplate` class for this.
example_formatter_template = """Word: {word}
Antonym: {antonym}
example_formatter_template = """
Word: {word}
Antonym: {antonym}\n
"""
example_prompt = PromptTemplate(
input_variables=["word", "antonym"],
template=example_formatter_template,
@@ -175,14 +176,14 @@ few_shot_prompt = FewShotPromptTemplate(
example_prompt=example_prompt,
# The prefix is some text that goes before the examples in the prompt.
# Usually, this consists of intructions.
prefix="Give the antonym of every input\n",
prefix="Give the antonym of every input",
# The suffix is some text that goes after the examples in the prompt.
# Usually, this is where the user input will go
suffix="Word: {input}\nAntonym: ",
suffix="Word: {input}\nAntonym:",
# The input variables are the variables that the overall prompt expects.
input_variables=["input"],
# The example_separator is the string we will use to join the prefix, examples, and suffix together with.
example_separator="\n",
example_separator="\n\n",
)
# We can now generate a prompt using the `format` method.
@@ -196,7 +197,7 @@ print(few_shot_prompt.format(input="big"))
# -> Antonym: short
# ->
# -> Word: big
# -> Antonym:
# -> Antonym:
```
## Select examples for a prompt template
@@ -228,11 +229,7 @@ example_selector = LengthBasedExampleSelector(
example_prompt=example_prompt,
# This is the maximum length that the formatted examples should be.
# Length is measured by the get_text_length function below.
max_length=25
# This is the function used to get the length of a string, which is used
# to determine which examples to include. It is commented out because
# it is provided as a default value if none is specified.
# get_text_length: Callable[[str], int] = lambda x: len(re.split("\n| ", x))
max_length=25,
)
# We can now use the `example_selector` to create a `FewShotPromptTemplate`.

View File

@@ -1,8 +1,5 @@
"""Agent toolkits."""
from langchain.agents.agent_toolkits.azure_cognitive_services.toolkit import (
AzureCognitiveServicesToolkit,
)
from langchain.agents.agent_toolkits.csv.base import create_csv_agent
from langchain.agents.agent_toolkits.file_management.toolkit import (
FileManagementToolkit,
@@ -63,5 +60,4 @@ __all__ = [
"JiraToolkit",
"FileManagementToolkit",
"PlayWrightBrowserToolkit",
"AzureCognitiveServicesToolkit",
]

View File

@@ -1,7 +0,0 @@
"""Azure Cognitive Services Toolkit."""
from langchain.agents.agent_toolkits.azure_cognitive_services.toolkit import (
AzureCognitiveServicesToolkit,
)
__all__ = ["AzureCognitiveServicesToolkit"]

View File

@@ -1,31 +0,0 @@
from __future__ import annotations
import sys
from typing import List
from langchain.agents.agent_toolkits.base import BaseToolkit
from langchain.tools.azure_cognitive_services import (
AzureCogsFormRecognizerTool,
AzureCogsImageAnalysisTool,
AzureCogsSpeech2TextTool,
AzureCogsText2SpeechTool,
)
from langchain.tools.base import BaseTool
class AzureCognitiveServicesToolkit(BaseToolkit):
"""Toolkit for Azure Cognitive Services."""
def get_tools(self) -> List[BaseTool]:
"""Get the tools in the toolkit."""
tools = [
AzureCogsFormRecognizerTool(),
AzureCogsSpeech2TextTool(),
AzureCogsText2SpeechTool(),
]
# TODO: Remove check once azure-ai-vision supports MacOS.
if sys.platform.startswith("linux") or sys.platform.startswith("win"):
tools.append(AzureCogsImageAnalysisTool())
return tools

View File

@@ -34,7 +34,7 @@ def create_pandas_dataframe_agent(
try:
import pandas as pd
except ImportError:
raise ImportError(
raise ValueError(
"pandas package not found, please install with `pip install pandas`"
)

View File

@@ -58,16 +58,6 @@ class BaseLanguageModel(BaseModel, ABC):
) -> BaseMessage:
"""Predict message from messages."""
@abstractmethod
async def apredict(self, text: str, *, stop: Optional[Sequence[str]] = None) -> str:
"""Predict text from text."""
@abstractmethod
async def apredict_messages(
self, messages: List[BaseMessage], *, stop: Optional[Sequence[str]] = None
) -> BaseMessage:
"""Predict message from messages."""
def get_token_ids(self, text: str) -> List[int]:
"""Get the token present in the text."""
return _get_token_ids_default_method(text)

View File

@@ -313,7 +313,7 @@ class GPTCache(BaseCache):
try:
import gptcache # noqa: F401
except ImportError:
raise ImportError(
raise ValueError(
"Could not import gptcache python package. "
"Please install it with `pip install gptcache`."
)

View File

@@ -12,7 +12,6 @@ from langchain.callbacks.openai_info import OpenAICallbackHandler
from langchain.callbacks.stdout import StdOutCallbackHandler
from langchain.callbacks.streaming_aiter import AsyncIteratorCallbackHandler
from langchain.callbacks.wandb_callback import WandbCallbackHandler
from langchain.callbacks.whylabs_callback import WhyLabsCallbackHandler
__all__ = [
"OpenAICallbackHandler",
@@ -22,7 +21,6 @@ __all__ = [
"MlflowCallbackHandler",
"ClearMLCallbackHandler",
"CometCallbackHandler",
"WhyLabsCallbackHandler",
"AsyncIteratorCallbackHandler",
"get_openai_callback",
"tracing_enabled",

View File

@@ -24,20 +24,12 @@ MODEL_COST_PER_1K_TOKENS = {
"text-davinci-003": 0.02,
"text-davinci-002": 0.02,
"code-davinci-002": 0.02,
"ada-finetuned": 0.0016,
"babbage-finetuned": 0.0024,
"curie-finetuned": 0.0120,
"davinci-finetuned": 0.1200,
}
def get_openai_token_cost_for_model(
model_name: str, num_tokens: int, is_completion: bool = False
) -> float:
# handling finetuned models
if "ft-" in model_name:
model_name = f"{model_name.split(':')[0]}-finetuned"
suffix = "-completion" if is_completion and model_name.startswith("gpt-4") else ""
model = model_name.lower() + suffix
if model not in MODEL_COST_PER_1K_TOKENS:

View File

@@ -58,8 +58,7 @@ class AsyncIteratorCallbackHandler(AsyncCallbackHandler):
)
# Cancel the other task
if other:
other.pop().cancel()
other.pop().cancel()
# Extract the value of the first completed task
token_or_done = cast(Union[str, Literal[True]], done.pop().result())

View File

@@ -1,203 +0,0 @@
from __future__ import annotations
import logging
from typing import TYPE_CHECKING, Any, Dict, List, Optional, Union
from langchain.callbacks.base import BaseCallbackHandler
from langchain.schema import AgentAction, AgentFinish, Generation, LLMResult
from langchain.utils import get_from_env
if TYPE_CHECKING:
from whylogs.api.logger.logger import Logger
diagnostic_logger = logging.getLogger(__name__)
def import_langkit(
sentiment: bool = False,
toxicity: bool = False,
themes: bool = False,
) -> Any:
try:
import langkit # noqa: F401
import langkit.regexes # noqa: F401
import langkit.textstat # noqa: F401
if sentiment:
import langkit.sentiment # noqa: F401
if toxicity:
import langkit.toxicity # noqa: F401
if themes:
import langkit.themes # noqa: F401
except ImportError:
raise ImportError(
"To use the whylabs callback manager you need to have the `langkit` python "
"package installed. Please install it with `pip install langkit`."
)
return langkit
class WhyLabsCallbackHandler(BaseCallbackHandler):
"""WhyLabs CallbackHandler."""
def __init__(self, logger: Logger):
"""Initiate the rolling logger"""
super().__init__()
self.logger = logger
diagnostic_logger.info(
"Initialized WhyLabs callback handler with configured whylogs Logger."
)
def _profile_generations(self, generations: List[Generation]) -> None:
for gen in generations:
self.logger.log({"response": gen.text})
def on_llm_start(
self, serialized: Dict[str, Any], prompts: List[str], **kwargs: Any
) -> None:
"""Pass the input prompts to the logger"""
for prompt in prompts:
self.logger.log({"prompt": prompt})
def on_llm_end(self, response: LLMResult, **kwargs: Any) -> None:
"""Pass the generated response to the logger."""
for generations in response.generations:
self._profile_generations(generations)
def on_llm_new_token(self, token: str, **kwargs: Any) -> None:
"""Do nothing."""
pass
def on_llm_error(
self, error: Union[Exception, KeyboardInterrupt], **kwargs: Any
) -> None:
"""Do nothing."""
pass
def on_chain_start(
self, serialized: Dict[str, Any], inputs: Dict[str, Any], **kwargs: Any
) -> None:
"""Do nothing."""
def on_chain_end(self, outputs: Dict[str, Any], **kwargs: Any) -> None:
"""Do nothing."""
def on_chain_error(
self, error: Union[Exception, KeyboardInterrupt], **kwargs: Any
) -> None:
"""Do nothing."""
pass
def on_tool_start(
self,
serialized: Dict[str, Any],
input_str: str,
**kwargs: Any,
) -> None:
"""Do nothing."""
def on_agent_action(
self, action: AgentAction, color: Optional[str] = None, **kwargs: Any
) -> Any:
"""Do nothing."""
def on_tool_end(
self,
output: str,
color: Optional[str] = None,
observation_prefix: Optional[str] = None,
llm_prefix: Optional[str] = None,
**kwargs: Any,
) -> None:
"""Do nothing."""
def on_tool_error(
self, error: Union[Exception, KeyboardInterrupt], **kwargs: Any
) -> None:
"""Do nothing."""
pass
def on_text(self, text: str, **kwargs: Any) -> None:
"""Do nothing."""
def on_agent_finish(
self, finish: AgentFinish, color: Optional[str] = None, **kwargs: Any
) -> None:
"""Run on agent end."""
pass
def flush(self) -> None:
self.logger._do_rollover()
diagnostic_logger.info("Flushing WhyLabs logger, writing profile...")
def close(self) -> None:
self.logger.close()
diagnostic_logger.info("Closing WhyLabs logger, see you next time!")
def __enter__(self) -> WhyLabsCallbackHandler:
return self
def __exit__(
self, exception_type: Any, exception_value: Any, traceback: Any
) -> None:
self.close()
@classmethod
def from_params(
cls,
*,
api_key: Optional[str] = None,
org_id: Optional[str] = None,
dataset_id: Optional[str] = None,
sentiment: bool = False,
toxicity: bool = False,
themes: bool = False,
) -> Logger:
"""Instantiate whylogs Logger from params.
Args:
api_key (Optional[str]): WhyLabs API key. Optional because the preferred
way to specify the API key is with environment variable
WHYLABS_API_KEY.
org_id (Optional[str]): WhyLabs organization id to write profiles to.
If not set must be specified in environment variable
WHYLABS_DEFAULT_ORG_ID.
dataset_id (Optional[str]): The model or dataset this callback is gathering
telemetry for. If not set must be specified in environment variable
WHYLABS_DEFAULT_DATASET_ID.
sentiment (bool): If True will initialize a model to perform
sentiment analysis compound score. Defaults to False and will not gather
this metric.
toxicity (bool): If True will initialize a model to score
toxicity. Defaults to False and will not gather this metric.
themes (bool): If True will initialize a model to calculate
distance to configured themes. Defaults to None and will not gather this
metric.
"""
# langkit library will import necessary whylogs libraries
import_langkit(sentiment=sentiment, toxicity=toxicity, themes=themes)
import whylogs as why
from whylogs.api.writer.whylabs import WhyLabsWriter
from whylogs.core.schema import DeclarativeSchema
from whylogs.experimental.core.metrics.udf_metric import generate_udf_schema
api_key = api_key or get_from_env("api_key", "WHYLABS_API_KEY")
org_id = org_id or get_from_env("org_id", "WHYLABS_DEFAULT_ORG_ID")
dataset_id = dataset_id or get_from_env(
"dataset_id", "WHYLABS_DEFAULT_DATASET_ID"
)
whylabs_writer = WhyLabsWriter(
api_key=api_key, org_id=org_id, dataset_id=dataset_id
)
langkit_schema = DeclarativeSchema(generate_udf_schema())
whylabs_logger = why.logger(
mode="rolling", interval=5, when="M", schema=langkit_schema
)
whylabs_logger.append_writer(writer=whylabs_writer)
diagnostic_logger.info(
"Started whylogs Logger with WhyLabsWriter and initialized LangKit. 📝"
)
return cls(whylabs_logger)

View File

@@ -42,9 +42,9 @@ class HypotheticalDocumentEmbedder(Chain, Embeddings):
"""Output keys for Hyde's LLM chain."""
return self.llm_chain.output_keys
def embed_documents(self, texts: List[str]) -> List[List[float]]:
def embed_texts(self, texts: List[str]) -> List[List[float]]:
"""Call the base embeddings."""
return self.base_embeddings.embed_documents(texts)
return self.base_embeddings.embed_texts(texts)
def combine_embeddings(self, embeddings: List[List[float]]) -> List[float]:
"""Combine embeddings into final embeddings."""
@@ -55,7 +55,7 @@ class HypotheticalDocumentEmbedder(Chain, Embeddings):
var_name = self.llm_chain.input_keys[0]
result = self.llm_chain.generate([{var_name: text}])
documents = [generation.text for generation in result.generations[0]]
embeddings = self.embed_documents(documents)
embeddings = self.embed_texts(documents)
return self.combine_embeddings(embeddings)
def _call(

View File

@@ -54,7 +54,7 @@ class OpenAIModerationChain(Chain):
openai.organization = openai_organization
values["client"] = openai.Moderation
except ImportError:
raise ImportError(
raise ValueError(
"Could not import openai python package. "
"Please install it with `pip install openai`."
)

View File

@@ -86,7 +86,7 @@ class AzureChatOpenAI(ChatOpenAI):
if openai_organization:
openai.organization = openai_organization
except ImportError:
raise ImportError(
raise ValueError(
"Could not import openai python package. "
"Please install it with `pip install openai`."
)

View File

@@ -183,19 +183,6 @@ class BaseChatModel(BaseLanguageModel, ABC):
else:
raise ValueError("Unexpected generation type")
async def _call_async(
self,
messages: List[BaseMessage],
stop: Optional[List[str]] = None,
callbacks: Callbacks = None,
) -> BaseMessage:
result = await self.agenerate([messages], stop=stop, callbacks=callbacks)
generation = result.generations[0][0]
if isinstance(generation, ChatGeneration):
return generation.message
else:
raise ValueError("Unexpected generation type")
def call_as_llm(self, message: str, stop: Optional[List[str]] = None) -> str:
return self.predict(message, stop=stop)
@@ -216,23 +203,6 @@ class BaseChatModel(BaseLanguageModel, ABC):
_stop = list(stop)
return self(messages, stop=_stop)
async def apredict(self, text: str, *, stop: Optional[Sequence[str]] = None) -> str:
if stop is None:
_stop = None
else:
_stop = list(stop)
result = await self._call_async([HumanMessage(content=text)], stop=_stop)
return result.content
async def apredict_messages(
self, messages: List[BaseMessage], *, stop: Optional[Sequence[str]] = None
) -> BaseMessage:
if stop is None:
_stop = None
else:
_stop = list(stop)
return await self._call_async(messages, stop=_stop)
@property
def _identifying_params(self) -> Mapping[str, Any]:
"""Get the identifying parameters."""

View File

@@ -342,7 +342,7 @@ class LangChainPlusClient(BaseSettings):
) -> Example:
"""Create a dataset example in the LangChain+ API."""
if dataset_id is None:
dataset_id = self.read_dataset(dataset_name=dataset_name).id
dataset_id = self.read_dataset(dataset_name).id
data = {
"inputs": inputs,

View File

@@ -48,7 +48,6 @@ from langchain.document_loaders.image_captions import ImageCaptionLoader
from langchain.document_loaders.imsdb import IMSDbLoader
from langchain.document_loaders.json_loader import JSONLoader
from langchain.document_loaders.markdown import UnstructuredMarkdownLoader
from langchain.document_loaders.mastodon import MastodonTootsLoader
from langchain.document_loaders.mediawikidump import MWDumpLoader
from langchain.document_loaders.modern_treasury import ModernTreasuryLoader
from langchain.document_loaders.notebook import NotebookLoader
@@ -100,7 +99,6 @@ from langchain.document_loaders.unstructured import (
from langchain.document_loaders.url import UnstructuredURLLoader
from langchain.document_loaders.url_playwright import PlaywrightURLLoader
from langchain.document_loaders.url_selenium import SeleniumURLLoader
from langchain.document_loaders.weather import WeatherDataLoader
from langchain.document_loaders.web_base import WebBaseLoader
from langchain.document_loaders.whatsapp_chat import WhatsAppChatLoader
from langchain.document_loaders.wikipedia import WikipediaLoader
@@ -162,7 +160,6 @@ __all__ = [
"ImageCaptionLoader",
"JSONLoader",
"MWDumpLoader",
"MastodonTootsLoader",
"MathpixPDFLoader",
"ModernTreasuryLoader",
"NotebookLoader",
@@ -213,7 +210,6 @@ __all__ = [
"UnstructuredRTFLoader",
"UnstructuredURLLoader",
"UnstructuredWordDocumentLoader",
"WeatherDataLoader",
"WebBaseLoader",
"WhatsAppChatLoader",
"WikipediaLoader",

View File

@@ -41,7 +41,7 @@ class ApifyDatasetLoader(BaseLoader, BaseModel):
values["apify_client"] = ApifyClient()
except ImportError:
raise ImportError(
raise ValueError(
"Could not import apify-client Python package. "
"Please install it with `pip install apify-client`."
)

View File

@@ -63,7 +63,7 @@ class DocugamiLoader(BaseLoader, BaseModel):
try:
from lxml import etree
except ImportError:
raise ImportError(
raise ValueError(
"Could not import lxml python package. "
"Please install it with `pip install lxml`."
)
@@ -259,7 +259,7 @@ class DocugamiLoader(BaseLoader, BaseModel):
try:
from lxml import etree
except ImportError:
raise ImportError(
raise ValueError(
"Could not import lxml python package. "
"Please install it with `pip install lxml`."
)

View File

@@ -33,7 +33,7 @@ class DuckDBLoader(BaseLoader):
try:
import duckdb
except ImportError:
raise ImportError(
raise ValueError(
"Could not import duckdb python package. "
"Please install it with `pip install duckdb`."
)

View File

@@ -39,7 +39,7 @@ class ImageCaptionLoader(BaseLoader):
try:
from transformers import BlipForConditionalGeneration, BlipProcessor
except ImportError:
raise ImportError(
raise ValueError(
"`transformers` package not found, please install with "
"`pip install transformers`."
)
@@ -66,7 +66,7 @@ class ImageCaptionLoader(BaseLoader):
try:
from PIL import Image
except ImportError:
raise ImportError(
raise ValueError(
"`PIL` package not found, please install with `pip install pillow`"
)

View File

@@ -42,7 +42,7 @@ class JSONLoader(BaseLoader):
try:
import jq # noqa:F401
except ImportError:
raise ImportError(
raise ValueError(
"jq package not found, please install it with `pip install jq`"
)

View File

@@ -1,88 +0,0 @@
"""Mastodon document loader."""
from __future__ import annotations
import os
from typing import TYPE_CHECKING, Any, Dict, Iterable, List, Optional, Sequence
from langchain.docstore.document import Document
from langchain.document_loaders.base import BaseLoader
if TYPE_CHECKING:
import mastodon
def _dependable_mastodon_import() -> mastodon:
try:
import mastodon
except ImportError:
raise ValueError(
"Mastodon.py package not found, "
"please install it with `pip install Mastodon.py`"
)
return mastodon
class MastodonTootsLoader(BaseLoader):
"""Mastodon toots loader."""
def __init__(
self,
mastodon_accounts: Sequence[str],
number_toots: Optional[int] = 100,
exclude_replies: bool = False,
access_token: Optional[str] = None,
api_base_url: str = "https://mastodon.social",
):
"""Instantiate Mastodon toots loader.
Args:
mastodon_accounts: The list of Mastodon accounts to query.
number_toots: How many toots to pull for each account.
exclude_replies: Whether to exclude reply toots from the load.
access_token: An access token if toots are loaded as a Mastodon app. Can
also be specified via the environment variables "MASTODON_ACCESS_TOKEN".
api_base_url: A Mastodon API base URL to talk to, if not using the default.
"""
mastodon = _dependable_mastodon_import()
access_token = access_token or os.environ.get("MASTODON_ACCESS_TOKEN")
self.api = mastodon.Mastodon(
access_token=access_token, api_base_url=api_base_url
)
self.mastodon_accounts = mastodon_accounts
self.number_toots = number_toots
self.exclude_replies = exclude_replies
def load(self) -> List[Document]:
"""Load toots into documents."""
results: List[Document] = []
for account in self.mastodon_accounts:
user = self.api.account_lookup(account)
toots = self.api.account_statuses(
user.id,
only_media=False,
pinned=False,
exclude_replies=self.exclude_replies,
exclude_reblogs=True,
limit=self.number_toots,
)
docs = self._format_toots(toots, user)
results.extend(docs)
return results
def _format_toots(
self, toots: List[Dict[str, Any]], user_info: dict
) -> Iterable[Document]:
"""Format toots into documents.
Adding user info, and selected toot fields into the metadata.
"""
for toot in toots:
metadata = {
"created_at": toot["created_at"],
"user_info": user_info,
"is_reply": toot["in_reply_to_id"] is not None,
}
yield Document(
page_content=toot["content"],
metadata=metadata,
)

View File

@@ -83,7 +83,7 @@ class NotebookLoader(BaseLoader):
try:
import pandas as pd
except ImportError:
raise ImportError(
raise ValueError(
"pandas is needed for Notebook Loader, "
"please install with `pip install pandas`"
)

View File

@@ -77,7 +77,7 @@ class OneDriveLoader(BaseLoader, BaseModel):
try:
from O365 import FileSystemTokenBackend
except ImportError:
raise ImportError(
raise ValueError(
"O365 package not found, please install it with `pip install o365`"
)
if self.auth_with_token:

View File

@@ -103,7 +103,7 @@ class PyPDFLoader(BasePDFLoader):
try:
import pypdf # noqa:F401
except ImportError:
raise ImportError(
raise ValueError(
"pypdf package not found, please install it with " "`pip install pypdf`"
)
self.parser = PyPDFParser()
@@ -194,7 +194,7 @@ class PDFMinerLoader(BasePDFLoader):
try:
from pdfminer.high_level import extract_text # noqa:F401
except ImportError:
raise ImportError(
raise ValueError(
"`pdfminer` package not found, please install it with "
"`pip install pdfminer.six`"
)
@@ -222,7 +222,7 @@ class PDFMinerPDFasHTMLLoader(BasePDFLoader):
try:
from pdfminer.high_level import extract_text_to_fp # noqa:F401
except ImportError:
raise ImportError(
raise ValueError(
"`pdfminer` package not found, please install it with "
"`pip install pdfminer.six`"
)
@@ -256,7 +256,7 @@ class PyMuPDFLoader(BasePDFLoader):
try:
import fitz # noqa:F401
except ImportError:
raise ImportError(
raise ValueError(
"`PyMuPDF` package not found, please install it with "
"`pip install pymupdf`"
)
@@ -375,7 +375,7 @@ class PDFPlumberLoader(BasePDFLoader):
try:
import pdfplumber # noqa:F401
except ImportError:
raise ImportError(
raise ValueError(
"pdfplumber package not found, please install it with "
"`pip install pdfplumber`"
)

View File

@@ -19,8 +19,9 @@ class ReadTheDocsLoader(BaseLoader):
"""Initialize path."""
try:
from bs4 import BeautifulSoup
except ImportError:
raise ImportError(
raise ValueError(
"Could not import python packages. "
"Please install it with `pip install beautifulsoup4`. "
)

View File

@@ -19,7 +19,7 @@ class S3DirectoryLoader(BaseLoader):
try:
import boto3
except ImportError:
raise ImportError(
raise ValueError(
"Could not import boto3 python package. "
"Please install it with `pip install boto3`."
)

View File

@@ -21,7 +21,7 @@ class S3FileLoader(BaseLoader):
try:
import boto3
except ImportError:
raise ImportError(
raise ValueError(
"Could not import `boto3` python package. "
"Please install it with `pip install boto3`."
)

View File

@@ -58,7 +58,7 @@ class SitemapLoader(WebBaseLoader):
try:
import lxml # noqa:F401
except ImportError:
raise ImportError(
raise ValueError(
"lxml package not found, please install it with " "`pip install lxml`"
)
@@ -107,9 +107,8 @@ class SitemapLoader(WebBaseLoader):
try:
import bs4
except ImportError:
raise ImportError(
"beautifulsoup4 package not found, please install it"
" with `pip install beautifulsoup4`"
raise ValueError(
"bs4 package not found, please install it with " "`pip install bs4`"
)
fp = open(self.web_path)
soup = bs4.BeautifulSoup(fp, "xml")

View File

@@ -13,8 +13,8 @@ class SRTLoader(BaseLoader):
try:
import pysrt # noqa:F401
except ImportError:
raise ImportError(
"package `pysrt` not found, please install it with `pip install pysrt`"
raise ValueError(
"package `pysrt` not found, please install it with `pysrt`"
)
self.file_path = file_path

View File

@@ -226,7 +226,7 @@ class TelegramChatApiLoader(BaseLoader):
nest_asyncio.apply()
asyncio.run(self.fetch_data_from_telegram())
except ImportError:
raise ImportError(
raise ValueError(
"""`nest_asyncio` package not found.
please install with `pip install nest_asyncio`
"""
@@ -239,7 +239,7 @@ class TelegramChatApiLoader(BaseLoader):
try:
import pandas as pd
except ImportError:
raise ImportError(
raise ValueError(
"""`pandas` package not found.
please install with `pip install pandas`
"""

View File

@@ -15,7 +15,7 @@ def _dependable_tweepy_import() -> tweepy:
try:
import tweepy
except ImportError:
raise ImportError(
raise ValueError(
"tweepy package not found, please install it with `pip install tweepy`"
)
return tweepy

View File

@@ -30,7 +30,7 @@ class PlaywrightURLLoader(BaseLoader):
try:
import playwright # noqa:F401
except ImportError:
raise ImportError(
raise ValueError(
"playwright package not found, please install it with "
"`pip install playwright`"
)

View File

@@ -40,7 +40,7 @@ class SeleniumURLLoader(BaseLoader):
try:
import selenium # noqa:F401
except ImportError:
raise ImportError(
raise ValueError(
"selenium package not found, please install it with "
"`pip install selenium`"
)
@@ -48,7 +48,7 @@ class SeleniumURLLoader(BaseLoader):
try:
import unstructured # noqa:F401
except ImportError:
raise ImportError(
raise ValueError(
"unstructured package not found, please install it with "
"`pip install unstructured`"
)

View File

@@ -1,50 +0,0 @@
"""Simple reader that reads weather data from OpenWeatherMap API"""
from __future__ import annotations
from datetime import datetime
from typing import Iterator, List, Optional, Sequence
from langchain.docstore.document import Document
from langchain.document_loaders.base import BaseLoader
from langchain.utilities.openweathermap import OpenWeatherMapAPIWrapper
class WeatherDataLoader(BaseLoader):
"""Weather Reader.
Reads the forecast & current weather of any location using OpenWeatherMap's free
API. Checkout 'https://openweathermap.org/appid' for more on how to generate a free
OpenWeatherMap API.
"""
def __init__(
self,
client: OpenWeatherMapAPIWrapper,
places: Sequence[str],
) -> None:
"""Initialize with parameters."""
super().__init__()
self.client = client
self.places = places
@classmethod
def from_params(
cls, places: Sequence[str], *, openweathermap_api_key: Optional[str] = None
) -> WeatherDataLoader:
client = OpenWeatherMapAPIWrapper(openweathermap_api_key=openweathermap_api_key)
return cls(client, places)
def lazy_load(
self,
) -> Iterator[Document]:
"""Lazily load weather data for the given locations."""
for place in self.places:
metadata = {"queried_at": datetime.now()}
content = self.client.run(place)
yield Document(page_content=content, metadata=metadata)
def load(
self,
) -> List[Document]:
"""Load weather data for the given locations."""
return list(self.lazy_load())

View File

@@ -55,9 +55,7 @@ def _get_embeddings_from_stateful_docs(
if len(documents) and "embedded_doc" in documents[0].state:
embedded_documents = [doc.state["embedded_doc"] for doc in documents]
else:
embedded_documents = embeddings.embed_documents(
[d.page_content for d in documents]
)
embedded_documents = embeddings.embed_texts([d.page_content for d in documents])
for doc, embedding in zip(documents, embedded_documents):
doc.state["embedded_doc"] = embedding
return embedded_documents

View File

@@ -7,7 +7,6 @@ from langchain.embeddings.aleph_alpha import (
AlephAlphaSymmetricSemanticEmbedding,
)
from langchain.embeddings.cohere import CohereEmbeddings
from langchain.embeddings.elasticsearch import ElasticsearchEmbeddings
from langchain.embeddings.fake import FakeEmbeddings
from langchain.embeddings.google_palm import GooglePalmEmbeddings
from langchain.embeddings.huggingface import (
@@ -17,7 +16,6 @@ from langchain.embeddings.huggingface import (
from langchain.embeddings.huggingface_hub import HuggingFaceHubEmbeddings
from langchain.embeddings.jina import JinaEmbeddings
from langchain.embeddings.llamacpp import LlamaCppEmbeddings
from langchain.embeddings.mosaicml import MosaicMLInstructorEmbeddings
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.embeddings.sagemaker_endpoint import SagemakerEndpointEmbeddings
from langchain.embeddings.self_hosted import SelfHostedEmbeddings
@@ -34,14 +32,12 @@ __all__ = [
"OpenAIEmbeddings",
"HuggingFaceEmbeddings",
"CohereEmbeddings",
"ElasticsearchEmbeddings",
"JinaEmbeddings",
"LlamaCppEmbeddings",
"HuggingFaceHubEmbeddings",
"TensorflowHubEmbeddings",
"SagemakerEndpointEmbeddings",
"HuggingFaceInstructEmbeddings",
"MosaicMLInstructorEmbeddings",
"SelfHostedEmbeddings",
"SelfHostedHuggingFaceEmbeddings",
"SelfHostedHuggingFaceInstructEmbeddings",

View File

@@ -65,7 +65,7 @@ class AlephAlphaAsymmetricSemanticEmbedding(BaseModel, Embeddings):
values["client"] = Client(token=aleph_alpha_api_key)
return values
def embed_documents(self, texts: List[str]) -> List[List[float]]:
def embed_texts(self, texts: List[str]) -> List[List[float]]:
"""Call out to Aleph Alpha's asymmetric Document endpoint.
Args:
@@ -186,7 +186,7 @@ class AlephAlphaSymmetricSemanticEmbedding(AlephAlphaAsymmetricSemanticEmbedding
return query_response.embedding
def embed_documents(self, texts: List[str]) -> List[List[float]]:
def embed_texts(self, texts: List[str]) -> List[List[float]]:
"""Call out to Aleph Alpha's Document endpoint.
Args:

View File

@@ -6,9 +6,13 @@ from typing import List
class Embeddings(ABC):
"""Interface for embedding models."""
@abstractmethod
def embed_documents(self, texts: List[str]) -> List[List[float]]:
"""Embed search docs."""
"""DEPRECATED. Kept for backwards compatibility."""
return self.embed_texts(texts)
@abstractmethod
def embed_texts(self, texts: List[str]) -> List[List[float]]:
"""Embed search texts."""
@abstractmethod
def embed_query(self, text: str) -> List[float]:

View File

@@ -48,13 +48,13 @@ class CohereEmbeddings(BaseModel, Embeddings):
values["client"] = cohere.Client(cohere_api_key)
except ImportError:
raise ImportError(
raise ValueError(
"Could not import cohere python package. "
"Please install it with `pip install cohere`."
)
return values
def embed_documents(self, texts: List[str]) -> List[List[float]]:
def embed_texts(self, texts: List[str]) -> List[List[float]]:
"""Call out to Cohere's embedding endpoint.
Args:

View File

@@ -1,155 +0,0 @@
from __future__ import annotations
from typing import TYPE_CHECKING, List, Optional
from langchain.utils import get_from_env
if TYPE_CHECKING:
from elasticsearch.client import MlClient
from langchain.embeddings.base import Embeddings
class ElasticsearchEmbeddings(Embeddings):
"""
Wrapper around Elasticsearch embedding models.
This class provides an interface to generate embeddings using a model deployed
in an Elasticsearch cluster. It requires an Elasticsearch connection object
and the model_id of the model deployed in the cluster.
In Elasticsearch you need to have an embedding model loaded and deployed.
- https://www.elastic.co/guide/en/elasticsearch/reference/current/infer-trained-model.html
- https://www.elastic.co/guide/en/machine-learning/current/ml-nlp-deploy-models.html
""" # noqa: E501
def __init__(
self,
client: MlClient,
model_id: str,
*,
input_field: str = "text_field",
):
"""
Initialize the ElasticsearchEmbeddings instance.
Args:
client (MlClient): An Elasticsearch ML client object.
model_id (str): The model_id of the model deployed in the Elasticsearch
cluster.
input_field (str): The name of the key for the input text field in the
document. Defaults to 'text_field'.
"""
self.client = client
self.model_id = model_id
self.input_field = input_field
@classmethod
def from_credentials(
cls,
model_id: str,
*,
es_cloud_id: Optional[str] = None,
es_user: Optional[str] = None,
es_password: Optional[str] = None,
input_field: str = "text_field",
) -> ElasticsearchEmbeddings:
"""Instantiate embeddings from Elasticsearch credentials.
Args:
model_id (str): The model_id of the model deployed in the Elasticsearch
cluster.
input_field (str): The name of the key for the input text field in the
document. Defaults to 'text_field'.
es_cloud_id: (str, optional): The Elasticsearch cloud ID to connect to.
es_user: (str, optional): Elasticsearch username.
es_password: (str, optional): Elasticsearch password.
Example Usage:
from langchain.embeddings import ElasticsearchEmbeddings
# Define the model ID and input field name (if different from default)
model_id = "your_model_id"
# Optional, only if different from 'text_field'
input_field = "your_input_field"
# Credentials can be passed in two ways. Either set the env vars
# ES_CLOUD_ID, ES_USER, ES_PASSWORD and they will be automatically pulled
# in, or pass them in directly as kwargs.
embeddings = ElasticsearchEmbeddings.from_credentials(
model_id,
input_field=input_field,
# es_cloud_id="foo",
# es_user="bar",
# es_password="baz",
)
documents = [
"This is an example document.",
"Another example document to generate embeddings for.",
]
embeddings_generator.embed_documents(documents)
"""
try:
from elasticsearch import Elasticsearch
from elasticsearch.client import MlClient
except ImportError:
raise ImportError(
"elasticsearch package not found, please install with 'pip install "
"elasticsearch'"
)
es_cloud_id = es_cloud_id or get_from_env("es_cloud_id", "ES_CLOUD_ID")
es_user = es_user or get_from_env("es_user", "ES_USER")
es_password = es_password or get_from_env("es_password", "ES_PASSWORD")
# Connect to Elasticsearch
es_connection = Elasticsearch(
cloud_id=es_cloud_id, basic_auth=(es_user, es_password)
)
client = MlClient(es_connection)
return cls(client, model_id, input_field=input_field)
def _embedding_func(self, texts: List[str]) -> List[List[float]]:
"""
Generate embeddings for the given texts using the Elasticsearch model.
Args:
texts (List[str]): A list of text strings to generate embeddings for.
Returns:
List[List[float]]: A list of embeddings, one for each text in the input
list.
"""
response = self.client.infer_trained_model(
model_id=self.model_id, docs=[{self.input_field: text} for text in texts]
)
embeddings = [doc["predicted_value"] for doc in response["inference_results"]]
return embeddings
def embed_documents(self, texts: List[str]) -> List[List[float]]:
"""
Generate embeddings for a list of documents.
Args:
texts (List[str]): A list of document text strings to generate embeddings
for.
Returns:
List[List[float]]: A list of embeddings, one for each document in the input
list.
"""
return self._embedding_func(texts)
def embed_query(self, text: str) -> List[float]:
"""
Generate an embedding for a single query text.
Args:
text (str): The query text to generate an embedding for.
Returns:
List[float]: The embedding for the input query text.
"""
return self._embedding_func([text])[0]

View File

@@ -12,7 +12,7 @@ class FakeEmbeddings(Embeddings, BaseModel):
def _get_embedding(self) -> List[float]:
return list(np.random.normal(size=self.size))
def embed_documents(self, texts: List[str]) -> List[List[float]]:
def embed_texts(self, texts: List[str]) -> List[List[float]]:
return [self._get_embedding() for _ in texts]
def embed_query(self, text: str) -> List[float]:

View File

@@ -77,7 +77,7 @@ class GooglePalmEmbeddings(BaseModel, Embeddings):
return values
def embed_documents(self, texts: List[str]) -> List[List[float]]:
def embed_texts(self, texts: List[str]) -> List[List[float]]:
return [self.embed_query(text) for text in texts]
def embed_query(self, text: str) -> List[float]:

View File

@@ -46,7 +46,7 @@ class HuggingFaceEmbeddings(BaseModel, Embeddings):
import sentence_transformers
except ImportError as exc:
raise ImportError(
raise ValueError(
"Could not import sentence_transformers python package. "
"Please install it with `pip install sentence_transformers`."
) from exc
@@ -60,7 +60,7 @@ class HuggingFaceEmbeddings(BaseModel, Embeddings):
extra = Extra.forbid
def embed_documents(self, texts: List[str]) -> List[List[float]]:
def embed_texts(self, texts: List[str]) -> List[List[float]]:
"""Compute doc embeddings using a HuggingFace transformer model.
Args:
@@ -135,7 +135,7 @@ class HuggingFaceInstructEmbeddings(BaseModel, Embeddings):
extra = Extra.forbid
def embed_documents(self, texts: List[str]) -> List[List[float]]:
def embed_texts(self, texts: List[str]) -> List[List[float]]:
"""Compute doc embeddings using a HuggingFace instruct model.
Args:

View File

@@ -77,7 +77,7 @@ class HuggingFaceHubEmbeddings(BaseModel, Embeddings):
)
return values
def embed_documents(self, texts: List[str]) -> List[List[float]]:
def embed_texts(self, texts: List[str]) -> List[List[float]]:
"""Call out to HuggingFaceHub's embedding endpoint for embedding search docs.
Args:
@@ -101,5 +101,5 @@ class HuggingFaceHubEmbeddings(BaseModel, Embeddings):
Returns:
Embeddings for the text.
"""
response = self.embed_documents([text])[0]
response = self.embed_texts([text])[0]
return response

View File

@@ -34,7 +34,7 @@ class JinaEmbeddings(BaseModel, Embeddings):
try:
import jina
except ImportError:
raise ImportError(
raise ValueError(
"Could not import `jina` python package. "
"Please install it with `pip install jina`."
)
@@ -71,7 +71,7 @@ class JinaEmbeddings(BaseModel, Embeddings):
payload = dict(inputs=docs, metadata=self.request_headers, **kwargs)
return self.client.post(on="/encode", **payload)
def embed_documents(self, texts: List[str]) -> List[List[float]]:
def embed_texts(self, texts: List[str]) -> List[List[float]]:
"""Call out to Jina's embedding endpoint.
Args:
texts: The list of texts to embed.

View File

@@ -99,7 +99,7 @@ class LlamaCppEmbeddings(BaseModel, Embeddings):
return values
def embed_documents(self, texts: List[str]) -> List[List[float]]:
def embed_texts(self, texts: List[str]) -> List[List[float]]:
"""Embed a list of documents using the Llama model.
Args:

View File

@@ -1,137 +0,0 @@
"""Wrapper around MosaicML APIs."""
from typing import Any, Dict, List, Mapping, Optional, Tuple
import requests
from pydantic import BaseModel, Extra, root_validator
from langchain.embeddings.base import Embeddings
from langchain.utils import get_from_dict_or_env
class MosaicMLInstructorEmbeddings(BaseModel, Embeddings):
"""Wrapper around MosaicML's embedding inference service.
To use, you should have the
environment variable ``MOSAICML_API_TOKEN`` set with your API token, or pass
it as a named parameter to the constructor.
Example:
.. code-block:: python
from langchain.llms import MosaicMLInstructorEmbeddings
endpoint_url = (
"https://models.hosted-on.mosaicml.hosting/instructor-large/v1/predict"
)
mosaic_llm = MosaicMLInstructorEmbeddings(
endpoint_url=endpoint_url,
mosaicml_api_token="my-api-key"
)
"""
endpoint_url: str = (
"https://models.hosted-on.mosaicml.hosting/instructor-large/v1/predict"
)
"""Endpoint URL to use."""
embed_instruction: str = "Represent the document for retrieval: "
"""Instruction used to embed documents."""
query_instruction: str = (
"Represent the question for retrieving supporting documents: "
)
"""Instruction used to embed the query."""
retry_sleep: float = 1.0
"""How long to try sleeping for if a rate limit is encountered"""
mosaicml_api_token: Optional[str] = None
class Config:
"""Configuration for this pydantic object."""
extra = Extra.forbid
@root_validator()
def validate_environment(cls, values: Dict) -> Dict:
"""Validate that api key and python package exists in environment."""
mosaicml_api_token = get_from_dict_or_env(
values, "mosaicml_api_token", "MOSAICML_API_TOKEN"
)
values["mosaicml_api_token"] = mosaicml_api_token
return values
@property
def _identifying_params(self) -> Mapping[str, Any]:
"""Get the identifying parameters."""
return {"endpoint_url": self.endpoint_url}
def _embed(
self, input: List[Tuple[str, str]], is_retry: bool = False
) -> List[List[float]]:
payload = {"input_strings": input}
# HTTP headers for authorization
headers = {
"Authorization": f"{self.mosaicml_api_token}",
"Content-Type": "application/json",
}
# send request
try:
response = requests.post(self.endpoint_url, headers=headers, json=payload)
except requests.exceptions.RequestException as e:
raise ValueError(f"Error raised by inference endpoint: {e}")
try:
parsed_response = response.json()
if "error" in parsed_response:
# if we get rate limited, try sleeping for 1 second
if (
not is_retry
and "rate limit exceeded" in parsed_response["error"].lower()
):
import time
time.sleep(self.retry_sleep)
return self._embed(input, is_retry=True)
raise ValueError(
f"Error raised by inference API: {parsed_response['error']}"
)
if "data" not in parsed_response:
raise ValueError(
f"Error raised by inference API, no key data: {parsed_response}"
)
embeddings = parsed_response["data"]
except requests.exceptions.JSONDecodeError as e:
raise ValueError(
f"Error raised by inference API: {e}.\nResponse: {response.text}"
)
return embeddings
def embed_documents(self, texts: List[str]) -> List[List[float]]:
"""Embed documents using a MosaicML deployed instructor embedding model.
Args:
texts: The list of texts to embed.
Returns:
List of embeddings, one for each text.
"""
instruction_pairs = [(self.embed_instruction, text) for text in texts]
embeddings = self._embed(instruction_pairs)
return embeddings
def embed_query(self, text: str) -> List[float]:
"""Embed a query using a MosaicML deployed instructor embedding model.
Args:
text: The text to embed.
Returns:
Embeddings for the text.
"""
instruction_pair = (self.query_instruction, text)
embedding = self._embed([instruction_pair])[0]
return embedding

View File

@@ -178,7 +178,7 @@ class OpenAIEmbeddings(BaseModel, Embeddings):
openai.api_type = openai_api_type
values["client"] = openai.Embedding
except ImportError:
raise ImportError(
raise ValueError(
"Could not import openai python package. "
"Please install it with `pip install openai`."
)
@@ -192,64 +192,67 @@ class OpenAIEmbeddings(BaseModel, Embeddings):
embeddings: List[List[float]] = [[] for _ in range(len(texts))]
try:
import tiktoken
tokens = []
indices = []
encoding = tiktoken.model.encoding_for_model(self.model)
for i, text in enumerate(texts):
if self.model.endswith("001"):
# See: https://github.com/openai/openai-python/issues/418#issuecomment-1525939500
# replace newlines, which can negatively affect performance.
text = text.replace("\n", " ")
token = encoding.encode(
text,
allowed_special=self.allowed_special,
disallowed_special=self.disallowed_special,
)
for j in range(0, len(token), self.embedding_ctx_length):
tokens += [token[j : j + self.embedding_ctx_length]]
indices += [i]
batched_embeddings = []
_chunk_size = chunk_size or self.chunk_size
for i in range(0, len(tokens), _chunk_size):
response = embed_with_retry(
self,
input=tokens[i : i + _chunk_size],
engine=self.deployment,
request_timeout=self.request_timeout,
headers=self.headers,
)
batched_embeddings += [r["embedding"] for r in response["data"]]
results: List[List[List[float]]] = [[] for _ in range(len(texts))]
num_tokens_in_batch: List[List[int]] = [[] for _ in range(len(texts))]
for i in range(len(indices)):
results[indices[i]].append(batched_embeddings[i])
num_tokens_in_batch[indices[i]].append(len(tokens[i]))
for i in range(len(texts)):
_result = results[i]
if len(_result) == 0:
average = embed_with_retry(
self,
input="",
engine=self.deployment,
request_timeout=self.request_timeout,
headers=self.headers,
)["data"][0]["embedding"]
else:
average = np.average(
_result, axis=0, weights=num_tokens_in_batch[i]
)
embeddings[i] = (average / np.linalg.norm(average)).tolist()
return embeddings
except ImportError:
raise ImportError(
raise ValueError(
"Could not import tiktoken python package. "
"This is needed in order to for OpenAIEmbeddings. "
"Please install it with `pip install tiktoken`."
)
tokens = []
indices = []
encoding = tiktoken.model.encoding_for_model(self.model)
for i, text in enumerate(texts):
if self.model.endswith("001"):
# See: https://github.com/openai/openai-python/issues/418#issuecomment-1525939500
# replace newlines, which can negatively affect performance.
text = text.replace("\n", " ")
token = encoding.encode(
text,
allowed_special=self.allowed_special,
disallowed_special=self.disallowed_special,
)
for j in range(0, len(token), self.embedding_ctx_length):
tokens += [token[j : j + self.embedding_ctx_length]]
indices += [i]
batched_embeddings = []
_chunk_size = chunk_size or self.chunk_size
for i in range(0, len(tokens), _chunk_size):
response = embed_with_retry(
self,
input=tokens[i : i + _chunk_size],
engine=self.deployment,
request_timeout=self.request_timeout,
headers=self.headers,
)
batched_embeddings += [r["embedding"] for r in response["data"]]
results: List[List[List[float]]] = [[] for _ in range(len(texts))]
num_tokens_in_batch: List[List[int]] = [[] for _ in range(len(texts))]
for i in range(len(indices)):
results[indices[i]].append(batched_embeddings[i])
num_tokens_in_batch[indices[i]].append(len(tokens[i]))
for i in range(len(texts)):
_result = results[i]
if len(_result) == 0:
average = embed_with_retry(
self,
input="",
engine=self.deployment,
request_timeout=self.request_timeout,
headers=self.headers,
)["data"][0]["embedding"]
else:
average = np.average(_result, axis=0, weights=num_tokens_in_batch[i])
embeddings[i] = (average / np.linalg.norm(average)).tolist()
return embeddings
def _embedding_func(self, text: str, *, engine: str) -> List[float]:
"""Call out to OpenAI's embedding endpoint."""
# handle large input text
@@ -268,7 +271,7 @@ class OpenAIEmbeddings(BaseModel, Embeddings):
headers=self.headers,
)["data"][0]["embedding"]
def embed_documents(
def embed_texts(
self, texts: List[str], chunk_size: Optional[int] = 0
) -> List[List[float]]:
"""Call out to OpenAI's embedding endpoint for embedding search docs.

View File

@@ -164,9 +164,7 @@ class SagemakerEndpointEmbeddings(BaseModel, Embeddings):
return self.content_handler.transform_output(response["Body"])
def embed_documents(
self, texts: List[str], chunk_size: int = 64
) -> List[List[float]]:
def embed_texts(self, texts: List[str], chunk_size: int = 64) -> List[List[float]]:
"""Compute doc embeddings using a SageMaker Inference Endpoint.
Args:

View File

@@ -72,7 +72,7 @@ class SelfHostedEmbeddings(SelfHostedPipeline, Embeddings):
extra = Extra.forbid
def embed_documents(self, texts: List[str]) -> List[List[float]]:
def embed_texts(self, texts: List[str]) -> List[List[float]]:
"""Compute doc embeddings using a HuggingFace transformer model.
Args:

View File

@@ -140,7 +140,7 @@ class SelfHostedHuggingFaceInstructEmbeddings(SelfHostedHuggingFaceEmbeddings):
load_fn_kwargs["device"] = load_fn_kwargs.get("device", 0)
super().__init__(load_fn_kwargs=load_fn_kwargs, **kwargs)
def embed_documents(self, texts: List[str]) -> List[List[float]]:
def embed_texts(self, texts: List[str]) -> List[List[float]]:
"""Compute doc embeddings using a HuggingFace instruct model.
Args:

View File

@@ -30,27 +30,20 @@ class TensorflowHubEmbeddings(BaseModel, Embeddings):
super().__init__(**kwargs)
try:
import tensorflow_hub
except ImportError:
raise ImportError(
"Could not import tensorflow-hub python package. "
"Please install it with `pip install tensorflow-hub``."
)
try:
import tensorflow_text # noqa
except ImportError:
raise ImportError(
"Could not import tensorflow_text python package. "
"Please install it with `pip install tensorflow_text``."
)
self.embed = tensorflow_hub.load(self.model_url)
self.embed = tensorflow_hub.load(self.model_url)
except ImportError as e:
raise ValueError(
"Could not import some python packages." "Please install them."
) from e
class Config:
"""Configuration for this pydantic object."""
extra = Extra.forbid
def embed_documents(self, texts: List[str]) -> List[List[float]]:
def embed_texts(self, texts: List[str]) -> List[List[float]]:
"""Compute doc embeddings using a TensorflowHub embedding model.
Args:

View File

@@ -25,7 +25,7 @@ SYSTEM_PROMPT = (
class PlanningOutputParser(PlanOutputParser):
def parse(self, text: str) -> Plan:
steps = [Step(value=v) for v in re.split("\n\s*\d+\. ", text)[1:]]
steps = [Step(value=v) for v in re.split("\n\d+\. ", text)[1:]]
return Plan(steps=steps)

View File

@@ -54,7 +54,7 @@ class NetworkxEntityGraph:
try:
import networkx as nx
except ImportError:
raise ImportError(
raise ValueError(
"Could not import networkx python package. "
"Please install it with `pip install networkx`."
)
@@ -70,7 +70,7 @@ class NetworkxEntityGraph:
try:
import networkx as nx
except ImportError:
raise ImportError(
raise ValueError(
"Could not import networkx python package. "
"Please install it with `pip install networkx`."
)

View File

@@ -7,7 +7,6 @@ from langchain.llms.anthropic import Anthropic
from langchain.llms.anyscale import Anyscale
from langchain.llms.bananadev import Banana
from langchain.llms.base import BaseLLM
from langchain.llms.beam import Beam
from langchain.llms.cerebriumai import CerebriumAI
from langchain.llms.cohere import Cohere
from langchain.llms.deepinfra import DeepInfra
@@ -23,10 +22,8 @@ from langchain.llms.huggingface_text_gen_inference import HuggingFaceTextGenInfe
from langchain.llms.human import HumanInputLLM
from langchain.llms.llamacpp import LlamaCpp
from langchain.llms.modal import Modal
from langchain.llms.mosaicml import MosaicML
from langchain.llms.nlpcloud import NLPCloud
from langchain.llms.openai import AzureOpenAI, OpenAI, OpenAIChat
from langchain.llms.openlm import OpenLM
from langchain.llms.petals import Petals
from langchain.llms.pipelineai import PipelineAI
from langchain.llms.predictionguard import PredictionGuard
@@ -44,7 +41,6 @@ __all__ = [
"AlephAlpha",
"Anyscale",
"Banana",
"Beam",
"CerebriumAI",
"Cohere",
"DeepInfra",
@@ -54,11 +50,9 @@ __all__ = [
"GPT4All",
"LlamaCpp",
"Modal",
"MosaicML",
"NLPCloud",
"OpenAI",
"OpenAIChat",
"OpenLM",
"Petals",
"PipelineAI",
"HuggingFaceEndpoint",
@@ -87,7 +81,6 @@ type_to_cls_dict: Dict[str, Type[BaseLLM]] = {
"anthropic": Anthropic,
"anyscale": Anyscale,
"bananadev": Banana,
"beam": Beam,
"cerebriumai": CerebriumAI,
"cohere": Cohere,
"deepinfra": DeepInfra,
@@ -99,12 +92,10 @@ type_to_cls_dict: Dict[str, Type[BaseLLM]] = {
"huggingface_endpoint": HuggingFaceEndpoint,
"llamacpp": LlamaCpp,
"modal": Modal,
"mosaic": MosaicML,
"sagemaker_endpoint": SagemakerEndpoint,
"nlpcloud": NLPCloud,
"human-input": HumanInputLLM,
"openai": OpenAI,
"openlm": OpenLM,
"petals": Petals,
"pipelineai": PipelineAI,
"huggingface_pipeline": HuggingFacePipeline,

View File

@@ -148,7 +148,7 @@ class AlephAlpha(LLM):
values["client"] = aleph_alpha_client.Client(token=aleph_alpha_api_key)
except ImportError:
raise ImportError(
raise ValueError(
"Could not import aleph_alpha_client python package. "
"Please install it with `pip install aleph_alpha_client`."
)

View File

@@ -59,7 +59,7 @@ class _AnthropicCommon(BaseModel):
values["AI_PROMPT"] = anthropic.AI_PROMPT
values["count_tokens"] = anthropic.count_tokens
except ImportError:
raise ImportError(
raise ValueError(
"Could not import anthropic python package. "
"Please it install it with `pip install anthropic`."
)

View File

@@ -91,7 +91,7 @@ class Banana(LLM):
try:
import banana_dev as banana
except ImportError:
raise ImportError(
raise ValueError(
"Could not import banana-dev python package. "
"Please install it with `pip install banana-dev`."
)

View File

@@ -299,13 +299,6 @@ class BaseLLM(BaseLanguageModel, ABC):
.text
)
async def _call_async(
self, prompt: str, stop: Optional[List[str]] = None, callbacks: Callbacks = None
) -> str:
"""Check Cache and run the LLM on the given prompt and input."""
result = await self.agenerate([prompt], stop=stop, callbacks=callbacks)
return result.generations[0][0].text
def predict(self, text: str, *, stop: Optional[Sequence[str]] = None) -> str:
if stop is None:
_stop = None
@@ -324,24 +317,6 @@ class BaseLLM(BaseLanguageModel, ABC):
content = self(text, stop=_stop)
return AIMessage(content=content)
async def apredict(self, text: str, *, stop: Optional[Sequence[str]] = None) -> str:
if stop is None:
_stop = None
else:
_stop = list(stop)
return await self._call_async(text, stop=_stop)
async def apredict_messages(
self, messages: List[BaseMessage], *, stop: Optional[Sequence[str]] = None
) -> BaseMessage:
text = get_buffer_string(messages)
if stop is None:
_stop = None
else:
_stop = list(stop)
content = await self._call_async(text, stop=_stop)
return AIMessage(content=content)
@property
def _identifying_params(self) -> Mapping[str, Any]:
"""Get the identifying parameters."""

View File

@@ -1,268 +0,0 @@
"""Wrapper around Beam API."""
import base64
import json
import logging
import subprocess
import textwrap
import time
from typing import Any, Dict, List, Mapping, Optional
import requests
from pydantic import Extra, Field, root_validator
from langchain.callbacks.manager import CallbackManagerForLLMRun
from langchain.llms.base import LLM
from langchain.utils import get_from_dict_or_env
logger = logging.getLogger(__name__)
DEFAULT_NUM_TRIES = 10
DEFAULT_SLEEP_TIME = 4
class Beam(LLM):
"""Wrapper around Beam API for gpt2 large language model.
To use, you should have the ``beam-sdk`` python package installed,
and the environment variable ``BEAM_CLIENT_ID`` set with your client id
and ``BEAM_CLIENT_SECRET`` set with your client secret. Information on how
to get these is available here: https://docs.beam.cloud/account/api-keys.
The wrapper can then be called as follows, where the name, cpu, memory, gpu,
python version, and python packages can be updated accordingly. Once deployed,
the instance can be called.
llm = Beam(model_name="gpt2",
name="langchain-gpt2",
cpu=8,
memory="32Gi",
gpu="A10G",
python_version="python3.8",
python_packages=[
"diffusers[torch]>=0.10",
"transformers",
"torch",
"pillow",
"accelerate",
"safetensors",
"xformers",],
max_length=50)
llm._deploy()
call_result = llm._call(input)
"""
model_name: str = ""
name: str = ""
cpu: str = ""
memory: str = ""
gpu: str = ""
python_version: str = ""
python_packages: List[str] = []
max_length: str = ""
url: str = ""
"""model endpoint to use"""
model_kwargs: Dict[str, Any] = Field(default_factory=dict)
"""Holds any model parameters valid for `create` call not
explicitly specified."""
beam_client_id: str = ""
beam_client_secret: str = ""
app_id: Optional[str] = None
class Config:
"""Configuration for this pydantic config."""
extra = Extra.forbid
@root_validator(pre=True)
def build_extra(cls, values: Dict[str, Any]) -> Dict[str, Any]:
"""Build extra kwargs from additional params that were passed in."""
all_required_field_names = {field.alias for field in cls.__fields__.values()}
extra = values.get("model_kwargs", {})
for field_name in list(values):
if field_name not in all_required_field_names:
if field_name in extra:
raise ValueError(f"Found {field_name} supplied twice.")
logger.warning(
f"""{field_name} was transfered to model_kwargs.
Please confirm that {field_name} is what you intended."""
)
extra[field_name] = values.pop(field_name)
values["model_kwargs"] = extra
return values
@root_validator()
def validate_environment(cls, values: Dict) -> Dict:
"""Validate that api key and python package exists in environment."""
beam_client_id = get_from_dict_or_env(
values, "beam_client_id", "BEAM_CLIENT_ID"
)
beam_client_secret = get_from_dict_or_env(
values, "beam_client_secret", "BEAM_CLIENT_SECRET"
)
values["beam_client_id"] = beam_client_id
values["beam_client_secret"] = beam_client_secret
return values
@property
def _identifying_params(self) -> Mapping[str, Any]:
"""Get the identifying parameters."""
return {
"model_name": self.model_name,
"name": self.name,
"cpu": self.cpu,
"memory": self.memory,
"gpu": self.gpu,
"python_version": self.python_version,
"python_packages": self.python_packages,
"max_length": self.max_length,
"model_kwargs": self.model_kwargs,
}
@property
def _llm_type(self) -> str:
"""Return type of llm."""
return "beam"
def app_creation(self) -> None:
"""Creates a Python file which will contain your Beam app definition."""
script = textwrap.dedent(
"""\
import beam
# The environment your code will run on
app = beam.App(
name="{name}",
cpu={cpu},
memory="{memory}",
gpu="{gpu}",
python_version="{python_version}",
python_packages={python_packages},
)
app.Trigger.RestAPI(
inputs={{"prompt": beam.Types.String(), "max_length": beam.Types.String()}},
outputs={{"text": beam.Types.String()}},
handler="run.py:beam_langchain",
)
"""
)
script_name = "app.py"
with open(script_name, "w") as file:
file.write(
script.format(
name=self.name,
cpu=self.cpu,
memory=self.memory,
gpu=self.gpu,
python_version=self.python_version,
python_packages=self.python_packages,
)
)
def run_creation(self) -> None:
"""Creates a Python file which will be deployed on beam."""
script = textwrap.dedent(
"""
import os
import transformers
from transformers import GPT2LMHeadModel, GPT2Tokenizer
model_name = "{model_name}"
def beam_langchain(**inputs):
prompt = inputs["prompt"]
length = inputs["max_length"]
tokenizer = GPT2Tokenizer.from_pretrained(model_name)
model = GPT2LMHeadModel.from_pretrained(model_name)
encodedPrompt = tokenizer.encode(prompt, return_tensors='pt')
outputs = model.generate(encodedPrompt, max_length=int(length),
do_sample=True, pad_token_id=tokenizer.eos_token_id)
output = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(output)
return {{"text": output}}
"""
)
script_name = "run.py"
with open(script_name, "w") as file:
file.write(script.format(model_name=self.model_name))
def _deploy(self) -> str:
"""Call to Beam."""
try:
import beam # type: ignore
if beam.__path__ == "":
raise ImportError
except ImportError:
raise ImportError(
"Could not import beam python package. "
"Please install it with `curl "
"https://raw.githubusercontent.com/slai-labs"
"/get-beam/main/get-beam.sh -sSfL | sh`."
)
self.app_creation()
self.run_creation()
process = subprocess.run(
"beam deploy app.py", shell=True, capture_output=True, text=True
)
if process.returncode == 0:
output = process.stdout
logger.info(output)
lines = output.split("\n")
for line in lines:
if line.startswith(" i Send requests to: https://apps.beam.cloud/"):
self.app_id = line.split("/")[-1]
self.url = line.split(":")[1].strip()
return self.app_id
raise ValueError(
f"""Failed to retrieve the appID from the deployment output.
Deployment output: {output}"""
)
else:
raise ValueError(f"Deployment failed. Error: {process.stderr}")
@property
def authorization(self) -> str:
if self.beam_client_id:
credential_str = self.beam_client_id + ":" + self.beam_client_secret
else:
credential_str = self.beam_client_secret
return base64.b64encode(credential_str.encode()).decode()
def _call(
self,
prompt: str,
stop: Optional[list] = None,
run_manager: Optional[CallbackManagerForLLMRun] = None,
) -> str:
"""Call to Beam."""
url = "https://apps.beam.cloud/" + self.app_id if self.app_id else self.url
payload = {"prompt": prompt, "max_length": self.max_length}
headers = {
"Accept": "*/*",
"Accept-Encoding": "gzip, deflate",
"Authorization": "Basic " + self.authorization,
"Connection": "keep-alive",
"Content-Type": "application/json",
}
for _ in range(DEFAULT_NUM_TRIES):
request = requests.post(url, headers=headers, data=json.dumps(payload))
if request.status_code == 200:
return request.json()["text"]
time.sleep(DEFAULT_SLEEP_TIME)
logger.warning("Unable to successfully call model.")
return ""

View File

@@ -72,7 +72,7 @@ class Cohere(LLM):
values["client"] = cohere.Client(cohere_api_key)
except ImportError:
raise ImportError(
raise ValueError(
"Could not import cohere python package. "
"Please install it with `pip install cohere`."
)

View File

@@ -29,10 +29,7 @@ def _create_retry_decorator() -> Callable[[Any], Any]:
try:
import google.api_core.exceptions
except ImportError:
raise ImportError(
"Could not import google-api-core python package. "
"Please install it with `pip install google-api-core`."
)
raise ImportError()
multiplier = 2
min_seconds = 1
@@ -108,10 +105,7 @@ class GooglePalm(BaseLLM, BaseModel):
genai.configure(api_key=google_api_key)
except ImportError:
raise ImportError(
"Could not import google-generativeai python package. "
"Please install it with `pip install google-generativeai`."
)
raise ImportError("Could not import google.generativeai python package.")
values["client"] = genai

View File

@@ -100,7 +100,7 @@ class GooseAI(LLM):
openai.api_base = "https://api.goose.ai/v1"
values["client"] = openai.Completion
except ImportError:
raise ImportError(
raise ValueError(
"Could not import openai python package. "
"Please install it with `pip install openai`."
)

View File

@@ -97,7 +97,7 @@ class HuggingFaceTextGenInference(LLM):
values["inference_server_url"], timeout=values["timeout"]
)
except ImportError:
raise ImportError(
raise ValueError(
"Could not import text_generation python package. "
"Please install it with `pip install text_generation`."
)

View File

@@ -1,173 +0,0 @@
"""Wrapper around MosaicML APIs."""
from typing import Any, Dict, List, Mapping, Optional
import requests
from pydantic import Extra, root_validator
from langchain.callbacks.manager import CallbackManagerForLLMRun
from langchain.llms.base import LLM
from langchain.llms.utils import enforce_stop_tokens
from langchain.utils import get_from_dict_or_env
INSTRUCTION_KEY = "### Instruction:"
RESPONSE_KEY = "### Response:"
INTRO_BLURB = (
"Below is an instruction that describes a task. "
"Write a response that appropriately completes the request."
)
PROMPT_FOR_GENERATION_FORMAT = """{intro}
{instruction_key}
{instruction}
{response_key}
""".format(
intro=INTRO_BLURB,
instruction_key=INSTRUCTION_KEY,
instruction="{instruction}",
response_key=RESPONSE_KEY,
)
class MosaicML(LLM):
"""Wrapper around MosaicML's LLM inference service.
To use, you should have the
environment variable ``MOSAICML_API_TOKEN`` set with your API token, or pass
it as a named parameter to the constructor.
Example:
.. code-block:: python
from langchain.llms import MosaicML
endpoint_url = (
"https://models.hosted-on.mosaicml.hosting/mpt-7b-instruct/v1/predict"
)
mosaic_llm = MosaicML(
endpoint_url=endpoint_url,
mosaicml_api_token="my-api-key"
)
"""
endpoint_url: str = (
"https://models.hosted-on.mosaicml.hosting/mpt-7b-instruct/v1/predict"
)
"""Endpoint URL to use."""
inject_instruction_format: bool = False
"""Whether to inject the instruction format into the prompt."""
model_kwargs: Optional[dict] = None
"""Key word arguments to pass to the model."""
retry_sleep: float = 1.0
"""How long to try sleeping for if a rate limit is encountered"""
mosaicml_api_token: Optional[str] = None
class Config:
"""Configuration for this pydantic object."""
extra = Extra.forbid
@root_validator()
def validate_environment(cls, values: Dict) -> Dict:
"""Validate that api key and python package exists in environment."""
mosaicml_api_token = get_from_dict_or_env(
values, "mosaicml_api_token", "MOSAICML_API_TOKEN"
)
values["mosaicml_api_token"] = mosaicml_api_token
return values
@property
def _identifying_params(self) -> Mapping[str, Any]:
"""Get the identifying parameters."""
_model_kwargs = self.model_kwargs or {}
return {
**{"endpoint_url": self.endpoint_url},
**{"model_kwargs": _model_kwargs},
}
@property
def _llm_type(self) -> str:
"""Return type of llm."""
return "mosaicml"
def _transform_prompt(self, prompt: str) -> str:
"""Transform prompt."""
if self.inject_instruction_format:
prompt = PROMPT_FOR_GENERATION_FORMAT.format(
instruction=prompt,
)
return prompt
def _call(
self,
prompt: str,
stop: Optional[List[str]] = None,
run_manager: Optional[CallbackManagerForLLMRun] = None,
is_retry: bool = False,
) -> str:
"""Call out to a MosaicML LLM inference endpoint.
Args:
prompt: The prompt to pass into the model.
stop: Optional list of stop words to use when generating.
Returns:
The string generated by the model.
Example:
.. code-block:: python
response = mosaic_llm("Tell me a joke.")
"""
_model_kwargs = self.model_kwargs or {}
prompt = self._transform_prompt(prompt)
payload = {"input_strings": [prompt]}
payload.update(_model_kwargs)
# HTTP headers for authorization
headers = {
"Authorization": f"{self.mosaicml_api_token}",
"Content-Type": "application/json",
}
# send request
try:
response = requests.post(self.endpoint_url, headers=headers, json=payload)
except requests.exceptions.RequestException as e:
raise ValueError(f"Error raised by inference endpoint: {e}")
try:
parsed_response = response.json()
if "error" in parsed_response:
# if we get rate limited, try sleeping for 1 second
if (
not is_retry
and "rate limit exceeded" in parsed_response["error"].lower()
):
import time
time.sleep(self.retry_sleep)
return self._call(prompt, stop, run_manager, is_retry=True)
raise ValueError(
f"Error raised by inference API: {parsed_response['error']}"
)
if "data" not in parsed_response:
raise ValueError(
f"Error raised by inference API, no key data: {parsed_response}"
)
generated_text = parsed_response["data"]
except requests.exceptions.JSONDecodeError as e:
raise ValueError(
f"Error raised by inference API: {e}.\nResponse: {response.text}"
)
text = generated_text[0][len(prompt) :]
# TODO: replace when MosaicML supports custom stop tokens natively
if stop is not None:
text = enforce_stop_tokens(text, stop)
return text

View File

@@ -75,7 +75,7 @@ class NLPCloud(LLM):
values["model_name"], nlpcloud_api_key, gpu=True, lang="en"
)
except ImportError:
raise ImportError(
raise ValueError(
"Could not import nlpcloud python package. "
"Please install it with `pip install nlpcloud`."
)

View File

@@ -234,7 +234,7 @@ class BaseOpenAI(BaseLLM):
openai.organization = openai_organization
values["client"] = openai.Completion
except ImportError:
raise ImportError(
raise ValueError(
"Could not import openai python package. "
"Please install it with `pip install openai`."
)
@@ -462,7 +462,7 @@ class BaseOpenAI(BaseLLM):
try:
import tiktoken
except ImportError:
raise ImportError(
raise ValueError(
"Could not import tiktoken python package. "
"This is needed in order to calculate get_num_tokens. "
"Please install it with `pip install tiktoken`."
@@ -512,10 +512,6 @@ class BaseOpenAI(BaseLLM):
"code-cushman-001": 2048,
}
# handling finetuned models
if "ft-" in modelname:
modelname = modelname.split(":")[0]
context_size = model_token_mapping.get(modelname, None)
if context_size is None:
@@ -681,7 +677,7 @@ class OpenAIChat(BaseLLM):
if openai_organization:
openai.organization = openai_organization
except ImportError:
raise ImportError(
raise ValueError(
"Could not import openai python package. "
"Please install it with `pip install openai`."
)
@@ -811,7 +807,7 @@ class OpenAIChat(BaseLLM):
try:
import tiktoken
except ImportError:
raise ImportError(
raise ValueError(
"Could not import tiktoken python package. "
"This is needed in order to calculate get_num_tokens. "
"Please install it with `pip install tiktoken`."

View File

@@ -1,26 +0,0 @@
from typing import Any, Dict
from pydantic import root_validator
from langchain.llms.openai import BaseOpenAI
class OpenLM(BaseOpenAI):
@property
def _invocation_params(self) -> Dict[str, Any]:
return {**{"model": self.model_name}, **super()._invocation_params}
@root_validator()
def validate_environment(cls, values: Dict) -> Dict:
try:
import openlm
values["client"] = openlm.Completion
except ImportError:
raise ValueError(
"Could not import openlm python package. "
"Please install it with `pip install openlm`."
)
if values["streaming"]:
raise ValueError("Streaming not supported with openlm")
return values

View File

@@ -50,7 +50,7 @@ class PredictionGuard(LLM):
values["client"] = pg.Client(token=token)
except ImportError:
raise ImportError(
raise ValueError(
"Could not import predictionguard python package. "
"Please install it with `pip install predictionguard`."
)

View File

@@ -89,7 +89,7 @@ class Replicate(LLM):
try:
import replicate as replicate_python
except ImportError:
raise ImportError(
raise ValueError(
"Could not import replicate python package. "
"Please install it with `pip install replicate`."
)

View File

@@ -103,7 +103,7 @@ class RWKV(LLM, BaseModel):
try:
import tokenizers
except ImportError:
raise ImportError(
raise ValueError(
"Could not import tokenizers python package. "
"Please install it with `pip install tokenizers`."
)

View File

@@ -182,7 +182,7 @@ class SagemakerEndpoint(LLM):
) from e
except ImportError:
raise ImportError(
raise ValueError(
"Could not import boto3 python package. "
"Please install it with `pip install boto3`."
)

View File

@@ -155,7 +155,7 @@ class SelfHostedPipeline(LLM):
import runhouse as rh
except ImportError:
raise ImportError(
raise ValueError(
"Could not import runhouse python package. "
"Please install it with `pip install runhouse`."
)

View File

@@ -52,11 +52,13 @@ class FirestoreChatMessageHistory(BaseChatMessageHistory):
try:
import firebase_admin
from firebase_admin import firestore
except ImportError:
raise ImportError(
"Could not import firebase-admin python package. "
"Please install it with `pip install firebase-admin`."
except ImportError as e:
logger.error(
"Failed to import Firebase and Firestore: %s. "
"Make sure to install the 'firebase-admin' module.",
e,
)
raise e
# For multiple instances, only initialize the app once.
try:

View File

@@ -25,7 +25,7 @@ class RedisChatMessageHistory(BaseChatMessageHistory):
try:
import redis
except ImportError:
raise ImportError(
raise ValueError(
"Could not import redis python package. "
"Please install it with `pip install redis`."
)

View File

@@ -91,7 +91,7 @@ class RedisEntityStore(BaseEntityStore):
try:
import redis
except ImportError:
raise ImportError(
raise ValueError(
"Could not import redis python package. "
"Please install it with `pip install redis`."
)

View File

@@ -41,7 +41,7 @@ class CohereRerank(BaseDocumentCompressor):
values["client"] = cohere.Client(cohere_api_key)
except ImportError:
raise ImportError(
raise ValueError(
"Could not import cohere python package. "
"Please install it with `pip install cohere`."
)

Some files were not shown because too many files have changed in this diff Show More