langchain/docs/docs/integrations/llms/bigdl.ipynb
Shengsheng Huang ae471a7dcb
community[minor]: add BigDL-LLM integrations (#17953)
- **Description**:
[`bigdl-llm`](https://github.com/intel-analytics/BigDL) is a library for
running LLM on Intel XPU (from Laptop to GPU to Cloud) using
INT4/FP4/INT8/FP8 with very low latency (for any PyTorch model). This PR
adds bigdl-llm integrations to langchain.
- **Issue**: NA
- **Dependencies**: `bigdl-llm` library
- **Contribution maintainer**: @shane-huang 
 
Examples added:
- docs/docs/integrations/llms/bigdl.ipynb
2024-03-01 10:04:53 -08:00

183 lines
5.2 KiB
Plaintext

{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# BigDL-LLM\n",
"\n",
"> [BigDL-LLM](https://github.com/intel-analytics/BigDL/) is a low-bit LLM optimization library on Intel XPU (Xeon/Core/Flex/Arc/Max). It can make LLMs run extremely fast and consume much less memory on Intel platforms. It is open sourced under Apache 2.0 License.\n",
"\n",
"This example goes over how to use LangChain to interact with BigDL-LLM for text generation. \n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Setup"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Update Langchain\n",
"\n",
"%pip install -qU langchain langchain-community"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Install BigDL-LLM for running LLMs locally on Intel CPU."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Install BigDL\n",
"%pip install --pre --upgrade bigdl-llm[all]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Usage"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"from langchain.chains import LLMChain\n",
"from langchain_community.llms.bigdl import BigdlLLM\n",
"from langchain_core.prompts import PromptTemplate"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"template = \"USER: {question}\\nASSISTANT:\"\n",
"prompt = PromptTemplate(template=template, input_variables=[\"question\"])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Load Model: "
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "69e018750ffb4de1af22ce49cd6957f4",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"2024-02-23 18:10:22,896 - INFO - Converting the current model to sym_int4 format......\n",
"2024-02-23 18:10:25,415 - INFO - BIGDL_OPT_IPEX: False\n"
]
}
],
"source": [
"llm = BigdlLLM.from_model_id(\n",
" model_id=\"lmsys/vicuna-7b-v1.5\",\n",
" model_kwargs={\"temperature\": 0, \"max_length\": 64, \"trust_remote_code\": True},\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Use it in Chains:"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/opt/anaconda3/envs/shane-langchain2/lib/python3.9/site-packages/langchain_core/_api/deprecation.py:117: LangChainDeprecationWarning: The function `run` was deprecated in LangChain 0.1.0 and will be removed in 0.2.0. Use invoke instead.\n",
" warn_deprecated(\n",
"/opt/anaconda3/envs/shane-langchain2/lib/python3.9/site-packages/transformers/generation/utils.py:1369: UserWarning: Using `max_length`'s default (4096) to control the generation length. This behaviour is deprecated and will be removed from the config in v5 of Transformers -- we recommend using `max_new_tokens` to control the maximum length of the generation.\n",
" warnings.warn(\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...\n",
"To disable this warning, you can either:\n",
"\t- Avoid using `tokenizers` before the fork if possible\n",
"\t- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)\n",
"AI stands for \"Artificial Intelligence.\" It refers to the development of computer systems that can perform tasks that typically require human intelligence, such as visual perception, speech recognition, decision-making, and language translation. AI can be achieved through a combination of techniques such as machine learning, natural language processing, computer vision, and robotics. The ultimate goal of AI research is to create machines that can think and learn like humans, and can even exceed human capabilities in certain areas.\n"
]
}
],
"source": [
"llm_chain = LLMChain(prompt=prompt, llm=llm)\n",
"\n",
"question = \"What is AI?\"\n",
"output = llm_chain.run(question)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "shane-diffusion",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.18"
}
},
"nbformat": 4,
"nbformat_minor": 2
}