mirror of
https://github.com/hwchase17/langchain.git
synced 2026-04-04 11:25:11 +00:00
Use docusaurus versioning with a callout, merged master as well @hwchase17 @baskaryan --------- Signed-off-by: Weichen Xu <weichen.xu@databricks.com> Signed-off-by: Rahul Tripathi <rauhl.psit.ec@gmail.com> Co-authored-by: Leonid Ganeline <leo.gan.57@gmail.com> Co-authored-by: Leonid Kuligin <lkuligin@yandex.ru> Co-authored-by: Averi Kitsch <akitsch@google.com> Co-authored-by: Erick Friis <erick@langchain.dev> Co-authored-by: Nuno Campos <nuno@langchain.dev> Co-authored-by: Nuno Campos <nuno@boringbits.io> Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com> Co-authored-by: Martín Gotelli Ferenaz <martingotelliferenaz@gmail.com> Co-authored-by: Fayfox <admin@fayfox.com> Co-authored-by: Eugene Yurtsev <eugene@langchain.dev> Co-authored-by: Dawson Bauer <105886620+djbauer2@users.noreply.github.com> Co-authored-by: Ravindu Somawansa <ravindu.somawansa@gmail.com> Co-authored-by: Dhruv Chawla <43818888+Dominastorm@users.noreply.github.com> Co-authored-by: ccurme <chester.curme@gmail.com> Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: WeichenXu <weichen.xu@databricks.com> Co-authored-by: Benito Geordie <89472452+benitoThree@users.noreply.github.com> Co-authored-by: kartikTAI <129414343+kartikTAI@users.noreply.github.com> Co-authored-by: Kartik Sarangmath <kartik@thirdai.com> Co-authored-by: Sevin F. Varoglu <sfvaroglu@octoml.ai> Co-authored-by: MacanPN <martin.triska@gmail.com> Co-authored-by: Prashanth Rao <35005448+prrao87@users.noreply.github.com> Co-authored-by: Hyeongchan Kim <kozistr@gmail.com> Co-authored-by: sdan <git@sdan.io> Co-authored-by: Guangdong Liu <liugddx@gmail.com> Co-authored-by: Rahul Triptahi <rahul.psit.ec@gmail.com> Co-authored-by: Rahul Tripathi <rauhl.psit.ec@gmail.com> Co-authored-by: pjb157 <84070455+pjb157@users.noreply.github.com> Co-authored-by: Eun Hye Kim <ehkim1440@gmail.com> Co-authored-by: kaijietti <43436010+kaijietti@users.noreply.github.com> Co-authored-by: Pengcheng Liu <pcliu.fd@gmail.com> Co-authored-by: Tomer Cagan <tomer@tomercagan.com> Co-authored-by: Christophe Bornet <cbornet@hotmail.com>
258 lines
7.7 KiB
Plaintext
258 lines
7.7 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "959300d4",
|
|
"metadata": {},
|
|
"source": [
|
|
"# OpenVINO\n",
|
|
"\n",
|
|
"[OpenVINO™](https://github.com/openvinotoolkit/openvino) is an open-source toolkit for optimizing and deploying AI inference. OpenVINO™ Runtime can enable running the same model optimized across various hardware [devices](https://github.com/openvinotoolkit/openvino?tab=readme-ov-file#supported-hardware-matrix). Accelerate your deep learning performance across use cases like: language + LLMs, computer vision, automatic speech recognition, and more.\n",
|
|
"\n",
|
|
"OpenVINO models can be run locally through the `HuggingFacePipeline` [class](https://python.langchain.com/docs/integrations/llms/huggingface_pipeline). To deploy a model with OpenVINO, you can specify the `backend=\"openvino\"` parameter to trigger OpenVINO as backend inference framework."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "4c1b8450-5eaf-4d34-8341-2d785448a1ff",
|
|
"metadata": {
|
|
"tags": []
|
|
},
|
|
"source": [
|
|
"To use, you should have the ``optimum-intel`` with OpenVINO Accelerator python [package installed](https://github.com/huggingface/optimum-intel?tab=readme-ov-file#installation)."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "d772b637-de00-4663-bd77-9bc96d798db2",
|
|
"metadata": {
|
|
"tags": []
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"%pip install --upgrade-strategy eager \"optimum[openvino,nncf]\" --quiet"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "91ad075f-71d5-4bc8-ab91-cc0ad5ef16bb",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Model Loading\n",
|
|
"\n",
|
|
"Models can be loaded by specifying the model parameters using the `from_model_id` method.\n",
|
|
"\n",
|
|
"If you have an Intel GPU, you can specify `model_kwargs={\"device\": \"GPU\"}` to run inference on it."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "165ae236-962a-4763-8052-c4836d78a5d2",
|
|
"metadata": {
|
|
"tags": []
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"from langchain_community.llms.huggingface_pipeline import HuggingFacePipeline\n",
|
|
"\n",
|
|
"ov_config = {\"PERFORMANCE_HINT\": \"LATENCY\", \"NUM_STREAMS\": \"1\", \"CACHE_DIR\": \"\"}\n",
|
|
"\n",
|
|
"ov_llm = HuggingFacePipeline.from_model_id(\n",
|
|
" model_id=\"gpt2\",\n",
|
|
" task=\"text-generation\",\n",
|
|
" backend=\"openvino\",\n",
|
|
" model_kwargs={\"device\": \"CPU\", \"ov_config\": ov_config},\n",
|
|
" pipeline_kwargs={\"max_new_tokens\": 10},\n",
|
|
")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "00104b27-0c15-4a97-b198-4512337ee211",
|
|
"metadata": {},
|
|
"source": [
|
|
"They can also be loaded by passing in an existing [`optimum-intel`](https://huggingface.co/docs/optimum/main/en/intel/inference) pipeline directly"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "7f426a4f",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from optimum.intel.openvino import OVModelForCausalLM\n",
|
|
"from transformers import AutoTokenizer, pipeline\n",
|
|
"\n",
|
|
"model_id = \"gpt2\"\n",
|
|
"device = \"CPU\"\n",
|
|
"tokenizer = AutoTokenizer.from_pretrained(model_id)\n",
|
|
"ov_model = OVModelForCausalLM.from_pretrained(\n",
|
|
" model_id, device=device, ov_config=ov_config\n",
|
|
")\n",
|
|
"ov_pipe = pipeline(\n",
|
|
" \"text-generation\", model=ov_model, tokenizer=tokenizer, max_new_tokens=10\n",
|
|
")\n",
|
|
"ov_llm = HuggingFacePipeline(pipeline=ov_pipe)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "60e7ba8d",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Create Chain\n",
|
|
"\n",
|
|
"With the model loaded into memory, you can compose it with a prompt to\n",
|
|
"form a chain."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "3acf0069",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from langchain.prompts import PromptTemplate\n",
|
|
"\n",
|
|
"template = \"\"\"Question: {question}\n",
|
|
"\n",
|
|
"Answer: Let's think step by step.\"\"\"\n",
|
|
"prompt = PromptTemplate.from_template(template)\n",
|
|
"\n",
|
|
"chain = prompt | ov_llm\n",
|
|
"\n",
|
|
"question = \"What is electroencephalography?\"\n",
|
|
"\n",
|
|
"print(chain.invoke({\"question\": question}))"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "12524837-e9ab-455a-86be-66b95f4f893a",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Inference with local OpenVINO model\n",
|
|
"\n",
|
|
"It is possible to [export your model](https://github.com/huggingface/optimum-intel?tab=readme-ov-file#export) to the OpenVINO IR format with the CLI, and load the model from local folder.\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "3d1104a2-79c7-43a6-aa1c-8076a5ad7747",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"!optimum-cli export openvino --model gpt2 ov_model_dir"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "0f7a6d21",
|
|
"metadata": {},
|
|
"source": [
|
|
"It is recommended to apply 8 or 4-bit weight quantization to reduce inference latency and model footprint using `--weight-format`:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "97088ea0",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"!optimum-cli export openvino --model gpt2 --weight-format int8 ov_model_dir # for 8-bit quantization\n",
|
|
"\n",
|
|
"!optimum-cli export openvino --model gpt2 --weight-format int4 ov_model_dir # for 4-bit quantization"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "ac71e60d-5595-454e-8602-03ebb0248205",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"ov_llm = HuggingFacePipeline.from_model_id(\n",
|
|
" model_id=\"ov_model_dir\",\n",
|
|
" task=\"text-generation\",\n",
|
|
" backend=\"openvino\",\n",
|
|
" model_kwargs={\"device\": \"CPU\", \"ov_config\": ov_config},\n",
|
|
" pipeline_kwargs={\"max_new_tokens\": 10},\n",
|
|
")\n",
|
|
"\n",
|
|
"ov_chain = prompt | ov_llm\n",
|
|
"\n",
|
|
"question = \"What is electroencephalography?\"\n",
|
|
"\n",
|
|
"print(ov_chain.invoke({\"question\": question}))"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "a2c5726c",
|
|
"metadata": {},
|
|
"source": [
|
|
"You can get additional inference speed improvement with Dynamic Quantization of activations and KV-cache quantization. These options can be enabled with `ov_config` as follows:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "a1f9c2c5",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"ov_config = {\n",
|
|
" \"KV_CACHE_PRECISION\": \"u8\",\n",
|
|
" \"DYNAMIC_QUANTIZATION_GROUP_SIZE\": \"32\",\n",
|
|
" \"PERFORMANCE_HINT\": \"LATENCY\",\n",
|
|
" \"NUM_STREAMS\": \"1\",\n",
|
|
" \"CACHE_DIR\": \"\",\n",
|
|
"}"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "da9a9239",
|
|
"metadata": {},
|
|
"source": [
|
|
"For more information refer to:\n",
|
|
"\n",
|
|
"* [OpenVINO LLM guide](https://docs.openvino.ai/2024/learn-openvino/llm_inference_guide.html).\n",
|
|
"\n",
|
|
"* [OpenVINO Documentation](https://docs.openvino.ai/2024/home.html).\n",
|
|
"\n",
|
|
"* [OpenVINO Get Started Guide](https://www.intel.com/content/www/us/en/content-details/819067/openvino-get-started-guide.html).\n",
|
|
" \n",
|
|
"* [RAG Notebook with LangChain](https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/llm-chatbot)."
|
|
]
|
|
}
|
|
],
|
|
"metadata": {
|
|
"kernelspec": {
|
|
"display_name": "Python 3 (ipykernel)",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.10.12"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 5
|
|
}
|