mirror of
https://github.com/hwchase17/langchain.git
synced 2025-07-16 09:48:04 +00:00
docs: llamacpp minor fixes (#8738)
- Description: minor updates on llama cpp doc
This commit is contained in:
parent
f437311eef
commit
91a0817e39
@ -4,12 +4,12 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Llama-cpp\n",
|
||||
"# Llama.cpp\n",
|
||||
"\n",
|
||||
"[llama-cpp](https://github.com/abetlen/llama-cpp-python) is a Python binding for [llama.cpp](https://github.com/ggerganov/llama.cpp). \n",
|
||||
"[llama-cpp-python](https://github.com/abetlen/llama-cpp-python) is a Python binding for [llama.cpp](https://github.com/ggerganov/llama.cpp). \n",
|
||||
"It supports [several LLMs](https://github.com/ggerganov/llama.cpp).\n",
|
||||
"\n",
|
||||
"This notebook goes over how to run `llama-cpp` within LangChain."
|
||||
"This notebook goes over how to run `llama-cpp-python` within LangChain."
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -18,7 +18,7 @@
|
||||
"source": [
|
||||
"## Installation\n",
|
||||
"\n",
|
||||
"There is a bunch of options how to install the llama-cpp package: \n",
|
||||
"There are different options on how to install the llama-cpp package: \n",
|
||||
"- only CPU usage\n",
|
||||
"- CPU + GPU (using one of many BLAS backends)\n",
|
||||
"- Metal GPU (MacOS with Apple Silicon Chip) \n",
|
||||
@ -61,7 +61,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**IMPORTANT**: If you have already installed a cpu only version of the package, you need to reinstall it from scratch: consider the following command: "
|
||||
"**IMPORTANT**: If you have already installed the CPU only version of the package, you need to reinstall it from scratch. Consider the following command: "
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -79,7 +79,7 @@
|
||||
"source": [
|
||||
"### Installation with Metal\n",
|
||||
"\n",
|
||||
"`lama.cpp` supports Apple silicon first-class citizen - optimized via ARM NEON, Accelerate and Metal frameworks. Use the `FORCE_CMAKE=1` environment variable to force the use of cmake and install the pip package for the Metal support ([source](https://github.com/abetlen/llama-cpp-python/blob/main/docs/install/macos.md)).\n",
|
||||
"`llama.cpp` supports Apple silicon first-class citizen - optimized via ARM NEON, Accelerate and Metal frameworks. Use the `FORCE_CMAKE=1` environment variable to force the use of cmake and install the pip package for the Metal support ([source](https://github.com/abetlen/llama-cpp-python/blob/main/docs/install/macos.md)).\n",
|
||||
"\n",
|
||||
"Example installation with Metal Support:"
|
||||
]
|
||||
@ -143,7 +143,7 @@
|
||||
"\n",
|
||||
"#### Compiling and installing\n",
|
||||
"\n",
|
||||
"In the same command prompt (anaconda prompt) you set the variables, you can cd into `llama-cpp-python` directory and run the following commands.\n",
|
||||
"In the same command prompt (anaconda prompt) you set the variables, you can `cd` into `llama-cpp-python` directory and run the following commands.\n",
|
||||
"\n",
|
||||
"```\n",
|
||||
"python setup.py clean\n",
|
||||
@ -164,7 +164,9 @@
|
||||
"source": [
|
||||
"Make sure you are following all instructions to [install all necessary model files](https://github.com/ggerganov/llama.cpp).\n",
|
||||
"\n",
|
||||
"You don't need an `API_TOKEN`!"
|
||||
"You don't need an `API_TOKEN` as you will run the LLM locally.\n",
|
||||
"\n",
|
||||
"It is worth understanding which models are suitable to be used on the desired machine."
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -227,7 +229,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"`Llama-v2`"
|
||||
"Example using a LLaMA 2 7B model"
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -304,7 +306,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"`Llama-v1`"
|
||||
"Example using a LLaMA v1 model"
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -381,7 +383,7 @@
|
||||
"source": [
|
||||
"### GPU\n",
|
||||
"\n",
|
||||
"If the installation with BLAS backend was correct, you will see an `BLAS = 1` indicator in model properties.\n",
|
||||
"If the installation with BLAS backend was correct, you will see a `BLAS = 1` indicator in model properties.\n",
|
||||
"\n",
|
||||
"Two of the most important parameters for use with GPU are:\n",
|
||||
"\n",
|
||||
@ -473,22 +475,15 @@
|
||||
"llm_chain.run(question)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Metal\n",
|
||||
"\n",
|
||||
"If the installation with Metal was correct, you will see an `NEON = 1` indicator in model properties.\n",
|
||||
"If the installation with Metal was correct, you will see a `NEON = 1` indicator in model properties.\n",
|
||||
"\n",
|
||||
"Two of the most important parameters for use with GPU are:\n",
|
||||
"Two of the most important GPU parameters are:\n",
|
||||
"\n",
|
||||
"- `n_gpu_layers` - determines how many layers of the model are offloaded to your Metal GPU, in the most case, set it to `1` is enough for Metal\n",
|
||||
"- `n_batch` - how many tokens are processed in parallel, default is 8, set to bigger number.\n",
|
||||
@ -522,7 +517,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The rest are almost same as GPU, the console log will show the following log to indicate the Metal was enable properly.\n",
|
||||
"The console log will show the following log to indicate Metal was enable properly.\n",
|
||||
"\n",
|
||||
"```\n",
|
||||
"ggml_metal_init: allocating\n",
|
||||
@ -530,7 +525,9 @@
|
||||
"...\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"You also could check the `Activity Monitor` by watching the % GPU of the process, the % CPU will drop dramatically after turn on `n_gpu_layers=1`. Also for the first time call LLM, the performance might be slow due to the model compilation in Metal GPU."
|
||||
"You also could check `Activity Monitor` by watching the GPU usage of the process, the CPU usage will drop dramatically after turn on `n_gpu_layers=1`. \n",
|
||||
"\n",
|
||||
"For the first call to the LLM, the performance may be slow due to the model compilation in Metal GPU."
|
||||
]
|
||||
}
|
||||
],
|
||||
|
Loading…
Reference in New Issue
Block a user