[HuggingFace Pipeline] add streaming support (#23852)

This commit is contained in:
Ethan Yang
2024-07-10 05:02:00 +08:00
committed by GitHub
parent 34a02efcf9
commit 13855ef0c3
5 changed files with 167 additions and 23 deletions

View File

@@ -143,6 +143,25 @@
"print(chain.invoke({\"question\": question}))"
]
},
{
"cell_type": "markdown",
"id": "5141dc4d",
"metadata": {},
"source": [
"Streaming repsonse."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f1819250-2db9-4143-b88a-12e92d4e2386",
"metadata": {},
"outputs": [],
"source": [
"for chunk in chain.stream(question):\n",
" print(chunk, end=\"\", flush=True)"
]
},
{
"cell_type": "markdown",
"id": "dbbc3a37",

View File

@@ -245,7 +245,7 @@
"source": [
"### Streaming\n",
"\n",
"To get streaming of LLM output, you can create a Huggingface `TextIteratorStreamer` for `_forward_params`."
"You can use `stream` method to get a streaming of LLM output, "
]
},
{
@@ -255,24 +255,11 @@
"metadata": {},
"outputs": [],
"source": [
"from threading import Thread\n",
"generation_config = {\"skip_prompt\": True, \"pipeline_kwargs\": {\"max_new_tokens\": 100}}\n",
"chain = prompt | ov_llm.bind(**generation_config)\n",
"\n",
"from transformers import TextIteratorStreamer\n",
"\n",
"streamer = TextIteratorStreamer(\n",
" ov_llm.pipeline.tokenizer,\n",
" timeout=30.0,\n",
" skip_prompt=True,\n",
" skip_special_tokens=True,\n",
")\n",
"pipeline_kwargs = {\"pipeline_kwargs\": {\"streamer\": streamer, \"max_new_tokens\": 100}}\n",
"chain = prompt | ov_llm.bind(**pipeline_kwargs)\n",
"\n",
"t1 = Thread(target=chain.invoke, args=({\"question\": question},))\n",
"t1.start()\n",
"\n",
"for new_text in streamer:\n",
" print(new_text, end=\"\", flush=True)"
"for chunk in chain.stream(question):\n",
" print(chunk, end=\"\", flush=True)"
]
},
{