community[patch]: invoke callback prior to yielding token (llama.cpp) (#19392)

**Description:** Invoke callback prior to yielding token for llama.cpp **Issue:** [Callback for on_llm_new_token should be invoked before the token is yielded by the model #16913](https://github.com/langchain-ai/langchain/issues/16913) **Dependencies:** None
2025-09-22 02:50:31 +00:00 · 2024-03-23 01:47:56 +05:30
parent c4599444ee
commit 4856a87261
1 changed files with 1 additions and 1 deletions
--- a/libs/community/langchain_community/llms/llamacpp.py
+++ b/libs/community/langchain_community/llms/llamacpp.py
@@ -344,11 +344,11 @@ class LlamaCpp(LLM):
                text=part["choices"][0]["text"],
                generation_info={"logprobs": logprobs},
            )
-            yield chunk
            if run_manager:
                run_manager.on_llm_new_token(
                    token=chunk.text, verbose=self.verbose, log_probs=logprobs
                )
+            yield chunk

    def get_num_tokens(self, text: str) -> int:
        tokenized_text = self.client.tokenize(text.encode("utf-8"))