mirror of
https://github.com/hwchase17/langchain.git
synced 2026-02-21 06:33:41 +00:00
## Background
This PR fixes this error when there are special tokens when querying the
chain:
```
Encountered text corresponding to disallowed special token '<|endofprompt|>'.
If you want this text to be encoded as a special token, pass it to `allowed_special`, e.g. `allowed_special={'<|endofprompt|>', ...}`.
If you want this text to be encoded as normal text, disable the check for this token by passing `disallowed_special=(enc.special_tokens_set - {'<|endofprompt|>'})`.
To disable this check for all special tokens, pass `disallowed_special=()`.
```
Refer to the code snippet below, it breaks in the chain line.
```
chain = ConversationalRetrievalChain.from_llm(
ChatOpenAI(openai_api_key=OPENAI_API_KEY),
retriever=vectorstore.as_retriever(),
qa_prompt=prompt,
condense_question_prompt=condense_prompt,
)
answer = chain({"question": f"{question}"})
```
However `ChatOpenAI` class is not accepting `allowed_special` and
`disallowed_special` at the moment so they cannot be passed to the
`encode()` in `get_num_tokens` method to avoid the errors.
## Change
- Add `allowed_special` and `disallowed_special` attributes to
`BaseOpenAI` class.
- Pass in `allowed_special` and `disallowed_special` as arguments of
`encode()` in tiktoken.
---------
Co-authored-by: samcarmen <“carmen.samkahman@gmail.com”>