mirror of
https://github.com/hwchase17/langchain.git
synced 2026-07-01 14:47:02 +00:00
feat(groq): add performance service tier (#38339)
Groq's API now exposes a fourth service tier, `performance` — their highest tier, providing reliable low latency for the most critical production applications. `ChatGroq.service_tier` only accepted `on_demand`, `flex`, and `auto`, so users who wanted to route requests to the performance tier had no type-safe way to do so. This widens the `service_tier` `Literal` to include `performance` and documents it alongside the existing tiers. The value is passed straight through to the Groq SDK as a constrained enum, so no validation or mapping logic changes were needed. Reference: [Groq service tiers documentation](https://console.groq.com/docs/service-tiers). An integration test case was added to `test_setting_service_tier_class` mirroring the existing per-tier assertions; it exercises a live request and so runs only with a Groq API key.
This commit is contained in:
@@ -430,7 +430,9 @@ class ChatGroq(BaseChatModel):
|
||||
max_tokens: int | None = None
|
||||
"""Maximum number of tokens to generate."""
|
||||
|
||||
service_tier: Literal["on_demand", "flex", "auto"] = Field(default="on_demand")
|
||||
service_tier: Literal["on_demand", "flex", "auto", "performance"] = Field(
|
||||
default="on_demand"
|
||||
)
|
||||
"""Optional parameter that you can include to specify the service tier you'd like to
|
||||
use for requests.
|
||||
|
||||
@@ -440,6 +442,8 @@ class ChatGroq(BaseChatModel):
|
||||
reliability for workloads that don't require guaranteed processing.
|
||||
- `'auto'`: Uses on-demand rate limits, then falls back to `'flex'` if those
|
||||
limits are exceeded
|
||||
- `'performance'`: Highest tier, providing reliable low latency for the most
|
||||
critical production applications.
|
||||
|
||||
See the [Groq documentation](https://console.groq.com/docs/flex-processing) for more
|
||||
details and a list of service tiers and descriptions.
|
||||
|
||||
@@ -574,6 +574,9 @@ def test_setting_service_tier_class() -> None:
|
||||
response = chat.invoke([message])
|
||||
assert response.response_metadata.get("service_tier") == "on_demand"
|
||||
|
||||
chat = ChatGroq(model=DEFAULT_MODEL_NAME, service_tier="performance")
|
||||
assert chat.service_tier == "performance"
|
||||
|
||||
chat = ChatGroq(model=DEFAULT_MODEL_NAME)
|
||||
assert chat.service_tier == "on_demand"
|
||||
response = chat.invoke([message])
|
||||
|
||||
Reference in New Issue
Block a user