Merge branch 'master' into eugene/foo_foo

This commit is contained in:
Eugene Yurtsev 2024-10-24 13:46:20 -04:00 committed by GitHub
commit a5cfcbc849
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
24 changed files with 187 additions and 109 deletions

View File

@ -66,7 +66,7 @@ Several more powerful methods that utilizes native features in the model provide
Many [model providers support](/docs/integrations/chat/) tool calling, a concept discussed in more detail in our [tool calling guide](/docs/concepts/tool_calling/).
In short, tool calling involves binding a tool to a model and, when appropriate, the model can *decide* to call this tool and ensure its response conforms to the tool's schema.
With this in mind, the central concept is strightforward: *simply bind our schema to a model as a tool!*
With this in mind, the central concept is straightforward: *simply bind our schema to a model as a tool!*
Here is an example using the `ResponseFormatter` schema defined above:
```python
@ -106,7 +106,7 @@ ai_msg.content
```
One important point to flag: the model *still* returns a string, which needs to be parsed into a JSON object.
This can, of course, simply use the `json` library or a JSON output parser if you need more adavanced functionality.
This can, of course, simply use the `json` library or a JSON output parser if you need more advanced functionality.
See this [how-to guide on the JSON output parser](/docs/how_to/output_parser_json) for more details.
```python
@ -117,7 +117,7 @@ json_object = json.loads(ai_msg.content)
## Structured output method
There a few challenges when producing structured output with the above methods:
There are a few challenges when producing structured output with the above methods:
(1) If using tool calling, tool call arguments needs to be parsed from a dictionary back to the original schema.
@ -145,4 +145,4 @@ ResponseFormatter(answer="The powerhouse of the cell is the mitochondrion. Mitoc
For more details on usage, see our [how-to guide](/docs/how_to/structured_output/#the-with_structured_output-method).
:::
:::

View File

@ -35,7 +35,7 @@
"- Creating a [Retriever](/docs/concepts/retrievers) to expose specific information to our agent\n",
"- Using a Search [Tool](/docs/concepts/tools) to look up things online\n",
"- [`Chat History`](/docs/concepts/chat_history), which allows a chatbot to \"remember\" past interactions and take them into account when responding to follow-up questions. \n",
"- Debugging and tracing your application using [LangSmith](/docs/concepts/#langsmith)\n",
"- Debugging and tracing your application using [LangSmith](https://docs.smith.langchain.com/)\n",
"\n",
"## Setup\n",
"\n",

View File

@ -13,7 +13,7 @@
"<Prerequisites titlesAndLinks={[\n",
" [\"Chat models\", \"/docs/concepts/chat_models\"],\n",
" [\"Few-shot-prompting\", \"/docs/concepts/few-shot-prompting\"],\n",
" [\"LangSmith\", \"/docs/concepts/#langsmith\"],\n",
" [\"LangSmith\", \"https://docs.smith.langchain.com/\"],\n",
"]} />\n",
"\n",
"\n",

View File

@ -23,7 +23,7 @@
"- [Prompt templates](/docs/concepts/prompt_templates)\n",
"- [Example selectors](/docs/concepts/example_selectors)\n",
"- [LLMs](/docs/concepts/text_llms)\n",
"- [Vectorstores](/docs/concepts/#vector-stores)\n",
"- [Vectorstores](/docs/concepts/vectorstores)\n",
"\n",
":::\n",
"\n",

View File

@ -23,7 +23,7 @@
"- [Prompt templates](/docs/concepts/prompt_templates)\n",
"- [Example selectors](/docs/concepts/example_selectors)\n",
"- [Chat models](/docs/concepts/chat_models)\n",
"- [Vectorstores](/docs/concepts/#vector-stores)\n",
"- [Vectorstores](/docs/concepts/vectorstores)\n",
"\n",
":::\n",
"\n",

View File

@ -159,7 +159,7 @@ What LangChain calls [LLMs](/docs/concepts/text_llms) are older forms of languag
### Vector stores
[Vector stores](/docs/concepts/#vector-stores) are databases that can efficiently store and retrieve embeddings.
[Vector stores](/docs/concepts/vectorstores) are databases that can efficiently store and retrieve embeddings.
- [How to: use a vector store to retrieve data](/docs/how_to/vectorstores)

View File

@ -214,7 +214,7 @@
"id": "3ca23082-c602-4ee8-af8c-a185b1f42bd1",
"metadata": {},
"source": [
"While the PydanticOutputParser cannot:"
"While the `PydanticOutputParser` cannot:"
]
},
{

View File

@ -18,7 +18,7 @@
"\n",
":::\n",
"\n",
"LLMs from different providers often have different strengths depending on the specific data they are trianed on. This also means that some may be \"better\" and more reliable at generating output in formats other than JSON.\n",
"LLMs from different providers often have different strengths depending on the specific data they are trained on. This also means that some may be \"better\" and more reliable at generating output in formats other than JSON.\n",
"\n",
"This guide shows you how to use the [`XMLOutputParser`](https://python.langchain.com/api_reference/core/output_parsers/langchain_core.output_parsers.xml.XMLOutputParser.html) to prompt models for XML output, then and parse that output into a usable format.\n",
"\n",

View File

@ -18,7 +18,7 @@
"\n",
":::\n",
"\n",
"LLMs from different providers often have different strengths depending on the specific data they are trianed on. This also means that some may be \"better\" and more reliable at generating output in formats other than JSON.\n",
"LLMs from different providers often have different strengths depending on the specific data they are trained on. This also means that some may be \"better\" and more reliable at generating output in formats other than JSON.\n",
"\n",
"This output parser allows users to specify an arbitrary schema and query LLMs for outputs that conform to that schema, using YAML to format their response.\n",
"\n",

View File

@ -153,7 +153,7 @@
"\n",
"## Next steps\n",
"\n",
"Now you've learned how to pass data through your chains to help to help format the data flowing through your chains.\n",
"Now you've learned how to pass data through your chains to help format the data flowing through your chains.\n",
"\n",
"To learn more, see the other how-to guides on runnables in this section."
]

View File

@ -16,7 +16,7 @@ Retrievers accept a string query as input and return a list of [Documents](https
For specifics on how to use retrievers, see the [relevant how-to guides here](/docs/how_to/#retrievers).
Note that all [vector stores](/docs/concepts/#vector-stores) can be [cast to retrievers](/docs/how_to/vectorstore_retriever/).
Note that all [vector stores](/docs/concepts/vectorstores) can be [cast to retrievers](/docs/how_to/vectorstore_retriever/).
Refer to the vector store [integration docs](/docs/integrations/vectorstores/) for available vector stores.
This page lists custom retrievers, implemented via subclassing [BaseRetriever](/docs/how_to/custom_retriever/).

View File

@ -7,7 +7,7 @@ sidebar_class_name: hidden
import { CategoryTable, IndexTable } from "@theme/FeatureTables";
A [vector store](/docs/concepts/#vector-stores) stores [embedded](/docs/concepts/embedding_models) data and performs similarity search.
A [vector store](/docs/concepts/vectorstores) stores [embedded](/docs/concepts/embedding_models) data and performs similarity search.
<CategoryTable category="vectorstores" />

View File

@ -27,7 +27,7 @@
"\n",
"- Using [LangChain Expression Language (LCEL)](/docs/concepts/lcel) to chain components together\n",
"\n",
"- Debugging and tracing your application using [LangSmith](/docs/concepts/#langsmith)\n",
"- Debugging and tracing your application using [LangSmith](https://docs.smith.langchain.com/)\n",
"\n",
"- Deploying your application with [LangServe](/docs/concepts/#langserve)\n",
"\n",

View File

@ -14,7 +14,7 @@
"- [Chat Models](/docs/concepts/chat_models)\n",
"- [Chaining runnables](/docs/how_to/sequence/)\n",
"- [Embeddings](/docs/concepts/embedding_models)\n",
"- [Vector stores](/docs/concepts/#vector-stores)\n",
"- [Vector stores](/docs/concepts/vectorstores)\n",
"- [Retrieval-augmented generation](/docs/tutorials/rag/)\n",
"\n",
":::\n",

View File

@ -26,7 +26,7 @@
"- [Document loaders](/docs/concepts/document_loaders)\n",
"- [Chat models](/docs/concepts/chat_models)\n",
"- [Embeddings](/docs/concepts/embedding_models)\n",
"- [Vector stores](/docs/concepts/#vector-stores)\n",
"- [Vector stores](/docs/concepts/vectorstores)\n",
"- [Retrieval-augmented generation](/docs/tutorials/rag/)\n",
"\n",
":::\n",
@ -117,7 +117,7 @@
"\n",
"## Question answering with RAG\n",
"\n",
"Next, you'll prepare the loaded documents for later retrieval. Using a [text splitter](/docs/concepts/text_splitters), you'll split your loaded documents into smaller documents that can more easily fit into an LLM's context window, then load them into a [vector store](/docs/concepts/#vector-stores). You can then create a [retriever](/docs/concepts/retrievers) from the vector store for use in our RAG chain:\n",
"Next, you'll prepare the loaded documents for later retrieval. Using a [text splitter](/docs/concepts/text_splitters), you'll split your loaded documents into smaller documents that can more easily fit into an LLM's context window, then load them into a [vector store](/docs/concepts/vectorstores). You can then create a [retriever](/docs/concepts/retrievers) from the vector store for use in our RAG chain:\n",
"\n",
"import ChatModelTabs from \"@theme/ChatModelTabs\";\n",
"\n",

View File

@ -24,7 +24,7 @@
"- [Chat history](/docs/concepts/chat_history)\n",
"- [Chat models](/docs/concepts/chat_models)\n",
"- [Embeddings](/docs/concepts/embedding_models)\n",
"- [Vector stores](/docs/concepts/#vector-stores)\n",
"- [Vector stores](/docs/concepts/vectorstores)\n",
"- [Retrieval-augmented generation](/docs/tutorials/rag/)\n",
"- [Tools](/docs/concepts/tools)\n",
"- [Agents](/docs/concepts/agents)\n",

View File

@ -24,7 +24,7 @@
"- [Document loaders](/docs/concepts/document_loaders)\n",
"- [Chat models](/docs/concepts/chat_models)\n",
"- [Embeddings](/docs/concepts/embedding_models)\n",
"- [Vector stores](/docs/concepts/#vector-stores)\n",
"- [Vector stores](/docs/concepts/vectorstores)\n",
"- [Retrieval](/docs/concepts/retrieval)\n",
"\n",
":::\n",

View File

@ -41,7 +41,7 @@
"### Indexing\n",
"1. **Load**: First we need to load our data. This is done with [Document Loaders](/docs/concepts/document_loaders).\n",
"2. **Split**: [Text splitters](/docs/concepts/text_splitters) break large `Documents` into smaller chunks. This is useful both for indexing data and for passing it in to a model, since large chunks are harder to search over and won't fit in a model's finite context window.\n",
"3. **Store**: We need somewhere to store and index our splits, so that they can later be searched over. This is often done using a [VectorStore](/docs/concepts/#vector-stores) and [Embeddings](/docs/concepts/embedding_models) model.\n",
"3. **Store**: We need somewhere to store and index our splits, so that they can later be searched over. This is often done using a [VectorStore](/docs/concepts/vectorstores) and [Embeddings](/docs/concepts/embedding_models) model.\n",
"\n",
"![index_diagram](../../static/img/rag_indexing.png)\n",
"\n",

View File

@ -3,47 +3,59 @@ import multiprocessing
import re
import sys
from pathlib import Path
from typing import Optional
# List of 4-tuples (integration_name, display_name, concept_page, how_to_fragment)
INTEGRATION_INFO = [
("chat", "Chat model", "chat_models", "chat-models"),
("llms", "LLM", "text_llms", "llms"),
("text_embedding", "Embedding model", "embedding_models", "embedding-models"),
("document_loaders", "Document loader", "document_loaders", "document-loaders"),
("vectorstores", "Vector store", "vectorstores", "vector-stores"),
("retrievers", "Retriever", "retrievers", "retrievers"),
("tools", "Tool", "tools", "tools"),
# stores is a special case because there are no key-value store how-tos yet
# this is a placeholder for when we do have them
# for now the related links section will only contain the conceptual guide.
("stores", "Key-value store", "key_value_stores", "key-value-stores"),
]
def _generate_related_links_section(integration_type: str, notebook_name: str):
concept_display_name = None
concept_heading = None
if integration_type == "chat":
concept_display_name = "Chat model"
concept_heading = "chat-models"
elif integration_type == "llms":
concept_display_name = "LLM"
concept_heading = "llms"
elif integration_type == "text_embedding":
concept_display_name = "Embedding model"
concept_heading = "embedding-models"
elif integration_type == "document_loaders":
concept_display_name = "Document loader"
concept_heading = "document-loaders"
elif integration_type == "vectorstores":
concept_display_name = "Vector store"
concept_heading = "vector-stores"
elif integration_type == "retrievers":
concept_display_name = "Retriever"
concept_heading = "retrievers"
elif integration_type == "tools":
concept_display_name = "Tool"
concept_heading = "tools"
elif integration_type == "stores":
concept_display_name = "Key-value store"
concept_heading = "key-value-stores"
# Special case because there are no key-value store how-tos yet
return f"""## Related
# Create a dictionary with key being the first element (integration_name) and value being the rest of the tuple
INTEGRATION_INFO_DICT = {
integration_name: rest for integration_name, *rest in INTEGRATION_INFO
}
- [{concept_display_name} conceptual guide](/docs/concepts/#{concept_heading})
RELATED_LINKS_SECTION = """## Related
- {concept_display_name} [conceptual guide](/docs/concepts/{concept_page})
- {concept_display_name} [how-to guides](/docs/how_to/#{how_to_fragment})
"""
else:
RELATED_LINKS_WITHOUT_HOW_TO_SECTION = """## Related
- {concept_display_name} [conceptual guide](/docs/concepts/{concept_page})
"""
def _generate_related_links_section(
integration_type: str, notebook_name: str
) -> Optional[str]:
if integration_type not in INTEGRATION_INFO_DICT:
return None
return f"""## Related
concept_display_name, concept_page, how_to_fragment = INTEGRATION_INFO_DICT[
integration_type
]
- {concept_display_name} [conceptual guide](/docs/concepts/#{concept_heading})
- {concept_display_name} [how-to guides](/docs/how_to/#{concept_heading})
"""
# Special case because there are no key-value store how-tos yet
if integration_type == "stores":
return RELATED_LINKS_WITHOUT_HOW_TO_SECTION.format(
concept_display_name=concept_display_name,
concept_page=concept_page,
)
return RELATED_LINKS_SECTION.format(
concept_display_name=concept_display_name,
concept_page=concept_page,
how_to_fragment=how_to_fragment,
)
def _process_path(doc_path: Path):

View File

@ -95,3 +95,4 @@ xmltodict>=0.13.0,<0.14
nanopq==0.2.1
mlflow[genai]>=2.14.0
databricks-sdk>=0.30.0
websocket>=0.2.1,<1

View File

@ -300,34 +300,6 @@ class ChatSparkLLM(BaseChatModel):
populate_by_name=True,
)
@model_validator(mode="before")
@classmethod
def build_extra(cls, values: Dict[str, Any]) -> Any:
"""Build extra kwargs from additional params that were passed in."""
all_required_field_names = get_pydantic_field_names(cls)
extra = values.get("model_kwargs", {})
for field_name in list(values):
if field_name in extra:
raise ValueError(f"Found {field_name} supplied twice.")
if field_name not in all_required_field_names:
logger.warning(
f"""WARNING! {field_name} is not default parameter.
{field_name} was transferred to model_kwargs.
Please confirm that {field_name} is what you intended."""
)
extra[field_name] = values.pop(field_name)
invalid_model_kwargs = all_required_field_names.intersection(extra.keys())
if invalid_model_kwargs:
raise ValueError(
f"Parameters {invalid_model_kwargs} should be specified explicitly. "
f"Instead they were passed in as part of `model_kwargs` parameter."
)
values["model_kwargs"] = extra
return values
@model_validator(mode="before")
@classmethod
def validate_environment(cls, values: Dict) -> Any:
@ -378,6 +350,38 @@ class ChatSparkLLM(BaseChatModel):
)
return values
# When using Pydantic V2
# The execution order of multiple @model_validator decorators is opposite to
# their declaration order. https://github.com/pydantic/pydantic/discussions/7434
@model_validator(mode="before")
@classmethod
def build_extra(cls, values: Dict[str, Any]) -> Any:
"""Build extra kwargs from additional params that were passed in."""
all_required_field_names = get_pydantic_field_names(cls)
extra = values.get("model_kwargs", {})
for field_name in list(values):
if field_name in extra:
raise ValueError(f"Found {field_name} supplied twice.")
if field_name not in all_required_field_names:
logger.warning(
f"""WARNING! {field_name} is not default parameter.
{field_name} was transferred to model_kwargs.
Please confirm that {field_name} is what you intended."""
)
extra[field_name] = values.pop(field_name)
invalid_model_kwargs = all_required_field_names.intersection(extra.keys())
if invalid_model_kwargs:
raise ValueError(
f"Parameters {invalid_model_kwargs} should be specified explicitly. "
f"Instead they were passed in as part of `model_kwargs` parameter."
)
values["model_kwargs"] = extra
return values
def _stream(
self,
messages: List[BaseMessage],

View File

@ -1,3 +1,4 @@
import pytest
from langchain_core.messages import (
AIMessage,
HumanMessage,
@ -8,6 +9,7 @@ from langchain_core.output_parsers.openai_tools import (
)
from langchain_community.chat_models.sparkllm import (
ChatSparkLLM,
convert_dict_to_message,
convert_message_to_dict,
)
@ -83,3 +85,25 @@ def test__convert_message_to_dict_system() -> None:
result = convert_message_to_dict(message)
expected_output = {"role": "system", "content": "foo"}
assert result == expected_output
@pytest.mark.requires("websocket")
def test__chat_spark_llm_initialization() -> None:
chat = ChatSparkLLM(
app_id="IFLYTEK_SPARK_APP_ID",
api_key="IFLYTEK_SPARK_API_KEY",
api_secret="IFLYTEK_SPARK_API_SECRET",
api_url="IFLYTEK_SPARK_API_URL",
model="IFLYTEK_SPARK_LLM_DOMAIN",
timeout=40,
temperature=0.1,
top_k=3,
)
assert chat.spark_app_id == "IFLYTEK_SPARK_APP_ID"
assert chat.spark_api_key == "IFLYTEK_SPARK_API_KEY"
assert chat.spark_api_secret == "IFLYTEK_SPARK_API_SECRET"
assert chat.spark_api_url == "IFLYTEK_SPARK_API_URL"
assert chat.spark_llm_domain == "IFLYTEK_SPARK_LLM_DOMAIN"
assert chat.request_timeout == 40
assert chat.temperature == 0.1
assert chat.top_k == 3

View File

@ -10,6 +10,7 @@ from typing import (
)
from pydantic import BaseModel, ConfigDict
from pydantic.fields import FieldInfo
from typing_extensions import NotRequired
@ -77,10 +78,23 @@ def try_neq_default(value: Any, key: str, model: BaseModel) -> bool:
Raises:
Exception: If the key is not in the model.
"""
field = model.model_fields[key]
return _try_neq_default(value, field)
def _try_neq_default(value: Any, field: FieldInfo) -> bool:
# Handle edge case: inequality of two objects does not evaluate to a bool (e.g. two
# Pandas DataFrames).
try:
return model.model_fields[key].get_default() != value
except Exception:
return True
return bool(field.get_default() != value)
except Exception as _:
try:
return all(field.get_default() != value)
except Exception as _:
try:
return value is not field.default
except Exception as _:
return False
class Serializable(BaseModel, ABC):
@ -297,18 +311,7 @@ def _is_field_useful(inst: Serializable, key: str, value: Any) -> bool:
if field.default_factory is list and isinstance(value, list):
return False
# Handle edge case: inequality of two objects does not evaluate to a bool (e.g. two
# Pandas DataFrames).
try:
value_neq_default = bool(field.get_default() != value)
except Exception as _:
try:
value_neq_default = all(field.get_default() != value)
except Exception as _:
try:
value_neq_default = value is not field.default
except Exception as _:
value_neq_default = False
value_neq_default = _try_neq_default(value, field)
# If value is falsy and does not match the default
return value_is_truthy or value_neq_default

View File

@ -4,6 +4,22 @@ from langchain_core.load import Serializable, dumpd, load
from langchain_core.load.serializable import _is_field_useful
class NonBoolObj:
def __bool__(self) -> bool:
msg = "Truthiness can't be determined"
raise ValueError(msg)
def __eq__(self, other: object) -> bool:
msg = "Equality can't be determined"
raise ValueError(msg)
def __str__(self) -> str:
return self.__class__.__name__
def __repr__(self) -> str:
return self.__class__.__name__
def test_simple_serialization() -> None:
class Foo(Serializable):
bar: int
@ -82,15 +98,6 @@ def test__is_field_useful() -> None:
def __eq__(self, other: object) -> bool:
return self # type: ignore[return-value]
class NonBoolObj:
def __bool__(self) -> bool:
msg = "Truthiness can't be determined"
raise ValueError(msg)
def __eq__(self, other: object) -> bool:
msg = "Equality can't be determined"
raise ValueError(msg)
default_x = ArrayObj()
default_y = NonBoolObj()
@ -169,3 +176,30 @@ def test_simple_deserialization_with_additional_imports() -> None:
},
)
assert isinstance(new_foo, Foo2)
class Foo3(Serializable):
model_config = ConfigDict(arbitrary_types_allowed=True)
content: str
non_bool: NonBoolObj
@classmethod
def is_lc_serializable(cls) -> bool:
return True
def test_repr() -> None:
foo = Foo3(
content="repr",
non_bool=NonBoolObj(),
)
assert repr(foo) == "Foo3(content='repr', non_bool=NonBoolObj)"
def test_str() -> None:
foo = Foo3(
content="str",
non_bool=NonBoolObj(),
)
assert str(foo) == "content='str' non_bool=NonBoolObj"