community: better support of pathlib paths in document loaders (#18396)

So this arose from the
https://github.com/langchain-ai/langchain/pull/18397 problem of document
loaders not supporting `pathlib.Path`.

This pull request provides more uniform support for Path as an argument.
The core ideas for this upgrade: 
- if there is a local file path used as an argument, it should be
supported as `pathlib.Path`
- if there are some external calls that may or may not support Pathlib,
the argument is immidiately converted to `str`
- if there `self.file_path` is used in a way that it allows for it to
stay pathlib without conversion, is is only converted for the metadata.

Twitter handle: https://twitter.com/mwmajewsk
This commit is contained in:
mwmajewsk
2024-03-26 16:51:52 +01:00
committed by GitHub
parent 94b869a974
commit f7a1fd91b8
32 changed files with 147 additions and 80 deletions

View File

@@ -1,6 +1,7 @@
import csv
from io import TextIOWrapper
from typing import Any, Dict, Iterator, List, Optional, Sequence
from pathlib import Path
from typing import Any, Dict, Iterator, List, Optional, Sequence, Union
from langchain_core.documents import Document
@@ -35,7 +36,7 @@ class CSVLoader(BaseLoader):
def __init__(
self,
file_path: str,
file_path: Union[str, Path],
source_column: Optional[str] = None,
metadata_columns: Sequence[str] = (),
csv_args: Optional[Dict] = None,
@@ -89,7 +90,7 @@ class CSVLoader(BaseLoader):
source = (
row[self.source_column]
if self.source_column is not None
else self.file_path
else str(self.file_path)
)
except KeyError:
raise ValueError(