community: better support of pathlib paths in document loaders (#18396)

So this arose from the
https://github.com/langchain-ai/langchain/pull/18397 problem of document
loaders not supporting `pathlib.Path`.

This pull request provides more uniform support for Path as an argument.
The core ideas for this upgrade: 
- if there is a local file path used as an argument, it should be
supported as `pathlib.Path`
- if there are some external calls that may or may not support Pathlib,
the argument is immidiately converted to `str`
- if there `self.file_path` is used in a way that it allows for it to
stay pathlib without conversion, is is only converted for the metadata.

Twitter handle: https://twitter.com/mwmajewsk
This commit is contained in:
mwmajewsk
2024-03-26 16:51:52 +01:00
committed by GitHub
parent 94b869a974
commit f7a1fd91b8
32 changed files with 147 additions and 80 deletions

View File

@@ -5,8 +5,9 @@ https://gist.github.com/foxmask/7b29c43a161e001ff04afdb2f181e31c
import hashlib
import logging
from base64 import b64decode
from pathlib import Path
from time import strptime
from typing import Any, Dict, Iterator, List, Optional
from typing import Any, Dict, Iterator, List, Optional, Union
from langchain_core.documents import Document
@@ -35,9 +36,9 @@ class EverNoteLoader(BaseLoader):
the 'source' which contains the file name of the export.
""" # noqa: E501
def __init__(self, file_path: str, load_single_document: bool = True):
def __init__(self, file_path: Union[str, Path], load_single_document: bool = True):
"""Initialize with file path."""
self.file_path = file_path
self.file_path = str(file_path)
self.load_single_document = load_single_document
def _lazy_load(self) -> Iterator[Document]: