# File Directory

This covers how to use the `DirectoryLoader` to load all documents in a directory. Under the hood, by default this uses the [UnstructuredLoader](./unstructured_file.ipynb)

In [1]:
from langchain.document_loaders import DirectoryLoader

We can use the `glob` parameter to control which files to load. Note that here it doesn't load the `.rst` file or the `.ipynb` files.

In [2]:
loader = DirectoryLoader('../', glob="**/*.md")

In [3]:
docs = loader.load()

In [4]:
len(docs)

1

## Show a progress bar

By default a progress bar will not be shown. To show a progress bar, install the `tqdm` library (e.g. `pip install tqdm`), and set the `show_progress` parameter to `True`.

In [10]:
%pip install tqdm
loader = DirectoryLoader('../', glob="**/*.md", show_progress=True)
docs = loader.load()



0it [00:00, ?it/s]


## Use multithreading

By default the loading happens in one thread. In order to utilize several threads set the `use_multithreading` flag to true.

In [None]:
loader = DirectoryLoader('../', glob="**/*.md", use_multithreading=True)
docs = loader.load()

## Change loader class
By default this uses the `UnstructuredLoader` class. However, you can change up the type of loader pretty easily.

In [15]:
from langchain.document_loaders import TextLoader

In [6]:
loader = DirectoryLoader('../', glob="**/*.md", loader_cls=TextLoader)

In [7]:
docs = loader.load()

In [8]:
len(docs)

1

If you need to load Python source code files, use the `PythonLoader`.

In [14]:
from langchain.document_loaders import PythonLoader

In [13]:
loader = DirectoryLoader('../../../../../', glob="**/*.py", loader_cls=PythonLoader)

In [14]:
docs = loader.load()

In [15]:
len(docs)

691

## Auto detect file encodings with TextLoader

In this example we will see some strategies that can be useful when loading a big list of arbitrary files from a directory using the `TextLoader` class.

First to illustrate the problem, let's try to load multiple text with arbitrary encodings.

In [16]:
path = '../../../../../tests/integration_tests/examples'
loader = DirectoryLoader(path, glob="**/*.txt", loader_cls=TextLoader)

### A. Default Behavior

In [19]:
loader.load()

The file `example-non-utf8.txt` uses a different encoding the `load()` function fails with a helpful message indicating which file failed decoding. 

With the default behavior of `TextLoader` any failure to load any of the documents will fail the whole loading process and no documents are loaded. 

### B. Silent fail

We can pass the parameter `silent_errors` to the `DirectoryLoader` to skip the files which could not be loaded and continue the load process.

In [30]:
loader = DirectoryLoader(path, glob="**/*.txt", loader_cls=TextLoader, silent_errors=True)
docs = loader.load()

Error loading ../../../../../tests/integration_tests/examples/example-non-utf8.txt


In [35]:
doc_sources = [doc.metadata['source']  for doc in docs]
doc_sources

['../../../../../tests/integration_tests/examples/whatsapp_chat.txt',
 '../../../../../tests/integration_tests/examples/example-utf8.txt']

### C. Auto detect encodings

We can also ask `TextLoader` to auto detect the file encoding before failing, by passing the `autodetect_encoding` to the loader class.

In [37]:
text_loader_kwargs={'autodetect_encoding': True}
loader = DirectoryLoader(path, glob="**/*.txt", loader_cls=TextLoader, loader_kwargs=text_loader_kwargs)
docs = loader.load()


In [38]:
doc_sources = [doc.metadata['source']  for doc in docs]
doc_sources

['../../../../../tests/integration_tests/examples/example-non-utf8.txt',
 '../../../../../tests/integration_tests/examples/whatsapp_chat.txt',
 '../../../../../tests/integration_tests/examples/example-utf8.txt']