# Unstructured >The `unstructured` package from [Unstructured.IO](https://www.unstructured.io/) extracts clean text from raw source documents like PDFs and Word documents. This page covers how to use the [`unstructured`](https://github.com/Unstructured-IO/unstructured) ecosystem within LangChain. ## Installation and Setup If you are using a loader that runs locally, use the following steps to get `unstructured` and its dependencies running locally. - Install the Python SDK with `pip install unstructured`. - You can install document specific dependencies with extras, i.e. `pip install "unstructured[docx]"`. - To install the dependencies for all document types, use `pip install "unstructured[all-docs]"`. - Install the following system dependencies if they are not already available on your system. Depending on what document types you're parsing, you may not need all of these. - `libmagic-dev` (filetype detection) - `poppler-utils` (images and PDFs) - `tesseract-ocr`(images and PDFs) - `libreoffice` (MS Office docs) - `pandoc` (EPUBs) If you want to get up and running with less set up, you can simply run `pip install unstructured` and use `UnstructuredAPIFileLoader` or `UnstructuredAPIFileIOLoader`. That will process your document using the hosted Unstructured API. The `Unstructured API` requires API keys to make requests. You can request an API key [here](https://unstructured.io/api-key-hosted) and start using it today! Checkout the README [here](https://github.com/Unstructured-IO/unstructured-api) here to get started making API calls. We'd love to hear your feedback, let us know how it goes in our [community slack](https://join.slack.com/t/unstructuredw-kbe4326/shared_invite/zt-1x7cgo0pg-PTptXWylzPQF9xZolzCnwQ). And stay tuned for improvements to both quality and performance! Check out the instructions [here](https://github.com/Unstructured-IO/unstructured-api#dizzy-instructions-for-using-the-docker-image) if you'd like to self-host the Unstructured API or run it locally. ## Data Loaders The primary usage of the `Unstructured` is in data loaders. ### UnstructuredAPIFileIOLoader See a [usage example](/docs/integrations/document_loaders/unstructured_file#unstructured-api). ```python from langchain_community.document_loaders import UnstructuredAPIFileIOLoader ``` ### UnstructuredAPIFileLoader See a [usage example](/docs/integrations/document_loaders/unstructured_file#unstructured-api). ```python from langchain_community.document_loaders import UnstructuredAPIFileLoader ``` ### UnstructuredCHMLoader `CHM` means `Microsoft Compiled HTML Help`. See a usage example in the API documentation. ```python from langchain_community.document_loaders import UnstructuredCHMLoader ``` ### UnstructuredCSVLoader A `comma-separated values` (`CSV`) file is a delimited text file that uses a comma to separate values. Each line of the file is a data record. Each record consists of one or more fields, separated by commas. See a [usage example](/docs/integrations/document_loaders/csv#unstructuredcsvloader). ```python from langchain_community.document_loaders import UnstructuredCSVLoader ``` ### UnstructuredEmailLoader See a [usage example](/docs/integrations/document_loaders/email). ```python from langchain_community.document_loaders import UnstructuredEmailLoader ``` ### UnstructuredEPubLoader [EPUB](https://en.wikipedia.org/wiki/EPUB) is an `e-book file format` that uses the “.epub” file extension. The term is short for electronic publication and is sometimes styled `ePub`. `EPUB` is supported by many e-readers, and compatible software is available for most smartphones, tablets, and computers. See a [usage example](/docs/integrations/document_loaders/epub). ```python from langchain_community.document_loaders import UnstructuredEPubLoader ``` ### UnstructuredExcelLoader See a [usage example](/docs/integrations/document_loaders/microsoft_excel). ```python from langchain_community.document_loaders import UnstructuredExcelLoader ``` ### UnstructuredFileIOLoader See a [usage example](/docs/integrations/document_loaders/google_drive#passing-in-optional-file-loaders). ```python from langchain_community.document_loaders import UnstructuredFileIOLoader ``` ### UnstructuredFileLoader See a [usage example](/docs/integrations/document_loaders/unstructured_file). ```python from langchain_community.document_loaders import UnstructuredFileLoader ``` ### UnstructuredHTMLLoader See a [usage example](/docs/modules/data_connection/document_loaders/html). ```python from langchain_community.document_loaders import UnstructuredHTMLLoader ``` ### UnstructuredImageLoader See a [usage example](/docs/integrations/document_loaders/image). ```python from langchain_community.document_loaders import UnstructuredImageLoader ``` ### UnstructuredMarkdownLoader See a [usage example](/docs/integrations/vectorstores/starrocks). ```python from langchain_community.document_loaders import UnstructuredMarkdownLoader ``` ### UnstructuredODTLoader The `Open Document Format for Office Applications (ODF)`, also known as `OpenDocument`, is an open file format for word processing documents, spreadsheets, presentations and graphics and using ZIP-compressed XML files. It was developed with the aim of providing an open, XML-based file format specification for office applications. See a [usage example](/docs/integrations/document_loaders/odt). ```python from langchain_community.document_loaders import UnstructuredODTLoader ``` ### UnstructuredOrgModeLoader An [Org Mode](https://en.wikipedia.org/wiki/Org-mode) document is a document editing, formatting, and organizing mode, designed for notes, planning, and authoring within the free software text editor Emacs. See a [usage example](/docs/integrations/document_loaders/org_mode). ```python from langchain_community.document_loaders import UnstructuredOrgModeLoader ``` ### UnstructuredPDFLoader See a [usage example](/docs/modules/data_connection/document_loaders/pdf#using-unstructured). ```python from langchain_community.document_loaders import UnstructuredPDFLoader ``` ### UnstructuredPowerPointLoader See a [usage example](/docs/integrations/document_loaders/microsoft_powerpoint). ```python from langchain_community.document_loaders import UnstructuredPowerPointLoader ``` ### UnstructuredRSTLoader A `reStructured Text` (`RST`) file is a file format for textual data used primarily in the Python programming language community for technical documentation. See a [usage example](/docs/integrations/document_loaders/rst). ```python from langchain_community.document_loaders import UnstructuredRSTLoader ``` ### UnstructuredRTFLoader See a usage example in the API documentation. ```python from langchain_community.document_loaders import UnstructuredRTFLoader ``` ### UnstructuredTSVLoader A `tab-separated values` (`TSV`) file is a simple, text-based file format for storing tabular data. Records are separated by newlines, and values within a record are separated by tab characters. See a [usage example](/docs/integrations/document_loaders/tsv). ```python from langchain_community.document_loaders import UnstructuredTSVLoader ``` ### UnstructuredURLLoader See a [usage example](/docs/integrations/document_loaders/url). ```python from langchain_community.document_loaders import UnstructuredURLLoader ``` ### UnstructuredWordDocumentLoader See a [usage example](/docs/integrations/document_loaders/microsoft_word#using-unstructured). ```python from langchain_community.document_loaders import UnstructuredWordDocumentLoader ``` ### UnstructuredXMLLoader See a [usage example](/docs/integrations/document_loaders/xml). ```python from langchain_community.document_loaders import UnstructuredXMLLoader ```