Per discussion on Discord. This adds a PDF reader that uses `PyPDF` - a
simple PDF reader. It also tracks page numbers in a per split metadata.
Here's an example:
```python
from langchain.document_loaders import PagedPDFSplitter
from langchain.vectorstores import FAISS
from langchain.embeddings.openai import OpenAIEmbeddings
loader = PagedPDFSplitter(chunk_size=250)
splits, metadatas = loader.load_and_split("examples/example_data/layout-parser-paper.pdf")
faiss_index = FAISS.from_texts(splits, OpenAIEmbeddings(), metadatas=metadatas)
docs = faiss_index.similarity_search("How will the community be engaged?", k=2)
for doc in docs:
print(doc.metadata["pages"] + ":", doc.page_content)
```
## TODO
- [x] Learn where to add `pypdf` as dependency for building docs
- [x] Add unit test?
---------
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>