Implements NLTK and Spacy-based TextSplitters (#103)

This PR is for Issue #88 

- [x] `make format`
- [x] `make lint`
- [x] `make tests`
This commit is contained in:
Delip Rao
2022-11-09 23:45:30 -05:00
committed by GitHub
parent 28282ad099
commit 3ee6e332dd
4 changed files with 118 additions and 16 deletions

View File

@@ -53,6 +53,8 @@ The following use cases require specific installs and api keys:
- _FAISS_:
- Install requirements with `pip install faiss` for Python 3.7 and `pip install faiss-cpu` for Python 3.10+.
If you are using the `NLTKTextSplitter` or the `SpacyTextSplitter`, you will also need to install the appropriate models. For example, if you want to use the `SpacyTextSplitter`, you will need to install the `en_core_web_sm` model with `python -m spacy download en_core_web_sm`. Similarly, if you want to use the `NLTKTextSplitter`, you will need to install the `punkt` model with `python -m nltk.downloader punkt`.
## 🚀 What can I do with this
This project was largely inspired by a few projects seen on Twitter for which we thought it would make sense to have more explicit tooling. A lot of the initial functionality was done in an attempt to recreate those. Those are: