langchain

mirror of https://github.com/hwchase17/langchain.git synced 2025-10-04 03:42:29 +00:00

Author	SHA1	Message	Date
Lance Martin	ee3d0513ad	Add tests and update notebook for MarkdownHeaderTextSplitter (#6069 ) Add test and update notebook for `MarkdownHeaderTextSplitter`.	2023-06-13 09:07:52 -07:00
Lance Martin	b023f0c0f2	Text splitter for Markdown files by header (#5860 ) This creates a new kind of text splitter for markdown files. The user can supply a set of headers that they want to split the file on. We define a new text splitter class, `MarkdownHeaderTextSplitter`, that does a few things: (1) For each line, it determines the associated set of user-specified headers (2) It groups lines with common headers into splits See notebook for example usage and test cases.	2023-06-12 15:46:42 -07:00
Jens Madsen	8d9e9e013c	refactor: extract token text splitter function (#5179 ) # Token text splitter for sentence transformers The current TokenTextSplitter only works with OpenAi models via the `tiktoken` package. This is not clear from the name `TokenTextSplitter`. In this (first PR) a token based text splitter for sentence transformer models is added. In the future I think we should work towards injecting a tokenizer into the TokenTextSplitter to make ti more flexible. Could perhaps be reviewed by @dev2049 --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2023-06-04 14:41:44 -07:00
Harrison Chase	5ce74b5958	code splitter docs (#5480 ) Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	2023-05-31 07:11:53 -07:00
Harrison Chase	f72bb966f8	Harrison/html splitter (#5468 ) Co-authored-by: David Revillas <26328973+r3v1@users.noreply.github.com>	2023-05-30 21:06:07 -07:00
ByronHsu	9d658aaa5a	Add more code splitters (go, rst, js, java, cpp, scala, ruby, php, swift, rust) (#5171 ) As the title says, I added more code splitters. The implementation is trivial, so i don't add separate tests for each splitter. Let me know if any concerns. Fixes # (issue) https://github.com/hwchase17/langchain/issues/5170 ## Who can review? Community members can review the PR once tests pass. Tag maintainers/contributors who might be interested: @eyurtsev @hwchase17 --------- Signed-off-by: byhsu <byhsu@linkedin.com> Co-authored-by: byhsu <byhsu@linkedin.com>	2023-05-30 11:04:05 -04:00
Harrison Chase	72f99ff953	Harrison/text splitter (#5417 ) adds support for keeping separators around when using recursive text splitter	2023-05-29 16:56:31 -07:00
Leonid Ganeline	c998569c8f	docs: text splitters improvements (#4490 ) #docs: text splitters improvements Changes are only in the Jupyter notebooks. - added links to the source packages and a short description of these packages - removed " Text Splitters" suffixes from the TOC elements (they made the list of the text splitters messy) - moved text splitters, based on the length function into a separate list. They can be mixed with any classes from the "Text Splitters", so it is a different classification. ## Who can review? @hwchase17 - project lead @eyurtsev @vowelparrot NOTE: please, check out the results of the `Python code` text splitter example (text_splitters/examples/python.ipynb). It looks suboptimal.	2023-05-17 21:33:34 -07:00
Alan Cha	e3b7a20454	Fix typo (#3728 )	2023-04-28 13:01:09 -07:00
Ikko Eltociear Ashimine	a4a1ee6b5d	Update huggingface_length_function.ipynb (#2203 ) HuggingFace -> Hugging Face	2023-03-30 20:43:58 -07:00
Chase Adams	b5449a866d	docs: tiny fix on docs verbiage (#2124 ) Changed `RecursiveCharaterTextSplitter` => `RecursiveCharacterTextSplitter`. GH's diff doesn't handle the long string well.	2023-03-28 22:56:29 -07:00
Harrison Chase	705431aecc	big docs refactor (#1978 ) Co-authored-by: Ankush Gola <ankush.gola@gmail.com>	2023-03-26 19:49:46 -07:00

12 Commits