Files
langchain/docs
Raviraj 5f30785b70 SemanticChunker : Feature Addition ("Semantic Splitting with gradient") (#22895)
```SemanticChunker``` currently provide three methods to split the texts semantically:
- percentile
- standard_deviation
- interquartile

I propose new method ```gradient```. In this method, the gradient of distance is used to split chunks along with the percentile method (technically) . This method is useful when chunks are highly correlated with each other or specific to a domain e.g. legal or medical. The idea is to apply anomaly detection on gradient array so that the distribution become wider and easy to identify boundaries in highly semantic data.
I have tested this merge on a set of 10 domain specific documents (mostly legal).

Details : 
    - **Issue:** Improvement
    - **Dependencies:** NA
    - **Twitter handle:** [x.com/prajapat_ravi](https://x.com/prajapat_ravi)


@hwchase17

---------

Co-authored-by: Raviraj Prajapat <raviraj.prajapat@sirionlabs.com>
Co-authored-by: isaac hershenson <ihershenson@hmc.edu>
2024-06-20 13:52:26 -07:00
..
2024-06-20 13:52:19 -07:00
2024-05-01 17:34:05 -07:00
2024-06-20 13:52:26 -07:00
2024-05-08 12:29:59 -07:00

LangChain Documentation

For more information on contributing to our documentation, see the Documentation Contributing Guide