langchain/docs/docs/integrations/splitters/writer_text_splitter.ipynb
Yan d0c9b98171
docs: writer integration docs cosmetic fixes (#29984)
Fixed links at Writer partners integration docs
2025-02-27 10:52:49 -05:00

297 lines
10 KiB
Plaintext

{
"cells": [
{
"cell_type": "markdown",
"id": "db23d51760310705",
"metadata": {},
"source": [
"# Writer Text Splitter\n",
"\n",
"This notebook provides a quick overview for getting started with Writer's [text splitter](/docs/concepts/text_splitters/).\n",
"\n",
"Writer's [context-aware splitting endpoint](https://dev.writer.com/api-guides/tools#context-aware-text-splitting) provides intelligent text splitting capabilities for long documents (up to 4000 words). Unlike simple character-based splitting, it preserves the semantic meaning and context between chunks, making it ideal for processing long-form content while maintaining coherence. In `langchain-writer`, we provide usage of Writer's context-aware splitting endpoint as a LangChain text splitter.\n",
"\n",
"## Overview\n",
"\n",
"### Integration details\n",
"| Class | Package | Local | Serializable | JS support | Package downloads | Package latest |\n",
"|:-----------------------------------------------------------------------------------------------------------------------------------------|:-----------------| :---: | :---: |:----------:|:------------------------------------------------------------------------------------------------:|:---------------------------------------------------------------------------------------------:|\n",
"| [WriterTextSplitter](https://github.com/writer/langchain-writer/blob/main/langchain_writer/text_splitter.py#L11) | [langchain-writer](https://pypi.org/project/langchain-writer/) | ❌ | ❌ | ❌ | ![PyPI - Downloads](https://img.shields.io/pypi/dm/langchain-writer?style=flat-square&label=%20) | ![PyPI - Version](https://img.shields.io/pypi/v/langchain-writer?style=flat-square&label=%20) |"
]
},
{
"cell_type": "markdown",
"id": "c5f08d23df5dc127",
"metadata": {},
"source": [
"## Setup\n",
"\n",
"The `WriterTextSplitter` is available in the `langchain-writer` package:"
]
},
{
"cell_type": "code",
"id": "a8d653f15b7ee32d",
"metadata": {},
"source": "%pip install --quiet -U langchain-writer",
"outputs": [],
"execution_count": null
},
{
"cell_type": "markdown",
"id": "3b9709c26797edf",
"metadata": {},
"source": [
"### Credentials\n",
"\n",
"Sign up for [Writer AI Studio](https://app.writer.com/aistudio/signup?utm_campaign=devrel) to generate an API key (you can follow this [Quickstart](https://dev.writer.com/api-guides/quickstart)). Then, set the WRITER_API_KEY environment variable:"
]
},
{
"cell_type": "code",
"id": "2983e19c9d555e58",
"metadata": {},
"source": [
"import getpass\n",
"import os\n",
"\n",
"if not os.getenv(\"WRITER_API_KEY\"):\n",
" os.environ[\"WRITER_API_KEY\"] = getpass.getpass(\"Enter your Writer API key: \")"
],
"outputs": [],
"execution_count": null
},
{
"cell_type": "markdown",
"id": "92a22c77f03d43dc",
"metadata": {},
"source": [
"It's also helpful (but not needed) to set up [LangSmith](https://smith.langchain.com/) for best-in-class observability. If you wish to do so, you can set the `LANGCHAIN_TRACING_V2` and `LANGCHAIN_API_KEY` environment variables:"
]
},
{
"cell_type": "code",
"id": "98d8422ecee77403",
"metadata": {},
"source": [
"# os.environ[\"LANGCHAIN_TRACING_V2\"] = \"true\"\n",
"# os.environ[\"LANGCHAIN_API_KEY\"] = getpass.getpass()"
],
"outputs": [],
"execution_count": null
},
{
"cell_type": "markdown",
"id": "67ab78950a3da8ba",
"metadata": {},
"source": [
"### Instantiation\n",
"\n",
"Instantiate an instance of `WriterTextSplitter` with the `strategy` parameter set to one of the following:\n",
"\n",
"- `llm_split`: Uses language model for precise semantic splitting\n",
"- `fast_split`: Uses heuristic-based approach for quick splitting\n",
"- `hybrid_split`: Combines both approaches\n"
]
},
{
"cell_type": "code",
"id": "787b3ba8af32533f",
"metadata": {},
"source": [
"from langchain_writer.text_splitter import WriterTextSplitter\n",
"\n",
"splitter = WriterTextSplitter(strategy=\"fast_split\")"
],
"outputs": [],
"execution_count": null
},
{
"cell_type": "markdown",
"id": "d91c6f752fd31cee",
"metadata": {},
"source": [
"## Usage\n",
"The `WriterTextSplitter` can be used synchronously or asynchronously.\n",
"\n",
"### Synchronous usage\n",
"To use the `WriterTextSplitter` synchronously, call the `split_text` method with the text you want to split:"
]
},
{
"cell_type": "code",
"id": "d1a24b81a8a96f09",
"metadata": {},
"source": [
"text = \"\"\"Reeeeeeeeeeeeeeeeeeeeeaally long text you want to divide into smaller chunks. For example you can add a poem multiple times:\n",
"Two roads diverged in a yellow wood,\n",
"And sorry I could not travel both\n",
"And be one traveler, long I stood\n",
"And looked down one as far as I could\n",
"To where it bent in the undergrowth;\n",
"\n",
"Then took the other, as just as fair,\n",
"And having perhaps the better claim,\n",
"Because it was grassy and wanted wear;\n",
"Though as for that the passing there\n",
"Had worn them really about the same,\n",
"\n",
"And both that morning equally lay\n",
"In leaves no step had trodden black.\n",
"Oh, I kept the first for another day!\n",
"Yet knowing how way leads on to way,\n",
"I doubted if I should ever come back.\n",
"\n",
"I shall be telling this with a sigh\n",
"Somewhere ages and ages hence:\n",
"Two roads diverged in a wood, and I—\n",
"I took the one less traveled by,\n",
"And that has made all the difference.\n",
"\n",
"Two roads diverged in a yellow wood,\n",
"And sorry I could not travel both\n",
"And be one traveler, long I stood\n",
"And looked down one as far as I could\n",
"To where it bent in the undergrowth;\n",
"\n",
"Then took the other, as just as fair,\n",
"And having perhaps the better claim,\n",
"Because it was grassy and wanted wear;\n",
"Though as for that the passing there\n",
"Had worn them really about the same,\n",
"\n",
"And both that morning equally lay\n",
"In leaves no step had trodden black.\n",
"Oh, I kept the first for another day!\n",
"Yet knowing how way leads on to way,\n",
"I doubted if I should ever come back.\n",
"\n",
"I shall be telling this with a sigh\n",
"Somewhere ages and ages hence:\n",
"Two roads diverged in a wood, and I—\n",
"I took the one less traveled by,\n",
"And that has made all the difference.\n",
"\n",
"Two roads diverged in a yellow wood,\n",
"And sorry I could not travel both\n",
"And be one traveler, long I stood\n",
"And looked down one as far as I could\n",
"To where it bent in the undergrowth;\n",
"\n",
"Then took the other, as just as fair,\n",
"And having perhaps the better claim,\n",
"Because it was grassy and wanted wear;\n",
"Though as for that the passing there\n",
"Had worn them really about the same,\n",
"\n",
"And both that morning equally lay\n",
"In leaves no step had trodden black.\n",
"Oh, I kept the first for another day!\n",
"Yet knowing how way leads on to way,\n",
"I doubted if I should ever come back.\n",
"\n",
"I shall be telling this with a sigh\n",
"Somewhere ages and ages hence:\n",
"Two roads diverged in a wood, and I—\n",
"I took the one less traveled by,\n",
"And that has made all the difference.\n",
"\"\"\"\n",
"\n",
"chunks = splitter.split_text(text)\n",
"chunks"
],
"outputs": [],
"execution_count": null
},
{
"cell_type": "markdown",
"id": "e6d09fcf",
"metadata": {},
"source": [
"You can print the length of the chunks to see how many chunks were created:"
]
},
{
"cell_type": "code",
"id": "a470daa875d99006",
"metadata": {},
"source": [
"print(len(chunks))"
],
"outputs": [],
"execution_count": null
},
{
"cell_type": "markdown",
"id": "f89c048c7d23807a",
"metadata": {},
"source": [
"### Asynchronous usage\n",
"To use the `WriterTextSplitter` asynchronously, call the `asplit_text` method with the text you want to split:"
]
},
{
"cell_type": "code",
"id": "e2f7fd52b7188c6c",
"metadata": {},
"source": [
"async_chunks = await splitter.asplit_text(text)\n",
"async_chunks"
],
"outputs": [],
"execution_count": null
},
{
"cell_type": "markdown",
"id": "cb669ce7",
"metadata": {},
"source": [
"Print the length of the chunks to see how many chunks were created:"
]
},
{
"cell_type": "code",
"id": "a1439db14e687fa4",
"metadata": {},
"source": [
"print(len(async_chunks))"
],
"outputs": [],
"execution_count": null
},
{
"cell_type": "markdown",
"id": "ab25a3bed8437a05",
"metadata": {},
"source": [
"## API reference\n",
"For detailed documentation of all `WriterTextSplitter` features and configurations head to the [API reference](https://python.langchain.com/api_reference/writer/text_splitter/langchain_writer.text_splitter.WriterTextSplitter.html#langchain_writer.text_splitter.WriterTextSplitter).\n",
"\n",
"## Additional resources\n",
"You can find information about Writer's models (including costs, context windows, and supported input types) and tools in the [Writer docs](https://dev.writer.com/home)."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.6"
}
},
"nbformat": 4,
"nbformat_minor": 5
}