langchain/docs/docs/integrations/splitters/writer_text_splitter.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "db23d51760310705",
   "metadata": {},
   "source": [
    "# Writer Text Splitter\n",
    "\n",
    "This notebook provides a quick overview for getting started with Writer's [text splitter](/docs/concepts/text_splitters/).\n",
    "\n",
    "Writer's [context-aware splitting endpoint](https://dev.writer.com/api-guides/tools#context-aware-text-splitting) provides intelligent text splitting capabilities for long documents (up to 4000 words). Unlike simple character-based splitting, it preserves the semantic meaning and context between chunks, making it ideal for processing long-form content while maintaining coherence. In `langchain-writer`, we provide usage of Writer's context-aware splitting endpoint as a LangChain text splitter.\n",
    "\n",
    "## Overview\n",
    "\n",
    "### Integration details\n",
    "| Class                                                                                                                                    | Package          | Local | Serializable | JS support |                                        Package downloads                                         |                                        Package latest                                         |\n",
    "|:-----------------------------------------------------------------------------------------------------------------------------------------|:-----------------| :---: | :---: |:----------:|:------------------------------------------------------------------------------------------------:|:---------------------------------------------------------------------------------------------:|\n",
    "| [WriterTextSplitter](https://github.com/writer/langchain-writer/blob/main/langchain_writer/text_splitter.py#L11) | [langchain-writer](https://pypi.org/project/langchain-writer/) |      ❌       |                                       ❌                                       | ❌ | ![PyPI - Downloads](https://img.shields.io/pypi/dm/langchain-writer?style=flat-square&label=%20) | ![PyPI - Version](https://img.shields.io/pypi/v/langchain-writer?style=flat-square&label=%20) |"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c5f08d23df5dc127",
   "metadata": {},
   "source": [
    "## Setup\n",
    "\n",
    "The `WriterTextSplitter` is available in the `langchain-writer` package:"
   ]
  },
  {
   "cell_type": "code",
   "id": "a8d653f15b7ee32d",
   "metadata": {},
   "source": "%pip install --quiet -U langchain-writer",
   "outputs": [],
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "id": "3b9709c26797edf",
   "metadata": {},
   "source": [
    "### Credentials\n",
    "\n",
    "Sign up for [Writer AI Studio](https://app.writer.com/aistudio/signup?utm_campaign=devrel) to generate an API key (you can follow this [Quickstart](https://dev.writer.com/api-guides/quickstart)). Then, set the WRITER_API_KEY environment variable:"
   ]
  },
  {
   "cell_type": "code",
   "id": "2983e19c9d555e58",
   "metadata": {},
   "source": [
    "import getpass\n",
    "import os\n",
    "\n",
    "if not os.getenv(\"WRITER_API_KEY\"):\n",
    "    os.environ[\"WRITER_API_KEY\"] = getpass.getpass(\"Enter your Writer API key: \")"
   ],
   "outputs": [],
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "id": "92a22c77f03d43dc",
   "metadata": {},
   "source": [
    "It's also helpful (but not needed) to set up [LangSmith](https://smith.langchain.com/) for best-in-class observability. If you wish to do so, you can set the `LANGCHAIN_TRACING_V2` and `LANGCHAIN_API_KEY` environment variables:"
   ]
  },
  {
   "cell_type": "code",
   "id": "98d8422ecee77403",
   "metadata": {},
   "source": [
    "# os.environ[\"LANGCHAIN_TRACING_V2\"] = \"true\"\n",
    "# os.environ[\"LANGCHAIN_API_KEY\"] = getpass.getpass()"
   ],
   "outputs": [],
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "id": "67ab78950a3da8ba",
   "metadata": {},
   "source": [
    "### Instantiation\n",
    "\n",
    "Instantiate an instance of `WriterTextSplitter` with the `strategy` parameter set to one of the following:\n",
    "\n",
    "- `llm_split`: Uses language model for precise semantic splitting\n",
    "- `fast_split`: Uses heuristic-based approach for quick splitting\n",
    "- `hybrid_split`: Combines both approaches\n"
   ]
  },
  {
   "cell_type": "code",
   "id": "787b3ba8af32533f",
   "metadata": {},
   "source": [
    "from langchain_writer.text_splitter import WriterTextSplitter\n",
    "\n",
    "splitter = WriterTextSplitter(strategy=\"fast_split\")"
   ],
   "outputs": [],
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "id": "d91c6f752fd31cee",
   "metadata": {},
   "source": [
    "## Usage\n",
    "The `WriterTextSplitter` can be used synchronously or asynchronously.\n",
    "\n",
    "### Synchronous usage\n",
    "To use the `WriterTextSplitter` synchronously, call the `split_text` method with the text you want to split:"
   ]
  },
  {
   "cell_type": "code",
   "id": "d1a24b81a8a96f09",
   "metadata": {},
   "source": [
    "text = \"\"\"Reeeeeeeeeeeeeeeeeeeeeaally long text you want to divide into smaller chunks. For example you can add a poem multiple times:\n",
    "Two roads diverged in a yellow wood,\n",
    "And sorry I could not travel both\n",
    "And be one traveler, long I stood\n",
    "And looked down one as far as I could\n",
    "To where it bent in the undergrowth;\n",
    "\n",
    "Then took the other, as just as fair,\n",
    "And having perhaps the better claim,\n",
    "Because it was grassy and wanted wear;\n",
    "Though as for that the passing there\n",
    "Had worn them really about the same,\n",
    "\n",
    "And both that morning equally lay\n",
    "In leaves no step had trodden black.\n",
    "Oh, I kept the first for another day!\n",
    "Yet knowing how way leads on to way,\n",
    "I doubted if I should ever come back.\n",
    "\n",
    "I shall be telling this with a sigh\n",
    "Somewhere ages and ages hence:\n",
    "Two roads diverged in a wood, and I—\n",
    "I took the one less traveled by,\n",
    "And that has made all the difference.\n",
    "\n",
    "Two roads diverged in a yellow wood,\n",
    "And sorry I could not travel both\n",
    "And be one traveler, long I stood\n",
    "And looked down one as far as I could\n",
    "To where it bent in the undergrowth;\n",
    "\n",
    "Then took the other, as just as fair,\n",
    "And having perhaps the better claim,\n",
    "Because it was grassy and wanted wear;\n",
    "Though as for that the passing there\n",
    "Had worn them really about the same,\n",
    "\n",
    "And both that morning equally lay\n",
    "In leaves no step had trodden black.\n",
    "Oh, I kept the first for another day!\n",
    "Yet knowing how way leads on to way,\n",
    "I doubted if I should ever come back.\n",
    "\n",
    "I shall be telling this with a sigh\n",
    "Somewhere ages and ages hence:\n",
    "Two roads diverged in a wood, and I—\n",
    "I took the one less traveled by,\n",
    "And that has made all the difference.\n",
    "\n",
    "Two roads diverged in a yellow wood,\n",
    "And sorry I could not travel both\n",
    "And be one traveler, long I stood\n",
    "And looked down one as far as I could\n",
    "To where it bent in the undergrowth;\n",
    "\n",
    "Then took the other, as just as fair,\n",
    "And having perhaps the better claim,\n",
    "Because it was grassy and wanted wear;\n",
    "Though as for that the passing there\n",
    "Had worn them really about the same,\n",
    "\n",
    "And both that morning equally lay\n",
    "In leaves no step had trodden black.\n",
    "Oh, I kept the first for another day!\n",
    "Yet knowing how way leads on to way,\n",
    "I doubted if I should ever come back.\n",
    "\n",
    "I shall be telling this with a sigh\n",
    "Somewhere ages and ages hence:\n",
    "Two roads diverged in a wood, and I—\n",
    "I took the one less traveled by,\n",
    "And that has made all the difference.\n",
    "\"\"\"\n",
    "\n",
    "chunks = splitter.split_text(text)\n",
    "chunks"
   ],
   "outputs": [],
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "id": "e6d09fcf",
   "metadata": {},
   "source": [
    "You can print the length of the chunks to see how many chunks were created:"
   ]
  },
  {
   "cell_type": "code",
   "id": "a470daa875d99006",
   "metadata": {},
   "source": [
    "print(len(chunks))"
   ],
   "outputs": [],
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "id": "f89c048c7d23807a",
   "metadata": {},
   "source": [
    "### Asynchronous usage\n",
    "To use the `WriterTextSplitter` asynchronously, call the `asplit_text` method with the text you want to split:"
   ]
  },
  {
   "cell_type": "code",
   "id": "e2f7fd52b7188c6c",
   "metadata": {},
   "source": [
    "async_chunks = await splitter.asplit_text(text)\n",
    "async_chunks"
   ],
   "outputs": [],
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "id": "cb669ce7",
   "metadata": {},
   "source": [
    "Print the length of the chunks to see how many chunks were created:"
   ]
  },
  {
   "cell_type": "code",
   "id": "a1439db14e687fa4",
   "metadata": {},
   "source": [
    "print(len(async_chunks))"
   ],
   "outputs": [],
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "id": "ab25a3bed8437a05",
   "metadata": {},
   "source": [
    "## API reference\n",
    "For detailed documentation of all `WriterTextSplitter` features and configurations head to the [API reference](https://python.langchain.com/api_reference/writer/text_splitter/langchain_writer.text_splitter.WriterTextSplitter.html#langchain_writer.text_splitter.WriterTextSplitter).\n",
    "\n",
    "## Additional resources\n",
    "You can find information about Writer's models (including costs, context windows, and supported input types) and tools in the [Writer docs](https://dev.writer.com/home)."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}