# Markdown

>[Markdown](https://en.wikipedia.org/wiki/Markdown) is a lightweight markup language for creating formatted text using a plain-text editor.

`MarkdownTextSplitter` splits text along Markdown headings, code blocks, or horizontal rules. It's implemented as a simple subclass of `RecursiveCharacterSplitter` with Markdown-specific separators. See the source code to see the Markdown syntax expected by default.

1. How the text is split: by list of `markdown` specific separators
2. How the chunk size is measured: by number of characters

In [1]:
from langchain.text_splitter import MarkdownTextSplitter

In [2]:
markdown_text = """
# ü¶úÔ∏èüîó LangChain

‚ö° Building applications with LLMs through composability ‚ö°

## Quick Install

```bash
# Hopefully this code block isn't split
pip install langchain
```

As an open source project in a rapidly developing field, we are extremely open to contributions.
"""
markdown_splitter = MarkdownTextSplitter(chunk_size=100, chunk_overlap=0)

In [3]:
docs = markdown_splitter.create_documents([markdown_text])

In [4]:
docs

[Document(page_content='# ü¶úÔ∏èüîó LangChain\n\n‚ö° Building applications with LLMs through composability ‚ö°', metadata={}),
 Document(page_content="Quick Install\n\n```bash\n# Hopefully this code block isn't split\npip install langchain", metadata={}),
 Document(page_content='As an open source project in a rapidly developing field, we are extremely open to contributions.', metadata={})]

In [5]:
markdown_splitter.split_text(markdown_text)

['# ü¶úÔ∏èüîó LangChain\n\n‚ö° Building applications with LLMs through composability ‚ö°',
 "Quick Install\n\n```bash\n# Hopefully this code block isn't split\npip install langchain",
 'As an open source project in a rapidly developing field, we are extremely open to contributions.']