mirror of https://github.com/hwchase17/langchain.git synced 2026-05-14 10:53:52 +00:00

Go to file

Ian Gregory e5472b5eb8 Framework for supporting more languages in LanguageParser (#13318 )

## Description

I am submitting this for a school project as part of a team of 5. Other
team members are @LeilaChr, @maazh10, @Megabear137, @jelalalamy. This PR
also has contributions from community members @Harrolee and @Mario928.

Initial context is in the issue we opened (#11229).

This pull request adds:

- Generic framework for expanding the languages that `LanguageParser`
can handle, using the
[tree-sitter](https://github.com/tree-sitter/py-tree-sitter#py-tree-sitter)
parsing library and existing language-specific parsers written for it
- Support for the following additional languages in `LanguageParser`:
  - C
  - C++
  - C#
  - Go
- Java (contributed by @Mario928
https://github.com/ThatsJustCheesy/langchain/pull/2)
  - Kotlin
  - Lua
  - Perl
  - Ruby
  - Rust
  - Scala
- TypeScript (contributed by @Harrolee
https://github.com/ThatsJustCheesy/langchain/pull/1)

Here is the [design
document](https://docs.google.com/document/d/17dB14cKCWAaiTeSeBtxHpoVPGKrsPye8W0o_WClz2kk)
if curious, but no need to read it.

## Issues

- Closes #11229
- Closes #10996
- Closes #8405

## Dependencies

`tree_sitter` and `tree_sitter_languages` on PyPI. We have tried to add
these as optional dependencies.

## Documentation

We have updated the list of supported languages, and also added a
section to `source_code.ipynb` detailing how to add support for
additional languages using our framework.

## Maintainer

- @hwchase17 (previously reviewed
https://github.com/langchain-ai/langchain/pull/6486)

Thanks!!

## Git commits

We will gladly squash any/all of our commits (esp merge commits) if
necessary. Let us know if this is desirable, or if you will be
squash-merging anyway.

<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->

---------

Co-authored-by: Maaz Hashmi <mhashmi373@gmail.com>
Co-authored-by: LeilaChr <87657694+LeilaChr@users.noreply.github.com>
Co-authored-by: Jeremy La <jeremylai511@gmail.com>
Co-authored-by: Megabear137 <zubair.alnoor27@gmail.com>
Co-authored-by: Lee Harrold <lhharrold@sep.com>
Co-authored-by: Mario928 <88029051+Mario928@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>

2024-02-13 08:45:49 -08:00

.devcontainer

Update README.md (#8570 )

2023-11-12 22:07:49 -08:00

.github

infra: pr template nit (#17438 )

2024-02-12 16:19:14 -08:00

cookbook

docs: add use case for managing chat messages via Apache Kafka (#16771 )

2024-02-13 08:09:15 -08:00

docker

langchain[minor], community[minor], core[minor]: Async Cache support and AsyncRedisCache (#15817 )

2024-02-07 22:06:09 -05:00

docs

Framework for supporting more languages in LanguageParser (#13318 )

2024-02-13 08:45:49 -08:00

libs

Framework for supporting more languages in LanguageParser (#13318 )

2024-02-13 08:45:49 -08:00

templates

langchain[patch], templates[patch]: fix multi query retriever, web re… (#17434 )

2024-02-12 22:52:07 -08:00

.gitattributes

Update dev container (#6189 )

2023-06-16 15:42:14 -07:00

.gitignore

API Reference building script update (#13587 )

2023-12-07 11:43:42 -08:00

.readthedocs.yaml

docs: allow pdf download of api ref (#16550 )

2024-01-24 17:17:52 -08:00

CITATION.cff

rename repo namespace to langchain-ai (#11259 )

2023-10-01 15:30:58 -04:00

LICENSE

Library Licenses (#13300 )

2023-11-28 17:34:27 -08:00

Makefile

core[patch], langchain[patch]: fix required deps (#14373 )

2023-12-07 14:24:58 -08:00

MIGRATE.md

Update main readme (#13298 )

2023-11-13 17:37:54 -08:00

poetry.lock

infra: install integration deps for test linting (#16963 )

2024-02-02 15:59:10 -08:00

poetry.toml

Unbreak devcontainer (#8154 )

2023-07-23 19:33:47 -07:00

pyproject.toml

infra: install integration deps for test linting (#16963 )

2024-02-02 15:59:10 -08:00

README.md

docs: Added LangGraph in framework parts of readme file (#17279 )

2024-02-08 17:19:47 -08:00

SECURITY.md

Update SECURITY.md email address. (#9558 )

2023-08-21 14:52:21 -04:00

README.md

🦜️🔗 LangChain

⚡ Build context-aware reasoning applications ⚡

Looking for the JS/TS library? Check out LangChain.js.

To help you ship LangChain apps to production faster, check out LangSmith. LangSmith is a unified developer platform for building, testing, and monitoring LLM applications. Fill out this form to get off the waitlist or speak with our sales team.

Quick Install

With pip:

pip install langchain

With conda:

conda install langchain -c conda-forge

🤔 What is LangChain?

LangChain is a framework for developing applications powered by language models. It enables applications that:

Are context-aware: connect a language model to sources of context (prompt instructions, few shot examples, content to ground its response in, etc.)
Reason: rely on a language model to reason (about how to answer based on provided context, what actions to take, etc.)

This framework consists of several parts.

LangChain Libraries: The Python and JavaScript libraries. Contains interfaces and integrations for a myriad of components, a basic run time for combining these components into chains and agents, and off-the-shelf implementations of chains and agents.
LangChain Templates: A collection of easily deployable reference architectures for a wide variety of tasks.
LangServe: A library for deploying LangChain chains as a REST API.
LangSmith: A developer platform that lets you debug, test, evaluate, and monitor chains built on any LLM framework and seamlessly integrates with LangChain.
LangGraph: LangGraph is a library for building stateful, multi-actor applications with LLMs, built on top of (and intended to be used with) LangChain. It extends the LangChain Expression Language with the ability to coordinate multiple chains (or actors) across multiple steps of computation in a cyclic manner.

The LangChain libraries themselves are made up of several different packages.

langchain-core: Base abstractions and LangChain Expression Language.
langchain-community: Third party integrations.
langchain: Chains, agents, and retrieval strategies that make up an application's cognitive architecture.

🧱 What can you build with LangChain?

❓ Retrieval augmented generation

Documentation
End-to-end Example: Chat LangChain and repo

💬 Analyzing structured data

Documentation
End-to-end Example: SQL Llama2 Template

🤖 Chatbots

Documentation
End-to-end Example: Web LangChain (web researcher chatbot) and repo

And much more! Head to the Use cases section of the docs for more.

🚀 How does LangChain help?

The main value props of the LangChain libraries are:

Components: composable tools and integrations for working with language models. Components are modular and easy-to-use, whether you are using the rest of the LangChain framework or not
Off-the-shelf chains: built-in assemblages of components for accomplishing higher-level tasks

Off-the-shelf chains make it easy to get started. Components make it easy to customize existing chains and build new ones.

Components fall into the following modules:

📃 Model I/O:

This includes prompt management, prompt optimization, a generic interface for all LLMs, and common utilities for working with LLMs.

📚 Retrieval:

Data Augmented Generation involves specific types of chains that first interact with an external data source to fetch data for use in the generation step. Examples include summarization of long pieces of text and question/answering over specific data sources.

🤖 Agents:

Agents involve an LLM making decisions about which Actions to take, taking that Action, seeing an Observation, and repeating that until done. LangChain provides a standard interface for agents, a selection of agents to choose from, and examples of end-to-end agents.

📖 Documentation

Please see here for full documentation, which includes:

Getting started: installation, setting up the environment, simple examples
Overview of the interfaces, modules, and integrations
Use case walkthroughs and best practice guides
LangSmith, LangServe, and LangChain Template overviews
Reference: full API docs

💁 Contributing

As an open-source project in a rapidly developing field, we are extremely open to contributions, whether it be in the form of a new feature, improved infrastructure, or better documentation.

For detailed information on how to contribute, see here.

🌟 Contributors

Description

⚡ Building applications with LLMs through composability ⚡

Readme MIT Cite this repository 4.9 GiB

README.md Unescape Escape

🦜️🔗 LangChain

Quick Install

🤔 What is LangChain?

🧱 What can you build with LangChain?

🚀 How does LangChain help?

📖 Documentation

💁 Contributing

🌟 Contributors

README.md