Skip to content

Conversation

@gorkachea
Copy link

Description:

This PR adds support for the R programming language to LangChain's text splitters, enabling users to intelligently split R code while preserving semantic structure.

Changes:

  • Added Language.R enum value to the Language enum
  • Implemented R-specific separators in RecursiveCharacterTextSplitter.get_separators_for_language()
  • Splits R code along:
    • Function definitions (<- function, = function)
    • Package loading (library(), require())
    • Control flow statements (if(), for(), while(), switch())
    • Data structure creation (data.frame(), list())
    • Standard separators (double newlines, single newlines, spaces)

Why This Matters:

R is widely used in data science, statistical analysis, machine learning, and bioinformatics. This change enables users working with R codebases to properly chunk R scripts for RAG applications and build AI assistants that understand R code structure.

Example Usage:

from langchain_text_splitters import Language, RecursiveCharacterTextSplitter

r_code = """
library(dplyr)

calculate_mean <- function(x) {
  mean(x, na.rm = TRUE)
}
"""

splitter = RecursiveCharacterTextSplitter.from_language(
    language=Language.R,
    chunk_size=100,
    chunk_overlap=0
)

chunks = splitter.split_text(r_code)
# Splits intelligently at library calls and function definitions

Testing:
Tested locally with various R code samples including package imports, function definitions (both <- and = syntax), control flow statements, and data structure creation. All separators work correctly with proper regex escaping.

Issue: Fixes #33824

Dependencies: None

- Add Language.R enum value
- Implement R-specific separators for RecursiveCharacterTextSplitter
- Split along function definitions (<- function, = function)
- Split along package loading (library, require)
- Split along control flow statements (if, for, while, switch)
- Split along data structure creation (data.frame, list)

This enables users to properly split R code files while preserving
semantic structure, addressing feature request from community.

Signed-off-by: Gorka Bengochea <gorkachea@gmail.com>
@github-actions github-actions bot added the text-splitters Related to the package `text-splitters` label Nov 11, 2025
@gorkachea gorkachea changed the title ✨ Add R programming language support to text splitters feat(text-splitters): add R programming language support Nov 11, 2025
@makkruo
Copy link

makkruo commented Nov 22, 2025

@gorkachea
Hello, there! I noticed that you submitted this PR last week, but so far, there have been no reviews. A few days ago, I also submitted a PR and worked on something similar to yours—contributing MySQL language text splitter rules for Langchain. I noticed that the scope of your changes in the project closely aligns with mine. Currently, my PR is stuck at the CodeQL scanning stage, and that workflow hasn’t even started yet. This has prevented me from requesting a review. I’m not sure if your PR is facing a similar problem or if this might be a bug in the text-splitters workflow of the project. All I can see is that you’ve done similar work and also haven’t requested a review. If you’re experiencing a similar problem or have any workarounds or ideas, I’d be very glad to discuss and exchange ideas with you. Wishing you a smooth merge of your contribution!

@gorkachea
Copy link
Author

gorkachea commented Nov 22, 2025

Hey @makkruo , thanks for the note! I’m seeing the same thing: no option to request a review and CodeQL hasn’t kicked off. From what I understand, external contributors can’t request reviewers directly, that’s something maintainers do when they triage PRs. I’ll keep an eye on this one and see if it gets picked up on their review board.

@gorkachea
Copy link
Author

Hi @mdrxy ! Since you labeled the related issue earlier, just wanted to give you a quick heads-up that the PR implementing R support is ready and includes tests. No rush at all, if you have a moment to take a look or let me know if something should be adjusted, I’d be happy to update it. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature text-splitters Related to the package `text-splitters`

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add R programming to langchain_text_splitters.Language

2 participants