feat(evaluation): Add CJK tokenizer support for ROUGE-1 evaluation #4143

maru0804 · 2026-01-13T13:33:40Z

Summary

The default ROUGE tokenizer only recognizes ASCII alphanumeric characters ([a-z0-9]), causing ROUGE-1 scores to be 0.0 for CJK (Chinese, Japanese, Korean) text. This PR adds CJK language support through an opt-in tokenizer.

Changes

New Features

CJKTokenizer: A character-based tokenizer for CJK languages that:
- Tokenizes CJK characters individually (1 char = 1 token)
- Preserves word-based tokenization for ASCII alphanumeric
- Removes CJK punctuation (U+3000-U+303F)
- Skips other scripts (Greek, Cyrillic, fullwidth alphanumeric, etc.)
RougeScoreCriterion: New criterion class to specify tokenizer options

Modified

RougeEvaluator: Updated to support custom tokenizers and log a warning (once) when CJK text is detected without proper tokenizer configuration
ResponseEvaluator: Updated to pass eval_metric (including criterion) to RougeEvaluator

Usage

from google.adk.evaluation.eval_metrics import EvalMetric, RougeScoreCriterion

criterion = RougeScoreCriterion(threshold=0.8, tokenizer='cjk')
eval_metric = EvalMetric(
    metric_name='response_match_score',
    threshold=0.8,
    criterion=criterion,
)

Backward Compatibility

✅ Default behavior unchanged (ASCII-only tokenization)
✅ Existing tests pass (365 evaluation tests)
✅ Warning logged when CJK detected without tokenizer

Limitations (documented in docstrings)

Fullwidth alphanumeric (Ａ-Ｚ, ０-９) are skipped
Greek, Cyrillic, and other non-CJK scripts are skipped
Character-based tokenization, not morphological analysis (for Japanese morphological analysis, consider MeCab)

Test Coverage

Added 29 new tests covering:

CJKTokenizer tokenization behavior
ROUGE score calculation with/without CJK tokenizer
Warning behavior (logged once per instance)
Edge cases (empty strings, None, mixed text, punctuation)

Fixes google#4122 The default ROUGE tokenizer only recognizes ASCII alphanumeric characters, causing ROUGE-1 scores to be 0.0 for CJK (Chinese, Japanese, Korean) text. Changes: - Add CJKTokenizer class that handles CJK characters individually while preserving word-based tokenization for ASCII alphanumeric characters - Add RougeScoreCriterion to allow explicit tokenizer specification - Update RougeEvaluator to support custom tokenizers - Add warning when CJK text is detected without CJK tokenizer - Preserve backward compatibility: default behavior unchanged Usage: criterion = RougeScoreCriterion(threshold=0.8, tokenizer='cjk') eval_metric = EvalMetric( metric_name='response_match_score', threshold=0.8, criterion=criterion, ) Limitations: - Fullwidth alphanumeric (Ａ-Ｚ, ０-９) are skipped - Greek, Cyrillic, and other non-CJK scripts are skipped - This is character-based tokenization, not morphological analysis

gemini-code-assist · 2026-01-13T13:34:30Z

Summary of Changes

Hello @maru0804, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the ROUGE-1 evaluation framework by introducing robust support for CJK (Chinese, Japanese, Korean) languages. Previously, the default ROUGE tokenizer's ASCII-only nature led to inaccurate, often zero, scores for CJK text. The changes address this by providing an opt-in CJK-specific tokenizer and a mechanism to configure it, ensuring that evaluations involving multilingual content yield meaningful results. The update also includes a helpful warning system to alert users when CJK text is present but no appropriate tokenizer is configured, promoting correct usage.

Highlights

CJK Tokenizer Introduction: A new CJKTokenizer is added, specifically designed for CJK (Chinese, Japanese, Korean) languages, which tokenizes CJK characters individually, preserves word-based tokenization for ASCII alphanumeric text, and removes CJK punctuation.
Configurable ROUGE Evaluation: A RougeScoreCriterion class is introduced, allowing users to specify custom tokenizer options for ROUGE-1 evaluation, enabling proper scoring for CJK text.
Enhanced ROUGE Evaluator: The RougeEvaluator is updated to support these custom tokenizers and now logs a warning (once per instance) if CJK text is detected without a suitable tokenizer configured, guiding users to better evaluation practices.
Improved Metric Propagation: The ResponseEvaluator is modified to correctly pass the eval_metric (including the new RougeScoreCriterion) to the RougeEvaluator, ensuring that tokenizer configurations are applied.
Comprehensive Testing: 29 new tests have been added to validate the CJKTokenizer's behavior, ROUGE score calculations with and without the CJK tokenizer, and the warning mechanism, ensuring robustness and correctness.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request does an excellent job of adding CJK language support for ROUGE-1 evaluation, addressing a significant limitation of the default tokenizer. The introduction of CJKTokenizer and RougeScoreCriterion is well-designed, and the changes are thoughtfully integrated to ensure backward compatibility. The addition of comprehensive unit tests covering the new tokenizer, scoring logic, and warning behavior is particularly commendable. My review includes a couple of suggestions to improve performance and code conciseness, but overall, this is a high-quality contribution.

src/google/adk/evaluation/final_response_match_v1.py

…instantiation Address Gemini Code Assist review feedback: 1. Simplify CJKTokenizer.tokenize() using re.findall instead of manual loop - More concise and idiomatic Python - Same functionality with fewer lines of code 2. Optimize RougeScorer instantiation for performance - Create RougeScorer once in __init__ instead of per invocation - Reuse self._scorer across all evaluate_invocations calls - Avoids unnecessary object creation in loops

adk-bot added the eval [Component] This issue is related to evaluation label Jan 13, 2026

gemini-code-assist bot reviewed Jan 13, 2026

View reviewed changes

src/google/adk/evaluation/final_response_match_v1.py Outdated Show resolved Hide resolved

src/google/adk/evaluation/final_response_match_v1.py Show resolved Hide resolved

maru0804 mentioned this pull request Jan 13, 2026

response_match_score (ROUGE-1) is not effectively in Japanese (with manual tokenization) #4122

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(evaluation): Add CJK tokenizer support for ROUGE-1 evaluation #4143

feat(evaluation): Add CJK tokenizer support for ROUGE-1 evaluation #4143

maru0804 commented Jan 13, 2026

Uh oh!

gemini-code-assist bot commented Jan 13, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat(evaluation): Add CJK tokenizer support for ROUGE-1 evaluation #4143

Are you sure you want to change the base?

feat(evaluation): Add CJK tokenizer support for ROUGE-1 evaluation #4143

Conversation

maru0804 commented Jan 13, 2026

Summary

Changes

New Features

Modified

Usage

Backward Compatibility

Limitations (documented in docstrings)

Test Coverage

Uh oh!

gemini-code-assist bot commented Jan 13, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants