⚠ This page is served via a proxy. Original site: https://github.com
This service does not collect credentials or authentication data.
Skip to content

Conversation

@maru0804
Copy link

Summary

Fixes #4122

The default ROUGE tokenizer only recognizes ASCII alphanumeric characters ([a-z0-9]), causing ROUGE-1 scores to be 0.0 for CJK (Chinese, Japanese, Korean) text. This PR adds CJK language support through an opt-in tokenizer.

Changes

New Features

  • CJKTokenizer: A character-based tokenizer for CJK languages that:

    • Tokenizes CJK characters individually (1 char = 1 token)
    • Preserves word-based tokenization for ASCII alphanumeric
    • Removes CJK punctuation (U+3000-U+303F)
    • Skips other scripts (Greek, Cyrillic, fullwidth alphanumeric, etc.)
  • RougeScoreCriterion: New criterion class to specify tokenizer options

Modified

  • RougeEvaluator: Updated to support custom tokenizers and log a warning (once) when CJK text is detected without proper tokenizer configuration

  • ResponseEvaluator: Updated to pass eval_metric (including criterion) to RougeEvaluator

Usage

from google.adk.evaluation.eval_metrics import EvalMetric, RougeScoreCriterion

criterion = RougeScoreCriterion(threshold=0.8, tokenizer='cjk')
eval_metric = EvalMetric(
    metric_name='response_match_score',
    threshold=0.8,
    criterion=criterion,
)

Backward Compatibility

  • ✅ Default behavior unchanged (ASCII-only tokenization)
  • ✅ Existing tests pass (365 evaluation tests)
  • ✅ Warning logged when CJK detected without tokenizer

Limitations (documented in docstrings)

  • Fullwidth alphanumeric (A-Z, 0-9) are skipped
  • Greek, Cyrillic, and other non-CJK scripts are skipped
  • Character-based tokenization, not morphological analysis (for Japanese morphological analysis, consider MeCab)

Test Coverage

Added 29 new tests covering:

  • CJKTokenizer tokenization behavior
  • ROUGE score calculation with/without CJK tokenizer
  • Warning behavior (logged once per instance)
  • Edge cases (empty strings, None, mixed text, punctuation)

Fixes google#4122

The default ROUGE tokenizer only recognizes ASCII alphanumeric characters,
causing ROUGE-1 scores to be 0.0 for CJK (Chinese, Japanese, Korean) text.

Changes:
- Add CJKTokenizer class that handles CJK characters individually while
  preserving word-based tokenization for ASCII alphanumeric characters
- Add RougeScoreCriterion to allow explicit tokenizer specification
- Update RougeEvaluator to support custom tokenizers
- Add warning when CJK text is detected without CJK tokenizer
- Preserve backward compatibility: default behavior unchanged

Usage:
  criterion = RougeScoreCriterion(threshold=0.8, tokenizer='cjk')
  eval_metric = EvalMetric(
      metric_name='response_match_score',
      threshold=0.8,
      criterion=criterion,
  )

Limitations:
- Fullwidth alphanumeric (A-Z, 0-9) are skipped
- Greek, Cyrillic, and other non-CJK scripts are skipped
- This is character-based tokenization, not morphological analysis
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @maru0804, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the ROUGE-1 evaluation framework by introducing robust support for CJK (Chinese, Japanese, Korean) languages. Previously, the default ROUGE tokenizer's ASCII-only nature led to inaccurate, often zero, scores for CJK text. The changes address this by providing an opt-in CJK-specific tokenizer and a mechanism to configure it, ensuring that evaluations involving multilingual content yield meaningful results. The update also includes a helpful warning system to alert users when CJK text is present but no appropriate tokenizer is configured, promoting correct usage.

Highlights

  • CJK Tokenizer Introduction: A new CJKTokenizer is added, specifically designed for CJK (Chinese, Japanese, Korean) languages, which tokenizes CJK characters individually, preserves word-based tokenization for ASCII alphanumeric text, and removes CJK punctuation.
  • Configurable ROUGE Evaluation: A RougeScoreCriterion class is introduced, allowing users to specify custom tokenizer options for ROUGE-1 evaluation, enabling proper scoring for CJK text.
  • Enhanced ROUGE Evaluator: The RougeEvaluator is updated to support these custom tokenizers and now logs a warning (once per instance) if CJK text is detected without a suitable tokenizer configured, guiding users to better evaluation practices.
  • Improved Metric Propagation: The ResponseEvaluator is modified to correctly pass the eval_metric (including the new RougeScoreCriterion) to the RougeEvaluator, ensuring that tokenizer configurations are applied.
  • Comprehensive Testing: 29 new tests have been added to validate the CJKTokenizer's behavior, ROUGE score calculations with and without the CJK tokenizer, and the warning mechanism, ensuring robustness and correctness.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@adk-bot adk-bot added the eval [Component] This issue is related to evaluation label Jan 13, 2026
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request does an excellent job of adding CJK language support for ROUGE-1 evaluation, addressing a significant limitation of the default tokenizer. The introduction of CJKTokenizer and RougeScoreCriterion is well-designed, and the changes are thoughtfully integrated to ensure backward compatibility. The addition of comprehensive unit tests covering the new tokenizer, scoring logic, and warning behavior is particularly commendable. My review includes a couple of suggestions to improve performance and code conciseness, but overall, this is a high-quality contribution.

…instantiation

Address Gemini Code Assist review feedback:

1. Simplify CJKTokenizer.tokenize() using re.findall instead of manual loop
   - More concise and idiomatic Python
   - Same functionality with fewer lines of code

2. Optimize RougeScorer instantiation for performance
   - Create RougeScorer once in __init__ instead of per invocation
   - Reuse self._scorer across all evaluate_invocations calls
   - Avoids unnecessary object creation in loops
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

eval [Component] This issue is related to evaluation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

response_match_score (ROUGE-1) is not effectively in Japanese (with manual tokenization)

2 participants