⚠ This page is served via a proxy. Original site: https://github.com
This service does not collect credentials or authentication data.
Skip to content

Conversation

@NiraliPopat
Copy link
Collaborator

@NiraliPopat NiraliPopat commented Jan 22, 2026

Summary

This PR is adding demo tasks for evaluation and the corresponding changes to support that.

Explain the features implemented:

  • question answering eval task added at tasks/eval/question_answering/simpleqa
  • classification eval task added at tasks/eval/classification/simpleqa
  • precision, recall, and f1_score to get the average as well as per-class score
  • unit_metric registry added
  • fixes to support eval tasks

Performance impact (if any):

  • N/A

How to Test the feature

Steps for reviewers to verify functionality:

  1. run the task tasks/eval/question_answering/simpleqa
  2. Observe tasks/eval/question_answering/simpleqa/MetricCollatorPostProcessor_.json file with results
  3. run the task tasks/eval/classification/simpleqa
  4. Observe tasks/eval/classification/simpleqa/MetricCollatorPostProcessor_.json file with results

Screenshots (if applicable)

NA

Checklist

  • Lint fixes and unit testing done
  • End to end task testing
  • Documentation updated

@NiraliPopat NiraliPopat requested a review from a team as a code owner January 22, 2026 08:28
@NiraliPopat NiraliPopat marked this pull request as draft January 22, 2026 08:29
@NiraliPopat NiraliPopat marked this pull request as ready for review January 22, 2026 08:55
@NiraliPopat NiraliPopat marked this pull request as draft January 22, 2026 09:20
@NiraliPopat NiraliPopat marked this pull request as ready for review January 22, 2026 09:41
@bidyapati-p
Copy link
Collaborator

We need to update documentation, this is first task on sygra platform. Lets discuss and update document as well as part of this

@bidyapati-p
Copy link
Collaborator

@NiraliPopat
Can you please attach the result file to review the output schema

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants