-
Notifications
You must be signed in to change notification settings - Fork 1.4k
[spi] Handle Scala Map/List in defaultNullValue serialization #17534
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
[spi] Handle Scala Map/List in defaultNullValue serialization #17534
Conversation
a2d6be6 to
be791e0
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #17534 +/- ##
============================================
- Coverage 63.27% 63.20% -0.07%
- Complexity 1433 1476 +43
============================================
Files 3133 3170 +37
Lines 186078 189530 +3452
Branches 28408 29002 +594
============================================
+ Hits 117733 119797 +2064
- Misses 59279 60423 +1144
- Partials 9066 9310 +244
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
cc63f58 to
fe7913b
Compare
fe7913b to
b341ac6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR fixes a serialization bug in FieldSpec.getStringValue() where Scala collections (when present via jackson-module-scala) are improperly serialized using toString() instead of JSON, causing downstream parsing failures. The fix adds explicit JSON serialization for Java Map/List types and special handling for Scala collections that don't implement java.util.Map/List.
Changes:
- Modified
FieldSpec.getStringValue()to serialize Java collections to JSON usingJsonUtils.objectToString() - Added Scala collection detection via class name pattern matching (
scala.collection.*) - Implemented
serializeScalaCollection()to handle empty Scala collections by returning appropriate JSON ({}for Maps,[]for Lists/Seq/etc.) - Added comprehensive test coverage for default null value conversions, JSON serialization, and Scala collection handling
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| pinot-spi/src/main/java/org/apache/pinot/spi/data/FieldSpec.java | Core fix: Added JSON serialization for Map/List types and Scala collection handling with helper methods |
| pinot-spi/src/test/java/org/apache/pinot/spi/data/FieldSpecTest.java | Comprehensive test suite covering default null values, JSON serialization round-trips, error cases, and Scala collection behavior |
| // Check if empty using reflection (all Scala collections have isEmpty method) | ||
| try { | ||
| final boolean isEmpty = (boolean) scalaCollection.getClass().getMethod("isEmpty").invoke(scalaCollection); |
Copilot
AI
Jan 20, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reflection is used on every Scala collection serialization. Consider caching the isEmpty Method object in a static map keyed by class name to avoid repeated reflection lookups for the same collection types.
| if (className.contains("Map")) { | ||
| return "{}"; | ||
| } | ||
| // Nil$ is Scala's empty list singleton | ||
| if (className.contains("List") || className.contains("Seq") || className.contains("Set") | ||
| || className.contains("Vector") || className.contains("Array") || className.contains("Buffer")) { |
Copilot
AI
Jan 20, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using contains() for class name matching can produce false positives. For example, a class named MyArrayList or CustomListProcessor would incorrectly match. Consider using more precise pattern matching such as checking for className.matches('.*\\.(List|Seq|Set|Vector|Array|Buffer)(\\$|$)') to match only when these terms appear as actual class name components.
| if (className.contains("Map")) { | |
| return "{}"; | |
| } | |
| // Nil$ is Scala's empty list singleton | |
| if (className.contains("List") || className.contains("Seq") || className.contains("Set") | |
| || className.contains("Vector") || className.contains("Array") || className.contains("Buffer")) { | |
| if (className.matches(".*\\.(Map)(\\$|$)")) { | |
| return "{}"; | |
| } | |
| // Nil$ is Scala's empty list singleton | |
| if (className.matches(".*\\.(List|Seq|Set|Vector|Array|Buffer)(\\$|$)")) { |
| * Returns the appropriate empty JSON representation for a Scala collection based on its class name. | ||
| */ | ||
| private static String getEmptyJsonForScalaCollection(final String className) { | ||
| if (className.contains("Map")) { |
Copilot
AI
Jan 20, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using contains('Map') for class name matching can produce false positives. A class like MyMapper or EmailMapperService would incorrectly be identified as a Map type. Use more precise pattern matching such as className.matches('.*\\.Map(\\$|$)') to ensure 'Map' appears as an actual class name component.
| if (className.contains("Map")) { | |
| if (className.matches(".*\\.Map(\\$|$)")) { |
| final Object scalaStyleMap = new HashMap<String, Object>() { | ||
| @Override | ||
| public String toString() { | ||
| return "Map()"; // This is what Scala Map.toString() returns | ||
| } | ||
| }; |
Copilot
AI
Jan 20, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test verifies behavior for Java HashMap with overridden toString(), but doesn't actually test the Scala collection detection logic since the class name is still java.util.HashMap$1 (anonymous class), not scala.collection.*. The Scala detection code path at line 365 (className.startsWith('scala.collection')) is never exercised. Consider adding a test with a mock class whose name actually starts with 'scala.collection' to cover this branch.
| Map<String, Object> circularMap = new HashMap<>(); | ||
| circularMap.put("self", circularMap); // Circular reference |
Copilot
AI
Jan 20, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test covers the JsonProcessingException catch block for Java collections, but the similar error path in serializeScalaCollection() (lines 377-384 where reflection fails) is not tested. Add a test that triggers the reflection exception branch to ensure complete coverage of error handling.
xiangfu0
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Jackie-Jiang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's discuss more on the issue. I feel we should store serialized format in "defaultNullValue", instead of the complex structure
| // Handle Java collections (Map, List) | ||
| if (value instanceof Map || value instanceof List) { | ||
| try { | ||
| return JsonUtils.objectToString(value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it work if we invoke JsonUtils.objectToString() on scala object?
Issue
FieldSpec.getStringValue()fails to serialize Scala collections to valid JSONDescription
When
jackson-module-scalais on the classpath (via Kafka connectors), Jackson deserializes empty JSON objects{}as Scala collections instead of Java collections. The existinggetStringValue()method only handles JavaMap/Listviainstanceofchecks, but Scala collections don't implementjava.util.Maporjava.util.List, causing them to fall through totoString()which produces invalid JSON.For example, a schema with ComplexFieldSpec:
When Jackson deserializes this with
jackson-module-scalaon the classpath:setDefaultNullValue(scalaMap)callsgetStringValue(scalaMap)scalaMap.toString()→ "Map()"setDataType(DataType.MAP)triggersgetDefaultNullValue()which callsDataType.MAP.convert("Map()")JsonUtils.stringToObject("Map()", Map.class)fails because "Map()" is not valid JSONBug
The bug is in FieldSpec.getStringValue():
Fix
This PR fixes that by detecting Scala collections via class name and serializing them to JSON
Testing