-
Notifications
You must be signed in to change notification settings - Fork 4k
GH-48858: [C++][Parquet] Avoid re-serializing footer for signature verification #48859
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| PARQUET_DEPRECATED("Deprecated in 24.0.0. Use the two-argument overload instead.") | ||
| bool VerifySignature(const void* signature); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I opted to deprecate this but we might also remove it, as it seems this API is meant for internal use? @adamreeve @EnricoMi
| bool VerifySignature(std::span<const uint8_t> serialized_metadata, | ||
| std::span<const uint8_t> signature) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I implemented this as a method of FileMetaData but the only member it uses is the FileDecryptor, so perhaps this should be moved elsewhere (or made static?).
|
Hmm, it looks like we need to wait for #48819 for the R failures. |
|
If I remember correctly in rust we verify directly on decrypted buffer (skipping reserialization). |
I'm not surprised that Rust is saner than C++ :-) |
35d6c51 to
e853dee
Compare
|
@github-actions crossbow submit -g cpp |
|
Revision: e853dee Submitted crossbow builds: ursacomputing/crossbow @ actions-f90353001e |
Rationale for this change
When reading an encrypted Parquet file with a plaintext footer, the Parquet reader is able to verify footer integrity by comparing the signature in the file with the one computed by encrypting the footer.
However, the way it does this is to first re-serializes the deserialized footer using Thrift. This has several issues:
Reason 3 is what allowed this to be uncovered by OSS-Fuzz (see https://oss-fuzz.com/testcase-detail/4740205688193024).
This PR switches to reusing the original serialized metadata.
Are these changes tested?
Yes, by existing tests and new fuzz regression file.
Are there any user-facing changes?
No.