-
Notifications
You must be signed in to change notification settings - Fork 49
[bitreq] Check utf-8 while deserializing JSON body #486
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[bitreq] Check utf-8 while deserializing JSON body #486
Conversation
|
This came up while me and @tnull were working on migrating to bitreq in lightningdevkit/vss-client#56 |
tnull
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK a3ae780
| T: serde::de::Deserialize<'a>, | ||
| { | ||
| match serde_json::from_str(self.as_str()?) { | ||
| match serde_json::from_slice(self.as_bytes()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really prefer the serde error over the utf8 error? I'm skeptical the claimed perf difference matters (notably when deserializing a string in the json serde skips utf8 validation if you call from_str whereas has to do it if you call from_slice). I don't have a strong feeling on the error but the commit should explain why we prefer the swap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Matt see below, I have expanded the commit message.
I'm skeptical the claimed perf difference matters
The claim is that a reduction in cache misses leads to non-trivial performance gains.
We still do the same amount of work on the data.
While deserializing, `serde_json::from_slice` validates utf-8 as needed. So instead of making two passes on the response body, one to validate utf-8, and another to deserialize the object, we let `serde_json::from_slice` check utf-8 as needed during deserialization. Making a single pass over large response bodies reduces the number of cache misses, and hence decreases the cycles taken to fully deserialize such responses. `Response::json` now returns `Error::SerdeJsonError` instead of `Error::InvalidUtf8InBody` if invalid utf-8 is found during deserialization. For this error case, the `Error::SerdeJsonError` inner type `serde_json::error::Error` is of category `serde_json::error::Category::Syntax`. Other JSON syntax errors are also assigned to this category. We accept this loss of information given the performance gain described above.
We avoid creating a `String` only to immediately convert it back to its inner `Vec<u8>`. This also mirrors the `serde_json::from_slice` call made when parsing a `Response` body as JSON.
a3ae780 to
2c3bbf2
Compare
TheBlueMatt
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mmm, fair enough.
ACK 2c3bbf2
and