⚠ This page is served via a proxy. Original site: https://github.com
This service does not collect credentials or authentication data.
Skip to content

Conversation

@tankyleo
Copy link
Contributor

    [bitreq] Check utf-8 while deserializing JSON body

    While deserializing, `serde_json::from_slice` validates utf-8 as
    needed. So instead of making two passes on the response body, one
    to validate utf-8, and another to deserialize the object, we can
    let `serde_json::from_slice` check utf-8 as needed during
    deserialization.

    `Response::json` now returns `Error::SerdeJsonError` instead of
    `Error::InvalidUtf8InBody` if invalid utf-8 bytes are found during
    deserialization.

and

    [bitreq] Serialize `Request` body into `Vec<u8>`

    We avoid creating a `String` only to immediately convert it back to its
    inner `Vec<u8>`.

    This also mirrors the `serde_json::from_slice` call made when parsing a
    `Response` body as JSON.

@tankyleo
Copy link
Contributor Author

This came up while me and @tnull were working on migrating to bitreq in lightningdevkit/vss-client#56

Copy link
Collaborator

@tnull tnull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK a3ae780

T: serde::de::Deserialize<'a>,
{
match serde_json::from_str(self.as_str()?) {
match serde_json::from_slice(self.as_bytes()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really prefer the serde error over the utf8 error? I'm skeptical the claimed perf difference matters (notably when deserializing a string in the json serde skips utf8 validation if you call from_str whereas has to do it if you call from_slice). I don't have a strong feeling on the error but the commit should explain why we prefer the swap.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Matt see below, I have expanded the commit message.

I'm skeptical the claimed perf difference matters

The claim is that a reduction in cache misses leads to non-trivial performance gains.

We still do the same amount of work on the data.

While deserializing, `serde_json::from_slice` validates utf-8 as
needed. So instead of making two passes on the response body, one
to validate utf-8, and another to deserialize the object, we let
`serde_json::from_slice` check utf-8 as needed during deserialization.

Making a single pass over large response bodies reduces the number of
cache misses, and hence decreases the cycles taken to fully deserialize
such responses.

`Response::json` now returns `Error::SerdeJsonError` instead of
`Error::InvalidUtf8InBody` if invalid utf-8 is found during
deserialization.

For this error case, the `Error::SerdeJsonError` inner type
`serde_json::error::Error` is of category
`serde_json::error::Category::Syntax`. Other JSON syntax errors are
also assigned to this category.

We accept this loss of information given the performance gain described
above.
We avoid creating a `String` only to immediately convert it back to its
inner `Vec<u8>`.

This also mirrors the `serde_json::from_slice` call made when parsing a
`Response` body as JSON.
@tankyleo tankyleo force-pushed the 26-01-check-utf8-while-flying branch from a3ae780 to 2c3bbf2 Compare January 30, 2026 19:08
@tankyleo tankyleo requested a review from TheBlueMatt January 30, 2026 19:15
Copy link
Member

@TheBlueMatt TheBlueMatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmm, fair enough.

ACK 2c3bbf2

@tnull tnull merged commit cfba70d into rust-bitcoin:master Jan 30, 2026
31 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants