-
Notifications
You must be signed in to change notification settings - Fork 572
fix(integrations): pydantic-ai: properly format binary input message parts to be conformant with the gen_ai.request.messages structure
#5251
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
…parts to be conformant with the `gen_ai.request.messages` structure
…-report-image-inputs
Semver Impact of This PR🟢 Patch (bug fixes) 📋 Changelog PreviewThis is how your changes will appear in the changelog. New Features ✨
Bug Fixes 🐛
Documentation 📚
Internal Changes 🔧
🤖 This preview updates automatically when you update the PR. |
| "type": "blob", | ||
| "modality": item.media_type.split("/")[0], | ||
| "mime_type": item.media_type, | ||
| "content": f"data:{item.media_type};base64,{base64.b64encode(item.data).decode('utf-8')}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should encode item.data here because it is stripped away anyway by the new blob redaction. I assume it's an operation that might fail, and if we don't emit the result, there's no point risking it.
can we just remove the "content" entry from the dictionary here? The assignment to BLOB_DATA_SUBSTITUTE will happen down the line anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you said theres two considerations: making the conversion that could fail and doing work that may be undone later.
ad 1: I think we should make it failsafe, try/catch and simply don't set the content if we fail
ad 2: In the sense of the single responsibility principle: the integrations job is to properly report the messages back and forth and IMHO should not be concerned with redaction. Because if we later decide that we do not want to redact we would just need to change one area, not look through all integrations. Sure, a little contrived example, but I would consider it cleaner.
On the other hand I'm not the primary maintainer, so if you prefer to change it I would be okay with that. What do you say?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking more about the contract that we have with the SDK users that we do the minimum work that is reasonable to collect the telemetry we emit and nothing else.
So, while I generally agree with your point about single-responsibility, I would class decoding a potentially large object as non-trivial enough that we shouldn't do it now. But yes, we will have to keep in mind that some downstream logic is bleeding into the integration as maintainers 😄
Issues
Closes https://linear.app/getsentry/issue/TET-1634/redact-images-pydantic-ai