-
Notifications
You must be signed in to change notification settings - Fork 487
JAMES-4156 refactoring Blobstore deleted message vault for appending messages into single bucket #2902
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
JAMES-4156 refactoring Blobstore deleted message vault for appending messages into single bucket #2902
Conversation
…ingle bucket Keep old code as appendV1 for testing retro-compatibility
quantranhong1999
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
...ssages-vault/src/test/java/org/apache/james/vault/blob/BlobStoreDeletedMessageVaultTest.java
Outdated
Show resolved
Hide resolved
...ssages-vault/src/test/java/org/apache/james/vault/blob/BlobStoreDeletedMessageVaultTest.java
Outdated
Show resolved
Hide resolved
chibenwa
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great as a first step!
I propose to wait to have the purge working on top of the v2 (in addition to the v1) in order not to ship broken code
Thanks @Arsnael for proposing this change.
Alright will add the purge refactoring on this PR then. Thanks for the feedback EDIT: I think technically the purge with old messages works as demonstrated in the adapted test. It's just not doing it for the new single bucket yet. |
Yes I know. But not with the new. Hence my remark. |
…vault single bucket
|
CF remarks on #2894 While there's no objection continuing this work so that it's ready I would actually like to also get the "S3 contained alternative" and potentially bring up a debate on the ML on the two implementations. |
I've seen yes. I've been thinking worth continuing in case, as the remarks were not widely discussed and validated with the community yet and might require more work. |
… deleted blob id from string
…t design into account
|
Might have a few bits left:
In DeletedMessageVaultIntegrationTest , the test vaultDeleteShouldDeleteAllMessagesHavingSameBlobContent . Here it fails, it doesn't work anymore. I guess maybe because same blob content message used to have the same blobId, but maybe here with the different time prefix, it's not the case? Is that case still relevant or not with this new design is what I'm wondering |
…t bucket design into account
...ssages-vault/src/test/java/org/apache/james/vault/blob/BlobStoreDeletedMessageVaultTest.java
Outdated
Show resolved
Hide resolved
| .hasSize(0); | ||
| } | ||
|
|
||
| @Disabled("JAMES-4156") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As said in a comment above, I don't even understand if this case is supposed to be valid or not, even more with the new design.
…rbageCollectionTask execution
...d-messages-vault/src/main/java/org/apache/james/vault/blob/BlobStoreDeletedMessageVault.java
Outdated
Show resolved
Hide resolved
...d-messages-vault/src/main/java/org/apache/james/vault/blob/BlobStoreDeletedMessageVault.java
Outdated
Show resolved
Hide resolved
...d-messages-vault/src/main/java/org/apache/james/vault/blob/BlobStoreDeletedMessageVault.java
Outdated
Show resolved
Hide resolved
...ssages-vault/src/test/java/org/apache/james/vault/blob/BlobStoreDeletedMessageVaultTest.java
Outdated
Show resolved
Hide resolved
…VaultGarbageCollectionTask execution
|
Closing related changes to deleted message vault for a lack of consensus reason. Moving work to linagora tmail. |
|
CF mailing list |
| this.removeStatement = prepareRemove(session); | ||
| this.readStatement = prepareRead(session); | ||
| this.blobIdFactory = blobIdFactory; | ||
| this.blobIdTimeGenerator = blobIdTimeGenerator; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not have a custom BlobId.Factory implementation here to avoid changing the API and affecting the cassandra variant for a S3 only problem ?
check https://github.com/apache/james-project/blob/master/server/mailrepository/mailrepository-blob/src/main/scala/org/apache/james/mailrepository/blob/BlobMailRepositoryFactory.scala#L29 for an example
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @jeantil
-> I wonder if this piece of coude could not work with PlainBlobId as the cassandra implem is blobId transparent. That's a technical nitpick that do not address the core of your remarks but could be a nice simplification I presume. Worth testing...
a custom BlobId.Factory implementation
On the principle mostly agree. This would mean acting at the blobStoreDAO level and not on the BlobStore. Which do not seem like a major issue to me.
I;ll try to push a fix in this direction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would mean acting at the blobStoreDAO level and not on the BlobStore.
🤔 not sure I follow : it does require creating a custom blobstore instance but it has no impact on the blobstoreDAO itself. This is how we did it for the blob backed mailrepository
It could arguably be managed at the injection layer by using a Qualified blobid factory instance ( this is a limitation of the BlobMailRepositoryFactory which forces usage of a passthrough blobstore and cannot use deduplication but I was so happy to have something working I stopped without the deduplication, also deduplication makes less sense for a mail repository which I don't expect to store mails for years )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I see
val metadataBlobStore = BlobStoreFactory.builder()
.blobStoreDAO(blobStoreDao)
.blobIdFactory(metadataIdFactory)
.bucket(defaultBucketName)
.passthrough()
So re-assembling a ad-hoc blobStore tailor made for the feature.
Interesting.
Sorry do like ... comment was hard for me to interprete.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry do like ... comment was hard for me to interprete.
my apologies, I should have been clearer
This information can be deduced without hacks from the storage information.
… PlainBlobId Care is taken not to concentrate all blobs in a single folder
- Build a tailor made blobStore from blobDAO - BlobIdTimeGenerator can be static
|
LGTM, thanks for the changes @chibenwa |
...d-messages-vault/src/main/java/org/apache/james/vault/blob/BlobStoreDeletedMessageVault.java
Outdated
Show resolved
Hide resolved
| import com.google.common.base.Preconditions; | ||
|
|
||
| public class VaultConfiguration { | ||
| public static final String DEFAULT_SINGLE_BUCKET_NAME = "james-deleted-message-vault"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this allow nesting below root ?
Imagine I can or want to have a single s3 bucket for all james related stuff. My bucket is free-hosting-foo-23402
how do I configure the Vault so that deleted messages are nested under
free-hosting-foo-23402/james-deleted-message-vault/xxxx
so I can have the mailqueue and other s3 backed store write in other subdirectories ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No the bucket is (by default) james-deleted-message-vault
Here it is not a subfolder of the main S3 bucket.
The goal of this work is not to get a single bucket
But operate distributed james server off 3 buckets (default, james-uploads, james-deleted-message-vault)
(So as an Ops I need to create 3 buckets and not one each month)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think using the term Bucket or bucket name for the blobstore was a mistake. it overlaps in a bad way with the s3 notions.
Here you are manipulating a blobstore bucket.
I think this can be stored as a subpath of a single underlying S3 bucket by using the prefix property on the S3 configuration
This is all very confusing (to me at least) as this prefix only allows to shift the whole namespace in which case it is impossible to split storage of some use cases among different bucket
On a more direct musing I think the previous design influences the naming too much here, the term single will become confusing quite fast. I think I would use DEFAULT_VAULT_BUCKET instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think using the term Bucket or bucket name for the blobstore was a mistake.
+1
But as said CF previous refactoring that may not be something easy that we are willing to change.
I think this can be stored as a subpath of a single underlying S3 bucket
I don't want this to mess up with mailbox GC. I'd rather have a separate blob store bucket if given the choice.
Also a separate blob store bucket makes loads of sense as the storage constraints are not the standard james ones (glacier?)
I think I would use DEFAULT_VAULT_BUCKET instead
Cf comment above.
| import org.apache.james.blob.api.BlobId; | ||
| import org.apache.james.blob.api.PlainBlobId; | ||
|
|
||
| public class BlobIdTimeGenerator { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why the whole PR goes around the BlobId.Factory design 🤷
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please re read the interface.
BlobId.Factory is about *parsing* BlobId
This work is about *generating* them.
Maybe I am missing something, despite investing several hours on the topic. Maybe a hand on example could help?
Replacing #2899