-
Notifications
You must be signed in to change notification settings - Fork 261
feat: add Dataframe.cache() factory (no planner handling)
#1420
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
milenkovicm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @killzoner
I have one suggestion
|
no need to implement anything on scheduler side at the moment |
7d48acc to
e76826a
Compare
|
You need to serialise logical plan extension, showcase project https://github.com/milenkovicm/ballista_extensions |
Thanks, I was looking at this part and I was not sure I needed to update ballista proto, this gives more context 🙏 |
fafed16 to
e313d85
Compare
|
I will work on adapting the datafusion example apache/datafusion#19139 with ballista custom query planner once this one is done |
no need to provide ballista planner at the moment, its ok to fail |
4674dc9 to
903d000
Compare
cache() factory
cache() factoryDataframe.cache() factory
Dataframe.cache() factoryDataframe.cache() factory (no planner handling)
903d000 to
d688e14
Compare
milenkovicm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thansk @killzoner few comments and one idea.
what do you think if we add configuration option which would make cache operation a no op instead of throwing exception?
| } | ||
|
|
||
| fn expressions(&self) -> Vec<Expr> { | ||
| self.exprs.clone() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we just return empty vector instead of having property exprs, will they ever be non empty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from what I get from the trait, it could be non-empty if you use with_exprs_and_inputs ? maybe it's not intended this way, I'm not very familiar with this
| plan: LogicalPlan, | ||
| session_state: &SessionState, | ||
| ) -> datafusion::error::Result<LogicalPlan> { | ||
| Ok(LogicalPlan::Extension(Extension { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i wonder if we should have a ballista config option to "shortcut" cache. At the moment cache() will fail as there is no cache functionality implemented. But if we flip the switch, instead of returning LogicalPlan extension we can return plan. There will be no cache but pipelines relying on cache will not fail (it will be noop), wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it not defeat the purpose of the initial issue #1395 of informing explicitely the user that the feature is not supported?
On the other hand I guess the issue is there for a while so we could just provide cache functionality to users already trying to use it transparently when support is added
Thanks for the review! All the points make sense to me, putting this to draft and addressing them in the future days |
Which issue does this PR close?
Closes #1395 (partially if we want to expand on future configurable cache later).
Rationale for this change
Disable explicitly
cache()method with good user level error rather than relying on currentpanicWhat changes are included in this PR?
CacheFactorywithDataFrame::cachedatafusion#18893)Are there any user-facing changes?
User level error instead of panic