⚠ This page is served via a proxy. Original site: https://github.com
This service does not collect credentials or authentication data.
Skip to content

Conversation

@killzoner
Copy link
Contributor

@killzoner killzoner commented Jan 28, 2026

Which issue does this PR close?

Closes #1395 (partially if we want to expand on future configurable cache later).

Rationale for this change

Disable explicitly cache() method with good user level error rather than relying on current panic

What changes are included in this PR?

Are there any user-facing changes?

User level error instead of panic

@killzoner killzoner marked this pull request as ready for review January 28, 2026 19:01
Copy link
Contributor

@milenkovicm milenkovicm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @killzoner
I have one suggestion

@killzoner killzoner marked this pull request as draft January 28, 2026 19:38
@milenkovicm
Copy link
Contributor

no need to implement anything on scheduler side at the moment

@killzoner killzoner force-pushed the issue-1395 branch 5 times, most recently from 7d48acc to e76826a Compare February 3, 2026 11:09
@milenkovicm
Copy link
Contributor

You need to serialise logical plan extension, showcase project https://github.com/milenkovicm/ballista_extensions

@killzoner
Copy link
Contributor Author

You need to serialise logical plan extension, showcase project https://github.com/milenkovicm/ballista_extensions

Thanks, I was looking at this part and I was not sure I needed to update ballista proto, this gives more context 🙏

@killzoner killzoner force-pushed the issue-1395 branch 2 times, most recently from fafed16 to e313d85 Compare February 9, 2026 19:32
@killzoner killzoner marked this pull request as ready for review February 10, 2026 17:17
@killzoner
Copy link
Contributor Author

killzoner commented Feb 10, 2026

I will work on adapting the datafusion example apache/datafusion#19139 with ballista custom query planner once this one is done

@milenkovicm
Copy link
Contributor

I will work on adapting the datafusion example apache/datafusion#19139 with ballista custom query planner once this one is done

no need to provide ballista planner at the moment, its ok to fail

@killzoner killzoner force-pushed the issue-1395 branch 2 times, most recently from 4674dc9 to 903d000 Compare February 11, 2026 09:13
@killzoner killzoner changed the title feat: disable DataFrame.cache() for ballista feat: add cache() factory Feb 11, 2026
@killzoner killzoner changed the title feat: add cache() factory feat: add Dataframe.cache() factory Feb 11, 2026
@killzoner killzoner changed the title feat: add Dataframe.cache() factory feat: add Dataframe.cache() factory (no planner handling) Feb 11, 2026
Copy link
Contributor

@milenkovicm milenkovicm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thansk @killzoner few comments and one idea.

what do you think if we add configuration option which would make cache operation a no op instead of throwing exception?

}

fn expressions(&self) -> Vec<Expr> {
self.exprs.clone()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we just return empty vector instead of having property exprs, will they ever be non empty?

Copy link
Contributor Author

@killzoner killzoner Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from what I get from the trait, it could be non-empty if you use with_exprs_and_inputs ? maybe it's not intended this way, I'm not very familiar with this

https://github.com/apache/datafusion-ballista/pull/1420/changes#diff-77305d1314ffee1d338be417750116a3501c19274c426a3380e9efd967f25e6dR751-R765

plan: LogicalPlan,
session_state: &SessionState,
) -> datafusion::error::Result<LogicalPlan> {
Ok(LogicalPlan::Extension(Extension {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i wonder if we should have a ballista config option to "shortcut" cache. At the moment cache() will fail as there is no cache functionality implemented. But if we flip the switch, instead of returning LogicalPlan extension we can return plan. There will be no cache but pipelines relying on cache will not fail (it will be noop), wdyt?

Copy link
Contributor Author

@killzoner killzoner Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it not defeat the purpose of the initial issue #1395 of informing explicitely the user that the feature is not supported?

On the other hand I guess the issue is there for a while so we could just provide cache functionality to users already trying to use it transparently when support is added

@killzoner
Copy link
Contributor Author

killzoner commented Feb 11, 2026

thansk @killzoner few comments and one idea.

what do you think if we add configuration option which would make cache operation a no op instead of throwing exception?

Thanks for the review! All the points make sense to me, putting this to draft and addressing them in the future days

@killzoner killzoner marked this pull request as draft February 11, 2026 22:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: Disable DataFrame.cache() for ballista

2 participants