Skip to content

Apply residuals when reading a table #1654

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

Fokko
Copy link
Contributor

@Fokko Fokko commented Feb 12, 2025

No description provided.

@Fokko Fokko force-pushed the fd-integrate-residuals branch from f5b8eaf to 569d1b1 Compare February 12, 2025 18:31
Copy link

@corleyma corleyma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, though I wonder if there's a good place to document this a little more (for future developers/for folks that use pyiceberg as reference implementation)

@Fokko Fokko added this to the PyIceberg 0.10.0 milestone Mar 4, 2025
@Fokko Fokko requested a review from kevinjqliu March 12, 2025 22:05
@Fokko
Copy link
Contributor Author

Fokko commented Mar 18, 2025

@corleyma Good point, I think most of the docs around residuals are here:

class ResidualVisitor(BoundBooleanExpressionVisitor[BooleanExpression], ABC):
"""Finds the residuals for an Expression the partitions in the given PartitionSpec.
A residual expression is made by partially evaluating an expression using partition values.
For example, if a table is partitioned by day(utc_timestamp) and is read with a filter expression
utc_timestamp > a and utc_timestamp < b, then there are 4 possible residuals expressions
for the partition data, d:
1. If d > day(a) and d &lt; day(b), the residual is always true
2. If d == day(a) and d != day(b), the residual is utc_timestamp > a
3. if d == day(b) and d != day(a), the residual is utc_timestamp < b
4. If d == day(a) == day(b), the residual is utc_timestamp > a and utc_timestamp < b
Partition data is passed using StructLike. Residuals are returned by residualFor(StructLike).
"""

Any thoughts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants