-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
Update groupby().first() documentation to clarify behavior with missing data (#27578) #61345
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
4234958
to
b4d068f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR!
@@ -3232,9 +3232,12 @@ def first( | |||
self, numeric_only: bool = False, min_count: int = -1, skipna: bool = True | |||
) -> NDFrameT: | |||
""" | |||
Compute the first entry of each column within each group. | |||
Compute the first non-null entry of each column within each group. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pandas documentation is quite consistent with using NA
instead of null
. Can you use NA
throughout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition, this line is incorrect as you can pass skipna=False
.
core.groupby.DataFrameGroupBy.nth : Take the nth row from each group. | ||
DataFrame.groupby : Group DataFrame using a mapper or by a Series of columns. | ||
Series.groupby : Group Series using a mapper or by a Series of values. | ||
GroupBy.nth : Take the nth row from each group. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GroupBy
is not public. Can you use DataFrameGroupBy
instead.
>>> df.groupby("A").nth(0) | ||
B C D | ||
A | ||
1 NaN 1 2000-03-11 | ||
3 6.0 3 2000-03-13 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this page should only include documentation on first
. Can you remove the use of other methods.
This PR enhances the docstring for
GroupBy.first()
to clarify:.nth(0)
and.head(1)
in how it treats missing valuesFixes part of issue #27578
Ready for review.