-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
Update groupby().first() documentation to clarify behavior with missing data (#27578) #61345
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3232,9 +3232,12 @@ def first( | |
self, numeric_only: bool = False, min_count: int = -1, skipna: bool = True | ||
) -> NDFrameT: | ||
""" | ||
Compute the first entry of each column within each group. | ||
Compute the first non-null entry of each column within each group. | ||
|
||
Defaults to skipping NA elements. | ||
This method operates column-wise, returning the first non-null value | ||
in each column for every group. Unlike `nth(0)`, which returns the | ||
first row (even if it contains nulls), `first()` skips over NA/null | ||
values in each column independently. | ||
|
||
Parameters | ||
---------- | ||
|
@@ -3251,15 +3254,15 @@ def first( | |
Returns | ||
------- | ||
Series or DataFrame | ||
First values within each group. | ||
First non-null values within each group, selected independently per column. | ||
|
||
See Also | ||
-------- | ||
DataFrame.groupby : Apply a function groupby to each row or column of a | ||
DataFrame. | ||
core.groupby.DataFrameGroupBy.last : Compute the last non-null entry | ||
of each column. | ||
core.groupby.DataFrameGroupBy.nth : Take the nth row from each group. | ||
DataFrame.groupby : Group DataFrame using a mapper or by a Series of columns. | ||
Series.groupby : Group Series using a mapper or by a Series of values. | ||
GroupBy.nth : Take the nth row from each group. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
GroupBy.head : Return the first `n` rows from each group. | ||
GroupBy.last : Compute the last non-null entry of each column. | ||
|
||
Examples | ||
-------- | ||
|
@@ -3272,23 +3275,38 @@ def first( | |
... ) | ||
... ) | ||
>>> df["D"] = pd.to_datetime(df["D"]) | ||
|
||
>>> df.groupby("A").first() | ||
B C D | ||
B C D | ||
A | ||
1 5.0 1 2000-03-11 | ||
3 6.0 3 2000-03-13 | ||
|
||
>>> df.groupby("A").nth(0) | ||
B C D | ||
A | ||
1 NaN 1 2000-03-11 | ||
3 6.0 3 2000-03-13 | ||
Comment on lines
+3285
to
+3289
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this page should only include documentation on |
||
|
||
>>> df.groupby("A").head(1) | ||
A B C D | ||
0 1 NaN 1 2000-03-11 | ||
2 3 6.0 3 2000-03-13 | ||
|
||
>>> df.groupby("A").first(min_count=2) | ||
B C D | ||
A | ||
1 NaN 1.0 2000-03-11 | ||
3 NaN NaN NaT | ||
1 NaN 1.0 2000-03-11 | ||
3 NaN NaN NaT | ||
|
||
>>> df.groupby("A").first(numeric_only=True) | ||
B C | ||
B C | ||
A | ||
1 5.0 1 | ||
3 6.0 3 | ||
""" | ||
|
||
|
||
def first_compat(obj: NDFrameT): | ||
def first(x: Series): | ||
"""Helper function for first item that isn't NA.""" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pandas documentation is quite consistent with using
NA
instead ofnull
. Can you useNA
throughout.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition, this line is incorrect as you can pass
skipna=False
.