Update groupby().first() documentation to clarify behavior with missing data (#27578) #61345

ericcht · 2025-04-23T18:39:30Z

This PR enhances the docstring for GroupBy.first() to clarify:

It returns the first non-null value per column
It differs from .nth(0) and .head(1) in how it treats missing values
Includes comparative examples for better understanding

Fixes part of issue #27578

Ready for review.

rhshadrach

Thanks for the PR!

rhshadrach · 2025-04-24T20:43:36Z

pandas/core/groupby/groupby.py

@@ -3232,9 +3232,12 @@ def first(
        self, numeric_only: bool = False, min_count: int = -1, skipna: bool = True
    ) -> NDFrameT:
        """
-        Compute the first entry of each column within each group.
+        Compute the first non-null entry of each column within each group.


pandas documentation is quite consistent with using NA instead of null. Can you use NA throughout.

In addition, this line is incorrect as you can pass skipna=False.

rhshadrach · 2025-04-24T20:45:44Z

pandas/core/groupby/groupby.py

-        core.groupby.DataFrameGroupBy.nth : Take the nth row from each group.
+        DataFrame.groupby : Group DataFrame using a mapper or by a Series of columns.
+        Series.groupby : Group Series using a mapper or by a Series of values.
+        GroupBy.nth : Take the nth row from each group.


GroupBy is not public. Can you use DataFrameGroupBy instead.

rhshadrach · 2025-04-24T20:49:16Z

pandas/core/groupby/groupby.py

+        >>> df.groupby("A").nth(0)
+            B  C          D
+        A
+        1  NaN  1 2000-03-11
+        3  6.0  3 2000-03-13


I think this page should only include documentation on first. Can you remove the use of other methods.

Update groupby.py

c1de316

ericcht requested a review from rhshadrach as a code owner April 23, 2025 18:39

Update groupby.py

b4d068f

ericcht force-pushed the doc-groupby-first-update branch from 4234958 to b4d068f Compare April 23, 2025 18:50

ericcht marked this pull request as draft April 23, 2025 19:04

ericcht closed this Apr 23, 2025

ericcht deleted the doc-groupby-first-update branch April 23, 2025 19:07

ericcht changed the title ~~DOC: clarify column-wise behavior of GroupBy.first and its handling of nulls (#42406)~~ Update groupby().first() documentation to clarify behavior with missing data (#27578) Apr 23, 2025

ericcht restored the doc-groupby-first-update branch April 23, 2025 19:09

ericcht reopened this Apr 23, 2025

rhshadrach requested changes Apr 24, 2025

View reviewed changes

rhshadrach added Docs Groupby labels Apr 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update groupby().first() documentation to clarify behavior with missing data (#27578) #61345

Update groupby().first() documentation to clarify behavior with missing data (#27578) #61345

ericcht commented Apr 23, 2025 •

edited

Loading

rhshadrach left a comment

rhshadrach Apr 24, 2025

rhshadrach Apr 24, 2025

rhshadrach Apr 24, 2025

rhshadrach Apr 24, 2025

Update groupby().first() documentation to clarify behavior with missing data (#27578) #61345

Are you sure you want to change the base?

Update groupby().first() documentation to clarify behavior with missing data (#27578) #61345

Conversation

ericcht commented Apr 23, 2025 • edited Loading

rhshadrach left a comment

Choose a reason for hiding this comment

rhshadrach Apr 24, 2025

Choose a reason for hiding this comment

rhshadrach Apr 24, 2025

Choose a reason for hiding this comment

rhshadrach Apr 24, 2025

Choose a reason for hiding this comment

rhshadrach Apr 24, 2025

Choose a reason for hiding this comment

ericcht commented Apr 23, 2025 •

edited

Loading