Skip to content

QST: best way to extend/subclass pandas.DataFrame #61362

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
2 tasks done
rwijtvliet opened this issue Apr 26, 2025 · 1 comment
Open
2 tasks done

QST: best way to extend/subclass pandas.DataFrame #61362

rwijtvliet opened this issue Apr 26, 2025 · 1 comment
Labels
Closing Candidate May be closeable, needs more eyeballs Usage Question

Comments

@rwijtvliet
Copy link

Research

  • I have searched the [pandas] tag on StackOverflow for similar questions.

  • I have asked my usage related question on StackOverflow.

Link to question on StackOverflow

https://stackoverflow.com/questions/79594258/best-way-to-extend-subclass-pandas-dataframe

Question about pandas

I've written a package to work with energy-related timeseries. At its center is a class (PfLine) that is essentially a wrapper around pandas.DataFrame, and it implements various methods and properties that are also available on DataFrames - like .loc, .asfreq(), .index, etc.

I am currently in the middle of a rewrite of this package, and think it would be a good idea to have closer integration with pandas. This page lays out several possibilities, and I am unsure which route to take - and was hoping to find some sparring here.

Let me describe a bit what I'm trying to accomplish with the PfLine class:

  • Behaves like a DataFrame, with specific column names allowed and some data conversion (and validation) on initialisation.

  • Is immutable to avoid data from becoming inconsistent.

  • Has additional methods.

The methods could be directly under PfLine.method() or under e.g. df.pfl.method().

What is probably important: a way is needed for the user to specify a (still under development) configuration object (commodity) when initialising the PfLine. This object contains information used in coercing the data, e.g. what are the correct units and which timezones are allowed for the index.

@rwijtvliet rwijtvliet added Needs Triage Issue that has not been reviewed by a pandas team member Usage Question labels Apr 26, 2025
@rhshadrach
Copy link
Member

Behaves like a DataFrame... Is immutable

These two are in conflict, pandas is not designed to be immutable. At the very least, you'd have to workaround:

  • __setitem__, .loc, .iloc, .at, .iat
  • .to_numpy(), .values
  • Any method with an inplace argument
  • Any method which acts inplace (e.g. update, insert)

But with these requirements, I believe the only feasible option would be subclass DataFrame / Series.

@rhshadrach rhshadrach added Closing Candidate May be closeable, needs more eyeballs and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Apr 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Closing Candidate May be closeable, needs more eyeballs Usage Question
Projects
None yet
Development

No branches or pull requests

2 participants