[wip] adding a part for pandas.df to hash_value #575

djarecka · 2022-09-02T16:20:37Z

Types of changes

Bug fix (non-breaking change which fixes an issue)

Summary

adding a section to hash_value that is for pandas.DF: changing df to dict before calculating the hash value.

@yibeichan - could you please see if this small change fixes some of your issues? it did fix the issue that I had with running my workflow within the jupyter notebook...

Checklist

I have added tests to cover my changes (if necessary)
I have updated documentation (if necessary)

codecov · 2022-09-02T16:27:06Z

Codecov Report

Base: 77.06% // Head: 77.04% // Decreases project coverage by -0.01% ⚠️

Coverage data is based on head (875fd17) compared to base (219c721).
Patch coverage: 0.00% of modified lines in pull request are covered.

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #575      +/-   ##
==========================================
- Coverage   77.06%   77.04%   -0.02%     
==========================================
  Files          20       20              
  Lines        4316     4322       +6     
  Branches     1213     1217       +4     
==========================================
+ Hits         3326     3330       +4     
- Misses        802      803       +1     
- Partials      188      189       +1

Flag	Coverage Δ
unittests	`76.95% <0.00%> (-0.02%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
pydra/engine/helpers.py	`79.52% <0.00%> (-0.38%)`	⬇️
pydra/engine/core.py	`89.04% <0.00%> (+0.06%)`	⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

yibeichan · 2022-09-02T16:36:18Z

@djarecka thank you! I'll try it today and let you now

yibeichan · 2022-09-03T03:17:03Z

@djarecka this change is specific for a list of pandas.Dataframe as inputs, correct?
my notebook works fine with or without this change because I'm using python 3.7? If this can solve your problem, then it probably can solve potential problems in higher python as well.
However, I exported my notebook as .py and got errors (not related to your change here)

File "/Users/yibeichen/miniconda3/envs/pydra/lib/python3.7/asyncio/futures.py", line 181, in result
raise self._exception
concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.

this error happens after secondlevel_estimation when starting statistical testing.
Only .py script has this error, notebook works fine.

tclose · 2025-04-02T01:08:34Z

I think this should be covered by the sequence hashing functionality shouldn't it @effigies ?

effigies · 2025-04-02T15:25:07Z

Seems like it's worth just testing.

tclose · 2025-04-03T08:07:15Z

Doesn't work, different data frames evaluate to the same hash... Looks like we need to do some more work on the backstop object hash.

At the moment we are a bit stuck no matter which way we err, if the same value produces a different hash you run the risk of workflows getting stuck halfway through if the downstream node can identify the upstream node (bad). However, if different values map onto the same hash you run the risk of producing the wrong results (worse). If #784 is implemented we could avoid the workflows getting stuck and then we could just default to cloudpickle to guarantee that different values map onto different hashes at least, and just throw a warning that the cache is likely to be missed in subsequent runs

adding a part for pandas.df to hash_value

875fd17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[wip] adding a part for pandas.df to hash_value #575

[wip] adding a part for pandas.df to hash_value #575

djarecka commented Sep 2, 2022 •

edited

Loading

codecov bot commented Sep 2, 2022 •

edited

Loading

yibeichan commented Sep 2, 2022

yibeichan commented Sep 3, 2022

tclose commented Apr 2, 2025

effigies commented Apr 2, 2025

tclose commented Apr 3, 2025

[wip] adding a part for pandas.df to hash_value #575

Are you sure you want to change the base?

[wip] adding a part for pandas.df to hash_value #575

Conversation

djarecka commented Sep 2, 2022 • edited Loading

Types of changes

Summary

Checklist

codecov bot commented Sep 2, 2022 • edited Loading

Codecov Report

yibeichan commented Sep 2, 2022

yibeichan commented Sep 3, 2022

tclose commented Apr 2, 2025

effigies commented Apr 2, 2025

tclose commented Apr 3, 2025

djarecka commented Sep 2, 2022 •

edited

Loading

codecov bot commented Sep 2, 2022 •

edited

Loading