-
Notifications
You must be signed in to change notification settings - Fork 59
[wip] adding a part for pandas.df to hash_value #575
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportBase: 77.06% // Head: 77.04% // Decreases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## master #575 +/- ##
==========================================
- Coverage 77.06% 77.04% -0.02%
==========================================
Files 20 20
Lines 4316 4322 +6
Branches 1213 1217 +4
==========================================
+ Hits 3326 3330 +4
- Misses 802 803 +1
- Partials 188 189 +1
Flags with carried forward coverage won't be shown. Click here to find out more.
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
@djarecka thank you! I'll try it today and let you now |
@djarecka this change is specific for a list of
this error happens after |
I think this should be covered by the sequence hashing functionality shouldn't it @effigies ? |
Seems like it's worth just testing. |
Doesn't work, different data frames evaluate to the same hash... Looks like we need to do some more work on the backstop object hash. At the moment we are a bit stuck no matter which way we err, if the same value produces a different hash you run the risk of workflows getting stuck halfway through if the downstream node can identify the upstream node (bad). However, if different values map onto the same hash you run the risk of producing the wrong results (worse). If #784 is implemented we could avoid the workflows getting stuck and then we could just default to cloudpickle to guarantee that different values map onto different hashes at least, and just throw a warning that the cache is likely to be missed in subsequent runs |
Types of changes
Summary
adding a section to
hash_value
that is forpandas.DF
: changing df to dict before calculating the hash value.@yibeichan - could you please see if this small change fixes some of your issues? it did fix the issue that I had with running my workflow within the jupyter notebook...
Checklist